Lucene

From WhyNotWiki

Jump to: navigation, search

http://en.wikipedia.org/wiki/Lucene

Lucene is a free and open source information retrieval API, originally implemented in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene has been ported to programming languages including Perl, C#, C++, Python, Ruby and PHP.

While suitable for any application which requires full text indexing and searching capability, Lucene has been widely recognized for its utility in the implementation of internet search engines and local, single-site searching. This has occasionally led to the misperception that Lucene is itself a search engine with built-in crawling and HTML parsing functionality. Instead, any such application utilizing Lucene would have to provide this functionality independently.

At the core of Lucene's logical architecture is a notion of a document containing fields of text. This flexibility allows Lucene's API to be agnostic of file format. Text from PDFs, HTML, Microsoft Word documents, as well as many others can all be indexed so long as their textual information can be extracted.

It is used to do the full-text search in Wikipedia. So it could probably be used by other MediaWiki sites.

Personal tools