Lucene 3 Extension
Lucene 3 Extension
The Lucene Search Extension adds full-text search to Lucee using Apache Lucene — no external search service, API keys, or monthly bill.
Version 3.0+ (the Maven-based extension) works with Lucee 6.2+ for traditional keyword search. Lucee 7.0+ unlocks vector and hybrid search, content passage extraction for RAG, and tighter integration with Lucee's AI features.
Getting started with search? The Adding Full Text Search to Your Application walks through creating collections, indexing files and database content, and running searches. This recipe covers what is new in version 3 — collection modes, embeddings, context passages, and how Lucee itself uses the extension.
Installation
Install via the Lucee Administrator under Services > Search, or see Extension Installation for all options (Dockerfile, deploy directory, env var, .CFConfig.json).
- Maven GAV:
org.lucee:lucene-search-extension - Extension ID:
EFDEB172-F52E-4D84-9CD1A1F561B3DFC8 - Source: github.com/lucee/extension-lucene
- Issues: Jira —
cfsearchlabel - Docs category: docs.lucee.org/categories/search.html
Check that the extension is installed and at version 3+:
var extId = "EFDEB172-F52E-4D84-9CD1A1F561B3DFC8";
if ( extensionExists( extId ) ) {
var info = extensionInfo( extId );
echo( "Lucene Search #info.version# installed" );
}
Pin the version in production — see Extension Installation for why unpinned extensions can resolve to a different version after redeploy.
Collection Modes
When you create a collection, the mode attribute controls how documents are indexed and searched:
| Mode | Since | Description |
|---|---|---|
keyword |
6.2 | Traditional full-text search (default). Best for exact term and phrase matching. |
vector |
7.0 | Semantic search using document embeddings. Finds conceptually similar content even when keywords differ. |
hybrid |
7.0 | Combines keyword and vector scoring. Usually the best choice for natural-language queries and RAG. |
Vector and hybrid collections also need an embedding method (see below) and optionally a ratio for hybrid weighting.
Keyword collection (default)
cfcollection(
action="create",
collection="helpdesk",
path=expandPath( "{lucee-config-dir}/collections/helpdesk" )
);
This is the same model described in Adding Full Text Search to Your Application — index files, paths, URLs, or query results, then search with cfsearch.
Hybrid collection
Lucee's own documentation search uses a hybrid collection with TF-IDF embeddings:
cfcollection(
action="create",
collection="lucee-documentation",
path=expandPath( "{lucee-config-dir}/doc/search" ),
mode="hybrid",
embedding="TF-IDF",
ratio="0.5"
);
The ratio controls how much vector vs keyword scoring contributes in hybrid mode:
0.5— equal weight (default)> 0.5— more emphasis on semantic/vector matches< 0.5— more emphasis on keyword/exact matches
Vector collection
cfcollection(
action="create",
collection="articles",
path=expandPath( "{lucee-config-dir}/collections/articles" ),
mode="vector",
embedding="word2vec"
);
Embedding Methods
Embeddings turn text into numeric vectors for semantic search. Lucene 3 supports:
| Value | Type | Notes |
|---|---|---|
TF-IDF |
Statistical | Fast, no external files. Good default for hybrid collections. |
word2vec |
Neural (GloVe) | Better semantic relationships. Loads pre-trained vectors from disk. |
/path/to/vectors.txt |
Custom file | Any path containing / or \ is treated as a GloVe-format vectors file. |
By default, word2vec looks for .txt files in {lucee-server}/context/search/embedding/ and uses the largest one found. Pre-trained GloVe vectors are available from Stanford NLP.
You can also load a custom EmbeddingService Java class by passing its fully qualified class name as the embedding value.
Indexing Content
All indexing goes through cfindex. The Adding Full Text Search to Your Application covers type="path", type="file", and type="custom" in detail. Two additional patterns worth highlighting:
Index from a URL (web crawl)
Crawl a website and index linked pages:
cfindex(
action="update",
collection="website",
type="url",
key="https://example.com/docs/",
extensions=".html,.htm,.cfm",
recurse="yes"
);
The built-in web crawler follows links within the same host, respects the extensions filter, and honours the request timeout.
Incremental indexing with content hashes
When indexing large, slowly changing datasets, avoid re-indexing everything on every request. Lucee's debug documentation reference stores a content hash in custom4 and only re-indexes when the hash changes:
// Build or load your content query
var qryRecipes = queryExecute( "SELECT id, title, body, keywords FROM recipes WHERE active = 1" );
var contentHash = hash( qryRecipes.toString(), "quick" );
// Check whether this version is already indexed
cfindex( action="list", collection="lucee-documentation", name="indexes" );
var needsUpdate = true;
loop query="indexes" {
if ( indexes.custom4 == "hash:" & contentHash ) {
needsUpdate = false;
break;
}
}
if ( needsUpdate ) {
cfindex(
action="update",
collection="lucee-documentation",
type="custom",
query="qryRecipes",
key="id",
title="title",
body="body,keywords",
custom1="keywords",
custom4="hash:" & contentHash
);
}
The same pattern works for indexing function/tag reference data, recipe files, or any other source you can represent as a query.
Searching
Basic keyword search is covered in Adding Full Text Search to Your Application. This section focuses on v3 features.
Context passages (for RAG and AI)
Since 3.0, cfsearch can return multiple scored passages from each matching document — not just a single context snippet. This is what powers Lucee's documentation AI assistant and the AI Augmentation with Lucene pattern.
cfsearch(
collection="lucee-documentation",
criteria=form.searchTerm,
name="results",
maxrows=3,
contextPassages=3,
contextBytes=3000,
contextPassageLength=1000,
contextHighlightBegin="<mark>",
contextHighlightEnd="</mark>",
suggestions="always"
);
Each result row includes a context struct with a passages query. Each passage has:
| Column | Description |
|---|---|
start |
Start position in the original document text |
end |
End position in the original document text |
score |
Lucene relevance score for this passage |
original |
The passage text (with highlighting if configured) |
loop query="results" {
echo( "<h3>#results.title#</h3>" );
loop query="results.context.passages" {
echo( "<p>#results.context.passages.original#</p>" );
}
}
Loader requirement: Context highlighting attributes (
contextHighlightBegin,contextHighlightEnd,contextPassages,contextPassageLength,contextBytes) require Lucee 7.0.3.30+ (or 6.2.6.11+) with extension 3.0.0.168+. On older loaders the extension still works, but these attributes fall back to defaults.
Augmenting AI queries
Lucee 7's debug documentation reference combines Lucene search with LuceeInquiryAISession() — search first, then attach the best passages as context for the LLM:
function augmentSearchCriteria( required string criteria ) {
// Escape Lucene special characters in user input
criteria = rereplace( criteria, '([+\-&|!(){}\[\]\^"~*?:\\\/])', '\\1', 'ALL' );
cfsearch(
collection="lucee-documentation",
criteria=arguments.criteria,
name="local.searchResults",
maxrows=3,
contextPassages=3,
contextBytes=3000,
contextPassageLength=1000,
contextHighlightBegin="<match>",
contextHighlightEnd="</match>",
suggestions="always"
);
var augmentedQuery = "User Query: #arguments.criteria#";
var contextData = [];
loop query="searchResults" {
var passages = [];
loop query="searchResults.context.passages" {
passages.append( {
"start": searchResults.context.passages.start,
"end": searchResults.context.passages.end,
"score": searchResults.context.passages.score,
"data": searchResults.context.passages.original
} );
}
contextData.append( {
"title": searchResults.title,
"summary": searchResults.summary,
"keywords": searchResults.custom1,
"source": searchResults.custom2,
"score": searchResults.score,
"rank": searchResults.rank,
"content": passages
} );
}
if ( contextData.len() ) {
augmentedQuery &= chr( 10 ) & "Documentation Context: #serializeJSON( contextData )#";
}
return augmentedQuery;
}
// Use with an AI session
var session = LuceeCreateAISession( name: "myclaude" );
var response = LuceeInquiryAISession( session, augmentSearchCriteria( "how do I configure caching?" ) );
See AI Augmentation with Lucene for a full RAG implementation guide, including indexing from databases, files, and remote URLs.
Administrator UI
Once installed, the Lucee Administrator exposes search under Services > Search (services.search.cfm):
- List, create, repair, optimize, purge, and delete collections
- View collection mode, embedding, ratio, and index count
- Index a directory path with file extension filters
- Run test searches against a collection
The admin UI uses the same cfcollection, cfindex, and cfsearch tags under the hood.
Real-World Use in Lucee 7
Lucee 7 itself uses the Lucene 3 extension in several places:
| Use case | Where | What it does |
|---|---|---|
| Documentation AI assistant | Debug monitor reference.cfm |
Hybrid collection indexes recipes, tags, and functions; augmentSearchCriteria() feeds passages to the AI session |
| Admin AI navigation | overview.cfm, adminAINavigation.cfm |
Checks whether Lucene 3+ is installed before showing search-related AI features |
| MCP doc search | MCP server extension | search_lucee_docs queries Lucene indexes for functions, tags, and recipes |
| Search admin | services.search.cfm |
Manage collections and run test searches from the Administrator |
These are useful reference implementations when building your own search or RAG features.
Performance Considerations
- Vector and hybrid searches are more CPU-intensive than keyword-only search
- Hybrid mode runs both keyword and vector queries — tune
ratioandmaxrowsfor your workload - Vector indexes are larger than keyword-only indexes
- Run
cfcollection( action="optimize", collection="..." )after bulk updates (good candidate for a scheduled task) - For RAG, keep
contextPassagesandcontextBytesproportional to your LLM context window — Lucee's documentation assistant uses 3 passages and 3000 bytes as a practical default - Cache frequent search results at the application level when the index changes infrequently
Related Recipes
- Adding Full Text Search to Your Application — collections, indexing, search syntax, categories, maintenance
- AI Augmentation with Lucene — full RAG pattern with Lucene + AI sessions
- AI — configuring AI endpoints in Lucee
- Extension Installation — install and pin the extension in Docker and CI