Engines Overview

brinicle provides three specialized engines, each designed for a different type of search problem. All three engines share the same lifecycle pattern, making it easy to switch between them or use multiple engines in the same application.

The Three Engines

EngineUse it for
VectorEngineRaw vector similarity search
ItemSearchEngineProduct, catalog, or structured item search (lexical, semantic, or hybrid)
AutocompleteEngineAutocomplete, title suggestions, and query suggestions

Shared Lifecycle

All engines follow the same init → ingest → finalize → search pattern. This consistent lifecycle means you only need to learn one workflow, and it applies across all engines:
# 1. Initialize an ingest session
engine.init(mode="build")

# 2. Ingest data one record at a time (streaming-first)
for record in records:
    engine.ingest(...)

# 3. Finalize the index (builds the HNSW graph)
engine.finalize()

# 4. Search the index
results = engine.search(...)
This streaming-first approach is fundamental to brinicle’s design. Unlike systems that require loading the entire dataset into memory before indexing, brinicle processes data one record at a time. This makes it possible to index datasets from streaming sources such as JSONL files, databases, APIs, and object storage without ever running out of memory. The supported write modes are:
ModeMeaning
buildBuild a new index
insertAdd new records to an existing index
upsertReplace records with the same external IDs, or insert them if they do not exist

Main Index and Delta Index

brinicle stores updates using a main index and a delta index. The main index stores the primary HNSW graph. The delta index stores later inserts and upserts. During search, brinicle searches both indexes, merges the results, filters deleted records, and returns the top matches. This allows brinicle to support updates without rebuilding the full index after every insert.

Common Operations

Beyond the basic lifecycle, all engines support a common set of operations:

Insert and Upsert

After the initial build, you can add new data using insert mode or update existing data using upsert mode:
engine.init(mode="insert")
for record in new_records:
    engine.ingest(...)
engine.finalize()

Delete

Remove items from the index by their external IDs:
deleted_count, not_found = engine.delete_items(["id1", "id2"], return_not_found=True)
Deletes are logical until the index is compacted.

Rebuild and Optimize

After deletions and updates, the index may benefit from a compact rebuild or graph optimization:
MethodMeaning
needs_rebuild()Returns whether the index has enough update or delete drift to justify rebuilding
rebuild_compact()Rebuilds the index from alive records and removes deleted records physically
optimize_graph()Rebuilds only when the index crosses the configured maintenance threshold
delta_ratio controls when brinicle considers an index ready for maintenance.
# Check if rebuild is needed
if engine.needs_rebuild():
    engine.rebuild_compact(M=16, ef_construction=200, ef_search=64)

# Or run conditional maintenance
engine.optimize_graph()

Close and Destroy

When you’re done with an index, you can either close it (preserving the data on disk) or destroy it (deleting all index files):
engine.close()    # Close the index, data remains on disk
engine.destroy()  # Delete all index files permanently

Which Engine Should I Use?

Use VectorEngine if you already have embeddings or numeric vectors. This is the most general-purpose engine and works with any data that can be represented as float32 vectors — from neural network embeddings to hand-crafted feature vectors. It supports L2, cosine distance, and dot product distance. Use ItemSearchEngine if you have structured catalog-like data such as products, movies, books, jobs, real estate listings, or any records with titles and attributes. This engine handles the encoding internally using a lexical encoder, so you don’t need to generate embeddings yourself. It supports lexical-only search (alpha=0.0), semantic search (alpha=1.0), and hybrid search (0.0 < alpha < 1.0). Use AutocompleteEngine if you need low-RAM query suggestions or title autocomplete. This engine is optimized for prefix-aware matching and can power search-as-you-type experiences.

Common Configuration Parameters

ParameterMeaning
dimVector or encoded representation dimension
MHNSW graph connectivity
ef_constructionBuild-time search width
ef_searchQuery-time search width
delta_ratioMaintenance threshold for delta and deleted records
build_n_threadsNumber of build threads
seedRandom seed for graph construction

Index Files

For an index at path "my_index", brinicle stores the following files on disk:
my_index.main     # Main HNSW graph
my_index.delta    # Delta/pending segment
my_index.lock     # Lock file
The main file contains the primary HNSW graph. The delta file holds pending vectors that have been ingested but not yet merged into the main graph. The lock file prevents concurrent access conflicts. High-level engines like ItemSearchEngine and AutocompleteEngine may store additional metadata beside the index files.