VectorEngine

VectorEngine is the raw vector search engine in Brinicle. Use it when you already have embeddings or numeric vectors and want approximate nearest neighbor search through a disk-first HNSW index. VectorEngine supports:
  • build
  • insert
  • upsert
  • delete
  • single-query search
  • batch search
  • search with distances
  • compact rebuild
  • graph optimization

Constructor

engine = brinicle.VectorEngine(
    index_path,
    dim,
    delta_ratio=0.10,
    M=16,
    ef_construction=200,
    ef_search=64,
    build_n_threads=1,
    seed=0,
    dist_func="l2",
)

Parameters

ParameterTypeDefaultDescription
index_pathstrrequiredBase path for the index files
dimintrequiredVector dimension
delta_ratiofloat0.10Maintenance threshold for delta and deleted records
Mint16HNSW graph connectivity
ef_constructionint200Build-time search width
ef_searchint64Default query-time search width
build_n_threadsint1Number of build threads
seedint0Random seed for graph construction
dist_funcstr”l2”Distance function used by the index
Example:
engine = brinicle.VectorEngine(
    "vector_index",
    dim=384,
    M=48,
    ef_construction=1024,
    ef_search=512,
    delta_ratio=0.1,
)

Choosing HNSW Parameters

The HNSW parameters control the trade-off between search quality, indexing speed, and memory usage:
  • M — Higher values improve recall but increase memory usage and indexing time. Values between 16 and 64 are common. For high-recall applications, use M=48 or higher.
  • ef_construction — Higher values produce better graphs at the cost of slower builds. Values between 200 and 1024 are typical.
  • ef_search — Higher values improve recall at the cost of slower queries. This can also be overridden per-query using the efs parameter.
  • delta_ratio — Controls the size of the delta segment relative to the main segment. A value of 0.10 means the delta segment can grow to 10% of the main segment before requiring a merge.

Distance Functions

VectorEngine supports these distance functions:
dist_funcMeaning
"l2"Squared Euclidean distance
"cosine_distance"1 - cosine_similarity(a, b)
"dot_product_distance"-dot_product(a, b)
Brinicle ranks results by ascending distance. Smaller distance means a better match. For dot_product_distance, a larger dot product becomes a smaller distance:
dot_product = 0.90  -> distance = -0.90
dot_product = 0.20  -> distance = -0.20
So the result with distance -0.90 is ranked before the result with distance -0.20.

Building an Index

Use build mode to create a new index.
import numpy as np
import brinicle

dim = 128

engine = brinicle.VectorEngine("vector_index", dim=dim)

engine.init(mode="build")

for i in range(1000):
    vector = np.random.randn(dim).astype("float32")
    engine.ingest(str(i), vector)

engine.finalize()
Vectors must be one-dimensional float32 arrays with the same dimension as the index.

Finalize Options

finalize(...) completes a pending build, insert, or upsert.
engine.finalize(
    optimize=False,
    M=0,
    ef_construction=0,
    ef_search=0,
    build_n_threads=0,
    seed=0,
)
Passing 0 for build parameters uses the engine defaults. When optimize=False, inserts and upserts are absorbed into the delta index. When optimize=True, Brinicle may rebuild the index if the projected delta size crosses the maintenance threshold controlled by delta_ratio. Use search(...) to return external IDs only.
query = np.random.randn(dim).astype("float32")

results = engine.search(query, k=10)

print(results)
Example output:
["42", "18", "901"]

Search with Distance

Use search_with_distance(...) to return both IDs and distances.
results = engine.search_with_distance(query, k=10)

print(results)
Example output:
[("42", 0.183), ("18", 0.241)]
The result format is:
[(external_id, distance), ...]

Search Parameters

engine.search(
    q,
    k=10,
    efs=64,
    threshold=float("inf"),
)
ParameterTypeDefaultDescription
qnp.ndarrayrequiredQuery vector (float32, 1-D)
kint10Maximum number of results
efsint64Query-time search width
thresholdfloatinfMaximum accepted distance
Increasing efs usually improves recall, but increases query latency. Use search_batch(...) to search multiple query vectors.
queries = np.random.randn(100, dim).astype("float32")

results = engine.search_batch(
    queries,
    k=10,
    efs=64,
    n_jobs=4,
)
queries must be a two-dimensional float32 array:
(num_queries, dim)
The return value contains one result list per query:
[
    ["42", "18", "901"],
    ["7", "103", "88"],
    ...
]
n_jobs controls parallel query execution when parallel execution is available.

Building from File

For very large datasets, you can build the index directly from a file:
engine.build_from_file(
    vectors_path="/path/to/vectors.bin",
    external_ids="/path/to/ids.txt",
    params=None,  # Optional HNSWParams
)

Index State

engine.has_index
Returns whether the engine currently has a loaded main or delta index.
engine.dim
Returns the vector dimension of the index.
engine.needs_rebuild()
Returns whether the index has enough delta or deleted records to justify a rebuild.

Utility Functions

brinicle also exposes some utility functions for distance computation and brute-force search:
from brinicle import l2_sqr, dot, brute_knn_batch
import numpy as np

# Compute squared L2 distance between two vectors
a = np.array([1.0, 2.0], dtype=np.float32)
b = np.array([3.0, 4.0], dtype=np.float32)
dist = l2_sqr(a, b)  # 8.0

# Compute dot product
product = dot(a, b)  # 11.0

# Brute-force KNN with OpenMP parallelism
X = np.random.randn(10000, 128).astype(np.float32)
Q = np.random.randn(10, 128).astype(np.float32)
neighbors = brute_knn_batch(X, Q, k=10, n_jobs=4)
These utility functions are useful for benchmarking, testing, and verifying search results against brute-force computation.

Complete API Reference

init

engine.init(mode="build")
Starts a write session. Supported modes: build, insert, upsert

ingest

engine.ingest(external_id, vector)
Adds one vector to the current pending write session. Call init(...) before calling ingest(...).

finalize

engine.finalize(
    optimize=False,
    M=0,
    ef_construction=0,
    ef_search=0,
    build_n_threads=0,
    seed=0,
)
Completes the pending write session.

search

engine.search(
    q,
    k=10,
    efs=64,
    threshold=float("inf"),
)
Returns external IDs.

search_with_distance

engine.search_with_distance(
    q,
    k=10,
    efs=64,
    threshold=float("inf"),
)
Returns (external_id, distance) pairs.

search_batch

engine.search_batch(
    Q,
    k=10,
    efs=64,
    threshold=float("inf"),
    n_jobs=1,
)
Runs batch search over a two-dimensional query matrix.

delete_items

engine.delete_items(
    external_ids,
    return_not_found=False,
)
Deletes records by external ID.

needs_rebuild

engine.needs_rebuild()
Returns whether the index has crossed its maintenance threshold.

rebuild_compact

engine.rebuild_compact(
    M=16,
    ef_construction=200,
    ef_search=64,
    build_n_threads=1,
    seed=0,
)
Rebuilds the index from alive records.

optimize_graph

engine.optimize_graph()
Runs conditional graph maintenance.

close

engine.close()
Closes loaded index resources.

destroy

engine.destroy()
Removes index files from disk.