ItemSearchEngine

ItemSearchEngine is brinicle’s high-level engine for structured item search. Use it when records have titles, categories, subcategories, attributes, and optionally semantic vectors. It supports:
  • lexical item search
  • semantic item search
  • hybrid lexical + semantic search
  • single-query search
  • batch search
  • search with distances
  • insert, upsert, and delete
  • compact rebuild and graph optimization
ItemSearchEngine uses the same disk-first HNSW infrastructure as VectorEngine, but it encodes structured item fields internally before indexing and searching. Use ItemSearchEngine when you have structured catalog-like data and need text-based search with filtering. Ideal use cases include:
  • E-commerce product catalogs — search products by title with category and attribute filters
  • Movie databases — find movies by title, genre, and attributes like director or year
  • Job boards — search job listings by title, category, and required skills
  • Real estate listings — find properties by description, type, and features
  • Book catalogs — search books by title, genre, and author attributes
For lexical-only search, use alpha=0.0. For semantic or hybrid search, provide vector_dim and pass vectors during ingest and search.

Constructor

engine = brinicle.ItemSearchEngine(
    index_path,
    dim=96,
    vector_dim=0,
    vector_normalized=False,
    tokenizer_path=None,
    text_prep=None,
    title_ratio=0.9,
    delta_ratio=0.10,
    M=16,
    ef_construction=200,
    ef_search=64,
    build_n_threads=1,
    alpha=0.95,
    seed=0,
    lexical_config=None,
)

Parameters

ParameterTypeDefaultDescription
index_pathstr/PathrequiredBase path for the index files
dimint96Dimension used for the encoded lexical representation
vector_dimint0Dimension of optional semantic vectors
vector_normalizedboolFalseWhether semantic vectors are already normalized
tokenizer_pathstr/Path/NoneNoneOptional custom tokenizer path
text_prepNoneNoneOptional text preprocessing
title_ratiofloat0.9Portion of lexical encoding space reserved for title tokens
delta_ratiofloat0.10Maintenance threshold for delta and deleted records
Mint16HNSW graph connectivity
ef_constructionint200Build-time search width
ef_searchint64Default query-time search width
build_n_threadsint1Number of build threads
alphafloat0.95Balance between lexical and semantic scoring
seedint0Random seed for graph construction
lexical_configLexicalConfig/NoneNoneOptional custom lexical scoring configuration
Example:
engine = brinicle.ItemSearchEngine(
    "items_index",
    dim=96,
    vector_dim=384,
    alpha=0.95,
    M=48,
    ef_construction=1024,
    ef_search=512,
)

Search Modes

ItemSearchEngine can be used in three main modes.
ModeSetup
Lexical searchUse structured fields and set alpha=0.0
Semantic searchProvide vectors and set alpha=1.0
Hybrid searchProvide structured fields and vectors, then use 0.0 < alpha < 1.0

Understanding alpha

alpha controls the balance between semantic vector similarity and lexical matching.
alphaBehavior
0.0Lexical-only
0.5Balanced lexical + semantic
0.95Mostly semantic, with lexical correction
1.0Semantic-only
alpha affects both graph construction and search scoring, so choose it before building the index. When lexical_config is provided, the custom config controls the weights directly.

How Item Scoring Works

In general mode, brinicle combines several distance components:
distance =
    title_weight       * title_distance
  + attribute_weight   * attribute_distance
  + category_weight    * category_distance
  + subcategory_weight * subcategory_distance
  + vector_weight      * vector_distance
Smaller distance means a better match.
ComponentWhat it compares
Title distanceQuery/title token overlap
Attribute distanceAttribute key-value matches and mismatches
Category distanceCategory match or mismatch
Subcategory distanceSubcategory match or mismatch
Vector distanceSemantic vector similarity

Title Matching

Title matching uses a Tversky-style distance over title tokens. The main tuning parameters are:
ParameterEffect
search_title_alphaPenalizes query tokens missing from the item title
search_title_betaPenalizes extra item-title tokens not present in the query
build_title_alphaSame idea during graph construction
build_title_betaSame idea during graph construction
Higher alpha makes missing query terms more expensive. Higher beta makes extra item-title terms more expensive. Lower beta is useful when short queries should match longer titles. For example, "iphone 15" can still match "Apple iPhone 15 Pro Max 256GB".

Attribute Matching

Attributes are key-value pairs.
attributes={
    "brand": "Apple",
    "storage": "256GB",
}
At search time, if the query and item share an attribute key but the values differ, brinicle applies a very large distance penalty. This allows attributes to behave like hard filters when threshold is used.
results = engine.search(
    "iphone 15",
    attributes={"brand": "Apple"},
    threshold=10.0,
)
An item with brand="Samsung" receives a large penalty and can be filtered out by the threshold. If an item does not have the queried attribute, brinicle applies a smaller penalty than a direct value mismatch.

Category and Subcategory Matching

Category and subcategory mismatches use category_penalty. If category or subcategory is missing on either side, no penalty is applied. If both sides have values and they differ, brinicle applies the penalty. The contribution is:
search_category_weight * search_category_penalty
or:
search_subcategory_weight * search_category_penalty
Example hard-filter style setup:
cfg = brinicle.LexicalConfig()

cfg.search_category_weight = 1.0
cfg.search_category_penalty = 2.0

engine = brinicle.ItemSearchEngine(
    "items_index",
    dim=96,
    lexical_config=cfg,
)

results = engine.search(
    "iphone 15",
    category="Electronics",
    threshold=2.0,
)
brinicle accepts results with distance below the threshold. With this setup, a category mismatch contributes 2.0 by itself, so mismatched categories are filtered out. The same approach can be used with subcategories.

Vector Matching

Semantic vector matching is used when:
  • vector_dim > 0
  • vectors are provided during ingest
  • a vector is provided during search
  • vector weight is greater than zero
Example:
engine = brinicle.ItemSearchEngine(
    "semantic_items",
    dim=96,
    vector_dim=384,
    alpha=1.0,
)
If your vectors are already normalized, use:
engine = brinicle.ItemSearchEngine(
    "semantic_items",
    dim=96,
    vector_dim=384,
    alpha=1.0,
    vector_normalized=True,
)
For semantic and hybrid item search, query vectors must have the same dimension as vector_dim.

Ingesting Items

Each item has a required title and optional structured fields.
engine.init(mode="build")

engine.ingest(
    external_id="p1",
    title="Apple iPhone 15 Pro Max 256GB",
    category="Electronics",
    subcategory="Smartphones",
    attributes={
        "brand": "Apple",
        "storage": "256GB",
        "color": "Natural Titanium",
    },
)

engine.finalize()
Only title is required. category, subcategory, attributes, and vector are optional. For semantic or hybrid search, pass a vector during ingest:
engine.ingest(
    external_id="p1",
    title="Apple iPhone 15 Pro Max 256GB",
    category="Electronics",
    subcategory="Smartphones",
    attributes={"brand": "Apple"},
    vector=item_vector,
    normalize=True,
)

Item Fields

FieldTypeRequiredDescription
external_idstrYesUnique identifier for the item
titlestrYesItem title or name
categorystr/int/NoneNoPrimary category
subcategorystr/int/NoneNoSecondary category
attributesdict/NoneNoKey-value attribute pairs
vectornp.ndarray/NoneNoSemantic vector for hybrid/semantic search
normalizeboolFalseWhether to normalize the vector before encoding

Attribute Value Types

The attributes dictionary supports the following value types:
  • str — Text values (tokenized and hashed)
  • int/float — Numeric values (converted to token IDs)
  • bool — Boolean values (mapped to reserved tokens)

Lexical Search Example

Lexical search does not require semantic vectors.
import brinicle

engine = brinicle.ItemSearchEngine(
    "item_index",
    dim=96,
    alpha=0.0,
)

engine.init(mode="build")

engine.ingest(
    external_id="p1",
    title="Apple iPhone 15 Pro Max 256GB",
    category="Electronics",
    subcategory="Smartphones",
    attributes={"brand": "Apple"},
)

engine.ingest(
    external_id="p2",
    title="Samsung Galaxy S24 Ultra 512GB",
    category="Electronics",
    subcategory="Smartphones",
    attributes={"brand": "Samsung"},
)

engine.finalize()

results = engine.search("iphone 15 pro", k=10)

print(results)

Semantic Search Example

Semantic search uses vectors as the main retrieval signal.
import numpy as np
import brinicle

vector_dim = 384

engine = brinicle.ItemSearchEngine(
    "semantic_item_index",
    dim=96,
    vector_dim=vector_dim,
    alpha=1.0,
)

engine.init(mode="build")

item_vector = np.random.randn(vector_dim).astype("float32")

engine.ingest(
    external_id="p1",
    title="Apple iPhone 15 Pro Max 256GB",
    vector=item_vector,
    normalize=True,
)

engine.finalize()

query_vector = np.random.randn(vector_dim).astype("float32")

results = engine.search(
    "iphone 15 pro",
    vector=query_vector,
    normalize=True,
    k=10,
)

print(results)

Hybrid Search Example

Hybrid search combines structured lexical fields with semantic vectors.
import numpy as np
import brinicle

vector_dim = 384

engine = brinicle.ItemSearchEngine(
    "hybrid_item_index",
    dim=96,
    vector_dim=vector_dim,
    alpha=0.95,
)

engine.init(mode="build")

item_vector = np.random.randn(vector_dim).astype("float32")

engine.ingest(
    external_id="p1",
    title="Apple iPhone 15 Pro Max 256GB",
    category="Electronics",
    subcategory="Smartphones",
    attributes={"brand": "Apple", "storage": "256GB"},
    vector=item_vector,
    normalize=True,
)

engine.finalize()

query_vector = np.random.randn(vector_dim).astype("float32")

results = engine.search(
    "iphone 15 pro",
    category="Electronics",
    attributes={"brand": "Apple"},
    vector=query_vector,
    normalize=True,
    k=10,
)

print(results)

Searching

Search by text query — the engine encodes the query and finds the most similar items:
results = engine.search("iphone 15 pro max", k=10)
print(results)  # ['p1', 'p2']

Search with Distance

Get results with distance scores:
results = engine.search_with_distance("iphone 15 pro", category="Electronics", k=10)
# [('p1', 0.12), ('p7', 0.19)]

Structured Search with Filters

Search with category, subcategory, and attribute filters to narrow down results:
results = engine.search(
    "iphone 15",
    category="Electronics",
    subcategory="Smartphones",
    attributes={"brand": "Apple"},
    k=10,
)

Search with Vector

Search with a semantic vector for hybrid or semantic search:
results = engine.search(
    "iphone 15 pro",
    category="Electronics",
    attributes={"brand": "Apple"},
    vector=query_vector,
    normalize=True,
    k=10,
)

Search Parameters

engine.search(
    query,
    k=10,
    efs=None,
    threshold=float("inf"),
    category=None,
    subcategory=None,
    attributes=None,
    vector=None,
    normalize=False,
)
ParameterTypeDefaultDescription
querystrrequiredSearch query text
kint10Maximum number of results to return
efsint/NoneNoneOverride ef_search for this query
thresholdfloatinfMaximum accepted distance
categorystr/int/NoneNoneFilter by category
subcategorystr/int/NoneNoneFilter by subcategory
attributesdict/NoneNoneFilter by attribute key-value pairs
vectornp.ndarray/NoneNoneOptional query vector for semantic/hybrid search
normalizeboolFalseWhether to normalize the query vector
Use search_batch(...) to search multiple queries.
results = engine.search_batch(
    queries,
    categories=categories,
    subcategories=subcategories,
    attributes_list=attributes_list,
    vectors=vectors,
    k=10,
    n_jobs=4,
)
If categories, subcategories, attributes_list, or vectors are provided, their length must match len(queries). Example:
queries = [
    "iphone 15 pro",
    "running shoes",
]

categories = [
    "Electronics",
    "Fashion",
]

attributes_list = [
    {"brand": "Apple"},
    {"brand": "Nike"},
]

results = engine.search_batch(
    queries,
    categories=categories,
    attributes_list=attributes_list,
    k=10,
)
Batch search supports optimized paths for:
  • query-only batch search
  • query + vector batch search
  • full per-query metadata batch search

Insert

Use insert mode to add new items to an existing index.
engine.init(mode="insert")

engine.ingest(
    external_id="p3",
    title="Google Pixel 8 Pro 256GB",
    category="Electronics",
    subcategory="Smartphones",
    attributes={"brand": "Google"},
)

engine.finalize()
Inserted items are added through the delta index.

Upsert

Use upsert mode to replace existing items or insert new ones.
engine.init(mode="upsert")

engine.ingest(
    external_id="p1",
    title="Apple iPhone 15 Pro Max 512GB",
    category="Electronics",
    subcategory="Smartphones",
    attributes={
        "brand": "Apple",
        "storage": "512GB",
    },
)

engine.finalize()
If the external ID already exists, brinicle marks the old record as deleted and inserts the new version. If the external ID does not exist, the item is inserted as a new record.

Delete

Use delete_items(...) to delete items by external ID.
deleted_count, not_found = engine.delete_items(
    ["p1", "p2"],
    return_not_found=True,
)

print(deleted_count)
print(not_found)
If return_not_found=False, the second returned value is None. Deletes are logical until compact rebuild.

Rebuild and Optimize

Item indexes use the same maintenance model as VectorEngine.
engine.needs_rebuild()
Returns whether the index has enough update or delete drift to justify rebuilding.
engine.rebuild_compact()
Rebuilds the index from alive records, removes deleted records physically, and clears the delta index.
engine.optimize_graph()
Runs conditional maintenance. If the index crosses the configured maintenance threshold, brinicle rebuilds the graph.

Lexical Scoring Configuration

Use LexicalConfig when you want direct control over item scoring.
cfg = brinicle.LexicalConfig()

# Build-time weights
cfg.build_title_weight = 0.70
cfg.build_attr_weight = 0.15
cfg.build_subcategory_weight = 0.10
cfg.build_category_weight = 0.05
cfg.build_vector_weight = 0.10
cfg.build_category_penalty = 0.20

# Search-time weights
cfg.search_title_weight = 0.60
cfg.search_attr_weight = 0.10
cfg.search_category_weight = 0.15
cfg.search_subcategory_weight = 0.15
cfg.search_vector_weight = 0.10
cfg.search_category_penalty = 0.30

# Title alpha and beta parameters
cfg.build_title_alpha = 0.5
cfg.build_title_beta = 0.5
cfg.search_title_alpha = 0.5
cfg.search_title_beta = 0.5

engine = brinicle.ItemSearchEngine(
    "item_index",
    dim=96,
    lexical_config=cfg,
)
LexicalConfig has separate build-time and search-time weights. Build-time weights affect graph construction. Search-time weights affect query ranking.

Available LexicalConfig Fields

FieldTypeDefaultDescription
build_title_weightfloatTitle weight during graph construction
build_attr_weightfloatAttribute weight during graph construction
build_category_weightfloatCategory weight during graph construction
build_subcategory_weightfloatSubcategory weight during graph construction
build_vector_weightfloatVector weight during graph construction
search_title_weightfloatTitle weight during search
search_attr_weightfloatAttribute weight during search
search_category_weightfloatCategory weight during search
search_subcategory_weightfloatSubcategory weight during search
search_vector_weightfloatVector weight during search
build_category_penaltyfloatCategory and subcategory mismatch penalty during graph construction
search_category_penaltyfloatCategory and subcategory mismatch penalty during search
build_title_alphafloatBuild-time title Tversky alpha
build_title_betafloatBuild-time title Tversky beta
search_title_alphafloatSearch-time title Tversky alpha
search_title_betafloatSearch-time title Tversky beta
vector_normalizedboolWhether vectors are already normalized

Close and Destroy

Close loaded index resources:
engine.close()
Destroy the index files:
engine.destroy()
destroy() removes the index from disk.

Complete API Reference

init

engine.init(mode="build")
Starts a write session. Supported modes: build, insert, upsert

ingest

engine.ingest(
    external_id,
    title,
    category=None,
    subcategory=None,
    attributes=None,
    vector=None,
    normalize=False,
)
Adds one item to the current write session.

finalize

engine.finalize(
    optimize=False,
    M=0,
    ef_construction=0,
    ef_search=0,
    build_n_threads=0,
    seed=0,
)
Completes the pending write session.
engine.search(
    query,
    k=10,
    efs=None,
    threshold=float("inf"),
    category=None,
    subcategory=None,
    attributes=None,
    vector=None,
    normalize=False,
)
Returns external IDs.

search_with_distance

engine.search_with_distance(
    query,
    k=10,
    efs=None,
    threshold=float("inf"),
    category=None,
    subcategory=None,
    attributes=None,
    vector=None,
    normalize=False,
)
Returns (external_id, distance) pairs.

search_batch

engine.search_batch(
    queries,
    k=10,
    efs=None,
    threshold=float("inf"),
    categories=None,
    subcategories=None,
    attributes_list=None,
    vectors=None,
    normalize=False,
    n_jobs=1,
)
Runs batch search over multiple item queries.

delete_items

engine.delete_items(
    external_ids,
    return_not_found=False,
)
Deletes items by external ID.

needs_rebuild

engine.needs_rebuild()
Returns whether the index has crossed its maintenance threshold.

rebuild_compact

engine.rebuild_compact(
    M=16,
    ef_construction=200,
    ef_search=64,
    build_n_threads=1,
    seed=0,
)
Rebuilds the index from alive records.

optimize_graph

engine.optimize_graph()
Runs conditional graph maintenance.

close

engine.close()
Closes loaded index resources.

destroy

engine.destroy()
Removes index files from disk.