ItemSearchEngine

ItemSearchEngine is brinicle’s high-level engine for structured item search. Use it when records have titles, categories, subcategories, attributes, and optionally semantic vectors. It supports:

lexical item search
semantic item search
hybrid lexical + semantic search
single-query search
batch search
search with distances
insert, upsert, and delete
compact rebuild and graph optimization

ItemSearchEngine uses the same disk-first HNSW infrastructure as VectorEngine, but it encodes structured item fields internally before indexing and searching.

When to Use Item Search

Use ItemSearchEngine when you have structured catalog-like data and need text-based search with filtering. Ideal use cases include:

E-commerce product catalogs — search products by title with category and attribute filters
Movie databases — find movies by title, genre, and attributes like director or year
Job boards — search job listings by title, category, and required skills
Real estate listings — find properties by description, type, and features
Book catalogs — search books by title, genre, and author attributes

For lexical-only search, use alpha=0.0. For semantic or hybrid search, provide vector_dim and pass vectors during ingest and search.

Constructor

engine = brinicle.ItemSearchEngine(
    index_path,
    dim=96,
    vector_dim=0,
    vector_normalized=False,
    tokenizer_path=None,
    text_prep=None,
    title_ratio=0.9,
    delta_ratio=0.10,
    M=16,
    ef_construction=200,
    ef_search=64,
    build_n_threads=1,
    alpha=0.95,
    seed=0,
    lexical_config=None,
)

Parameters

Parameter	Type	Default	Description
`index_path`	str/Path	required	Base path for the index files
`dim`	int	96	Dimension used for the encoded lexical representation
`vector_dim`	int	0	Dimension of optional semantic vectors
`vector_normalized`	bool	False	Whether semantic vectors are already normalized
`tokenizer_path`	str/Path/None	None	Optional custom tokenizer path
`text_prep`	None	None	Optional text preprocessing
`title_ratio`	float	0.9	Portion of lexical encoding space reserved for title tokens
`delta_ratio`	float	0.10	Maintenance threshold for delta and deleted records
`M`	int	16	HNSW graph connectivity
`ef_construction`	int	200	Build-time search width
`ef_search`	int	64	Default query-time search width
`build_n_threads`	int	1	Number of build threads
`alpha`	float	0.95	Balance between lexical and semantic scoring
`seed`	int	0	Random seed for graph construction
`lexical_config`	LexicalConfig/None	None	Optional custom lexical scoring configuration

Example:

engine = brinicle.ItemSearchEngine(
    "items_index",
    dim=96,
    vector_dim=384,
    alpha=0.95,
    M=48,
    ef_construction=1024,
    ef_search=512,
)

Search Modes

ItemSearchEngine can be used in three main modes.

Mode	Setup
Lexical search	Use structured fields and set `alpha=0.0`
Semantic search	Provide vectors and set `alpha=1.0`
Hybrid search	Provide structured fields and vectors, then use `0.0 < alpha < 1.0`

Understanding `alpha`

alpha controls the balance between semantic vector similarity and lexical matching.

`alpha`	Behavior
`0.0`	Lexical-only
`0.5`	Balanced lexical + semantic
`0.95`	Mostly semantic, with lexical correction
`1.0`	Semantic-only

alpha affects both graph construction and search scoring, so choose it before building the index. When lexical_config is provided, the custom config controls the weights directly.

How Item Scoring Works

In general mode, brinicle combines several distance components:

distance =
    title_weight       * title_distance
  + attribute_weight   * attribute_distance
  + category_weight    * category_distance
  + subcategory_weight * subcategory_distance
  + vector_weight      * vector_distance

Smaller distance means a better match.

Component	What it compares
Title distance	Query/title token overlap
Attribute distance	Attribute key-value matches and mismatches
Category distance	Category match or mismatch
Subcategory distance	Subcategory match or mismatch
Vector distance	Semantic vector similarity

Title Matching

Title matching uses a Tversky-style distance over title tokens. The main tuning parameters are:

Parameter	Effect
`search_title_alpha`	Penalizes query tokens missing from the item title
`search_title_beta`	Penalizes extra item-title tokens not present in the query
`build_title_alpha`	Same idea during graph construction
`build_title_beta`	Same idea during graph construction

Higher alpha makes missing query terms more expensive. Higher beta makes extra item-title terms more expensive. Lower beta is useful when short queries should match longer titles. For example, "iphone 15" can still match "Apple iPhone 15 Pro Max 256GB".

Attribute Matching

Attributes are key-value pairs.

attributes={
    "brand": "Apple",
    "storage": "256GB",
}

At search time, if the query and item share an attribute key but the values differ, brinicle applies a very large distance penalty. This allows attributes to behave like hard filters when threshold is used.

results = engine.search(
    "iphone 15",
    attributes={"brand": "Apple"},
    threshold=10.0,
)

An item with brand="Samsung" receives a large penalty and can be filtered out by the threshold. If an item does not have the queried attribute, brinicle applies a smaller penalty than a direct value mismatch.

Category and Subcategory Matching

Category and subcategory mismatches use category_penalty. If category or subcategory is missing on either side, no penalty is applied. If both sides have values and they differ, brinicle applies the penalty. The contribution is:

search_category_weight * search_category_penalty

or:

search_subcategory_weight * search_category_penalty

Example hard-filter style setup:

cfg = brinicle.LexicalConfig()

cfg.search_category_weight = 1.0
cfg.search_category_penalty = 2.0

engine = brinicle.ItemSearchEngine(
    "items_index",
    dim=96,
    lexical_config=cfg,
)

results = engine.search(
    "iphone 15",
    category="Electronics",
    threshold=2.0,
)

brinicle accepts results with distance below the threshold. With this setup, a category mismatch contributes 2.0 by itself, so mismatched categories are filtered out. The same approach can be used with subcategories.

Vector Matching

Semantic vector matching is used when:

vector_dim > 0
vectors are provided during ingest
a vector is provided during search
vector weight is greater than zero

Example:

engine = brinicle.ItemSearchEngine(
    "semantic_items",
    dim=96,
    vector_dim=384,
    alpha=1.0,
)

If your vectors are already normalized, use:

engine = brinicle.ItemSearchEngine(
    "semantic_items",
    dim=96,
    vector_dim=384,
    alpha=1.0,
    vector_normalized=True,
)

For semantic and hybrid item search, query vectors must have the same dimension as vector_dim.

Ingesting Items

Each item has a required title and optional structured fields.

engine.init(mode="build")

engine.ingest(
    external_id="p1",
    title="Apple iPhone 15 Pro Max 256GB",
    category="Electronics",
    subcategory="Smartphones",
    attributes={
        "brand": "Apple",
        "storage": "256GB",
        "color": "Natural Titanium",
    },
)

engine.finalize()

Only title is required. category, subcategory, attributes, and vector are optional. For semantic or hybrid search, pass a vector during ingest:

engine.ingest(
    external_id="p1",
    title="Apple iPhone 15 Pro Max 256GB",
    category="Electronics",
    subcategory="Smartphones",
    attributes={"brand": "Apple"},
    vector=item_vector,
    normalize=True,
)

Item Fields

Field	Type	Required	Description
`external_id`	str	Yes	Unique identifier for the item
`title`	str	Yes	Item title or name
`category`	str/int/None	No	Primary category
`subcategory`	str/int/None	No	Secondary category
`attributes`	dict/None	No	Key-value attribute pairs
`vector`	np.ndarray/None	No	Semantic vector for hybrid/semantic search
`normalize`	bool	False	Whether to normalize the vector before encoding

Attribute Value Types

The attributes dictionary supports the following value types:

str — Text values (tokenized and hashed)
int/float — Numeric values (converted to token IDs)
bool — Boolean values (mapped to reserved tokens)

Lexical Search Example

Lexical search does not require semantic vectors.

import brinicle

engine = brinicle.ItemSearchEngine(
    "item_index",
    dim=96,
    alpha=0.0,
)

engine.init(mode="build")

engine.ingest(
    external_id="p1",
    title="Apple iPhone 15 Pro Max 256GB",
    category="Electronics",
    subcategory="Smartphones",
    attributes={"brand": "Apple"},
)

engine.ingest(
    external_id="p2",
    title="Samsung Galaxy S24 Ultra 512GB",
    category="Electronics",
    subcategory="Smartphones",
    attributes={"brand": "Samsung"},
)

engine.finalize()

results = engine.search("iphone 15 pro", k=10)

print(results)

Semantic Search Example

Semantic search uses vectors as the main retrieval signal.

import numpy as np
import brinicle

vector_dim = 384

engine = brinicle.ItemSearchEngine(
    "semantic_item_index",
    dim=96,
    vector_dim=vector_dim,
    alpha=1.0,
)

engine.init(mode="build")

item_vector = np.random.randn(vector_dim).astype("float32")

engine.ingest(
    external_id="p1",
    title="Apple iPhone 15 Pro Max 256GB",
    vector=item_vector,
    normalize=True,
)

engine.finalize()

query_vector = np.random.randn(vector_dim).astype("float32")

results = engine.search(
    "iphone 15 pro",
    vector=query_vector,
    normalize=True,
    k=10,
)

print(results)

Hybrid Search Example

Hybrid search combines structured lexical fields with semantic vectors.

import numpy as np
import brinicle

vector_dim = 384

engine = brinicle.ItemSearchEngine(
    "hybrid_item_index",
    dim=96,
    vector_dim=vector_dim,
    alpha=0.95,
)

engine.init(mode="build")

item_vector = np.random.randn(vector_dim).astype("float32")

engine.ingest(
    external_id="p1",
    title="Apple iPhone 15 Pro Max 256GB",
    category="Electronics",
    subcategory="Smartphones",
    attributes={"brand": "Apple", "storage": "256GB"},
    vector=item_vector,
    normalize=True,
)

engine.finalize()

query_vector = np.random.randn(vector_dim).astype("float32")

results = engine.search(
    "iphone 15 pro",
    category="Electronics",
    attributes={"brand": "Apple"},
    vector=query_vector,
    normalize=True,
    k=10,
)

print(results)

Searching

Basic Text Search

Search by text query — the engine encodes the query and finds the most similar items:

results = engine.search("iphone 15 pro max", k=10)
print(results)  # ['p1', 'p2']

Search with Distance

Get results with distance scores:

results = engine.search_with_distance("iphone 15 pro", category="Electronics", k=10)
# [('p1', 0.12), ('p7', 0.19)]

Structured Search with Filters

Search with category, subcategory, and attribute filters to narrow down results:

results = engine.search(
    "iphone 15",
    category="Electronics",
    subcategory="Smartphones",
    attributes={"brand": "Apple"},
    k=10,
)

Search with Vector

Search with a semantic vector for hybrid or semantic search:

results = engine.search(
    "iphone 15 pro",
    category="Electronics",
    attributes={"brand": "Apple"},
    vector=query_vector,
    normalize=True,
    k=10,
)

Search Parameters

engine.search(
    query,
    k=10,
    efs=None,
    threshold=float("inf"),
    category=None,
    subcategory=None,
    attributes=None,
    vector=None,
    normalize=False,
)

Parameter	Type	Default	Description
`query`	str	required	Search query text
`k`	int	10	Maximum number of results to return
`efs`	int/None	None	Override ef_search for this query
`threshold`	float	inf	Maximum accepted distance
`category`	str/int/None	None	Filter by category
`subcategory`	str/int/None	None	Filter by subcategory
`attributes`	dict/None	None	Filter by attribute key-value pairs
`vector`	np.ndarray/None	None	Optional query vector for semantic/hybrid search
`normalize`	bool	False	Whether to normalize the query vector

Batch Search

Use search_batch(...) to search multiple queries.

results = engine.search_batch(
    queries,
    categories=categories,
    subcategories=subcategories,
    attributes_list=attributes_list,
    vectors=vectors,
    k=10,
    n_jobs=4,
)

If categories, subcategories, attributes_list, or vectors are provided, their length must match len(queries). Example:

queries = [
    "iphone 15 pro",
    "running shoes",
]

categories = [
    "Electronics",
    "Fashion",
]

attributes_list = [
    {"brand": "Apple"},
    {"brand": "Nike"},
]

results = engine.search_batch(
    queries,
    categories=categories,
    attributes_list=attributes_list,
    k=10,
)

Batch search supports optimized paths for:

query-only batch search
query + vector batch search
full per-query metadata batch search

Insert

Use insert mode to add new items to an existing index.

engine.init(mode="insert")

engine.ingest(
    external_id="p3",
    title="Google Pixel 8 Pro 256GB",
    category="Electronics",
    subcategory="Smartphones",
    attributes={"brand": "Google"},
)

engine.finalize()

Inserted items are added through the delta index.

Upsert

Use upsert mode to replace existing items or insert new ones.

engine.init(mode="upsert")

engine.ingest(
    external_id="p1",
    title="Apple iPhone 15 Pro Max 512GB",
    category="Electronics",
    subcategory="Smartphones",
    attributes={
        "brand": "Apple",
        "storage": "512GB",
    },
)

engine.finalize()

If the external ID already exists, brinicle marks the old record as deleted and inserts the new version. If the external ID does not exist, the item is inserted as a new record.

Delete

Use delete_items(...) to delete items by external ID.

deleted_count, not_found = engine.delete_items(
    ["p1", "p2"],
    return_not_found=True,
)

print(deleted_count)
print(not_found)

If return_not_found=False, the second returned value is None. Deletes are logical until compact rebuild.

Rebuild and Optimize

Item indexes use the same maintenance model as VectorEngine.

engine.needs_rebuild()

Returns whether the index has enough update or delete drift to justify rebuilding.

engine.rebuild_compact()

Rebuilds the index from alive records, removes deleted records physically, and clears the delta index.

engine.optimize_graph()

Runs conditional maintenance. If the index crosses the configured maintenance threshold, brinicle rebuilds the graph.

Lexical Scoring Configuration

Use LexicalConfig when you want direct control over item scoring.

cfg = brinicle.LexicalConfig()

# Build-time weights
cfg.build_title_weight = 0.70
cfg.build_attr_weight = 0.15
cfg.build_subcategory_weight = 0.10
cfg.build_category_weight = 0.05
cfg.build_vector_weight = 0.10
cfg.build_category_penalty = 0.20

# Search-time weights
cfg.search_title_weight = 0.60
cfg.search_attr_weight = 0.10
cfg.search_category_weight = 0.15
cfg.search_subcategory_weight = 0.15
cfg.search_vector_weight = 0.10
cfg.search_category_penalty = 0.30

# Title alpha and beta parameters
cfg.build_title_alpha = 0.5
cfg.build_title_beta = 0.5
cfg.search_title_alpha = 0.5
cfg.search_title_beta = 0.5

engine = brinicle.ItemSearchEngine(
    "item_index",
    dim=96,
    lexical_config=cfg,
)

LexicalConfig has separate build-time and search-time weights. Build-time weights affect graph construction. Search-time weights affect query ranking.

Available LexicalConfig Fields

Field	Type	Default	Description
`build_title_weight`	float	—	Title weight during graph construction
`build_attr_weight`	float	—	Attribute weight during graph construction
`build_category_weight`	float	—	Category weight during graph construction
`build_subcategory_weight`	float	—	Subcategory weight during graph construction
`build_vector_weight`	float	—	Vector weight during graph construction
`search_title_weight`	float	—	Title weight during search
`search_attr_weight`	float	—	Attribute weight during search
`search_category_weight`	float	—	Category weight during search
`search_subcategory_weight`	float	—	Subcategory weight during search
`search_vector_weight`	float	—	Vector weight during search
`build_category_penalty`	float	—	Category and subcategory mismatch penalty during graph construction
`search_category_penalty`	float	—	Category and subcategory mismatch penalty during search
`build_title_alpha`	float	—	Build-time title Tversky alpha
`build_title_beta`	float	—	Build-time title Tversky beta
`search_title_alpha`	float	—	Search-time title Tversky alpha
`search_title_beta`	float	—	Search-time title Tversky beta
`vector_normalized`	bool	—	Whether vectors are already normalized

Close and Destroy

Close loaded index resources:

engine.close()

Destroy the index files:

engine.destroy()

destroy() removes the index from disk.

Complete API Reference

`init`

engine.init(mode="build")

Starts a write session. Supported modes: build, insert, upsert

`ingest`

engine.ingest(
    external_id,
    title,
    category=None,
    subcategory=None,
    attributes=None,
    vector=None,
    normalize=False,
)

Adds one item to the current write session.

`finalize`

engine.finalize(
    optimize=False,
    M=0,
    ef_construction=0,
    ef_search=0,
    build_n_threads=0,
    seed=0,
)

Completes the pending write session.

`search`

engine.search(
    query,
    k=10,
    efs=None,
    threshold=float("inf"),
    category=None,
    subcategory=None,
    attributes=None,
    vector=None,
    normalize=False,
)

Returns external IDs.

`search_with_distance`

engine.search_with_distance(
    query,
    k=10,
    efs=None,
    threshold=float("inf"),
    category=None,
    subcategory=None,
    attributes=None,
    vector=None,
    normalize=False,
)

Returns (external_id, distance) pairs.

`search_batch`

engine.search_batch(
    queries,
    k=10,
    efs=None,
    threshold=float("inf"),
    categories=None,
    subcategories=None,
    attributes_list=None,
    vectors=None,
    normalize=False,
    n_jobs=1,
)

Runs batch search over multiple item queries.

`delete_items`

engine.delete_items(
    external_ids,
    return_not_found=False,
)

Deletes items by external ID.

`needs_rebuild`

engine.needs_rebuild()

Returns whether the index has crossed its maintenance threshold.

`rebuild_compact`

engine.rebuild_compact(
    M=16,
    ef_construction=200,
    ef_search=64,
    build_n_threads=1,
    seed=0,
)

Rebuilds the index from alive records.

`optimize_graph`

engine.optimize_graph()

Runs conditional graph maintenance.

`close`

engine.close()

Closes loaded index resources.

`destroy`

engine.destroy()

Removes index files from disk.

​ItemSearchEngine

​When to Use Item Search

​Constructor

​Parameters

​Search Modes

​Understanding alpha

​How Item Scoring Works

​Title Matching

​Attribute Matching

​Category and Subcategory Matching

​Vector Matching

​Ingesting Items

​Item Fields

​Attribute Value Types

​Lexical Search Example

​Semantic Search Example

​Hybrid Search Example

​Searching

​Basic Text Search

​Search with Distance

​Structured Search with Filters

​Search with Vector

​Search Parameters

​Batch Search

​Insert

​Upsert

​Delete

​Rebuild and Optimize

​Lexical Scoring Configuration

​Available LexicalConfig Fields

​Close and Destroy

​Complete API Reference

​init

​ingest

​finalize

​search

​search_with_distance

​search_batch

​delete_items

​needs_rebuild

​rebuild_compact

​optimize_graph

​close

​destroy