ItemSearchEngine
ItemSearchEngine is brinicle’s high-level engine for structured item search.
Use it when records have titles, categories, subcategories, attributes, and optionally semantic vectors.
It supports:
- lexical item search
- semantic item search
- hybrid lexical + semantic search
- single-query search
- batch search
- search with distances
- insert, upsert, and delete
- compact rebuild and graph optimization
ItemSearchEngine uses the same disk-first HNSW infrastructure as VectorEngine, but it encodes structured item fields internally before indexing and searching.
When to Use Item Search
UseItemSearchEngine when you have structured catalog-like data and need text-based search with filtering. Ideal use cases include:
- E-commerce product catalogs — search products by title with category and attribute filters
- Movie databases — find movies by title, genre, and attributes like director or year
- Job boards — search job listings by title, category, and required skills
- Real estate listings — find properties by description, type, and features
- Book catalogs — search books by title, genre, and author attributes
alpha=0.0. For semantic or hybrid search, provide vector_dim and pass vectors during ingest and search.
Constructor
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
index_path | str/Path | required | Base path for the index files |
dim | int | 96 | Dimension used for the encoded lexical representation |
vector_dim | int | 0 | Dimension of optional semantic vectors |
vector_normalized | bool | False | Whether semantic vectors are already normalized |
tokenizer_path | str/Path/None | None | Optional custom tokenizer path |
text_prep | None | None | Optional text preprocessing |
title_ratio | float | 0.9 | Portion of lexical encoding space reserved for title tokens |
delta_ratio | float | 0.10 | Maintenance threshold for delta and deleted records |
M | int | 16 | HNSW graph connectivity |
ef_construction | int | 200 | Build-time search width |
ef_search | int | 64 | Default query-time search width |
build_n_threads | int | 1 | Number of build threads |
alpha | float | 0.95 | Balance between lexical and semantic scoring |
seed | int | 0 | Random seed for graph construction |
lexical_config | LexicalConfig/None | None | Optional custom lexical scoring configuration |
Search Modes
ItemSearchEngine can be used in three main modes.
| Mode | Setup |
|---|---|
| Lexical search | Use structured fields and set alpha=0.0 |
| Semantic search | Provide vectors and set alpha=1.0 |
| Hybrid search | Provide structured fields and vectors, then use 0.0 < alpha < 1.0 |
Understanding alpha
alpha controls the balance between semantic vector similarity and lexical matching.
alpha | Behavior |
|---|---|
0.0 | Lexical-only |
0.5 | Balanced lexical + semantic |
0.95 | Mostly semantic, with lexical correction |
1.0 | Semantic-only |
alpha affects both graph construction and search scoring, so choose it before building the index.
When lexical_config is provided, the custom config controls the weights directly.
How Item Scoring Works
In general mode, brinicle combines several distance components:| Component | What it compares |
|---|---|
| Title distance | Query/title token overlap |
| Attribute distance | Attribute key-value matches and mismatches |
| Category distance | Category match or mismatch |
| Subcategory distance | Subcategory match or mismatch |
| Vector distance | Semantic vector similarity |
Title Matching
Title matching uses a Tversky-style distance over title tokens. The main tuning parameters are:| Parameter | Effect |
|---|---|
search_title_alpha | Penalizes query tokens missing from the item title |
search_title_beta | Penalizes extra item-title tokens not present in the query |
build_title_alpha | Same idea during graph construction |
build_title_beta | Same idea during graph construction |
alpha makes missing query terms more expensive.
Higher beta makes extra item-title terms more expensive.
Lower beta is useful when short queries should match longer titles. For example, "iphone 15" can still match "Apple iPhone 15 Pro Max 256GB".
Attribute Matching
Attributes are key-value pairs.threshold is used.
brand="Samsung" receives a large penalty and can be filtered out by the threshold.
If an item does not have the queried attribute, brinicle applies a smaller penalty than a direct value mismatch.
Category and Subcategory Matching
Category and subcategory mismatches usecategory_penalty.
If category or subcategory is missing on either side, no penalty is applied.
If both sides have values and they differ, brinicle applies the penalty.
The contribution is:
2.0 by itself, so mismatched categories are filtered out.
The same approach can be used with subcategories.
Vector Matching
Semantic vector matching is used when:vector_dim > 0- vectors are provided during ingest
- a vector is provided during search
- vector weight is greater than zero
vector_dim.
Ingesting Items
Each item has a requiredtitle and optional structured fields.
title is required. category, subcategory, attributes, and vector are optional.
For semantic or hybrid search, pass a vector during ingest:
Item Fields
| Field | Type | Required | Description |
|---|---|---|---|
external_id | str | Yes | Unique identifier for the item |
title | str | Yes | Item title or name |
category | str/int/None | No | Primary category |
subcategory | str/int/None | No | Secondary category |
attributes | dict/None | No | Key-value attribute pairs |
vector | np.ndarray/None | No | Semantic vector for hybrid/semantic search |
normalize | bool | False | Whether to normalize the vector before encoding |
Attribute Value Types
Theattributes dictionary supports the following value types:
- str — Text values (tokenized and hashed)
- int/float — Numeric values (converted to token IDs)
- bool — Boolean values (mapped to reserved tokens)
Lexical Search Example
Lexical search does not require semantic vectors.Semantic Search Example
Semantic search uses vectors as the main retrieval signal.Hybrid Search Example
Hybrid search combines structured lexical fields with semantic vectors.Searching
Basic Text Search
Search by text query — the engine encodes the query and finds the most similar items:Search with Distance
Get results with distance scores:Structured Search with Filters
Search with category, subcategory, and attribute filters to narrow down results:Search with Vector
Search with a semantic vector for hybrid or semantic search:Search Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query | str | required | Search query text |
k | int | 10 | Maximum number of results to return |
efs | int/None | None | Override ef_search for this query |
threshold | float | inf | Maximum accepted distance |
category | str/int/None | None | Filter by category |
subcategory | str/int/None | None | Filter by subcategory |
attributes | dict/None | None | Filter by attribute key-value pairs |
vector | np.ndarray/None | None | Optional query vector for semantic/hybrid search |
normalize | bool | False | Whether to normalize the query vector |
Batch Search
Usesearch_batch(...) to search multiple queries.
categories, subcategories, attributes_list, or vectors are provided, their length must match len(queries).
Example:
- query-only batch search
- query + vector batch search
- full per-query metadata batch search
Insert
Useinsert mode to add new items to an existing index.
Upsert
Useupsert mode to replace existing items or insert new ones.
Delete
Usedelete_items(...) to delete items by external ID.
return_not_found=False, the second returned value is None.
Deletes are logical until compact rebuild.
Rebuild and Optimize
Item indexes use the same maintenance model asVectorEngine.
Lexical Scoring Configuration
UseLexicalConfig when you want direct control over item scoring.
LexicalConfig has separate build-time and search-time weights. Build-time weights affect graph construction. Search-time weights affect query ranking.
Available LexicalConfig Fields
| Field | Type | Default | Description |
|---|---|---|---|
build_title_weight | float | — | Title weight during graph construction |
build_attr_weight | float | — | Attribute weight during graph construction |
build_category_weight | float | — | Category weight during graph construction |
build_subcategory_weight | float | — | Subcategory weight during graph construction |
build_vector_weight | float | — | Vector weight during graph construction |
search_title_weight | float | — | Title weight during search |
search_attr_weight | float | — | Attribute weight during search |
search_category_weight | float | — | Category weight during search |
search_subcategory_weight | float | — | Subcategory weight during search |
search_vector_weight | float | — | Vector weight during search |
build_category_penalty | float | — | Category and subcategory mismatch penalty during graph construction |
search_category_penalty | float | — | Category and subcategory mismatch penalty during search |
build_title_alpha | float | — | Build-time title Tversky alpha |
build_title_beta | float | — | Build-time title Tversky beta |
search_title_alpha | float | — | Search-time title Tversky alpha |
search_title_beta | float | — | Search-time title Tversky beta |
vector_normalized | bool | — | Whether vectors are already normalized |
Close and Destroy
Close loaded index resources:destroy() removes the index from disk.
Complete API Reference
init
build, insert, upsert
ingest
finalize
search
search_with_distance
(external_id, distance) pairs.