Deployment

brinicle can be deployed in several ways depending on your architecture and requirements. This guide covers the most common deployment patterns.

Docker Deployment

The simplest way to deploy brinicle as a service is using Docker. The repository includes a Dockerfile and docker-compose.yml for production-ready deployment.

Using Docker Compose

# Clone the repository
git clone https://github.com/bicardinal/brinicle.git
cd brinicle

# Start the server
docker compose up -d
The default configuration runs the server on port 1984 with a 1GB memory limit. The index data is stored in the /app/data/ directory inside the container.

Custom Configuration

Modify the docker-compose.yml for your needs:
version: "3"
services:
  brinicle:
    build: .
    ports:
      - "1984:1984"
    environment:
      - STORE_DIR=/app/data
    mem_limit: 2g
    volumes:
      - ./data:/app/data
    restart: unless-stopped

Persistent Data

To persist index data across container restarts, mount a volume:
volumes:
  - brinicle-data:/app/data

volumes:
  brinicle-data:
    driver: local

Manual Deployment

If you prefer to deploy without Docker, you can install and run the server directly.

Install Dependencies

pip install brinicle[server]
This installs the core library along with FastAPI, Uvicorn, and other server dependencies.

Run the Server

uvicorn brinicle.ref.api:app --host 0.0.0.0 --port 1984 --workers 1

Systemd Service

Create a systemd service file for production deployment:
[Unit]
Description=brinicle Vector Engine API
After=network.target

[Service]
Type=simple
User=brinicle
WorkingDirectory=/opt/brinicle
ExecStart=/opt/brinicle/venv/bin/uvicorn brinicle.ref.api:app --host 0.0.0.0 --port 1984
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Nginx Reverse Proxy

For production, it’s recommended to place brinicle behind a reverse proxy that handles TLS termination, authentication, and rate limiting:
server {
    listen 443 ssl;
    server_name brinicle.example.com;

    ssl_certificate /etc/ssl/certs/brinicle.crt;
    ssl_certificate_key /etc/ssl/private/brinicle.key;

    location / {
        proxy_pass http://127.0.0.1:1984;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Binary protocol support
        client_max_body_size 100M;
        proxy_request_buffering off;
    }
}

Embedded Deployment

For applications that don’t need a separate server, you can embed brinicle directly in your Python process:
import brinicle

# Direct in-process usage — no server needed
engine = brinicle.VectorEngine("my_index", dim=384)
engine.init(mode="build")
engine.ingest("v1", vector)
engine.finalize()
results = engine.search(query, k=10)
This approach provides the lowest latency since there’s no network overhead, but it’s limited to Python applications and shares the same process memory.

Scaling Considerations

Single Instance

brinicle is designed for single-instance deployments with datasets under 10M vectors. For most use cases, a single instance is sufficient and provides the simplest operational model.

Memory Planning

When planning memory for your deployment, consider:
  • Index size — brinicle is disk-first, but it still needs some RAM for the delta segment and search buffers
  • Concurrent queries — more concurrent queries require more RAM for search buffers
  • Delta ratio — a higher delta_ratio means more RAM for the delta segment
As a guideline, brinicle can run in as little as 256MB RAM for small datasets (60K vectors), and 1-2GB is typically sufficient for datasets up to 1M vectors.

Index Sharding

For datasets larger than 10M vectors, consider sharding your data across multiple brinicle instances, each handling a subset of the data. You can implement a simple routing layer that directs queries to the appropriate shard based on the data partitioning strategy.

Health Monitoring

The HTTP server provides a health check endpoint that returns the number of loaded indexes:
curl http://localhost:1984/
# {"success":true,"message":"Vector Engine API is running. 2 index(es) loaded."}
Use this endpoint for load balancer health checks and monitoring:
  • A successful response indicates the server is running
  • The message includes the number of loaded indexes for operational awareness