# CefasDB Documentation Full Context

Canonical docs site: https://docs.cefasdb.com
Source of truth: https://github.com/CefasDB/cefasdb-docs/wiki

This file is generated from the GitHub Wiki markdown copied during the documentation build. Use canonical rendered links when citing user-facing docs.

## CefasDB

Rendered: https://docs.cefasdb.com/docs/Home
Markdown: https://docs.cefasdb.com/wiki/Home.md

# CefasDB

CefasDB is a high-performance NoSQL key-value and document database written in Go. It is designed for predictable millisecond-class access, horizontal scale, and a small operational footprint while giving teams direct control over deployment, storage, replication, and extensions.

The engine combines primary-key access, typed document attributes, SQL and PartiQL-style querying, geospatial indexes, similarity search, plugin-backed indexes, raft replication, backup and restore, and a CLI built around table, item, query, plugin, and cluster operations.

The project is intentionally split into a small core and a broad plugin surface. The core owns tables, items, conditional writes, TTL, streams, secondary index lifecycle, query planning, storage, replication, and API transport. Plugins own specialized search and ads-workflow behavior: bloom filters, trigram text search, vector LSH, geohash, HyperLogLog, Count-Min Sketch, distance operators, deduplication, frequency caps, and privacy-aware aggregation.

## Read by section

The wiki is organized into five sections. Each section starts with an overview page and then moves into focused pages that can be read independently.

The [Get Started](Get-Started-Overview) section shows how to build and run CefasDB locally, in Docker, with Docker Compose, and on Kubernetes. Start there if you want a server process running before reading internals.

The [Concepts and Architecture](Concepts-Overview) section explains the data model, storage layout, query planner, indexing model, replication path, deployment modes, and security model. Read it before deciding how CefasDB should fit into an application or platform.

The [Plugins](Plugins-Overview) section documents the plugin boundary and every built-in plugin family. It covers how to write a plugin, how the import graph is enforced, how index plugins are configured, and how audience workflows are composed.

The [Interfaces](Interfaces-Overview) section covers the CLI, SQL and PartiQL surface, HTTP and gRPC APIs, and the Go client package. Read it when integrating CefasDB from another process.

The [Operations](Operations-Overview) section covers configuration, backup and restore, observability, benchmark results, security, privacy, and production runbooks.

## Quick paths

| Intent | Start here |
| --- | --- |
| I want to run CefasDB on my laptop | [Run Locally](Get-Started-Run-Locally) |
| I want a one-container server | [Run In Docker](Get-Started-Run-In-Docker) |
| I want a local multi-node cluster | [Run With Docker Compose](Get-Started-Run-With-Docker-Compose) |
| I want Kubernetes manifests | [Run In Kubernetes](Get-Started-Run-In-Kubernetes) |
| I want the architecture in one pass | [Architecture Overview](Concepts-Architecture-Overview) |
| I want to model data | [Data Model](Concepts-Data-Model) |
| I want query and index behavior | [Query And Indexes](Concepts-Query-And-Indexes) |
| I want to author a plugin | [Plugin Authoring](Plugins-Authoring) |
| I want CLI commands | [CLI](Interfaces-CLI) |
| I want backup and restore | [Backup And Restore](Operations-Backup-And-Restore) |
| I want benchmark evidence | [Benchmark Results](Operations-Benchmark-Results) |
| I want ads audience privacy details | [Security And Privacy](Operations-Security-And-Privacy) |

## Source code and issues

The source code lives at [github.com/CefasDB/cefasdb-core](https://github.com/CefasDB/cefasdb-core). Issues and pull requests are tracked there. The repository intentionally does not keep long-form documentation in `docs/`; the GitHub Wiki is the canonical documentation surface.


## Get Started: Overview

Rendered: https://docs.cefasdb.com/docs/Get-Started-Overview
Markdown: https://docs.cefasdb.com/wiki/Get-Started-Overview.md

# Get Started: Overview

This section is for readers who want CefasDB running before they study internals. By the end of the section you can start a single node, create a table, write an item, query it, run the server in Docker, run a small replicated topology with Docker Compose, and understand how the Helm chart maps those same concepts onto Kubernetes.

CefasDB ships as two Go binaries:

- `cefasdb`, the database process.
- `cefas`, the CLI that talks to the server over gRPC and exposes table, item, query, plugin, and cluster operations.

The topology changes how CefasDB is operated, not how the data model works. A local process, a Docker container, a three-node Compose cluster, and a Kubernetes StatefulSet all expose the same table, item, query, plugin, backup, and cluster concepts.

## How to install

Three install paths are supported and produce equivalent binaries.

Pull the server image:

```sh
docker pull ghcr.io/cefasdb/cefasdb:latest
```

Install the CLI from npm:

```sh
npm install -g @cefasdb/cefas
```

Build both binaries from source. Go 1.25+ is required; the Makefile drives the build:

```sh
git clone https://github.com/CefasDb/cefasdb-core
cd cefasdb-core
make build              # produces ./bin/cefasdb and ./bin/cefas
```

The server lives at `cmd/cefasdb` and the CLI at `cmd/cefasctl`. `make help` lists every developer target.

## What you will know after this section

You will know which ports matter. The HTTP API defaults to `:8080`. The gRPC API is enabled with `-grpc`, commonly `:9090`. Raft is enabled with `-raft-bind`, and multi-raft deployments use a shared mux listener with `-mux`. Metrics are served on the HTTP listener unless disabled.

You will know the first table workflow. A table has a partition key and an optional sort key. Items are typed attribute maps, so documents can evolve without a migration for every new field.

You will know where data lives. In single-node mode CefasDB stores table metadata, items, secondary indexes, TTL state, backups, and raft state under the configured data directory. Docker and Kubernetes deployments must mount that directory on persistent storage if data should survive process or node restarts.

You will know which deployment mode fits the current task. Local binary is best for development. Single Docker is best for demos and local integration. Docker Compose is the smallest useful replicated lab. Kubernetes is the production-oriented path when you need managed restarts, persistent volumes, service discovery, and operational policy.

You will know which consistency guarantee each deployment mode gives. Single-node and single-container Docker offer local durability only. Raft, multi-raft, and Kubernetes-backed deployments commit to a quorum and let clients opt into strong reads per call. See the per-mode comparison in [Deployment Modes](Concepts-Deployment-Modes#consistency-guarantees).

## Reading order

Read [Run Locally](Get-Started-Run-Locally) first. It introduces the data model and CLI without Docker or raft.

Read [Run In Docker](Get-Started-Run-In-Docker) next if you need a container image. It focuses on volume mounts and server flags.

Read [Run With Docker Compose](Get-Started-Run-With-Docker-Compose) when you want raft behavior and leader failover on a laptop.

Read [Run In Kubernetes](Get-Started-Run-In-Kubernetes) when you need the Helm chart and production-shaped configuration.

## Prerequisites

For local development you need a Unix-like shell, Go 1.25 or newer, and optionally `jq` for readable JSON examples.

For Docker you need a working Docker daemon.

For Docker Compose you need Docker Compose v2 and enough disk for multiple data directories.

For Kubernetes you need `kubectl`, Helm 3, and a cluster that can provision persistent volumes.


## Get Started: Run Locally

Rendered: https://docs.cefasdb.com/docs/Get-Started-Run-Locally
Markdown: https://docs.cefasdb.com/wiki/Get-Started-Run-Locally.md

# Get Started: Run Locally

This page starts a single CefasDB node on a laptop and exercises the table and item lifecycle. It is the best first run because it keeps every moving part visible: one process, one data directory, one HTTP listener, and one gRPC listener.

## Build the binaries

Pick either path. The Makefile is the canonical entry point and produces both binaries under `./bin`:

```sh
git clone https://github.com/CefasDb/cefasdb-core
cd cefasdb-core
make build              # ./bin/cefasdb (server) + ./bin/cefas (CLI)
```

Or install only the CLI from npm and run the server from the public image (see [Run In Docker](Get-Started-Run-In-Docker)):

```sh
npm install -g @cefasdb/cefas
```

The server entry point is `cmd/cefasdb/main.go`. The CLI entry point is `cmd/cefasctl/main.go`, and the root command is registered in `cmd/cefasctl/cmd/root.go`.

## Start a single node

```sh
rm -rf ./cefas-data
./bin/cefasdb \
  -data ./cefas-data \
  -http :8080 \
  -grpc :9090
```

The server creates the data directory if it does not exist. HTTP requests go to `localhost:8080`. CLI commands use gRPC, so the examples below point the CLI at `127.0.0.1:9090` and mark the connection plaintext with `--insecure`.

## Create a table

```sh
./bin/cefas --endpoint 127.0.0.1:9090 --insecure create-table \
  --table-name Users \
  --attribute-definitions AttributeName=pk,AttributeType=S \
  --attribute-definitions AttributeName=sk,AttributeType=S \
  --key-schema AttributeName=pk,KeyType=HASH \
  --key-schema AttributeName=sk,KeyType=RANGE \
  --billing-mode PAY_PER_REQUEST
```

CefasDB accepts `--billing-mode` as an ignored compatibility flag for existing command scripts. The engine is self-hosted and does not bill by request.

## Write and read an item

```sh
./bin/cefas --endpoint 127.0.0.1:9090 --insecure put-item \
  --table-name Users \
  --item '{"pk":{"S":"USER#1"},"sk":{"S":"PROFILE"},"name":{"S":"Ova"},"city":{"S":"Santos"}}'

./bin/cefas --endpoint 127.0.0.1:9090 --insecure get-item \
  --table-name Users \
  --key '{"pk":{"S":"USER#1"},"sk":{"S":"PROFILE"}}'
```

The wire shape is a typed attribute map. Strings are `{ "S": "..." }`, numbers are `{ "N": "..." }`, maps are `{ "M": { ... } }`, and lists are `{ "L": [ ... ] }`.

## Query and scan

Use `query` when you know the partition key or have a predicate the planner can route. Use `scan` when you deliberately want to stream the table.

```sh
./bin/cefas --endpoint 127.0.0.1:9090 --insecure query \
  --table-name Users \
  --where "pk = 'USER#1'"

./bin/cefas --endpoint 127.0.0.1:9090 --insecure scan \
  --table-name Users
```

## Stop and restart

Stop the server with `Ctrl-C`, then restart it with the same `-data` path. The table and item still exist because they were persisted under `./cefas-data`.

```sh
./bin/cefas --endpoint 127.0.0.1:9090 --insecure list-tables
```

If you delete `./cefas-data`, you reset the local node.

## A note on durability and consistency

Single-node mode has no raft and no replicas. Writes go through Pebble's WAL on the local disk; the `--consistency` knob on get/query/scan still parses but `STRONG` and `EVENTUAL` both read the same local store, so the answer is identical.

The one knob that matters here is `-fsync`. With the default `false`, the WAL flushes asynchronously and a process crash can lose a few in-flight writes. Add `-fsync` to the start command if you want every acknowledged write to be on disk before the response returns:

```sh
./bin/cefasdb -data ./cefas-data -http :8080 -grpc :9090 -fsync
```

For replication and the per-call consistency knob, move to [Run With Docker Compose](Get-Started-Run-With-Docker-Compose#consistency-model) or [Run In Kubernetes](Get-Started-Run-In-Kubernetes).

## Next steps

Read [Interfaces CLI](Interfaces-CLI) for the full command list. Read [Data Model](Concepts-Data-Model) before designing a schema. Read [Query And Indexes](Concepts-Query-And-Indexes) before adding secondary, spatial, or plugin-backed indexes.


## Get Started: Run In Docker

Rendered: https://docs.cefasdb.com/docs/Get-Started-Run-In-Docker
Markdown: https://docs.cefasdb.com/wiki/Get-Started-Run-In-Docker.md

# Get Started: Run In Docker

The Docker path runs the `cefasdb` binary inside a minimal image. Use it when you want a disposable integration target or when a service needs a local database dependency without building Go locally.

## Pull the public image

The server image is published to GHCR per release with the tags `<version>`, `v<version>`, and `latest`.

```sh
docker pull ghcr.io/cefasdb/cefasdb:latest
```

Pin to a specific release for production:

```sh
docker pull ghcr.io/cefasdb/cefasdb:0.8.5
```

## Build the image locally (alternative)

If you need a custom build, the repository keeps its Dockerfile at `deploy/Dockerfile`:

```sh
docker build -f deploy/Dockerfile -t cefasdb:local .
```

The build stage compiles `./cmd/cefasdb`. The runtime image runs as a non-root user and exposes HTTP and gRPC ports.

## Start a container

```sh
docker volume create cefas-data

docker run --rm --name cefasdb \
  -p 8080:8080 \
  -p 9090:9090 \
  -v cefas-data:/var/lib/cefasdb \
  ghcr.io/cefasdb/cefasdb:latest \
  -data /var/lib/cefasdb \
  -http :8080 \
  -grpc :9090 \
  -grpc-reflection
```

The important part is the volume mount. Without it, data is removed with the container.

## Connect with the CLI

Install the CLI from npm on the host:

```sh
npm install -g @cefasdb/cefas
cefas --endpoint 127.0.0.1:9090 --insecure list-tables
```

Or, if you prefer not to install npm, build the CLI from source:

```sh
git clone https://github.com/CefasDb/cefasdb-core
cd cefasdb-core
make cli                # produces ./bin/cefas
./bin/cefas --endpoint 127.0.0.1:9090 --insecure list-tables
```

## Health and logs

The server logs to stdout and stderr. In Docker, read them with:

```sh
docker logs -f cefas
```

Metrics are exposed on the HTTP listener unless disabled. If Prometheus is scraping the container directly, scrape the mapped HTTP port.

## When to use this mode

Single-container Docker is suitable for demos, local development, integration tests, and small non-critical environments. It is not high availability and it has no replication: if the host or the volume is lost, the data is lost, and the per-call `CONSISTENCY_STRONG` knob does the same thing as `CONSISTENCY_EVENTUAL` because there is only one node to read from. For replicated behavior and meaningful strong reads, read [Run With Docker Compose](Get-Started-Run-With-Docker-Compose#consistency-model) or [Run In Kubernetes](Get-Started-Run-In-Kubernetes).


## Get Started: Run With Docker Compose

Rendered: https://docs.cefasdb.com/docs/Get-Started-Run-With-Docker-Compose
Markdown: https://docs.cefasdb.com/wiki/Get-Started-Run-With-Docker-Compose.md

# Get Started: Run With Docker Compose

Docker Compose is the smallest useful way to observe a replicated CefasDB deployment. It lets you run multiple `cefasdb` processes, persistent volumes, raft listeners, and a client-facing endpoint on a single machine.

## Start from the repository compose file

The repository keeps a single-node observability Compose template at `deploy/docker-compose.yml`.

```sh
docker compose -f deploy/docker-compose.yml up --build
```

For the three-node raft cluster used by local load tests, use `deploy/docker-compose.cluster.yml`:

```sh
docker compose -p cefas-cluster -f deploy/docker-compose.cluster.yml down -v
docker compose -p cefas-cluster -f deploy/docker-compose.cluster.yml up --build -d
```

Default host endpoints:

```text
n1 HTTP: localhost:18081  gRPC: localhost:9191
n2 HTTP: localhost:18082  gRPC: localhost:9192
n3 HTTP: localhost:18083  gRPC: localhost:9193
```

The exact service names may change as the chart evolves, but the topology is stable: each node needs its own data directory, HTTP listener, gRPC listener, raft identity, raft bind address, and peer list.

## Storage profiles for Compose

The cluster Compose file supports storage profiles through `STORAGE_PROFILE`.

Use the default `balanced` profile when Docker Desktop has its usual local memory limit, or when the goal is a stable developer cluster:

```sh
docker compose -p cefas-cluster -f deploy/docker-compose.cluster.yml up --build -d
```

Use `write-heavy` when the Docker VM has enough memory allocated for larger Pebble caches and memtables:

```sh
STORAGE_PROFILE=write-heavy \
docker compose -p cefas-cluster -f deploy/docker-compose.cluster.yml up --build -d
```

On a local Docker Desktop VM with about 7.65 GiB available, `write-heavy` can run the bulk test but may OOM a follower during raft snapshot pressure. With Docker Desktop increased to about 64 GiB, the same `write-heavy` workload completed the full 30-minute benchmark with all three nodes alive.

Keep `balanced` as the portable default. Treat `write-heavy` as an opt-in performance profile for larger local machines or production-like test hosts.

Host ports can be changed without editing the file:

```sh
CEFAS_NODE1_GRPC_PORT=29491 \
CEFAS_NODE2_GRPC_PORT=29492 \
CEFAS_NODE3_GRPC_PORT=29493 \
docker compose -p cefas-cluster -f deploy/docker-compose.cluster.yml up --build -d
```

## What Compose demonstrates

Compose demonstrates three production concerns that a single process cannot:

1. **Membership**: every node has a stable raft ID.
2. **Replication**: writes go through raft before they are acknowledged.
3. **Failure behavior**: if the leader exits, a remaining quorum can elect a new leader.

The same storage engine applies underneath raft. Raft changes when a write is acknowledged; it does not change table or item semantics.

## Client behavior

The CLI talks to gRPC. If a node is not leader for a write path, the server returns `ErrNotLeader` plus the leader hint published by the cluster surface, and the client retries against the leader.

```sh
cefas --endpoint 127.0.0.1:9090 --insecure cluster status
cefas --endpoint 127.0.0.1:9090 --insecure list-tables
```

## Consistency model

With three voters, the cluster commits a write once two of them have the entry in their raft log (quorum = `floor(3/2)+1` = 2). The third node catches up via `AppendEntries`. The CLI does not wait on the slowest follower.

Reads default to eventual:

```sh
cefas --endpoint 127.0.0.1:9091 --insecure get-item \
  --table-name Users --key '{"pk":{"S":"USER#1"},"sk":{"S":"PROFILE"}}'
```

The read above might land on a follower whose log is a few heartbeats behind the leader. For read-after-write correctness, opt into strong on that one call — the request will be routed to the leader and pass a raft barrier first:

```sh
cefas --endpoint 127.0.0.1:9091 --insecure get-item \
  --table-name Users --key '{"pk":{"S":"USER#1"},"sk":{"S":"PROFILE"}}' \
  --consistency strong
```

When the leader is killed (see the Failover exercise below), the cluster pauses writes for at most a few hundred milliseconds while a new leader is elected. Reads keep returning eventual data from the survivors during the gap. Once two voters can talk to each other, writes resume.

If you stop a second container while the first is still down, quorum is gone (only one voter alive). Writes start failing fast with `ErrNotLeader` / quorum-loss errors; eventual reads keep working off the lone survivor.

See [Concepts: Deployment Modes](Concepts-Deployment-Modes#consistency-guarantees) for the per-mode comparison table and [Operations Configuration](Operations-Configuration#consistency-and-durability-tuning) for the tuning knobs.

## Failover exercise

1. Create a table and write an item.
2. Stop the leader container.
3. Wait for a new leader.
4. Read the item from another node.
5. Restart the old leader and verify it catches up.

The point of the exercise is not just uptime. It proves the storage layer applies replicated batches through the raft FSM and that committed writes survive a process failure.

## When to move beyond Compose

Compose is a lab, not an orchestrator. Use it to understand the topology and reproduce bugs. Use Kubernetes or another scheduler when you need node placement, volume lifecycle, service discovery, restart policy, and operational controls.


## Get Started: Run In Kubernetes

Rendered: https://docs.cefasdb.com/docs/Get-Started-Run-In-Kubernetes
Markdown: https://docs.cefasdb.com/wiki/Get-Started-Run-In-Kubernetes.md

# Get Started: Run In Kubernetes

The Kubernetes path packages CefasDB as a StatefulSet-oriented deployment with stable identity and persistent storage. Use it when you want CefasDB managed by the same control plane that manages the services using it.

## Chart location

The Helm chart lives under:

```text
dist/helm/cefas/
```

Important files:

| File | Purpose |
| --- | --- |
| `Chart.yaml` | Chart metadata. |
| `values.yaml` | Default values for image, ports, storage, raft, and config. |
| `templates/statefulset.yaml` | Server pods, volumes, and command flags. |
| `templates/service.yaml` | Network identity for clients and peers. |
| `templates/configmap.yaml` | Runtime configuration. |

## Install

Clone the core repository and install the chart from its bundled path:

```sh
git clone https://github.com/CefasDb/cefasdb-core
cd cefasdb-core
helm upgrade --install cefas ./dist/helm/cefas \
  --namespace cefas \
  --create-namespace
```

The chart `values.yaml` already pins `image.repository: ghcr.io/cefasdb/cefasdb`. For a local cluster such as `kind` or `minikube`, make sure a default StorageClass exists. For production, set storage class, requested size, resource limits, and image tag explicitly in a values file.

## Connect

Install the CLI on your workstation:

```sh
npm install -g @cefasdb/cefas
```

Port-forward the gRPC service for a quick test:

```sh
kubectl -n cefas port-forward svc/cefas 9090:9090
cefas --endpoint 127.0.0.1:9090 --insecure list-tables
```

In a real deployment, services inside the cluster should connect through the Kubernetes Service DNS name instead of port-forwarding.

## Storage

Kubernetes must provide persistent volumes for the data directory. Treat the volume as the durable database state for the pod. If a pod is rescheduled with its PVC intact, CefasDB can reopen the local Pebble store and raft state.

A pod restart on its existing PVC is safe. Losing the PVC is equivalent to losing a voter from the raft cluster; the remaining replicas keep serving as long as a quorum is healthy. Writes pause when quorum is lost — see the per-mode breakdown in [Deployment Modes](Concepts-Deployment-Modes#consistency-guarantees) and the operational knob inventory in [Operations Configuration](Operations-Configuration#consistency-and-durability-tuning).

## Operational checklist

- Pin the image tag. Do not run production on `latest`.
- Set resource requests and limits based on workload.
- Use persistent volumes with enough IOPS for write-heavy workloads.
- Configure identity and TLS before exposing the gRPC listener beyond a trusted network.
- Scrape metrics from the HTTP listener.
- Test backup and restore before storing production data.

## Where to go next

Read [Deployment Modes](Concepts-Deployment-Modes) for topology tradeoffs and [Operations Overview](Operations-Overview) for production runbooks.


## Concepts Overview

Rendered: https://docs.cefasdb.com/docs/Concepts-Overview
Markdown: https://docs.cefasdb.com/wiki/Concepts-Overview.md

# Concepts Overview

CefasDB is a high-performance NoSQL key-value and document database server. It accepts table, item, query, plugin, backup, and cluster operations over HTTP and gRPC; persists them in an embedded Pebble LSM tree; optionally replicates writes through raft; and exposes specialized query behavior through a plugin registry.

The simplest useful sentence is this: CefasDB stores typed document items in partition-keyed tables, adds SQL and PartiQL query surfaces, and lets plugins supply indexes, distance functions, estimators, and audience workflows without coupling those plugins to engine internals.

## What this section covers

[Architecture Overview](Concepts-Architecture-Overview) shows the full request path: CLI or SDK, gRPC handler, catalog, storage, planner, plugin registry, raft, metrics, and tracing.

[Data Model](Concepts-Data-Model) explains tables, partition keys, sort keys, attribute maps, conditions, TTL, backups, and streams.

[Storage And Replication](Concepts-Storage-And-Replication) explains how Pebble stores catalog and item keys, how write batches are committed, how raft wraps writes, and how multi-shard deployments distribute ownership.

[Query And Indexes](Concepts-Query-And-Indexes) explains built-in secondary indexes, spatial indexes, plugin-backed indexes, distance operators, candidate sets, top-k search, explain plans, and query planning.

[Deployment Modes](Concepts-Deployment-Modes) compares local, Docker, Compose, raft, multi-raft, and Kubernetes topologies.

[Authentication And Authorization](Concepts-Authentication-And-Authorization) covers bearer-token validation, identity provider configuration, and per-operation scopes.

## What CefasDB is not

CefasDB is not a broker. It does not model topics, consumer groups, or offsets. It is not an analytics warehouse. It is designed for operational workloads where an application needs low-latency primary-key access, conditional mutation, secondary lookup, spatial matching, or similarity search close to the write path.

CefasDB is also not a plugin marketplace. Built-in plugins compile into the server. The plugin boundary exists to keep specialized logic out of the storage engine, not to load arbitrary untrusted code at runtime.

## The core vocabulary

**Table** means a named collection of items with a key schema.

**Item** means a map of attribute names to typed attribute values.

**Partition key** means the required key component used for identity and distribution.

**Sort key** means the optional second key component used for ordered ranges within a partition.

**Index** means either a built-in GSI/LSI/spatial index or a plugin-backed index descriptor.

**Plugin** means a Go implementation of an index, distance, estimator, or audience interface registered in-process.

**Raft** means the optional consensus layer used to replicate write batches.

**Shard** means a partition of the keyspace owned by a manager and optionally replicated by its own raft group.


## Concepts: Architecture Overview

Rendered: https://docs.cefasdb.com/docs/Concepts-Architecture-Overview
Markdown: https://docs.cefasdb.com/wiki/Concepts-Architecture-Overview.md

# Concepts: Architecture Overview

CefasDB is structured as a set of narrow layers. The transport layer accepts requests. The catalog describes tables. The storage layer commits items and indexes. The query planner chooses operators. The plugin registry supplies specialized behavior. Raft and multi-raft wrap the write path when replication is enabled.

```mermaid
flowchart LR
  subgraph Clients
    CLI[cefas CLI]
    SDK[Go client]
    HTTP[HTTP clients]
  end

  subgraph Server[cefasdb]
    API[pkg/api gRPC and HTTP]
    Catalog[internal/catalog]
    Storage[internal/storage Pebble]
    Planner[internal/core/query]
    Registry[pkg/plugin registry]
    Raft[internal/replication]
    Cluster[internal/cluster]
    Metrics[internal/metrics]
    Tracing[internal/tracing]
  end

  subgraph Plugins
    Index[Index plugins]
    Distance[Distance plugins]
    Estimator[Estimator plugins]
    Audience[Audience plugin]
  end

  CLI --> SDK --> API
  HTTP --> API
  API --> Catalog
  API --> Storage
  API --> Planner
  API --> Registry
  Storage --> Raft
  Storage --> Cluster
  Planner --> Registry
  Registry --> Index
  Registry --> Distance
  Registry --> Estimator
  Registry --> Audience
  API --> Metrics
  API --> Tracing
```

## Write lifecycle

```mermaid
sequenceDiagram
  participant Client
  participant API as gRPC/HTTP handler
  participant Catalog
  participant Storage
  participant Raft
  participant Plugins

  Client->>API: PutItem / UpdateItem / DeleteItem
  API->>Catalog: load table descriptor
  API->>Storage: validate key, condition, and mutation
  Storage->>Raft: replicate batch when raft is attached
  Raft-->>Storage: majority-applied batch
  Storage->>Plugins: update index hooks
  API-->>Client: response
```

The important invariant is that the table mutation and built-in index mutation are part of the same storage batch. A committed write cannot update the primary record without updating the built-in indexes that describe it.

## Read lifecycle

```mermaid
sequenceDiagram
  participant Client
  participant API
  participant Planner
  participant Registry
  participant Index
  participant Storage

  Client->>API: Query / Scan / ExecuteStatement / TopK
  API->>Planner: parse and plan predicate
  Planner->>Registry: resolve plugin operators
  Planner->>Index: candidate set when available
  Planner->>Storage: primary or index-backed reads
  API-->>Client: rows or streamed items
```

Reads prefer the cheapest available route. A primary-key lookup is direct. A partition query is range-oriented. A secondary index query follows pointers. A plugin-backed query can ask a plugin for candidates before applying exact filters.

## Consistency model

Each call carries a single consistency knob, the `Consistency` enum from `cefas.v1.Cefas`:

- `CONSISTENCY_EVENTUAL` — the read is served locally from whichever node received the call. Cheapest path on the diagrams above: API → Planner → Storage on the same node. May trail the leader by a few raft heartbeats.
- `CONSISTENCY_STRONG` — the read is routed to the shard leader, which applies a raft barrier before answering. On the diagrams that adds one hop (`Client → API → Leader API → Planner → Storage`) and a small linearization wait. The result reflects every previously acknowledged write on that shard.

Writes never use this knob; they always travel to the shard leader, get replicated to a quorum of voters, and are acknowledged only after commit. A write that arrives at a follower returns `client.ErrNotLeader` (Go sentinel), and the client transparently retries via the `-raft-http-peers` redirect map.

The choice is per call, not per session. A high-throughput cohort scan can stay eventual; the credit-check that follows can opt into `Strong()`. See [Storage and Replication](Concepts-Storage-And-Replication#consistency-model) for the replication path and [Interfaces HTTP and gRPC](Interfaces-HTTP-And-GRPC#consistency-options) for the wire shape.

## Why the layers are separate

The storage engine should not know how Levenshtein distance works. The audience plugin should not know how Pebble encodes primary keys. The HTTP API should not know the in-memory representation of a trigram index. That separation is the point of `internal/core` and `pkg/plugin`.

Import-graph tests enforce the boundary. If plugin code imports engine internals, tests fail. If core code imports concrete plugins, tests fail. This makes it possible to add search and audience behavior without turning the database kernel into a pile of feature-specific branches.


## Concepts: Data Model

Rendered: https://docs.cefasdb.com/docs/Concepts-Data-Model
Markdown: https://docs.cefasdb.com/wiki/Concepts-Data-Model.md

# Concepts: Data Model

CefasDB models operational data as tables containing typed document items. Each table has a key schema, optional indexes, optional TTL configuration, and optional plugin-backed descriptors.

## Tables

A table descriptor includes:

- Table name.
- Partition key name.
- Optional sort key name.
- Global secondary index descriptors.
- Local secondary index descriptors.
- Spatial index descriptors.
- TTL configuration.
- Plugin-backed index descriptors.

The table descriptor is persisted in the catalog. API handlers load the descriptor before validating keys, conditions, index routing, or TTL behavior.

## Items

An item is a map from attribute name to typed value. The supported attribute family is:

| Type | Meaning |
| --- | --- |
| `S` | String |
| `N` | Number encoded as a string |
| `B` | Binary |
| `BOOL` | Boolean |
| `NULL` | Null marker |
| `SS`, `NS`, `BS` | String, number, and binary sets |
| `L` | List |
| `M` | Map |

The CLI accepts JSON in this shape. The Go SDK uses generated protobuf types and helper codecs.

## Primary key

Every table has a partition key. A table may also have a sort key. The pair identifies a single item.

Good partition keys distribute writes across the keyspace and support the most common lookup path. Good sort keys encode range semantics: timestamp, version, event ID, account-local sequence, or another ordered value.

## Conditional writes

Conditional writes evaluate a predicate against the existing item before applying a mutation. They are used for optimistic concurrency, insert-if-absent, compare-and-set, and safe deletes.

Examples:

```sql
attribute_not_exists(pk)
version = :expected
status IN ('pending', 'active')
```

The condition evaluator lives under `internal/core/condition` and the storage layer applies it before committing the batch.

Under raft, the condition is evaluated on the shard leader as part of the write batch and the batch is replicated to a quorum before acknowledgement. Conditional puts on the same partition key are therefore linearizable: two clients racing on `attribute_not_exists(pk)` will see one succeed and the other receive a condition-failure error, regardless of where they connected. See [Storage and Replication](Concepts-Storage-And-Replication#consistency-model).

## TTL

TTL lets a table nominate an attribute containing an expiration timestamp. The engine indexes TTL buckets and a reaper can remove expired items without scanning the full table. TTL is not a hard real-time deadline; it is a cleanup contract.

Use TTL for session records, temporary campaign state, dedup windows, and short-lived operational records.

## Streams

The stream abstraction under `internal/core/stream` represents change events. It is the seam for change-data-capture behavior and for plugin/index hooks that need to observe mutations.

## Backups

Backups are named checkpoints. They can cover one or more tables and are tracked under the admin backup namespace. Restore can recreate a table from a backup into a target table name.


## Concepts: Storage And Replication

Rendered: https://docs.cefasdb.com/docs/Concepts-Storage-And-Replication
Markdown: https://docs.cefasdb.com/wiki/Concepts-Storage-And-Replication.md

# Concepts: Storage And Replication

CefasDB uses Pebble as its embedded storage engine. Pebble is an LSM-tree, so write-heavy workloads are committed as ordered key-value batches and compacted in the background.

## Namespaces

The storage layer uses prefixes to separate logical data:

| Prefix | Contents |
| --- | --- |
| `cefas/catalog/<table>` | Table descriptor JSON. |
| `cefas/data/<table>/...` | Primary item records. |
| `cefas/gsi/<table>/<index>/...` | Global secondary index pointers. |
| `cefas/lsi/<table>/<index>/...` | Local secondary index pointers. |
| `cefas/spatial/<table>/<index>/...` | Geohash and Z-order index pointers. |
| `cefas/ttl/<table>/<ttlAttr>/...` | TTL bucket entries. |
| `cefas/admin/backups/<name>` | Backup metadata. |

Plugin-backed indexes own their internal format. Built-in v1 plugins mostly keep state in memory, with persistence seams documented in the plugin pages.

## Write batches

A write batch groups all changes for one operation. For a `PutItem`, the batch can contain:

- Primary item write.
- GSI pointer updates.
- LSI pointer updates.
- Spatial pointer updates.
- TTL bucket updates.
- Backup or stream metadata as needed.

The batch is the atomic unit. Either every key in the batch becomes visible or none of them do.

## Group commit

Single-node mode commits through the storage layer's group-commit path. Group commit amortizes sync and write overhead across concurrent producers. For latency-sensitive workloads, fsync behavior is configurable.

## Raft replication

When raft is attached, the storage write batch is replicated before it is applied. A majority of voters must agree on the log entry. The FSM then applies the batch to Pebble.

This changes durability and availability. A single-node write is durable to one disk. A raft write is durable to the majority of raft members.

## Consistency model

Replication and read consistency are decoupled. Raft delivers a single, ordered commit history per shard; the read path picks how visible those commits must be at the moment of the call.

Per-call consistency is a single enum on `GetItemRequest`, `QueryRequest`, and `ScanRequest`:

| Option | Where to set it | Behaviour |
| --- | --- | --- |
| `CONSISTENCY_EVENTUAL` (default) | Omit on the request, or `CONSISTENCY_UNSPECIFIED`. Go client: default `GetOptions` / `ScanOptions`. | Local read on whichever node received the call. May trail the leader by a few raft heartbeats. |
| `CONSISTENCY_STRONG` | Set on the gRPC enum. Go client: `client.GetOptions{Strong: true}`, `client.ScanOptions{Strong: true}`, or `QueryBuilder.Strong()`. | Routed to the shard leader. The leader applies a read barrier so the call sees every previously acknowledged write on that shard. |

Writes always go through the shard leader and are acknowledged only after a quorum of voters has the entry in its raft log. A write against a follower returns `client.ErrNotLeader`; the client redirects using the `-raft-http-peers` map.

`fsync` is the durability lever. With `-fsync=false` (default), the WAL flushes asynchronously and group commit batches multiple writes per disk sync — higher throughput, a small window of crash-loss on the leader before the entry hits stable storage. With `-fsync=true`, every commit hits disk before acknowledgement; throughput drops but a crashed leader has nothing in flight.

See [Operations Configuration](Operations-Configuration#consistency-and-durability-tuning) for the full list of consistency and durability knobs, including the raft timeout block (heartbeat, election, leader lease, commit, snapshot threshold) and the Pebble storage profile selectors.

## Multi-shard mode

Multi-shard mode partitions tables by key hash. Each shard can have its own raft group. This keeps per-shard ordering and replication while allowing more parallelism across partitions.

The important operational detail is that data placement follows the partition key. If the partition key is skewed, shard load is skewed.

## Backups and restore

Backups use Pebble checkpoint semantics and CefasDB metadata to create named recovery points. Restore reads the backup and writes a new target table, preserving the source data without overwriting the original table by default.


## Concepts: Query And Indexes

Rendered: https://docs.cefasdb.com/docs/Concepts-Query-And-Indexes
Markdown: https://docs.cefasdb.com/wiki/Concepts-Query-And-Indexes.md

# Concepts: Query And Indexes

CefasDB has three query layers:

1. Primary-key and range access.
2. Built-in secondary and spatial indexes.
3. Plugin-backed candidate generation, distance operators, estimators, and top-k ranking.

## Primary access

Primary access is the cheapest path. `GetItem` resolves one key. `Query` can read a partition and optionally filter within it. Data modeling should start here: choose a partition key and sort key that make the most common read path direct.

## Built-in indexes

Global secondary indexes and local secondary indexes persist pointers alongside primary writes. They are part of the same write batch as the item mutation, so an acknowledged write and its built-in index entries move together.

Spatial indexes support geohash and Z-order access patterns for location-aware data. They narrow a search to cells or ranges before exact filtering.

## Plugin-backed indexes

Plugin-backed indexes are descriptors that route candidate generation to a registered plugin. Examples:

- `trigram` for fuzzy text candidate sets.
- `minhash` for set similarity.
- `simhash` for near duplicate detection.
- `vectorlsh` for approximate nearest neighbors.
- `geohash` for spatial candidates.
- `bloom`, `cbloom`, and `cuckoo` for membership tests.

The query planner can combine candidate generation with exact distance evaluation.

## Distance operators

Distance operators return a scalar where smaller is closer. That convention makes predicates consistent:

```sql
levenshtein(name, 'habibs') <= 2
cosine(embedding, :query) <= 0.25
haversine(loc, :center) <= 1500
```

Distance operators are plugins. They do not own storage. They evaluate typed values and can be paired with an index plugin that narrows the candidate set.

## Top-k

Top-k search ranks candidates by a distance expression:

```sh
cefas top-k \
  --table Documents \
  --by "cosine(embedding, :query)" \
  --k 20 \
  --query '{"L":[{"N":"0.1"},{"N":"0.2"}]}'
```

The best top-k plans use an index plugin to avoid scanning the full table, then apply the exact distance operator to rank survivors.

## Explain

`explain` prints the plan tree:

```sh
cefas explain --table Merchants --where "levenshtein(name, 'habibs') <= 2"
```

Use explain before adding an index, after adding an index, and after rebuilding an index. It is the fastest way to verify whether the planner can use the path you expect.


## Concepts: Deployment Modes

Rendered: https://docs.cefasdb.com/docs/Concepts-Deployment-Modes
Markdown: https://docs.cefasdb.com/wiki/Concepts-Deployment-Modes.md

# Concepts: Deployment Modes

CefasDB can run in several topologies. The API is stable across them; the difference is durability, failure behavior, and operational complexity.

## Local binary

Run `cefasdb` directly on a developer machine. This is best for development and debugging because logs, flags, data files, and binaries are all local.

Use this mode when:

- You are learning CefasDB.
- You are debugging a CLI or SDK workflow.
- You want a disposable local database.

Do not use it when a process or machine failure must be transparent.

## Single Docker container

Docker packages the server and its runtime environment. It is useful for integration tests and demos.

Use this mode when:

- You want repeatable local setup.
- Another service needs CefasDB as a local dependency.
- You want to test image packaging.

Mount the data directory on a volume if the data should survive container removal.

## Raft cluster

Raft replicates writes across members and acknowledges only after a majority has the entry. This is the first high-availability topology.

Use this mode when:

- One process or host can fail without losing acknowledged writes.
- Operators can provide stable node IDs and network addresses.
- Write latency can include raft coordination.

## Multi-raft sharding

Multi-raft partitions the keyspace across independent raft groups. It is intended for scale-out write throughput and failure-domain isolation.

Use this mode when:

- One raft group is not enough throughput.
- Partition-key distribution is understood.
- Operational automation can manage multiple shard groups.

## Kubernetes

Kubernetes wraps the server in StatefulSets, Services, ConfigMaps, Secrets, and PersistentVolumeClaims.

Use this mode when:

- You already operate workloads on Kubernetes.
- You need managed restarts and declarative config.
- You can provision persistent storage and monitor the pods.

## Choosing a mode

Start with the simplest mode that proves the workload. Move to Docker when packaging matters. Move to raft when availability matters. Move to Kubernetes when operations and scheduling matter. Move to multi-raft when throughput and data placement matter.

## Consistency guarantees

Every mode exposes the same `CONSISTENCY_EVENTUAL` / `CONSISTENCY_STRONG` knob per call. What changes between modes is the underlying replication shape — and therefore what each guarantee means under failure.

| Mode | Quorum size | Write guarantee on success | Read with `Strong` | Behaviour under leader loss |
| --- | --- | --- | --- | --- |
| Local binary | n/a (1 node) | Write reached the local Pebble batch. With `-fsync=true`, also durable on disk. | Local read; same answer as eventual. | Process death = downtime. Data survives if the data directory is intact. |
| Single Docker | n/a (1 container) | Same as local binary. | Same as local binary. | Volume loss = data loss. |
| Single-shard raft | majority of voters (e.g. 2 of 3) | Replicated to a quorum of raft followers. | Routed to the leader, reads after a barrier — sees every acknowledged write. | New leader elected after `ElectionMS`; writes pause during the gap. Loss of quorum (e.g. 2 of 3 unreachable) means no new writes until quorum returns. |
| Multi-raft sharding | majority **per shard** | Same as single-shard raft, but scoped to the partitioning shard. | Same, leader is per-shard. | Loss only impacts shards whose quorum is unhealthy; other partitions keep serving. |
| Kubernetes (StatefulSet + raft) | majority of replicas | Same as raft. PVC tied to pod identity. | Same as raft. | Pod restart preserves data via PVC; pod loss without PVC = node loss for that voter. |

Two practical consequences:

1. **Strong reads cost a leader hop.** In any raft-backed mode, `CONSISTENCY_STRONG` adds round-trip + barrier latency. Use it where read-after-write correctness matters; leave eventual for high-volume scans and cohort scoring.
2. **Quorum loss is a write halt, not corruption.** When a shard cannot reach quorum, the leader rejects writes (`client.ErrNotLeader` or transport error). Reads on followers still return eventual data. Restoring a node — or evicting one via `cefas remove-server` — re-forms quorum.

See [Storage and Replication](Concepts-Storage-And-Replication#consistency-model) for the replication path and [Operations Configuration](Operations-Configuration#consistency-and-durability-tuning) for the tuning knobs.


## Concepts: Authentication And Authorization

Rendered: https://docs.cefasdb.com/docs/Concepts-Authentication-And-Authorization
Markdown: https://docs.cefasdb.com/wiki/Concepts-Authentication-And-Authorization.md

# Concepts: Authentication And Authorization

CefasDB can run open in a trusted development environment or validate bearer tokens against an identity provider. Production deployments should enable token validation and configure per-operation authorization scopes.

## Identity provider configuration

The server exposes flags for identity configuration:

- JWKS URL.
- Expected issuer.
- Expected audience.
- Allowed clock skew.

When JWKS configuration is empty, the server can run in open development mode. When configured, requests must carry a bearer token that validates against the issuer and audience.

## Scope model

Scopes are operation and resource oriented. A caller can be allowed to read one table, write another, manage plugins, or perform admin operations depending on token claims.

Examples of scope shapes:

```text
cefas:item:read:<table>
cefas:item:write:<table>
cefas:table:admin:<table>
cefas:cluster:admin
```

The exact scope checks live in `internal/auth` and API handlers.

## CLI authentication

The CLI can receive a token directly, from a token file, from environment, or from a profile config.

Common flags:

```sh
cefas --token "$TOKEN" ...
cefas --token-file ./token.txt ...
cefas --profile prod ...
```

## Transport security

For local examples, `--insecure` means plaintext gRPC. Production deployments should use TLS, configure a CA bundle where needed, and restrict network access to the gRPC and HTTP listeners.

## Operational posture

Do not expose an open CefasDB server to untrusted networks. Token validation and TLS should be part of the deployment baseline, not a later hardening pass.


## Plugins Overview

Rendered: https://docs.cefasdb.com/docs/Plugins-Overview
Markdown: https://docs.cefasdb.com/wiki/Plugins-Overview.md

# Plugins Overview

CefasDB plugins are in-process Go implementations registered against `plugin.Default`. They provide specialized behavior without coupling that behavior to the storage engine.

## Plugin kinds

| Kind | Used for |
| --- | --- |
| Index | Candidate generation, membership tests, search indexes. |
| Distance | Scalar similarity or distance evaluation. |
| Estimator | Approximate aggregates such as cardinality or frequency. |
| Audience | Composite ads workflows such as geo select, dedup, frequency cap, and privacy aggregation. |

## Why plugins exist

Search and similarity features evolve faster than the database kernel. A geohash selector, a trigram inverted index, and a Count-Min Sketch have different state, configuration, and evaluation behavior. Putting all of that directly into storage would make the core hard to reason about.

The plugin boundary keeps the kernel narrow. Core code defines stable interfaces and data structures. Plugins implement those interfaces.

## Built-in families

Membership and approximate set plugins:

- `bloom`
- `cbloom`
- `cuckoo`
- `hll`
- `cms`

Search and similarity indexes:

- `radix`
- `trigram`
- `minhash`
- `simhash`
- `vectorlsh`
- `geohash`
- `roaring`

Distance operators:

- `hamming`
- `levenshtein`
- `damerau`
- `jaro_winkler`
- `jaccard`
- `cosine`
- `euclidean`
- `manhattan`
- `haversine`

Audience workflows:

- Geo radius selection.
- Approximate reach estimation.
- Dedup with TTL.
- Sliding-window frequency cap.
- Privacy-aware aggregation.
- Composite eligibility.

## Reading order

Read [Core Boundaries](Plugins-Core-Boundaries) first if you will change code. Read [Index Examples](Plugins-Index-Examples) if you will use plugins from the CLI. Read [Plugin Authoring](Plugins-Authoring) if you will add a new plugin.


## Plugins: Core Boundaries

Rendered: https://docs.cefasdb.com/docs/Plugins-Core-Boundaries
Markdown: https://docs.cefasdb.com/wiki/Plugins-Core-Boundaries.md

# Plugins: Core Boundaries

CefasDB keeps plugin code away from engine internals. This is enforced by tests, not convention.

```mermaid
flowchart LR
  Server[pkg/api and internal packages] --> Core[internal/core]
  Server --> Plugin[pkg/plugin]
  Plugin --> Core
  Core -. forbidden .-> Plugin
  Core -. forbidden .-> Server
  Plugin -. forbidden .-> Server
```

## Core packages

Core packages define stable concepts:

| Concept | Package |
| --- | --- |
| Model aliases and item types | `internal/core/model` |
| Conditions | `internal/core/condition` |
| TTL service | `internal/core/ttl` |
| Change streams | `internal/core/stream` |
| Index lifecycle | `internal/core/index` |
| Query planner and top-k | `internal/core/query` |

Core code does not import concrete plugin packages.

## Plugin packages

Plugin packages live under `pkg/plugin/<name>`. They can depend on core packages and shared helpers under `pkg/plugin/internal`, but they must not import `internal/storage`, `pkg/api`, `pkg/client`, or the SQL executor directly.

The server wires built-in plugins through blank imports in `pkg/plugin/builtins`.

## Boundary tests

Run:

```sh
go test ./internal/core/... -run CoreHasNoEngineImports
go test ./pkg/plugin/... -run PluginHasNoEngineImports
```

These tests parse imports and fail if a package crosses the boundary. That makes the architecture reviewable in CI.

## Practical rule

If a plugin needs a capability from the engine, do not import the engine. Add a small interface or data type to `internal/core`, make the engine implement it, and keep the plugin dependent only on that core contract.


## Plugins: Authoring

Rendered: https://docs.cefasdb.com/docs/Plugins-Authoring
Markdown: https://docs.cefasdb.com/wiki/Plugins-Authoring.md

# Plugins: Authoring

A CefasDB plugin is a Go type that implements one of the plugin contracts and registers itself during package initialization.

## Pick the plugin kind

| Kind | Use it when |
| --- | --- |
| Index | You maintain searchable state and return candidate item IDs. |
| Distance | You evaluate two typed values and return a numeric distance. |
| Estimator | You observe values and return approximate aggregate answers. |
| Audience | You compose selection, reach, dedup, frequency cap, and aggregation workflows. |

## Minimal distance plugin

```go
package mydistance

import (
  "github.com/CefasDB/cefasdb-core/internal/core/model"
  "github.com/CefasDB/cefasdb-core/pkg/plugin"
)

type Op struct{}

func (Op) Manifest() plugin.Manifest {
  return plugin.Manifest{
    Name: "mydistance",
    Kind: plugin.KindDistance,
    Version: "1",
    Description: "example distance operator",
  }
}

func (Op) Name() string { return "mydistance" }

func (Op) Supports(a, b model.AttrType) bool {
  return a == model.AttrS && b == model.AttrS
}

func (Op) Eval(a, b model.AttributeValue) (float64, error) {
  return 0, nil
}

func init() {
  plugin.Default.MustRegister(Op{})
}
```

## Index configuration

Index plugins receive an opaque JSON config. Decode it into a typed config and validate it before building state.

```go
type Config struct {
  Field string `json:"field"`
  K     int    `json:"k,omitempty"`
}
```

Every index plugin should reject missing required fields with a clear error. Defaults should be explicit and tested.

## Per-index state

Index plugins usually maintain state per `(table, indexName)` descriptor. Use a stable key such as `table + "/" + name` and guard state with a mutex or another concurrency primitive.

## Tests

Use `pkg/plugin/testharness` instead of booting a full server. A plugin unit test should seed model items, build the plugin state, and assert query or estimate behavior directly.

## Wire the plugin into the server

Add a blank import in `pkg/plugin/builtins/builtins.go`.

```go
import (
  _ "github.com/CefasDB/cefasdb-core/pkg/plugin/mydistance"
)
```

After rebuilding `cefasdb`, the CLI can show it:

```sh
cefas list-plugins
cefas describe-plugin --name mydistance
```

## Boundary checklist

- Do not import `internal/*`.
- Do not import API handlers.
- Do not import the SQL executor.
- Put shared types in `internal/core`.
- Add unit tests for manifest validation, configuration, and core behavior.


## Plugins: Index Examples

Rendered: https://docs.cefasdb.com/docs/Plugins-Index-Examples
Markdown: https://docs.cefasdb.com/wiki/Plugins-Index-Examples.md

# Plugins: Index Examples

This page shows the built-in index and estimator plugins with CLI examples. Assume `cefasdb` is running and the CLI can reach it.

## Bloom filter

Use `bloom` for membership checks where false positives are acceptable and deletes are not required.

```sh
cefas create-index \
  --table Users \
  --name email_bloom \
  --type bloom \
  --config '{"field":"email","m":16384,"k":6}'
```

## Counting Bloom filter

Use `cbloom` when membership checks need delete support.

```sh
cefas create-index \
  --table Sessions \
  --name session_cbloom \
  --type cbloom \
  --config '{"field":"session_id","m":4096,"k":5,"width":4}'
```

## Cuckoo filter

Use `cuckoo` for membership checks with deletes and compact fingerprints.

```sh
cefas create-index \
  --table Orders \
  --name order_cuckoo \
  --type cuckoo \
  --config '{"field":"order_id","buckets":2048,"fingerprint_bits":12}'
```

## Roaring bitmap

Use `roaring` for cohorts over numeric or stable integer-like identifiers.

```sh
cefas cohort create \
  --table Users \
  --cohort high_value \
  --field user_id \
  --where "spend >= :floor" \
  --binds '{":floor":{"N":"1000"}}'
```

## HyperLogLog

Use `hll` for approximate distinct counts.

```sh
cefas cohort estimate --table Events --field user_id
```

## Count-Min Sketch

Use `cms` for approximate frequency estimates. It is useful when exact counters are too expensive or too large.

## Radix

Use `radix` for prefix search and autocomplete-style access.

```sh
cefas create-index \
  --table Cities \
  --name name_prefix \
  --type radix \
  --config '{"field":"name"}'
```

## Trigram

Use `trigram` for fuzzy text candidate generation.

```sh
cefas create-index \
  --table Merchants \
  --name merchant_name_trigram \
  --type trigram \
  --field name

cefas query \
  --table-name Merchants \
  --where "levenshtein(name, 'habibs') <= 2"
```

## MinHash

Use `minhash` for set similarity.

```sh
cefas create-index \
  --table Users \
  --name tag_sim \
  --type minhash \
  --config '{"field":"tags","k":128,"r":8}'
```

## SimHash

Use `simhash` for near-duplicate text or document detection.

```sh
cefas create-index \
  --table Docs \
  --name dedupe \
  --type simhash \
  --config '{"field":"body","prefix_bits":16,"max_radius":3}'
```

## Vector LSH

Use `vectorlsh` for approximate vector candidate generation.

```sh
cefas create-index \
  --table Documents \
  --name emb_lsh \
  --type vectorlsh \
  --config '{"field":"embedding","dim":768,"sketches":8,"bits_per_sketch":12}'
```

## Geohash

Use `geohash` for spatial candidate generation.

```sh
cefas create-index \
  --table Stores \
  --name loc_geo \
  --type geohash \
  --config '{"field":"loc","precision":7}'
```


## Plugins: Distance Operators

Rendered: https://docs.cefasdb.com/docs/Plugins-Distance-Operators
Markdown: https://docs.cefasdb.com/wiki/Plugins-Distance-Operators.md

# Plugins: Distance Operators

Distance plugins return a scalar where smaller means closer. This lets every operator fit the same predicate shape:

```sql
operator(left, right) <= threshold
```

## Operator table

| Operator | Inputs | Typical use |
| --- | --- | --- |
| `hamming` | Equal-length strings or binary | SimHash post-filtering. |
| `levenshtein` | String vs string | Fuzzy text matching. |
| `damerau` | String vs string | Fuzzy text with adjacent transpositions. |
| `jaro_winkler` | String vs string | Names and short labels. |
| `jaccard` | Sets or shingled strings | Tag or set similarity. |
| `cosine` | Numeric vectors | Embeddings. |
| `euclidean` | Numeric vectors | Spatial or vector distance. |
| `manhattan` | Numeric vectors | L1 vector distance. |
| `haversine` | `{lat, lon}` maps | Earth distance in meters. |

## Query examples

```sh
cefas query \
  --table-name Merchants \
  --where "levenshtein(name, 'habibs') <= 2"
```

```sh
cefas top-k \
  --table Documents \
  --by "cosine(embedding, :query)" \
  --k 20 \
  --query '{"L":[{"N":"0.1"},{"N":"0.2"},{"N":"0.3"}]}'
```

```sh
cefas geo audience \
  --table Stores \
  --center "-23.9608,-46.3336" \
  --radius 1500m
```

## Choosing an operator

Use edit distance for strings where typos matter. Use Jaro-Winkler for names. Use Jaccard for sets. Use cosine for normalized embeddings. Use Haversine for latitude and longitude.

At scale, pair distance operators with an index plugin. A trigram index can narrow candidates before Levenshtein. Vector LSH can narrow candidates before cosine. Geohash can narrow candidates before Haversine.


## Plugins: Audience Workflows

Rendered: https://docs.cefasdb.com/docs/Plugins-Audience-Workflows
Markdown: https://docs.cefasdb.com/wiki/Plugins-Audience-Workflows.md

# Plugins: Audience Workflows

The audience plugin composes geo selection, approximate reach, deduplication, frequency capping, eligibility checks, and privacy-aware aggregation. It is designed for campaign and audience workflows where raw identity should stay server-side.

## Setup

Create a table with store locations:

```sh
cefas create-table \
  --table-name Stores \
  --attribute-definitions AttributeName=id,AttributeType=S \
  --key-schema AttributeName=id,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST
```

Seed items with a location map:

```sh
cefas put-item --table-name Stores --item '{
  "id":{"S":"s1"},
  "loc":{"M":{"lat":{"N":"-23.5510"},"lon":{"N":"-46.6340"}}}
}'
```

Create a geohash index:

```sh
cefas create-index \
  --table Stores \
  --name loc_geo \
  --type geohash \
  --config '{"field":"loc","precision":7}'
```

## Select an audience

```sh
cefas geo audience \
  --table Stores \
  --index loc_geo \
  --center "-23.5505,-46.6333" \
  --radius 2000m
```

The geohash plugin returns candidates from the center cell and neighboring cells. Haversine removes false positives at cell boundaries.

## Estimate reach

```sh
cefas cohort estimate \
  --table Stores \
  --field id
```

HyperLogLog estimates distinct reach without returning member identity.

## Dedup

```sh
cefas dedup put \
  --scope campaign-123 \
  --key USER#1 \
  --ttl 168h
```

The response is a boolean verdict. The stored dedup key does not round-trip to the caller as a list.

## Frequency cap

```sh
cefas freqcap check \
  --scope merchant-456 \
  --key USER#1 \
  --limit 3 \
  --window 168h
```

The plugin increments and checks the sliding window server-side.

## Privacy-aware aggregation

```sh
cefas aggregate \
  --table CampaignEvents \
  --group-by campaign_id,geohash5 \
  --metrics impressions,clicks,redemptions \
  --min-group-size 100
```

If any group is below the privacy floor, the operation fails instead of returning a partial small group.


## Interfaces Overview

Rendered: https://docs.cefasdb.com/docs/Interfaces-Overview
Markdown: https://docs.cefasdb.com/wiki/Interfaces-Overview.md

# Interfaces Overview

CefasDB exposes four main interfaces:

- CLI for operators and scripts.
- HTTP/JSON for simple clients and compatibility.
- gRPC for typed clients and streaming.
- SQL and PartiQL for query-oriented access.

The interfaces are different entry points into the same engine. A table created through the CLI can be queried through HTTP, read through gRPC, and inspected through SQL.

## CLI

The CLI binary is `cefas`. It groups the operational surface into table, item, query, plugin, and cluster commands:

```sh
cefas create-table ...
cefas put-item ...
cefas query ...
cefas execute-statement ...
```

The CLI also exposes CefasDB-specific operations for plugins, cohorts, top-k, audience selection, backups, and cluster membership.

## HTTP

The HTTP API is useful for curl, simple integrations, and diagnostics. It exposes table, item, and query routes over JSON.

## gRPC

The gRPC API is the primary typed transport. The protobuf definition lives at `pkg/protocol/cefas.proto`, and generated Go code lives beside it.

## SQL and PartiQL

The SQL layer parses and plans a useful subset of `SELECT`, `INSERT`, `UPDATE`, `DELETE`, conditions, scalar functions, and spatial/similarity predicates. PartiQL-style commands are exposed through `execute-statement`.

## Go client

The Go client package lives under `pkg/client`. It wraps gRPC calls and encodes the typed item model.


## Interfaces: CLI

Rendered: https://docs.cefasdb.com/docs/Interfaces-CLI
Markdown: https://docs.cefasdb.com/wiki/Interfaces-CLI.md

# Interfaces: CLI

The `cefas` CLI is the operational surface for local development, scripting, plugin inspection, backup and restore, and cluster administration.

Global flags:

| Flag | Purpose |
| --- | --- |
| `--config` | Config file path. |
| `--profile` | Named profile. |
| `--endpoint` | gRPC endpoint host:port. |
| `--token` | Bearer token. |
| `--token-file` | File containing bearer token. |
| `--ca` | TLS CA bundle. |
| `--insecure` | Use plaintext gRPC. |
| `--output` | `json`, `table`, or `text`. |
| `--timeout` | Per-call timeout. |

## Table management

```sh
cefas list-tables
cefas describe-table --table-name Users
cefas create-table --table-name Users ...
cefas delete-table --table-name Users
cefas update-time-to-live --table-name Users ...
cefas describe-time-to-live --table-name Users
```

## Item operations

```sh
cefas put-item --table-name Users --item '{...}'
cefas get-item --table-name Users --key '{...}'
cefas update-item --table-name Users --key '{...}' --update-expression "SET #n = :v"
cefas delete-item --table-name Users --key '{...}'
```

## Query operations

```sh
cefas query --table-name Users --where "pk = 'USER#1'"
cefas scan --table-name Users
cefas execute-statement --statement "SELECT * FROM Users WHERE pk = 'USER#1'"
```

## Batch and transaction operations

```sh
cefas batch-get-item --request-items '{...}'
cefas batch-write-item --request-items '{...}'
cefas transact-get-items --transact-items '[...]'
cefas transact-write-items --transact-items '[...]'
```

## Plugin and query planning operations

```sh
cefas list-plugins
cefas describe-plugin --name trigram
cefas create-index --table Merchants --name merchant_name_trigram --type trigram --field name
cefas explain --table Merchants --where "levenshtein(name, 'habibs') <= 2"
cefas top-k --table Documents --by "cosine(embedding, :query)" --k 20 --query '{...}'
```

## Audience operations

```sh
cefas geo audience --table Stores --center "-23.9608,-46.3336" --radius 1500m
cefas dedup put --scope campaign-123 --key USER#1 --ttl 168h
cefas freqcap check --scope merchant-456 --key USER#1 --limit 3 --window 168h
cefas aggregate --table CampaignEvents --group-by campaign_id --metrics impressions --min-group-size 100
```

## Backup and cluster operations

```sh
cefas create-backup --backup-name nightly --table-name Users
cefas list-backups
cefas restore-table-from-backup --backup-name nightly --source-table-name Users --target-table-name Users_restored
cefas cluster status
cefas cluster add-voter --id node-b --addr 10.0.0.2:9001
cefas cluster remove-server --id node-b
```


## Interfaces: SQL And PartiQL

Rendered: https://docs.cefasdb.com/docs/Interfaces-SQL-And-PartiQL
Markdown: https://docs.cefasdb.com/wiki/Interfaces-SQL-And-PartiQL.md

# Interfaces: SQL And PartiQL

CefasDB includes a SQL parser, planner, and executor for operational queries. It is not intended to be a full relational database. It is a pragmatic query surface over the item model and index system.

## Supported statement families

- `SELECT`
- `INSERT`
- `UPDATE`
- `DELETE`
- `RETURNING` on supported mutations
- PartiQL-style execution through `execute-statement`

## Predicates

Predicates can include key conditions, scalar comparisons, boolean composition, and supported functions.

Examples:

```sql
SELECT * FROM Users WHERE pk = 'USER#1'
SELECT * FROM Merchants WHERE levenshtein(name, 'habibs') <= 2
SELECT * FROM Stores WHERE haversine(loc, :center) <= 1500
```

## Update expressions

The CLI update path can be translated into CefasDB SQL update behavior. This keeps compatibility with expression-based mutation scripts while letting the engine use one mutation path internally.

## Parameters

Parameters are represented as typed attribute values:

```sh
cefas execute-statement \
  --statement "SELECT * FROM Stores WHERE haversine(loc, :center) <= :radius" \
  --parameters '[{":center":{"M":{"lat":{"N":"-23.55"},"lon":{"N":"-46.63"}}}}, {":radius":{"N":"1500"}}]'
```

## Planner behavior

The planner tries to push work to the cheapest available source:

- Primary key lookup.
- Sort-key range.
- Built-in secondary index.
- Spatial index.
- Plugin candidate set.
- Table scan when no better path exists.

Use `cefas explain` to inspect the plan before assuming an index is active.


## Interfaces: HTTP And gRPC

Rendered: https://docs.cefasdb.com/docs/Interfaces-HTTP-And-GRPC
Markdown: https://docs.cefasdb.com/wiki/Interfaces-HTTP-And-GRPC.md

# Interfaces: HTTP And gRPC

CefasDB exposes both HTTP/JSON and gRPC. HTTP is useful for simple clients and diagnostics. gRPC is the main typed API and the transport used by the CLI and Go client.

## HTTP

The HTTP listener is configured with `-http`, defaulting to `:8080`.

Common local pattern:

```sh
cefasdb -data ./cefas-data -http :8080 -grpc :9090
```

Example table creation over HTTP:

```sh
curl -X POST localhost:8080/v1/tables \
  -d '{"name":"events","keySchema":{"pk":"user_id","sk":"ts"}}'
```

Example write:

```sh
curl -X POST localhost:8080/v1/PutItem \
  -d '{"table":"events","item":{"user_id":{"S":"alice"},"ts":{"N":"100"},"event":{"S":"login"}}}'
```

## gRPC

The gRPC listener is enabled with `-grpc`.

The protobuf definition lives at:

```text
pkg/protocol/cefas.proto
```

The Go client wraps generated gRPC calls under:

```text
pkg/client
```

## TLS and auth

For development, clients commonly use plaintext:

```sh
cefas --endpoint 127.0.0.1:9090 --insecure list-tables
```

For production, configure TLS and bearer-token authentication. Use `--ca`, `--token`, or `--token-file` from the CLI as needed.

## Streaming

Some operations can stream result rows. The CLI can buffer streams into a single response with `--no-stream` when scripts need one JSON value instead of a stream.

## Consistency options

`GetItemRequest`, `QueryRequest`, and `ScanRequest` each carry a `Consistency` enum:

```proto
enum Consistency {
  CONSISTENCY_UNSPECIFIED = 0;
  CONSISTENCY_EVENTUAL    = 1; // local read on whichever node served the call
  CONSISTENCY_STRONG      = 2; // routed to the leader + barrier
}
```

Over gRPC, set the field directly. Over HTTP via the gRPC gateway, the same field surfaces as a string in the JSON body:

```sh
# Eventual (default — field omitted)
curl -X POST localhost:8080/v1/GetItem \
  -d '{"table":"events","key":{"user_id":{"S":"alice"},"ts":{"N":"100"}}}'

# Strong read
curl -X POST localhost:8080/v1/GetItem \
  -d '{"table":"events","key":{"user_id":{"S":"alice"},"ts":{"N":"100"}},"consistency":"CONSISTENCY_STRONG"}'
```

From the Go client the choice is a per-call option:

```go
item, err := c.GetItem(ctx, "events", key, client.GetOptions{Strong: true})

rows, err := c.Scan(ctx, "events", client.ScanOptions{Strong: true, Limit: 1000})

iter := c.Query("events").Pk(types.S("alice")).Strong().Stream(ctx)
```

Writes do not take a consistency option — they always travel to the shard leader and are acknowledged after a quorum of voters commits the entry. A write that hits a follower returns `client.ErrNotLeader`; the client honours the leader hint published by the cluster surface and the `-raft-http-peers` redirect map.

See [Concepts: Storage and Replication](Concepts-Storage-And-Replication#consistency-model) for the replication path and [Operations Configuration](Operations-Configuration#consistency-and-durability-tuning) for the server-side tuning knobs.


## Operations Overview

Rendered: https://docs.cefasdb.com/docs/Operations-Overview
Markdown: https://docs.cefasdb.com/wiki/Operations-Overview.md

# Operations Overview

Operating CefasDB means managing data directories, configuration, identity, backups, metrics, logs, traces, raft membership, benchmark evidence, and upgrades. The database is a single binary, but production safety comes from the surrounding discipline.

## Operational baseline

- Pin binary or image versions.
- Use persistent storage for the data directory.
- Enable authentication outside trusted development networks.
- Scrape metrics.
- Keep logs centralized.
- Test backup and restore.
- Test restart and failover behavior, and verify the [consistency model](Concepts-Storage-And-Replication#consistency-model) under leader loss.
- Document the active deployment mode.
- Keep benchmark results tied to reproducible commands and deployment shape.

## Data directory

The data directory contains the durable database state. In Docker and Kubernetes it must be mounted on persistent storage. In raft mode, raft state also needs stable storage.

## Config sources

CefasDB accepts flags, environment variables, and YAML config. The practical order is:

1. Use config files for stable environment-level settings.
2. Use environment variables for deployment-time injection.
3. Use flags for local development and explicit overrides.

## Incident triage

Start with:

```sh
cefas cluster status
cefas list-tables
cefas describe-table --table-name <table>
```

Then check:

- Server logs.
- Metrics endpoint.
- Disk capacity and I/O latency.
- Raft leader and peer health.
- Backup availability.
- Auth errors and token issuer/audience mismatch.

## Related pages

Read [Configuration](Operations-Configuration), [Backup And Restore](Operations-Backup-And-Restore), [Observability](Operations-Observability), [Benchmark Results](Operations-Benchmark-Results), and [Security And Privacy](Operations-Security-And-Privacy).


## Operations: Configuration

Rendered: https://docs.cefasdb.com/docs/Operations-Configuration
Markdown: https://docs.cefasdb.com/wiki/Operations-Configuration.md

# Operations: Configuration

Configuration controls storage paths, HTTP and gRPC listeners, raft, identity, metrics, tracing, and TLS.

## Common server flags

| Flag | Purpose |
| --- | --- |
| `-data` | Pebble data directory. |
| `-http` | HTTP listen address. |
| `-grpc` | gRPC listen address. |
| `-fsync` | Fsync on commit. |
| `-config` | YAML config file. |
| `-metrics-disabled` | Disable metrics endpoint. |
| `-tracing-endpoint` | OTLP/gRPC collector endpoint. |

## Storage profile flags

Storage profiles set Pebble defaults for local cache, memtable, compaction, L0, bytes-per-sync, and WAL bytes-per-sync behavior.

| Flag | Environment | Purpose |
| --- | --- | --- |
| `-storage-profile` | `CEFAS_STORAGE_PROFILE` | Select `default`, `balanced`, or `write-heavy`. |
| `-raft-storage-profile` | `CEFAS_RAFT_STORAGE_PROFILE` | Select the profile for the separate raft log/stable store. |
| `-storage-backpressure` | `CEFAS_STORAGE_BACKPRESSURE_ENABLED` | Enable LSM-metric based write backpressure. |
| `-storage-backpressure-reject-critical` | `CEFAS_STORAGE_BACKPRESSURE_REJECT_CRITICAL` | Reject writes when the storage pressure state is critical. |

The Docker Compose cluster uses `balanced` by default because it is stable under common Docker Desktop memory limits. For performance runs on machines with a larger Docker VM memory allocation, opt into `write-heavy`:

```sh
STORAGE_PROFILE=write-heavy \
docker compose -p cefas-cluster -f deploy/docker-compose.cluster.yml up --build -d
```

Use `balanced` for portable local development and repeatable CI-like checks. Use `write-heavy` for sustained ingest benchmarks when the host has enough memory for larger Pebble caches and memtables.

## Raft flags

| Flag | Purpose |
| --- | --- |
| `-raft-bind` | Raft TCP bind address. |
| `-raft-id` | Stable raft server ID. |
| `-raft-path` | Raft state path. |
| `-raft-bootstrap` | Bootstrap a new cluster. |
| `-raft-peers` | Comma-separated `id=addr` peers. |
| `-raft-http-peers` | Comma-separated peer HTTP URLs. |

## Multi-raft flags

| Flag | Purpose |
| --- | --- |
| `-shards` | Number of shards. |
| `-mux` | Shared mux transport address. |

## Identity flags

| Flag | Purpose |
| --- | --- |
| `-identity-jwks-url` | JWKS endpoint. |
| `-identity-issuer` | Expected issuer. |
| `-identity-audience` | Expected audience. |
| `-identity-clock-skew` | Allowed clock skew. |

## TLS flags

| Flag | Purpose |
| --- | --- |
| `-tls-cert` | gRPC TLS certificate. |
| `-tls-key` | gRPC TLS private key. |
| `-mtls-ca` | Client CA bundle for mTLS. |

## Consistency and durability tuning

CefasDB exposes the consistency surface to operators in three places: the per-call enum on `GetItem`/`Query`/`Scan` (clients control this — see [Interfaces HTTP and gRPC](Interfaces-HTTP-And-GRPC#consistency-options)), the durability lever, and the raft tunable block. This section covers the server-side knobs only.

### Durability

| Knob | Default | Effect |
| --- | --- | --- |
| `-fsync` | `false` | When `true`, every commit calls fsync on the WAL before acknowledging. Trades throughput for crash-immediate durability. Helm chart value: `cluster.fsyncOnCommit`. |
| `-storage-bytes-per-sync` | `0` (profile-driven) | Pebble `BytesPerSync` — how often data is flushed to the page cache. Smaller = more frequent disk pressure, lower crash window for unsynced data. |
| `-storage-wal-bytes-per-sync` | `0` (profile-driven) | Same idea, scoped to the WAL. |

### Raft timeouts (internal/replication)

The `Config` struct in `internal/replication/db.go` ships with conservative defaults. They are not exposed as CLI flags today; they are configured via the YAML `cluster` block when present, otherwise the defaults apply.

| Knob | Default | Effect |
| --- | --- | --- |
| `HeartbeatMS` | 1000 ms | How often the leader sends a heartbeat. Lower = faster failure detection, more CPU and network. |
| `ElectionMS` | 1000 ms | How long a follower waits without a heartbeat before starting an election. |
| `LeaderLeaseMS` | 500 ms | Lease validity used for leadership steady-state checks. |
| `CommitMS` | 50 ms | Maximum time the leader waits before applying a batch of committed entries. |
| `SnapshotEntries` | 8192 | Raft log entries between snapshots. Larger = fewer snapshot writes, longer log replay on restart. |

### Membership

Quorum size is `floor(N/2) + 1` where `N` is the voter count. The CLI manages it through:

```sh
cefas cluster add-voter    --id node-d --addr node-d:9091
cefas cluster remove-server --id node-c
cefas cluster status
```

Per-shard scoping for the same calls is available via `--shard-id <n>` or `--all-shards`.

### Storage profile

The `-storage-profile` selector (`default`, `balanced`, `write-heavy`) presizes block cache, memtable, and concurrent-compaction limits. For raft-backed clusters, set `-raft-storage-profile=raft` to keep raft metadata in a separate, smaller-cache Pebble instance — this isolates raft log churn from the table working set.

### Helm chart values

The chart at `dist/helm/cefas/` exposes the consistency-relevant defaults under `cluster`:

```yaml
replicaCount: 1

cluster:
  shards: 1
  bootstrap: true
  fsyncOnCommit: false
```

Set `replicaCount: 3` and `cluster.bootstrap: true` only on the first install — subsequent pods join the cluster via the standard StatefulSet identity. Flip `cluster.fsyncOnCommit: true` when crash safety is more important than write throughput.

## Config advice

Keep local commands explicit. Keep production config declarative. Avoid relying on implicit defaults for data paths, identity, TLS, and raft membership.


## Operations: Backup And Restore

Rendered: https://docs.cefasdb.com/docs/Operations-Backup-And-Restore
Markdown: https://docs.cefasdb.com/wiki/Operations-Backup-And-Restore.md

# Operations: Backup And Restore

Backups are named recovery points. They let operators checkpoint one or more tables and restore a source table into a target table name.

## Create a backup

```sh
cefas create-backup \
  --backup-name nightly \
  --table-name Users
```

Multiple tables can be included when the command supports repeated table names.

## List backups

```sh
cefas list-backups
```

The response includes backup names and metadata needed to identify recovery points.

## Restore a table

```sh
cefas restore-table-from-backup \
  --backup-name nightly \
  --source-table-name Users \
  --target-table-name Users_restored
```

Restoring into a new target table is safer than overwriting a live table. Verify the restored data before cutting application traffic over.

## Runbook

1. Confirm the source table and backup name.
2. Restore into a new target table.
3. Describe the restored table.
4. Query known keys.
5. Run application-level verification.
6. Switch traffic or export only after verification.

## Backup policy

Choose backup frequency based on recovery point objective. Store backups on durable storage and test restore regularly. A backup that has never been restored is not a proven backup.

Backups capture a Pebble checkpoint of the node they ran on. In raft mode the safest place to take a scheduled backup is the leader, or a follower whose applied log is known to be caught up; a backup taken from a stale follower will be missing whatever entries had not yet been applied locally. The restore path replays the checkpoint into a new table, so restore-time consistency reflects the source checkpoint's position in the raft log — see [Concepts: Storage and Replication](Concepts-Storage-And-Replication#consistency-model).


## Operations: Observability

Rendered: https://docs.cefasdb.com/docs/Operations-Observability
Markdown: https://docs.cefasdb.com/wiki/Operations-Observability.md

# Operations: Observability

CefasDB exposes logs, metrics, and tracing hooks. Use them together: logs explain what happened, metrics show whether it is still happening, and traces show where time was spent.

## Logs

The server logs startup configuration, listener setup, storage errors, raft events, and request-level failures. In containers, collect stdout and stderr with the platform log collector.

## Metrics

Metrics are served on the HTTP listener unless disabled. Prometheus configuration examples live under:

```text
deploy/prometheus/
```

Grafana dashboard examples live under:

```text
deploy/grafana/
```

Track at least:

- Request rate.
- Request latency.
- Error rate.
- Storage write latency.
- Storage read latency.
- Raft leader and replication health.
- Backup and restore outcomes.
- Process memory and file descriptors.

## Tracing

Tracing is configured with an OTLP/gRPC endpoint. Use it when request latency must be broken down across API handlers, storage, raft, and plugin behavior.

## Debug workflow

1. Identify the failing operation.
2. Check logs for direct errors.
3. Check metrics for saturation or spikes.
4. Use tracing to isolate slow layers.
5. Use `cefas explain` for query-specific issues.
6. Use `cefas cluster status` for raft or membership issues.


## Operations: Benchmark Results

Rendered: https://docs.cefasdb.com/docs/Operations-Benchmark-Results
Markdown: https://docs.cefasdb.com/wiki/Operations-Benchmark-Results.md

# Operations: Benchmark Results

This page records reproducible local benchmark results for CefasDB. The goal is to separate functional smoke tests from sustained load evidence and to make every result traceable to a command, deployment mode, and workload shape.

## Test environment

The results below were captured on June 10, 2026 on a local development machine running Docker Desktop. The cluster tests used three CefasDB server containers in raft mode, each with its own persistent Docker volume.

Original container VM snapshot:

```text
Architecture: arm64
CPUs: 16
Memory: 7.653 GiB
Deployment: Docker Compose, 3 raft voters
Client path: local gRPC over localhost port mappings
```

The storage-profile comparison below also includes a later run on the same machine after Docker Desktop memory was raised to about 63.42 GiB.

These numbers are local benchmark results, not a hosted-service SLA. They are useful for validating the engine path, raft replication path, client behavior, and the shape of tail latency under controlled local pressure.

## Load tester

The benchmark client uses the Go gRPC client directly. It supports:

- Leader discovery across multiple gRPC endpoints.
- Batched writes through `BatchWriteItem`.
- Point reads through `GetItem`.
- Fixed-volume runs.
- Duration-based runs.
- Optional target rates for soak tests.
- Latency sampling for long-running workloads.
- JSON summary output.

The main flags used by these tests were:

```text
-addrs localhost:9191,localhost:9192,localhost:9193
-batch-size <items-per-write-rpc>
-workers <write-workers>
-read-workers <read-workers>
-payload-bytes <payload-size>
-write-duration <duration>
-read-duration <duration>
-write-rate <items-per-second>
-read-rate <reads-per-second>
-json-output <path>
```

## Single-node direct gRPC baseline

The first direct gRPC test was run against a single local CefasDB server. It used one table, 1,000,000 writes, 200,000 point reads, batch writes of 500 items, 64 write workers, and 64 read workers.

Result:

```text
write units:      1,000,000
write RPCs:       2,000
write elapsed:    4.93s
write throughput: 202,830 items/s
write errors:     0
write p50:        114.974ms
write p95:        405.899ms
write p99:        459.007ms

read units:       200,000
read elapsed:     2.026s
read throughput:  98,709 reads/s
read errors:      0
read p50:         611us
read p95:         1.044ms
read p99:         1.285ms
read found:       200,000/200,000
```

Conclusion: the direct gRPC path removes the CLI subprocess overhead and exposes a much stronger engine baseline. Reads were especially strong, with sub-millisecond p50 and low millisecond p99 in the baseline run.

## Three-node raft cluster bulk run

The three-node raft cluster test used Docker Compose with three raft voters. The load tester discovered the current leader and wrote through that node. The workload used 2,000,000 writes, 500,000 point reads, batch writes of 500 items, 64 write workers, 64 read workers, and a 256-byte payload attribute.

Result:

```text
write units:      2,000,000
write RPCs:       4,000
write elapsed:    21.55s
write throughput: 92,809 items/s
write errors:     0
write p50:        270.691ms
write p95:        705.782ms
write p99:        1.175971s

read units:       500,000
read elapsed:     7.822s
read throughput:  63,919 reads/s
read errors:      0
read p50:         907us
read p95:         1.513ms
read p99:         3.385ms
read found:       500,000/500,000
```

Follower read validation:

```text
n1 follower read throughput: 52,857 reads/s, found 100,000/100,000
n3 follower read throughput: 53,995 reads/s, found 100,000/100,000
```

Conclusion: with raft replication enabled, CefasDB sustained a high write rate and still served low-latency point reads. The write p99 reflects batch RPC latency, not per-item latency; each write RPC in this run carried 500 items.

## Failover validation

The current leader was stopped, the remaining nodes elected a new leader, and the client then ran a smaller write/read workload against the surviving cluster.

Result:

```text
new leader:       n3
write units:      100,000
write elapsed:    412ms
write throughput: 242,772 items/s
write errors:     0
write p50:        29.483ms
write p95:        50.120ms
write p99:        52.528ms

read units:       10,000
read throughput:  48,242 reads/s
read errors:      0
read p50:         619us
read p95:         1.042ms
read p99:         1.335ms
read found:       10,000/10,000
```

Conclusion: the cluster accepted writes after leader loss and preserved read correctness for the tested keyspace.

## Controlled soak test

The controlled soak test ran for ten minutes total: five minutes of writes followed by five minutes of reads. It intentionally used target rates rather than maximum pressure so the run could verify sustained stability without turning the test into only disk saturation.

Workload:

```text
cluster:             3 raft voters
table:               ClusterSoak5m_20260610
write duration:      5m
read duration:       5m
write target rate:   10,000 items/s
read target rate:    20,000 reads/s
batch size:          250
write workers:       32
read workers:        64
payload bytes:       64
latency sample rate: 1 sample per 20 RPCs
```

Write result:

```text
write units:      3,000,000
write RPCs:       12,000
write elapsed:    300.001s
write throughput: 9,999.96 items/s
write errors:     0
write p50:        6.811ms
write p95:        10.687ms
write p99:        13.920ms
write max:        25.694ms
```

Read result:

```text
read units:       6,000,000
read RPCs:        6,000,000
read elapsed:     300.026s
read throughput:  19,998.26 reads/s
read errors:      0
read found:       6,000,000/6,000,000
read p50:         270us
read p95:         453us
read p99:         665us
read max:         6.807ms
```

Post-soak validation:

```text
read units:       10,000
read throughput:  9,991 reads/s
read errors:      0
read found:       10,000/10,000
read p50:         206us
read p95:         337us
read p99:         1.828ms
```

The JSON report for this run was written locally as:

```text
/tmp/cefas-bench/soak_5m_20260610.json
```

Conclusion: the controlled soak passed cleanly. CefasDB sustained the target write and read rates for the full duration, returned every sampled key, and kept read tail latency below 1 ms p99 during the read phase.

## Storage profile comparison

The storage profile tests were run after adding explicit Pebble tuning profiles, LSM backpressure, raft/data store separation, admin compaction, and load-test scripts. The workload used a three-node raft cluster, 64 write workers, 64 read workers, 500-item write batches, and a 256-byte payload attribute.

The two practical scenarios are:

| Scenario | Docker memory | Profile | Outcome |
| --- | ---: | --- | --- |
| Portable local default | 7.653 GiB | `balanced` | Completed full bulk and 30-minute soak. |
| Performance workstation | 63.42 GiB | `write-heavy` | Completed full bulk and 30-minute soak with better write latency. |

The failed case was also useful: with only 7.653 GiB available to Docker, `write-heavy` completed the bulk phase but the third raft node was OOM-killed during snapshot pressure in the soak phase. The error was environmental memory pressure, not a data correctness failure; the bulk phase had zero write/read errors and found all 500,000 keys.

### Balanced profile on Docker Desktop memory defaults

Command shape:

```sh
STORAGE_PROFILE=balanced \
PROJECT=cefas-loadtest-balanced-013 \
RESET_CLUSTER=1 \
RESULT_DIR=/tmp/cefas-bench/load-balanced-0.1.3-20260610T164320Z \
scripts/bench_cluster.sh
```

Bulk result:

```text
write units:      2,000,000
write elapsed:    9.528s
write throughput: 209,900 items/s
write errors:     0
write p50:        146.717ms
write p95:        207.670ms
write p99:        258.980ms

read units:       500,000
read elapsed:     7.627s
read throughput:  65,559 reads/s
read errors:      0
read p50:         934us
read p95:         1.482ms
read p99:         1.968ms
read found:       500,000/500,000
```

Soak result:

```text
write duration:   15m
write units:      13,500,000
write throughput: 14,999.97 items/s
write errors:     0
write p50:        17.125ms
write p95:        27.144ms
write p99:        41.636ms

read duration:    15m
read units:       18,000,000
read throughput:  19,996.97 reads/s
read errors:      0
read p50:         287us
read p95:         702us
read p99:         2.160ms
read found:       18,000,000/18,000,000
```

Final memory snapshot:

```text
n1:     528.9 MiB
n2:     1.621 GiB
n3:     1.141 GiB
leader: n2
```

Conclusion: `balanced` is the right portable default. It completed the sustained benchmark under the local Docker memory budget and kept read latency low.

### Write-heavy profile with a 64 GiB Docker VM

Command shape:

```sh
STORAGE_PROFILE=write-heavy \
PROJECT=cefas-loadtest-writeheavy-64g \
RESET_CLUSTER=1 \
RESULT_DIR=/tmp/cefas-bench/load-writeheavy-64g-0.1.3-20260610T180514Z \
scripts/bench_cluster.sh
```

Bulk result:

```text
write units:      2,000,000
write elapsed:    8.739s
write throughput: 228,855 items/s
write errors:     0
write p50:        134.849ms
write p95:        189.938ms
write p99:        231.759ms

read units:       500,000
read elapsed:     7.442s
read throughput:  67,183 reads/s
read errors:      0
read p50:         925us
read p95:         1.417ms
read p99:         1.731ms
read found:       500,000/500,000
```

Soak result:

```text
write duration:   15m
write units:      13,500,000
write throughput: 14,999.97 items/s
write errors:     0
write p50:        15.480ms
write p95:        20.760ms
write p99:        38.888ms

read duration:    15m
read units:       17,999,999
read throughput:  19,997.01 reads/s
read errors:      0
read p50:         273us
read p95:         900us
read p99:         1.364ms
read found:       17,999,999/17,999,999
```

Final memory snapshot:

```text
n1:     2.590 GiB
n2:     2.651 GiB
n3:     4.436 GiB
leader: n3
```

Conclusion: `write-heavy` is the faster ingest profile when Docker has enough memory. In this run it improved bulk write throughput and reduced sustained write p95 compared with `balanced`, while keeping all nodes alive.

### Operational choice

Use `balanced` when:

- The cluster must run on default Docker Desktop memory limits.
- The run is a developer smoke, CI-like check, or portable reproduction.
- Stability matters more than maximum ingest throughput.

Use `write-heavy` when:

- Docker or the target host has enough memory for larger Pebble caches and memtables.
- The workload is sustained ingest or benchmark-oriented.
- Resource monitoring is available for memory, compaction, snapshots, and raft health.

## Overall conclusion

The current implementation is no longer just passing smoke tests. It has early evidence of a strong operational database core:

- Direct gRPC single-node writes exceeded 200,000 items/s in a short maximum-pressure run.
- A three-node raft cluster sustained more than 200,000 replicated write items/s in the latest bulk profile runs.
- A controlled raft soak sustained 15,000 writes/s and about 20,000 reads/s for thirty minutes total with zero client-visible errors.
- Point reads remained consistently low-latency in both bulk and soak tests.
- Follower reads returned the expected data after raft replication.
- The cluster accepted new writes after manual leader loss and re-election.

The strongest result is read stability: point reads remained sub-millisecond at p50 and p99 in the controlled soak. The highest-pressure replicated write test also showed strong throughput, but its p99 must be interpreted as batch RPC latency because each write RPC represented hundreds of items.

The next benchmark milestones are:

- One-hour and eight-hour soak runs.
- Mixed read/write workloads running at the same time.
- Runs with `-fsync` enabled.
- Larger payload sizes and larger table cardinality.
- Compaction and snapshot pressure tracking.
- Remote-client tests instead of loopback-only clients.
- Resource charts for CPU, memory, disk writes, and raft replication lag.


## Operations: Security And Privacy

Rendered: https://docs.cefasdb.com/docs/Operations-Security-And-Privacy
Markdown: https://docs.cefasdb.com/wiki/Operations-Security-And-Privacy.md

# Operations: Security And Privacy

Security in CefasDB has three layers: transport, identity, and operation authorization. Privacy-sensitive audience workflows add a fourth layer: avoid exporting raw member identity when aggregate answers are enough.

## Transport

Use TLS for production gRPC. Plaintext `--insecure` is for local development and trusted test networks only.

## Identity

Configure JWKS, issuer, and audience so requests must present a valid bearer token. Keep clock skew small and monitor auth failures after identity-provider changes.

## Authorization

Use scoped tokens. Separate table read, table write, table admin, plugin, backup, and cluster operations. Do not give broad admin tokens to application services.

## Audience privacy guarantees

The audience workflow is designed around server-side selection and aggregate reporting:

- There is no general `list-audience-members` reporting surface.
- Approximate reach uses HyperLogLog.
- Dedup and frequency cap return boolean verdicts, not stored keys or counters.
- Aggregation supports `--min-group-size` to avoid small cohort disclosure.
- Plugin index state stays server-side.

## Threat model

CefasDB reduces accidental raw identity export from audience workflows. It does not provide formal differential privacy, does not eliminate all timing side channels, and cannot prevent linkage attacks against external datasets. Treat privacy floors as a minimum operational control, not a mathematical guarantee.

## Practical checklist

- Enable TLS.
- Enable bearer-token validation.
- Use least-privilege scopes.
- Keep admin operations off public networks.
- Set `--min-group-size` for audience reporting.
- Avoid building raw export commands for cohorts.
- Audit logs around backup, restore, and admin actions.


## API Reference

Rendered index: https://docs.cefasdb.com/docs/api
Machine index: https://docs.cefasdb.com/api/index.json

- [pkg/client](https://docs.cefasdb.com/docs/api/pkg-client): <!-- Code generated by gomarkdoc. DO NOT EDIT --
- [pkg/plugin](https://docs.cefasdb.com/docs/api/pkg-plugin): <!-- Code generated by gomarkdoc. DO NOT EDIT --
- [pkg/plugin/distancecontract](https://docs.cefasdb.com/docs/api/pkg-plugin-distancecontract): <!-- Code generated by gomarkdoc. DO NOT EDIT --
- [pkg/plugin/testharness](https://docs.cefasdb.com/docs/api/pkg-plugin-testharness): <!-- Code generated by gomarkdoc. DO NOT EDIT --
- [pkg/protocol](https://docs.cefasdb.com/docs/api/pkg-protocol): <!-- Code generated by gomarkdoc. DO NOT EDIT --
- [pkg/types](https://docs.cefasdb.com/docs/api/pkg-types): <!-- Code generated by gomarkdoc. DO NOT EDIT --