# CefasDB Documentation Full Context Canonical docs site: https://docs.cefasdb.com Source of truth: https://github.com/CefasDB/cefasdb-docs/wiki This file is generated from the GitHub Wiki markdown copied during the documentation build. Use canonical rendered links when citing user-facing docs. ## CefasDB Rendered: https://docs.cefasdb.com/docs/Home Markdown: https://docs.cefasdb.com/wiki/Home.md # CefasDB CefasDB is a high-performance NoSQL key-value and document database written in Go. It is designed for predictable millisecond-class access, horizontal scale, and a small operational footprint while giving teams direct control over deployment, storage, replication, and extensions. The engine combines primary-key access, typed document attributes, SQL and PartiQL-style querying, geospatial indexes, similarity search, plugin-backed indexes, raft replication, backup and restore, and a CLI built around table, item, query, plugin, and cluster operations. The project is intentionally split into a small core and a broad plugin surface. The core owns tables, items, conditional writes, TTL, streams, secondary index lifecycle, query planning, storage, replication, and API transport. Plugins own specialized search and ads-workflow behavior: bloom filters, trigram text search, vector LSH, geohash, HyperLogLog, Count-Min Sketch, distance operators, deduplication, frequency caps, and privacy-aware aggregation. ## Read by section The wiki is organized into five sections. Each section starts with an overview page and then moves into focused pages that can be read independently. The [Get Started](Get-Started-Overview) section shows how to build and run CefasDB locally, in Docker, with Docker Compose, and on Kubernetes. Start there if you want a server process running before reading internals. The [Concepts and Architecture](Concepts-Overview) section explains the data model, storage layout, query planner, indexing model, replication path, deployment modes, and security model. Read it before deciding how CefasDB should fit into an application or platform. The [Plugins](Plugins-Overview) section documents the plugin boundary and every built-in plugin family. It covers how to write a plugin, how the import graph is enforced, how index plugins are configured, and how audience workflows are composed. The [Interfaces](Interfaces-Overview) section covers the CLI, SQL and PartiQL surface, HTTP and gRPC APIs, and the Go client package. Read it when integrating CefasDB from another process. The [Operations](Operations-Overview) section covers configuration, backup and restore, observability, benchmark results, security, privacy, and production runbooks. ## Quick paths | Intent | Start here | | --- | --- | | I want to run CefasDB on my laptop | [Run Locally](Get-Started-Run-Locally) | | I want a one-container server | [Run In Docker](Get-Started-Run-In-Docker) | | I want a local multi-node cluster | [Run With Docker Compose](Get-Started-Run-With-Docker-Compose) | | I want Kubernetes manifests | [Run In Kubernetes](Get-Started-Run-In-Kubernetes) | | I want the architecture in one pass | [Architecture Overview](Concepts-Architecture-Overview) | | I want to model data | [Data Model](Concepts-Data-Model) | | I want query and index behavior | [Query And Indexes](Concepts-Query-And-Indexes) | | I want to author a plugin | [Plugin Authoring](Plugins-Authoring) | | I want CLI commands | [CLI](Interfaces-CLI) | | I want backup and restore | [Backup And Restore](Operations-Backup-And-Restore) | | I want benchmark evidence | [Benchmark Results](Operations-Benchmark-Results) | | I want ads audience privacy details | [Security And Privacy](Operations-Security-And-Privacy) | ## Source code and issues The source code lives at [github.com/CefasDB/cefasdb-core](https://github.com/CefasDB/cefasdb-core). Issues and pull requests are tracked there. The repository intentionally does not keep long-form documentation in `docs/`; the GitHub Wiki is the canonical documentation surface. ## Get Started: Overview Rendered: https://docs.cefasdb.com/docs/Get-Started-Overview Markdown: https://docs.cefasdb.com/wiki/Get-Started-Overview.md # Get Started: Overview This section is for readers who want CefasDB running before they study internals. By the end of the section you can start a single node, create a table, write an item, query it, run the server in Docker, run a small replicated topology with Docker Compose, and understand how the Helm chart maps those same concepts onto Kubernetes. CefasDB ships as two Go binaries: - `cefasdb`, the database process. - `cefas`, the CLI that talks to the server over gRPC and exposes table, item, query, plugin, and cluster operations. The topology changes how CefasDB is operated, not how the data model works. A local process, a Docker container, a three-node Compose cluster, and a Kubernetes StatefulSet all expose the same table, item, query, plugin, backup, and cluster concepts. ## How to install Three install paths are supported and produce equivalent binaries. Pull the server image: ```sh docker pull ghcr.io/cefasdb/cefasdb:latest ``` Install the CLI from npm: ```sh npm install -g @cefasdb/cefas ``` Build both binaries from source. Go 1.25+ is required; the Makefile drives the build: ```sh git clone https://github.com/CefasDb/cefasdb-core cd cefasdb-core make build # produces ./bin/cefasdb and ./bin/cefas ``` The server lives at `cmd/cefasdb` and the CLI at `cmd/cefasctl`. `make help` lists every developer target. ## What you will know after this section You will know which ports matter. The HTTP API defaults to `:8080`. The gRPC API is enabled with `-grpc`, commonly `:9090`. Raft is enabled with `-raft-bind`, and multi-raft deployments use a shared mux listener with `-mux`. Metrics are served on the HTTP listener unless disabled. You will know the first table workflow. A table has a partition key and an optional sort key. Items are typed attribute maps, so documents can evolve without a migration for every new field. You will know where data lives. In single-node mode CefasDB stores table metadata, items, secondary indexes, TTL state, backups, and raft state under the configured data directory. Docker and Kubernetes deployments must mount that directory on persistent storage if data should survive process or node restarts. You will know which deployment mode fits the current task. Local binary is best for development. Single Docker is best for demos and local integration. Docker Compose is the smallest useful replicated lab. Kubernetes is the production-oriented path when you need managed restarts, persistent volumes, service discovery, and operational policy. You will know which consistency guarantee each deployment mode gives. Single-node and single-container Docker offer local durability only. Raft, multi-raft, and Kubernetes-backed deployments commit to a quorum and let clients opt into strong reads per call. See the per-mode comparison in [Deployment Modes](Concepts-Deployment-Modes#consistency-guarantees). ## Reading order Read [Run Locally](Get-Started-Run-Locally) first. It introduces the data model and CLI without Docker or raft. Read [Run In Docker](Get-Started-Run-In-Docker) next if you need a container image. It focuses on volume mounts and server flags. Read [Run With Docker Compose](Get-Started-Run-With-Docker-Compose) when you want raft behavior and leader failover on a laptop. Read [Run In Kubernetes](Get-Started-Run-In-Kubernetes) when you need the Helm chart and production-shaped configuration. ## Prerequisites For local development you need a Unix-like shell, Go 1.25 or newer, and optionally `jq` for readable JSON examples. For Docker you need a working Docker daemon. For Docker Compose you need Docker Compose v2 and enough disk for multiple data directories. For Kubernetes you need `kubectl`, Helm 3, and a cluster that can provision persistent volumes. ## Get Started: Run Locally Rendered: https://docs.cefasdb.com/docs/Get-Started-Run-Locally Markdown: https://docs.cefasdb.com/wiki/Get-Started-Run-Locally.md # Get Started: Run Locally This page starts a single CefasDB node on a laptop and exercises the table and item lifecycle. It is the best first run because it keeps every moving part visible: one process, one data directory, one HTTP listener, and one gRPC listener. ## Build the binaries Pick either path. The Makefile is the canonical entry point and produces both binaries under `./bin`: ```sh git clone https://github.com/CefasDb/cefasdb-core cd cefasdb-core make build # ./bin/cefasdb (server) + ./bin/cefas (CLI) ``` Or install only the CLI from npm and run the server from the public image (see [Run In Docker](Get-Started-Run-In-Docker)): ```sh npm install -g @cefasdb/cefas ``` The server entry point is `cmd/cefasdb/main.go`. The CLI entry point is `cmd/cefasctl/main.go`, and the root command is registered in `cmd/cefasctl/cmd/root.go`. ## Start a single node ```sh rm -rf ./cefas-data ./bin/cefasdb \ -data ./cefas-data \ -http :8080 \ -grpc :9090 ``` The server creates the data directory if it does not exist. HTTP requests go to `localhost:8080`. CLI commands use gRPC, so the examples below point the CLI at `127.0.0.1:9090` and mark the connection plaintext with `--insecure`. ## Create a table ```sh ./bin/cefas --endpoint 127.0.0.1:9090 --insecure create-table \ --table-name Users \ --attribute-definitions AttributeName=pk,AttributeType=S \ --attribute-definitions AttributeName=sk,AttributeType=S \ --key-schema AttributeName=pk,KeyType=HASH \ --key-schema AttributeName=sk,KeyType=RANGE \ --billing-mode PAY_PER_REQUEST ``` CefasDB accepts `--billing-mode` as an ignored compatibility flag for existing command scripts. The engine is self-hosted and does not bill by request. ## Write and read an item ```sh ./bin/cefas --endpoint 127.0.0.1:9090 --insecure put-item \ --table-name Users \ --item '{"pk":{"S":"USER#1"},"sk":{"S":"PROFILE"},"name":{"S":"Ova"},"city":{"S":"Santos"}}' ./bin/cefas --endpoint 127.0.0.1:9090 --insecure get-item \ --table-name Users \ --key '{"pk":{"S":"USER#1"},"sk":{"S":"PROFILE"}}' ``` The wire shape is a typed attribute map. Strings are `{ "S": "..." }`, numbers are `{ "N": "..." }`, maps are `{ "M": { ... } }`, and lists are `{ "L": [ ... ] }`. ## Query and scan Use `query` when you know the partition key or have a predicate the planner can route. Use `scan` when you deliberately want to stream the table. ```sh ./bin/cefas --endpoint 127.0.0.1:9090 --insecure query \ --table-name Users \ --where "pk = 'USER#1'" ./bin/cefas --endpoint 127.0.0.1:9090 --insecure scan \ --table-name Users ``` ## Stop and restart Stop the server with `Ctrl-C`, then restart it with the same `-data` path. The table and item still exist because they were persisted under `./cefas-data`. ```sh ./bin/cefas --endpoint 127.0.0.1:9090 --insecure list-tables ``` If you delete `./cefas-data`, you reset the local node. ## A note on durability and consistency Single-node mode has no raft and no replicas. Writes go through Pebble's WAL on the local disk; the `--consistency` knob on get/query/scan still parses but `STRONG` and `EVENTUAL` both read the same local store, so the answer is identical. The one knob that matters here is `-fsync`. With the default `false`, the WAL flushes asynchronously and a process crash can lose a few in-flight writes. Add `-fsync` to the start command if you want every acknowledged write to be on disk before the response returns: ```sh ./bin/cefasdb -data ./cefas-data -http :8080 -grpc :9090 -fsync ``` For replication and the per-call consistency knob, move to [Run With Docker Compose](Get-Started-Run-With-Docker-Compose#consistency-model) or [Run In Kubernetes](Get-Started-Run-In-Kubernetes). ## Next steps Read [Interfaces CLI](Interfaces-CLI) for the full command list. Read [Data Model](Concepts-Data-Model) before designing a schema. Read [Query And Indexes](Concepts-Query-And-Indexes) before adding secondary, spatial, or plugin-backed indexes. ## Get Started: Run In Docker Rendered: https://docs.cefasdb.com/docs/Get-Started-Run-In-Docker Markdown: https://docs.cefasdb.com/wiki/Get-Started-Run-In-Docker.md # Get Started: Run In Docker The Docker path runs the `cefasdb` binary inside a minimal image. Use it when you want a disposable integration target or when a service needs a local database dependency without building Go locally. ## Pull the public image The server image is published to GHCR per release with the tags ``, `v`, and `latest`. ```sh docker pull ghcr.io/cefasdb/cefasdb:latest ``` Pin to a specific release for production: ```sh docker pull ghcr.io/cefasdb/cefasdb:0.8.5 ``` ## Build the image locally (alternative) If you need a custom build, the repository keeps its Dockerfile at `deploy/Dockerfile`: ```sh docker build -f deploy/Dockerfile -t cefasdb:local . ``` The build stage compiles `./cmd/cefasdb`. The runtime image runs as a non-root user and exposes HTTP and gRPC ports. ## Start a container ```sh docker volume create cefas-data docker run --rm --name cefasdb \ -p 8080:8080 \ -p 9090:9090 \ -v cefas-data:/var/lib/cefasdb \ ghcr.io/cefasdb/cefasdb:latest \ -data /var/lib/cefasdb \ -http :8080 \ -grpc :9090 \ -grpc-reflection ``` The important part is the volume mount. Without it, data is removed with the container. ## Connect with the CLI Install the CLI from npm on the host: ```sh npm install -g @cefasdb/cefas cefas --endpoint 127.0.0.1:9090 --insecure list-tables ``` Or, if you prefer not to install npm, build the CLI from source: ```sh git clone https://github.com/CefasDb/cefasdb-core cd cefasdb-core make cli # produces ./bin/cefas ./bin/cefas --endpoint 127.0.0.1:9090 --insecure list-tables ``` ## Health and logs The server logs to stdout and stderr. In Docker, read them with: ```sh docker logs -f cefas ``` Metrics are exposed on the HTTP listener unless disabled. If Prometheus is scraping the container directly, scrape the mapped HTTP port. ## When to use this mode Single-container Docker is suitable for demos, local development, integration tests, and small non-critical environments. It is not high availability and it has no replication: if the host or the volume is lost, the data is lost, and the per-call `CONSISTENCY_STRONG` knob does the same thing as `CONSISTENCY_EVENTUAL` because there is only one node to read from. For replicated behavior and meaningful strong reads, read [Run With Docker Compose](Get-Started-Run-With-Docker-Compose#consistency-model) or [Run In Kubernetes](Get-Started-Run-In-Kubernetes). ## Get Started: Run With Docker Compose Rendered: https://docs.cefasdb.com/docs/Get-Started-Run-With-Docker-Compose Markdown: https://docs.cefasdb.com/wiki/Get-Started-Run-With-Docker-Compose.md # Get Started: Run With Docker Compose Docker Compose is the smallest useful way to observe a replicated CefasDB deployment. It lets you run multiple `cefasdb` processes, persistent volumes, raft listeners, and a client-facing endpoint on a single machine. ## Start from the repository compose file The repository keeps a single-node observability Compose template at `deploy/docker-compose.yml`. ```sh docker compose -f deploy/docker-compose.yml up --build ``` For the three-node raft cluster used by local load tests, use `deploy/docker-compose.cluster.yml`: ```sh docker compose -p cefas-cluster -f deploy/docker-compose.cluster.yml down -v docker compose -p cefas-cluster -f deploy/docker-compose.cluster.yml up --build -d ``` Default host endpoints: ```text n1 HTTP: localhost:18081 gRPC: localhost:9191 n2 HTTP: localhost:18082 gRPC: localhost:9192 n3 HTTP: localhost:18083 gRPC: localhost:9193 ``` The exact service names may change as the chart evolves, but the topology is stable: each node needs its own data directory, HTTP listener, gRPC listener, raft identity, raft bind address, and peer list. ## Storage profiles for Compose The cluster Compose file supports storage profiles through `STORAGE_PROFILE`. Use the default `balanced` profile when Docker Desktop has its usual local memory limit, or when the goal is a stable developer cluster: ```sh docker compose -p cefas-cluster -f deploy/docker-compose.cluster.yml up --build -d ``` Use `write-heavy` when the Docker VM has enough memory allocated for larger Pebble caches and memtables: ```sh STORAGE_PROFILE=write-heavy \ docker compose -p cefas-cluster -f deploy/docker-compose.cluster.yml up --build -d ``` On a local Docker Desktop VM with about 7.65 GiB available, `write-heavy` can run the bulk test but may OOM a follower during raft snapshot pressure. With Docker Desktop increased to about 64 GiB, the same `write-heavy` workload completed the full 30-minute benchmark with all three nodes alive. Keep `balanced` as the portable default. Treat `write-heavy` as an opt-in performance profile for larger local machines or production-like test hosts. Host ports can be changed without editing the file: ```sh CEFAS_NODE1_GRPC_PORT=29491 \ CEFAS_NODE2_GRPC_PORT=29492 \ CEFAS_NODE3_GRPC_PORT=29493 \ docker compose -p cefas-cluster -f deploy/docker-compose.cluster.yml up --build -d ``` ## What Compose demonstrates Compose demonstrates three production concerns that a single process cannot: 1. **Membership**: every node has a stable raft ID. 2. **Replication**: writes go through raft before they are acknowledged. 3. **Failure behavior**: if the leader exits, a remaining quorum can elect a new leader. The same storage engine applies underneath raft. Raft changes when a write is acknowledged; it does not change table or item semantics. ## Client behavior The CLI talks to gRPC. If a node is not leader for a write path, the server returns `ErrNotLeader` plus the leader hint published by the cluster surface, and the client retries against the leader. ```sh cefas --endpoint 127.0.0.1:9090 --insecure cluster status cefas --endpoint 127.0.0.1:9090 --insecure list-tables ``` ## Consistency model With three voters, the cluster commits a write once two of them have the entry in their raft log (quorum = `floor(3/2)+1` = 2). The third node catches up via `AppendEntries`. The CLI does not wait on the slowest follower. Reads default to eventual: ```sh cefas --endpoint 127.0.0.1:9091 --insecure get-item \ --table-name Users --key '{"pk":{"S":"USER#1"},"sk":{"S":"PROFILE"}}' ``` The read above might land on a follower whose log is a few heartbeats behind the leader. For read-after-write correctness, opt into strong on that one call — the request will be routed to the leader and pass a raft barrier first: ```sh cefas --endpoint 127.0.0.1:9091 --insecure get-item \ --table-name Users --key '{"pk":{"S":"USER#1"},"sk":{"S":"PROFILE"}}' \ --consistency strong ``` When the leader is killed (see the Failover exercise below), the cluster pauses writes for at most a few hundred milliseconds while a new leader is elected. Reads keep returning eventual data from the survivors during the gap. Once two voters can talk to each other, writes resume. If you stop a second container while the first is still down, quorum is gone (only one voter alive). Writes start failing fast with `ErrNotLeader` / quorum-loss errors; eventual reads keep working off the lone survivor. See [Concepts: Deployment Modes](Concepts-Deployment-Modes#consistency-guarantees) for the per-mode comparison table and [Operations Configuration](Operations-Configuration#consistency-and-durability-tuning) for the tuning knobs. ## Failover exercise 1. Create a table and write an item. 2. Stop the leader container. 3. Wait for a new leader. 4. Read the item from another node. 5. Restart the old leader and verify it catches up. The point of the exercise is not just uptime. It proves the storage layer applies replicated batches through the raft FSM and that committed writes survive a process failure. ## When to move beyond Compose Compose is a lab, not an orchestrator. Use it to understand the topology and reproduce bugs. Use Kubernetes or another scheduler when you need node placement, volume lifecycle, service discovery, restart policy, and operational controls. ## Get Started: Run In Kubernetes Rendered: https://docs.cefasdb.com/docs/Get-Started-Run-In-Kubernetes Markdown: https://docs.cefasdb.com/wiki/Get-Started-Run-In-Kubernetes.md # Get Started: Run In Kubernetes The Kubernetes path packages CefasDB as a StatefulSet-oriented deployment with stable identity and persistent storage. Use it when you want CefasDB managed by the same control plane that manages the services using it. ## Chart location The Helm chart lives under: ```text dist/helm/cefas/ ``` Important files: | File | Purpose | | --- | --- | | `Chart.yaml` | Chart metadata. | | `values.yaml` | Default values for image, ports, storage, raft, and config. | | `templates/statefulset.yaml` | Server pods, volumes, and command flags. | | `templates/service.yaml` | Network identity for clients and peers. | | `templates/configmap.yaml` | Runtime configuration. | ## Install Clone the core repository and install the chart from its bundled path: ```sh git clone https://github.com/CefasDb/cefasdb-core cd cefasdb-core helm upgrade --install cefas ./dist/helm/cefas \ --namespace cefas \ --create-namespace ``` The chart `values.yaml` already pins `image.repository: ghcr.io/cefasdb/cefasdb`. For a local cluster such as `kind` or `minikube`, make sure a default StorageClass exists. For production, set storage class, requested size, resource limits, and image tag explicitly in a values file. ## Connect Install the CLI on your workstation: ```sh npm install -g @cefasdb/cefas ``` Port-forward the gRPC service for a quick test: ```sh kubectl -n cefas port-forward svc/cefas 9090:9090 cefas --endpoint 127.0.0.1:9090 --insecure list-tables ``` In a real deployment, services inside the cluster should connect through the Kubernetes Service DNS name instead of port-forwarding. ## Storage Kubernetes must provide persistent volumes for the data directory. Treat the volume as the durable database state for the pod. If a pod is rescheduled with its PVC intact, CefasDB can reopen the local Pebble store and raft state. A pod restart on its existing PVC is safe. Losing the PVC is equivalent to losing a voter from the raft cluster; the remaining replicas keep serving as long as a quorum is healthy. Writes pause when quorum is lost — see the per-mode breakdown in [Deployment Modes](Concepts-Deployment-Modes#consistency-guarantees) and the operational knob inventory in [Operations Configuration](Operations-Configuration#consistency-and-durability-tuning). ## Operational checklist - Pin the image tag. Do not run production on `latest`. - Set resource requests and limits based on workload. - Use persistent volumes with enough IOPS for write-heavy workloads. - Configure identity and TLS before exposing the gRPC listener beyond a trusted network. - Scrape metrics from the HTTP listener. - Test backup and restore before storing production data. ## Where to go next Read [Deployment Modes](Concepts-Deployment-Modes) for topology tradeoffs and [Operations Overview](Operations-Overview) for production runbooks. ## Concepts Overview Rendered: https://docs.cefasdb.com/docs/Concepts-Overview Markdown: https://docs.cefasdb.com/wiki/Concepts-Overview.md # Concepts Overview CefasDB is a high-performance NoSQL key-value and document database server. It accepts table, item, query, plugin, backup, and cluster operations over HTTP and gRPC; persists them in an embedded Pebble LSM tree; optionally replicates writes through raft; and exposes specialized query behavior through a plugin registry. The simplest useful sentence is this: CefasDB stores typed document items in partition-keyed tables, adds SQL and PartiQL query surfaces, and lets plugins supply indexes, distance functions, estimators, and audience workflows without coupling those plugins to engine internals. ## What this section covers [Architecture Overview](Concepts-Architecture-Overview) shows the full request path: CLI or SDK, gRPC handler, catalog, storage, planner, plugin registry, raft, metrics, and tracing. [Data Model](Concepts-Data-Model) explains tables, partition keys, sort keys, attribute maps, conditions, TTL, backups, and streams. [Storage And Replication](Concepts-Storage-And-Replication) explains how Pebble stores catalog and item keys, how write batches are committed, how raft wraps writes, and how multi-shard deployments distribute ownership. [Query And Indexes](Concepts-Query-And-Indexes) explains built-in secondary indexes, spatial indexes, plugin-backed indexes, distance operators, candidate sets, top-k search, explain plans, and query planning. [Deployment Modes](Concepts-Deployment-Modes) compares local, Docker, Compose, raft, multi-raft, and Kubernetes topologies. [Authentication And Authorization](Concepts-Authentication-And-Authorization) covers bearer-token validation, identity provider configuration, and per-operation scopes. ## What CefasDB is not CefasDB is not a broker. It does not model topics, consumer groups, or offsets. It is not an analytics warehouse. It is designed for operational workloads where an application needs low-latency primary-key access, conditional mutation, secondary lookup, spatial matching, or similarity search close to the write path. CefasDB is also not a plugin marketplace. Built-in plugins compile into the server. The plugin boundary exists to keep specialized logic out of the storage engine, not to load arbitrary untrusted code at runtime. ## The core vocabulary **Table** means a named collection of items with a key schema. **Item** means a map of attribute names to typed attribute values. **Partition key** means the required key component used for identity and distribution. **Sort key** means the optional second key component used for ordered ranges within a partition. **Index** means either a built-in GSI/LSI/spatial index or a plugin-backed index descriptor. **Plugin** means a Go implementation of an index, distance, estimator, or audience interface registered in-process. **Raft** means the optional consensus layer used to replicate write batches. **Shard** means a partition of the keyspace owned by a manager and optionally replicated by its own raft group. ## Concepts: Architecture Overview Rendered: https://docs.cefasdb.com/docs/Concepts-Architecture-Overview Markdown: https://docs.cefasdb.com/wiki/Concepts-Architecture-Overview.md # Concepts: Architecture Overview CefasDB is structured as a set of narrow layers. The transport layer accepts requests. The catalog describes tables. The storage layer commits items and indexes. The query planner chooses operators. The plugin registry supplies specialized behavior. Raft and multi-raft wrap the write path when replication is enabled. ```mermaid flowchart LR subgraph Clients CLI[cefas CLI] SDK[Go client] HTTP[HTTP clients] end subgraph Server[cefasdb] API[pkg/api gRPC and HTTP] Catalog[internal/catalog] Storage[internal/storage Pebble] Planner[internal/core/query] Registry[pkg/plugin registry] Raft[internal/replication] Cluster[internal/cluster] Metrics[internal/metrics] Tracing[internal/tracing] end subgraph Plugins Index[Index plugins] Distance[Distance plugins] Estimator[Estimator plugins] Audience[Audience plugin] end CLI --> SDK --> API HTTP --> API API --> Catalog API --> Storage API --> Planner API --> Registry Storage --> Raft Storage --> Cluster Planner --> Registry Registry --> Index Registry --> Distance Registry --> Estimator Registry --> Audience API --> Metrics API --> Tracing ``` ## Write lifecycle ```mermaid sequenceDiagram participant Client participant API as gRPC/HTTP handler participant Catalog participant Storage participant Raft participant Plugins Client->>API: PutItem / UpdateItem / DeleteItem API->>Catalog: load table descriptor API->>Storage: validate key, condition, and mutation Storage->>Raft: replicate batch when raft is attached Raft-->>Storage: majority-applied batch Storage->>Plugins: update index hooks API-->>Client: response ``` The important invariant is that the table mutation and built-in index mutation are part of the same storage batch. A committed write cannot update the primary record without updating the built-in indexes that describe it. ## Read lifecycle ```mermaid sequenceDiagram participant Client participant API participant Planner participant Registry participant Index participant Storage Client->>API: Query / Scan / ExecuteStatement / TopK API->>Planner: parse and plan predicate Planner->>Registry: resolve plugin operators Planner->>Index: candidate set when available Planner->>Storage: primary or index-backed reads API-->>Client: rows or streamed items ``` Reads prefer the cheapest available route. A primary-key lookup is direct. A partition query is range-oriented. A secondary index query follows pointers. A plugin-backed query can ask a plugin for candidates before applying exact filters. ## Consistency model Each call carries a single consistency knob, the `Consistency` enum from `cefas.v1.Cefas`: - `CONSISTENCY_EVENTUAL` — the read is served locally from whichever node received the call. Cheapest path on the diagrams above: API → Planner → Storage on the same node. May trail the leader by a few raft heartbeats. - `CONSISTENCY_STRONG` — the read is routed to the shard leader, which applies a raft barrier before answering. On the diagrams that adds one hop (`Client → API → Leader API → Planner → Storage`) and a small linearization wait. The result reflects every previously acknowledged write on that shard. Writes never use this knob; they always travel to the shard leader, get replicated to a quorum of voters, and are acknowledged only after commit. A write that arrives at a follower returns `client.ErrNotLeader` (Go sentinel), and the client transparently retries via the `-raft-http-peers` redirect map. The choice is per call, not per session. A high-throughput cohort scan can stay eventual; the credit-check that follows can opt into `Strong()`. See [Storage and Replication](Concepts-Storage-And-Replication#consistency-model) for the replication path and [Interfaces HTTP and gRPC](Interfaces-HTTP-And-GRPC#consistency-options) for the wire shape. ## Why the layers are separate The storage engine should not know how Levenshtein distance works. The audience plugin should not know how Pebble encodes primary keys. The HTTP API should not know the in-memory representation of a trigram index. That separation is the point of `internal/core` and `pkg/plugin`. Import-graph tests enforce the boundary. If plugin code imports engine internals, tests fail. If core code imports concrete plugins, tests fail. This makes it possible to add search and audience behavior without turning the database kernel into a pile of feature-specific branches. ## Concepts: Data Model Rendered: https://docs.cefasdb.com/docs/Concepts-Data-Model Markdown: https://docs.cefasdb.com/wiki/Concepts-Data-Model.md # Concepts: Data Model CefasDB models operational data as tables containing typed document items. Each table has a key schema, optional indexes, optional TTL configuration, and optional plugin-backed descriptors. ## Tables A table descriptor includes: - Table name. - Partition key name. - Optional sort key name. - Global secondary index descriptors. - Local secondary index descriptors. - Spatial index descriptors. - TTL configuration. - Plugin-backed index descriptors. The table descriptor is persisted in the catalog. API handlers load the descriptor before validating keys, conditions, index routing, or TTL behavior. ## Items An item is a map from attribute name to typed value. The supported attribute family is: | Type | Meaning | | --- | --- | | `S` | String | | `N` | Number encoded as a string | | `B` | Binary | | `BOOL` | Boolean | | `NULL` | Null marker | | `SS`, `NS`, `BS` | String, number, and binary sets | | `L` | List | | `M` | Map | The CLI accepts JSON in this shape. The Go SDK uses generated protobuf types and helper codecs. ## Primary key Every table has a partition key. A table may also have a sort key. The pair identifies a single item. Good partition keys distribute writes across the keyspace and support the most common lookup path. Good sort keys encode range semantics: timestamp, version, event ID, account-local sequence, or another ordered value. ## Conditional writes Conditional writes evaluate a predicate against the existing item before applying a mutation. They are used for optimistic concurrency, insert-if-absent, compare-and-set, and safe deletes. Examples: ```sql attribute_not_exists(pk) version = :expected status IN ('pending', 'active') ``` The condition evaluator lives under `internal/core/condition` and the storage layer applies it before committing the batch. Under raft, the condition is evaluated on the shard leader as part of the write batch and the batch is replicated to a quorum before acknowledgement. Conditional puts on the same partition key are therefore linearizable: two clients racing on `attribute_not_exists(pk)` will see one succeed and the other receive a condition-failure error, regardless of where they connected. See [Storage and Replication](Concepts-Storage-And-Replication#consistency-model). ## TTL TTL lets a table nominate an attribute containing an expiration timestamp. The engine indexes TTL buckets and a reaper can remove expired items without scanning the full table. TTL is not a hard real-time deadline; it is a cleanup contract. Use TTL for session records, temporary campaign state, dedup windows, and short-lived operational records. ## Streams The stream abstraction under `internal/core/stream` represents change events. It is the seam for change-data-capture behavior and for plugin/index hooks that need to observe mutations. ## Backups Backups are named checkpoints. They can cover one or more tables and are tracked under the admin backup namespace. Restore can recreate a table from a backup into a target table name. ## Concepts: Storage And Replication Rendered: https://docs.cefasdb.com/docs/Concepts-Storage-And-Replication Markdown: https://docs.cefasdb.com/wiki/Concepts-Storage-And-Replication.md # Concepts: Storage And Replication CefasDB uses Pebble as its embedded storage engine. Pebble is an LSM-tree, so write-heavy workloads are committed as ordered key-value batches and compacted in the background. ## Namespaces The storage layer uses prefixes to separate logical data: | Prefix | Contents | | --- | --- | | `cefas/catalog/` | Table descriptor JSON. | | `cefas/data/
/...` | Primary item records. | | `cefas/gsi/
//...` | Global secondary index pointers. | | `cefas/lsi/
//...` | Local secondary index pointers. | | `cefas/spatial/
//...` | Geohash and Z-order index pointers. | | `cefas/ttl/
//...` | TTL bucket entries. | | `cefas/admin/backups/` | Backup metadata. | Plugin-backed indexes own their internal format. Built-in v1 plugins mostly keep state in memory, with persistence seams documented in the plugin pages. ## Write batches A write batch groups all changes for one operation. For a `PutItem`, the batch can contain: - Primary item write. - GSI pointer updates. - LSI pointer updates. - Spatial pointer updates. - TTL bucket updates. - Backup or stream metadata as needed. The batch is the atomic unit. Either every key in the batch becomes visible or none of them do. ## Group commit Single-node mode commits through the storage layer's group-commit path. Group commit amortizes sync and write overhead across concurrent producers. For latency-sensitive workloads, fsync behavior is configurable. ## Raft replication When raft is attached, the storage write batch is replicated before it is applied. A majority of voters must agree on the log entry. The FSM then applies the batch to Pebble. This changes durability and availability. A single-node write is durable to one disk. A raft write is durable to the majority of raft members. ## Consistency model Replication and read consistency are decoupled. Raft delivers a single, ordered commit history per shard; the read path picks how visible those commits must be at the moment of the call. Per-call consistency is a single enum on `GetItemRequest`, `QueryRequest`, and `ScanRequest`: | Option | Where to set it | Behaviour | | --- | --- | --- | | `CONSISTENCY_EVENTUAL` (default) | Omit on the request, or `CONSISTENCY_UNSPECIFIED`. Go client: default `GetOptions` / `ScanOptions`. | Local read on whichever node received the call. May trail the leader by a few raft heartbeats. | | `CONSISTENCY_STRONG` | Set on the gRPC enum. Go client: `client.GetOptions{Strong: true}`, `client.ScanOptions{Strong: true}`, or `QueryBuilder.Strong()`. | Routed to the shard leader. The leader applies a read barrier so the call sees every previously acknowledged write on that shard. | Writes always go through the shard leader and are acknowledged only after a quorum of voters has the entry in its raft log. A write against a follower returns `client.ErrNotLeader`; the client redirects using the `-raft-http-peers` map. `fsync` is the durability lever. With `-fsync=false` (default), the WAL flushes asynchronously and group commit batches multiple writes per disk sync — higher throughput, a small window of crash-loss on the leader before the entry hits stable storage. With `-fsync=true`, every commit hits disk before acknowledgement; throughput drops but a crashed leader has nothing in flight. See [Operations Configuration](Operations-Configuration#consistency-and-durability-tuning) for the full list of consistency and durability knobs, including the raft timeout block (heartbeat, election, leader lease, commit, snapshot threshold) and the Pebble storage profile selectors. ## Multi-shard mode Multi-shard mode partitions tables by key hash. Each shard can have its own raft group. This keeps per-shard ordering and replication while allowing more parallelism across partitions. The important operational detail is that data placement follows the partition key. If the partition key is skewed, shard load is skewed. ## Backups and restore Backups use Pebble checkpoint semantics and CefasDB metadata to create named recovery points. Restore reads the backup and writes a new target table, preserving the source data without overwriting the original table by default. ## Concepts: Query And Indexes Rendered: https://docs.cefasdb.com/docs/Concepts-Query-And-Indexes Markdown: https://docs.cefasdb.com/wiki/Concepts-Query-And-Indexes.md # Concepts: Query And Indexes CefasDB has three query layers: 1. Primary-key and range access. 2. Built-in secondary and spatial indexes. 3. Plugin-backed candidate generation, distance operators, estimators, and top-k ranking. ## Primary access Primary access is the cheapest path. `GetItem` resolves one key. `Query` can read a partition and optionally filter within it. Data modeling should start here: choose a partition key and sort key that make the most common read path direct. ## Built-in indexes Global secondary indexes and local secondary indexes persist pointers alongside primary writes. They are part of the same write batch as the item mutation, so an acknowledged write and its built-in index entries move together. Spatial indexes support geohash and Z-order access patterns for location-aware data. They narrow a search to cells or ranges before exact filtering. ## Plugin-backed indexes Plugin-backed indexes are descriptors that route candidate generation to a registered plugin. Examples: - `trigram` for fuzzy text candidate sets. - `minhash` for set similarity. - `simhash` for near duplicate detection. - `vectorlsh` for approximate nearest neighbors. - `geohash` for spatial candidates. - `bloom`, `cbloom`, and `cuckoo` for membership tests. The query planner can combine candidate generation with exact distance evaluation. ## Distance operators Distance operators return a scalar where smaller is closer. That convention makes predicates consistent: ```sql levenshtein(name, 'habibs') <= 2 cosine(embedding, :query) <= 0.25 haversine(loc, :center) <= 1500 ``` Distance operators are plugins. They do not own storage. They evaluate typed values and can be paired with an index plugin that narrows the candidate set. ## Top-k Top-k search ranks candidates by a distance expression: ```sh cefas top-k \ --table Documents \ --by "cosine(embedding, :query)" \ --k 20 \ --query '{"L":[{"N":"0.1"},{"N":"0.2"}]}' ``` The best top-k plans use an index plugin to avoid scanning the full table, then apply the exact distance operator to rank survivors. ## Explain `explain` prints the plan tree: ```sh cefas explain --table Merchants --where "levenshtein(name, 'habibs') <= 2" ``` Use explain before adding an index, after adding an index, and after rebuilding an index. It is the fastest way to verify whether the planner can use the path you expect. ## Concepts: Deployment Modes Rendered: https://docs.cefasdb.com/docs/Concepts-Deployment-Modes Markdown: https://docs.cefasdb.com/wiki/Concepts-Deployment-Modes.md # Concepts: Deployment Modes CefasDB can run in several topologies. The API is stable across them; the difference is durability, failure behavior, and operational complexity. ## Local binary Run `cefasdb` directly on a developer machine. This is best for development and debugging because logs, flags, data files, and binaries are all local. Use this mode when: - You are learning CefasDB. - You are debugging a CLI or SDK workflow. - You want a disposable local database. Do not use it when a process or machine failure must be transparent. ## Single Docker container Docker packages the server and its runtime environment. It is useful for integration tests and demos. Use this mode when: - You want repeatable local setup. - Another service needs CefasDB as a local dependency. - You want to test image packaging. Mount the data directory on a volume if the data should survive container removal. ## Raft cluster Raft replicates writes across members and acknowledges only after a majority has the entry. This is the first high-availability topology. Use this mode when: - One process or host can fail without losing acknowledged writes. - Operators can provide stable node IDs and network addresses. - Write latency can include raft coordination. ## Multi-raft sharding Multi-raft partitions the keyspace across independent raft groups. It is intended for scale-out write throughput and failure-domain isolation. Use this mode when: - One raft group is not enough throughput. - Partition-key distribution is understood. - Operational automation can manage multiple shard groups. ## Kubernetes Kubernetes wraps the server in StatefulSets, Services, ConfigMaps, Secrets, and PersistentVolumeClaims. Use this mode when: - You already operate workloads on Kubernetes. - You need managed restarts and declarative config. - You can provision persistent storage and monitor the pods. ## Choosing a mode Start with the simplest mode that proves the workload. Move to Docker when packaging matters. Move to raft when availability matters. Move to Kubernetes when operations and scheduling matter. Move to multi-raft when throughput and data placement matter. ## Consistency guarantees Every mode exposes the same `CONSISTENCY_EVENTUAL` / `CONSISTENCY_STRONG` knob per call. What changes between modes is the underlying replication shape — and therefore what each guarantee means under failure. | Mode | Quorum size | Write guarantee on success | Read with `Strong` | Behaviour under leader loss | | --- | --- | --- | --- | --- | | Local binary | n/a (1 node) | Write reached the local Pebble batch. With `-fsync=true`, also durable on disk. | Local read; same answer as eventual. | Process death = downtime. Data survives if the data directory is intact. | | Single Docker | n/a (1 container) | Same as local binary. | Same as local binary. | Volume loss = data loss. | | Single-shard raft | majority of voters (e.g. 2 of 3) | Replicated to a quorum of raft followers. | Routed to the leader, reads after a barrier — sees every acknowledged write. | New leader elected after `ElectionMS`; writes pause during the gap. Loss of quorum (e.g. 2 of 3 unreachable) means no new writes until quorum returns. | | Multi-raft sharding | majority **per shard** | Same as single-shard raft, but scoped to the partitioning shard. | Same, leader is per-shard. | Loss only impacts shards whose quorum is unhealthy; other partitions keep serving. | | Kubernetes (StatefulSet + raft) | majority of replicas | Same as raft. PVC tied to pod identity. | Same as raft. | Pod restart preserves data via PVC; pod loss without PVC = node loss for that voter. | Two practical consequences: 1. **Strong reads cost a leader hop.** In any raft-backed mode, `CONSISTENCY_STRONG` adds round-trip + barrier latency. Use it where read-after-write correctness matters; leave eventual for high-volume scans and cohort scoring. 2. **Quorum loss is a write halt, not corruption.** When a shard cannot reach quorum, the leader rejects writes (`client.ErrNotLeader` or transport error). Reads on followers still return eventual data. Restoring a node — or evicting one via `cefas remove-server` — re-forms quorum. See [Storage and Replication](Concepts-Storage-And-Replication#consistency-model) for the replication path and [Operations Configuration](Operations-Configuration#consistency-and-durability-tuning) for the tuning knobs. ## Concepts: Authentication And Authorization Rendered: https://docs.cefasdb.com/docs/Concepts-Authentication-And-Authorization Markdown: https://docs.cefasdb.com/wiki/Concepts-Authentication-And-Authorization.md # Concepts: Authentication And Authorization CefasDB can run open in a trusted development environment or validate bearer tokens against an identity provider. Production deployments should enable token validation and configure per-operation authorization scopes. ## Identity provider configuration The server exposes flags for identity configuration: - JWKS URL. - Expected issuer. - Expected audience. - Allowed clock skew. When JWKS configuration is empty, the server can run in open development mode. When configured, requests must carry a bearer token that validates against the issuer and audience. ## Scope model Scopes are operation and resource oriented. A caller can be allowed to read one table, write another, manage plugins, or perform admin operations depending on token claims. Examples of scope shapes: ```text cefas:item:read:
cefas:item:write:
cefas:table:admin:
cefas:cluster:admin ``` The exact scope checks live in `internal/auth` and API handlers. ## CLI authentication The CLI can receive a token directly, from a token file, from environment, or from a profile config. Common flags: ```sh cefas --token "$TOKEN" ... cefas --token-file ./token.txt ... cefas --profile prod ... ``` ## Transport security For local examples, `--insecure` means plaintext gRPC. Production deployments should use TLS, configure a CA bundle where needed, and restrict network access to the gRPC and HTTP listeners. ## Operational posture Do not expose an open CefasDB server to untrusted networks. Token validation and TLS should be part of the deployment baseline, not a later hardening pass. ## Plugins Overview Rendered: https://docs.cefasdb.com/docs/Plugins-Overview Markdown: https://docs.cefasdb.com/wiki/Plugins-Overview.md # Plugins Overview CefasDB plugins are in-process Go implementations registered against `plugin.Default`. They provide specialized behavior without coupling that behavior to the storage engine. ## Plugin kinds | Kind | Used for | | --- | --- | | Index | Candidate generation, membership tests, search indexes. | | Distance | Scalar similarity or distance evaluation. | | Estimator | Approximate aggregates such as cardinality or frequency. | | Audience | Composite ads workflows such as geo select, dedup, frequency cap, and privacy aggregation. | ## Why plugins exist Search and similarity features evolve faster than the database kernel. A geohash selector, a trigram inverted index, and a Count-Min Sketch have different state, configuration, and evaluation behavior. Putting all of that directly into storage would make the core hard to reason about. The plugin boundary keeps the kernel narrow. Core code defines stable interfaces and data structures. Plugins implement those interfaces. ## Built-in families Membership and approximate set plugins: - `bloom` - `cbloom` - `cuckoo` - `hll` - `cms` Search and similarity indexes: - `radix` - `trigram` - `minhash` - `simhash` - `vectorlsh` - `geohash` - `roaring` Distance operators: - `hamming` - `levenshtein` - `damerau` - `jaro_winkler` - `jaccard` - `cosine` - `euclidean` - `manhattan` - `haversine` Audience workflows: - Geo radius selection. - Approximate reach estimation. - Dedup with TTL. - Sliding-window frequency cap. - Privacy-aware aggregation. - Composite eligibility. ## Reading order Read [Core Boundaries](Plugins-Core-Boundaries) first if you will change code. Read [Index Examples](Plugins-Index-Examples) if you will use plugins from the CLI. Read [Plugin Authoring](Plugins-Authoring) if you will add a new plugin. ## Plugins: Core Boundaries Rendered: https://docs.cefasdb.com/docs/Plugins-Core-Boundaries Markdown: https://docs.cefasdb.com/wiki/Plugins-Core-Boundaries.md # Plugins: Core Boundaries CefasDB keeps plugin code away from engine internals. This is enforced by tests, not convention. ```mermaid flowchart LR Server[pkg/api and internal packages] --> Core[internal/core] Server --> Plugin[pkg/plugin] Plugin --> Core Core -. forbidden .-> Plugin Core -. forbidden .-> Server Plugin -. forbidden .-> Server ``` ## Core packages Core packages define stable concepts: | Concept | Package | | --- | --- | | Model aliases and item types | `internal/core/model` | | Conditions | `internal/core/condition` | | TTL service | `internal/core/ttl` | | Change streams | `internal/core/stream` | | Index lifecycle | `internal/core/index` | | Query planner and top-k | `internal/core/query` | Core code does not import concrete plugin packages. ## Plugin packages Plugin packages live under `pkg/plugin/`. They can depend on core packages and shared helpers under `pkg/plugin/internal`, but they must not import `internal/storage`, `pkg/api`, `pkg/client`, or the SQL executor directly. The server wires built-in plugins through blank imports in `pkg/plugin/builtins`. ## Boundary tests Run: ```sh go test ./internal/core/... -run CoreHasNoEngineImports go test ./pkg/plugin/... -run PluginHasNoEngineImports ``` These tests parse imports and fail if a package crosses the boundary. That makes the architecture reviewable in CI. ## Practical rule If a plugin needs a capability from the engine, do not import the engine. Add a small interface or data type to `internal/core`, make the engine implement it, and keep the plugin dependent only on that core contract. ## Plugins: Authoring Rendered: https://docs.cefasdb.com/docs/Plugins-Authoring Markdown: https://docs.cefasdb.com/wiki/Plugins-Authoring.md # Plugins: Authoring A CefasDB plugin is a Go type that implements one of the plugin contracts and registers itself during package initialization. ## Pick the plugin kind | Kind | Use it when | | --- | --- | | Index | You maintain searchable state and return candidate item IDs. | | Distance | You evaluate two typed values and return a numeric distance. | | Estimator | You observe values and return approximate aggregate answers. | | Audience | You compose selection, reach, dedup, frequency cap, and aggregation workflows. | ## Minimal distance plugin ```go package mydistance import ( "github.com/CefasDB/cefasdb-core/internal/core/model" "github.com/CefasDB/cefasdb-core/pkg/plugin" ) type Op struct{} func (Op) Manifest() plugin.Manifest { return plugin.Manifest{ Name: "mydistance", Kind: plugin.KindDistance, Version: "1", Description: "example distance operator", } } func (Op) Name() string { return "mydistance" } func (Op) Supports(a, b model.AttrType) bool { return a == model.AttrS && b == model.AttrS } func (Op) Eval(a, b model.AttributeValue) (float64, error) { return 0, nil } func init() { plugin.Default.MustRegister(Op{}) } ``` ## Index configuration Index plugins receive an opaque JSON config. Decode it into a typed config and validate it before building state. ```go type Config struct { Field string `json:"field"` K int `json:"k,omitempty"` } ``` Every index plugin should reject missing required fields with a clear error. Defaults should be explicit and tested. ## Per-index state Index plugins usually maintain state per `(table, indexName)` descriptor. Use a stable key such as `table + "/" + name` and guard state with a mutex or another concurrency primitive. ## Tests Use `pkg/plugin/testharness` instead of booting a full server. A plugin unit test should seed model items, build the plugin state, and assert query or estimate behavior directly. ## Wire the plugin into the server Add a blank import in `pkg/plugin/builtins/builtins.go`. ```go import ( _ "github.com/CefasDB/cefasdb-core/pkg/plugin/mydistance" ) ``` After rebuilding `cefasdb`, the CLI can show it: ```sh cefas list-plugins cefas describe-plugin --name mydistance ``` ## Boundary checklist - Do not import `internal/*`. - Do not import API handlers. - Do not import the SQL executor. - Put shared types in `internal/core`. - Add unit tests for manifest validation, configuration, and core behavior. ## Plugins: Index Examples Rendered: https://docs.cefasdb.com/docs/Plugins-Index-Examples Markdown: https://docs.cefasdb.com/wiki/Plugins-Index-Examples.md # Plugins: Index Examples This page shows the built-in index and estimator plugins with CLI examples. Assume `cefasdb` is running and the CLI can reach it. ## Bloom filter Use `bloom` for membership checks where false positives are acceptable and deletes are not required. ```sh cefas create-index \ --table Users \ --name email_bloom \ --type bloom \ --config '{"field":"email","m":16384,"k":6}' ``` ## Counting Bloom filter Use `cbloom` when membership checks need delete support. ```sh cefas create-index \ --table Sessions \ --name session_cbloom \ --type cbloom \ --config '{"field":"session_id","m":4096,"k":5,"width":4}' ``` ## Cuckoo filter Use `cuckoo` for membership checks with deletes and compact fingerprints. ```sh cefas create-index \ --table Orders \ --name order_cuckoo \ --type cuckoo \ --config '{"field":"order_id","buckets":2048,"fingerprint_bits":12}' ``` ## Roaring bitmap Use `roaring` for cohorts over numeric or stable integer-like identifiers. ```sh cefas cohort create \ --table Users \ --cohort high_value \ --field user_id \ --where "spend >= :floor" \ --binds '{":floor":{"N":"1000"}}' ``` ## HyperLogLog Use `hll` for approximate distinct counts. ```sh cefas cohort estimate --table Events --field user_id ``` ## Count-Min Sketch Use `cms` for approximate frequency estimates. It is useful when exact counters are too expensive or too large. ## Radix Use `radix` for prefix search and autocomplete-style access. ```sh cefas create-index \ --table Cities \ --name name_prefix \ --type radix \ --config '{"field":"name"}' ``` ## Trigram Use `trigram` for fuzzy text candidate generation. ```sh cefas create-index \ --table Merchants \ --name merchant_name_trigram \ --type trigram \ --field name cefas query \ --table-name Merchants \ --where "levenshtein(name, 'habibs') <= 2" ``` ## MinHash Use `minhash` for set similarity. ```sh cefas create-index \ --table Users \ --name tag_sim \ --type minhash \ --config '{"field":"tags","k":128,"r":8}' ``` ## SimHash Use `simhash` for near-duplicate text or document detection. ```sh cefas create-index \ --table Docs \ --name dedupe \ --type simhash \ --config '{"field":"body","prefix_bits":16,"max_radius":3}' ``` ## Vector LSH Use `vectorlsh` for approximate vector candidate generation. ```sh cefas create-index \ --table Documents \ --name emb_lsh \ --type vectorlsh \ --config '{"field":"embedding","dim":768,"sketches":8,"bits_per_sketch":12}' ``` ## Geohash Use `geohash` for spatial candidate generation. ```sh cefas create-index \ --table Stores \ --name loc_geo \ --type geohash \ --config '{"field":"loc","precision":7}' ``` ## Plugins: Distance Operators Rendered: https://docs.cefasdb.com/docs/Plugins-Distance-Operators Markdown: https://docs.cefasdb.com/wiki/Plugins-Distance-Operators.md # Plugins: Distance Operators Distance plugins return a scalar where smaller means closer. This lets every operator fit the same predicate shape: ```sql operator(left, right) <= threshold ``` ## Operator table | Operator | Inputs | Typical use | | --- | --- | --- | | `hamming` | Equal-length strings or binary | SimHash post-filtering. | | `levenshtein` | String vs string | Fuzzy text matching. | | `damerau` | String vs string | Fuzzy text with adjacent transpositions. | | `jaro_winkler` | String vs string | Names and short labels. | | `jaccard` | Sets or shingled strings | Tag or set similarity. | | `cosine` | Numeric vectors | Embeddings. | | `euclidean` | Numeric vectors | Spatial or vector distance. | | `manhattan` | Numeric vectors | L1 vector distance. | | `haversine` | `{lat, lon}` maps | Earth distance in meters. | ## Query examples ```sh cefas query \ --table-name Merchants \ --where "levenshtein(name, 'habibs') <= 2" ``` ```sh cefas top-k \ --table Documents \ --by "cosine(embedding, :query)" \ --k 20 \ --query '{"L":[{"N":"0.1"},{"N":"0.2"},{"N":"0.3"}]}' ``` ```sh cefas geo audience \ --table Stores \ --center "-23.9608,-46.3336" \ --radius 1500m ``` ## Choosing an operator Use edit distance for strings where typos matter. Use Jaro-Winkler for names. Use Jaccard for sets. Use cosine for normalized embeddings. Use Haversine for latitude and longitude. At scale, pair distance operators with an index plugin. A trigram index can narrow candidates before Levenshtein. Vector LSH can narrow candidates before cosine. Geohash can narrow candidates before Haversine. ## Plugins: Audience Workflows Rendered: https://docs.cefasdb.com/docs/Plugins-Audience-Workflows Markdown: https://docs.cefasdb.com/wiki/Plugins-Audience-Workflows.md # Plugins: Audience Workflows The audience plugin composes geo selection, approximate reach, deduplication, frequency capping, eligibility checks, and privacy-aware aggregation. It is designed for campaign and audience workflows where raw identity should stay server-side. ## Setup Create a table with store locations: ```sh cefas create-table \ --table-name Stores \ --attribute-definitions AttributeName=id,AttributeType=S \ --key-schema AttributeName=id,KeyType=HASH \ --billing-mode PAY_PER_REQUEST ``` Seed items with a location map: ```sh cefas put-item --table-name Stores --item '{ "id":{"S":"s1"}, "loc":{"M":{"lat":{"N":"-23.5510"},"lon":{"N":"-46.6340"}}} }' ``` Create a geohash index: ```sh cefas create-index \ --table Stores \ --name loc_geo \ --type geohash \ --config '{"field":"loc","precision":7}' ``` ## Select an audience ```sh cefas geo audience \ --table Stores \ --index loc_geo \ --center "-23.5505,-46.6333" \ --radius 2000m ``` The geohash plugin returns candidates from the center cell and neighboring cells. Haversine removes false positives at cell boundaries. ## Estimate reach ```sh cefas cohort estimate \ --table Stores \ --field id ``` HyperLogLog estimates distinct reach without returning member identity. ## Dedup ```sh cefas dedup put \ --scope campaign-123 \ --key USER#1 \ --ttl 168h ``` The response is a boolean verdict. The stored dedup key does not round-trip to the caller as a list. ## Frequency cap ```sh cefas freqcap check \ --scope merchant-456 \ --key USER#1 \ --limit 3 \ --window 168h ``` The plugin increments and checks the sliding window server-side. ## Privacy-aware aggregation ```sh cefas aggregate \ --table CampaignEvents \ --group-by campaign_id,geohash5 \ --metrics impressions,clicks,redemptions \ --min-group-size 100 ``` If any group is below the privacy floor, the operation fails instead of returning a partial small group. ## Interfaces Overview Rendered: https://docs.cefasdb.com/docs/Interfaces-Overview Markdown: https://docs.cefasdb.com/wiki/Interfaces-Overview.md # Interfaces Overview CefasDB exposes four main interfaces: - CLI for operators and scripts. - HTTP/JSON for simple clients and compatibility. - gRPC for typed clients and streaming. - SQL and PartiQL for query-oriented access. The interfaces are different entry points into the same engine. A table created through the CLI can be queried through HTTP, read through gRPC, and inspected through SQL. ## CLI The CLI binary is `cefas`. It groups the operational surface into table, item, query, plugin, and cluster commands: ```sh cefas create-table ... cefas put-item ... cefas query ... cefas execute-statement ... ``` The CLI also exposes CefasDB-specific operations for plugins, cohorts, top-k, audience selection, backups, and cluster membership. ## HTTP The HTTP API is useful for curl, simple integrations, and diagnostics. It exposes table, item, and query routes over JSON. ## gRPC The gRPC API is the primary typed transport. The protobuf definition lives at `pkg/protocol/cefas.proto`, and generated Go code lives beside it. ## SQL and PartiQL The SQL layer parses and plans a useful subset of `SELECT`, `INSERT`, `UPDATE`, `DELETE`, conditions, scalar functions, and spatial/similarity predicates. PartiQL-style commands are exposed through `execute-statement`. ## Go client The Go client package lives under `pkg/client`. It wraps gRPC calls and encodes the typed item model. ## Interfaces: CLI Rendered: https://docs.cefasdb.com/docs/Interfaces-CLI Markdown: https://docs.cefasdb.com/wiki/Interfaces-CLI.md # Interfaces: CLI The `cefas` CLI is the operational surface for local development, scripting, plugin inspection, backup and restore, and cluster administration. Global flags: | Flag | Purpose | | --- | --- | | `--config` | Config file path. | | `--profile` | Named profile. | | `--endpoint` | gRPC endpoint host:port. | | `--token` | Bearer token. | | `--token-file` | File containing bearer token. | | `--ca` | TLS CA bundle. | | `--insecure` | Use plaintext gRPC. | | `--output` | `json`, `table`, or `text`. | | `--timeout` | Per-call timeout. | ## Table management ```sh cefas list-tables cefas describe-table --table-name Users cefas create-table --table-name Users ... cefas delete-table --table-name Users cefas update-time-to-live --table-name Users ... cefas describe-time-to-live --table-name Users ``` ## Item operations ```sh cefas put-item --table-name Users --item '{...}' cefas get-item --table-name Users --key '{...}' cefas update-item --table-name Users --key '{...}' --update-expression "SET #n = :v" cefas delete-item --table-name Users --key '{...}' ``` ## Query operations ```sh cefas query --table-name Users --where "pk = 'USER#1'" cefas scan --table-name Users cefas execute-statement --statement "SELECT * FROM Users WHERE pk = 'USER#1'" ``` ## Batch and transaction operations ```sh cefas batch-get-item --request-items '{...}' cefas batch-write-item --request-items '{...}' cefas transact-get-items --transact-items '[...]' cefas transact-write-items --transact-items '[...]' ``` ## Plugin and query planning operations ```sh cefas list-plugins cefas describe-plugin --name trigram cefas create-index --table Merchants --name merchant_name_trigram --type trigram --field name cefas explain --table Merchants --where "levenshtein(name, 'habibs') <= 2" cefas top-k --table Documents --by "cosine(embedding, :query)" --k 20 --query '{...}' ``` ## Audience operations ```sh cefas geo audience --table Stores --center "-23.9608,-46.3336" --radius 1500m cefas dedup put --scope campaign-123 --key USER#1 --ttl 168h cefas freqcap check --scope merchant-456 --key USER#1 --limit 3 --window 168h cefas aggregate --table CampaignEvents --group-by campaign_id --metrics impressions --min-group-size 100 ``` ## Backup and cluster operations ```sh cefas create-backup --backup-name nightly --table-name Users cefas list-backups cefas restore-table-from-backup --backup-name nightly --source-table-name Users --target-table-name Users_restored cefas cluster status cefas cluster add-voter --id node-b --addr 10.0.0.2:9001 cefas cluster remove-server --id node-b ``` ## Interfaces: SQL And PartiQL Rendered: https://docs.cefasdb.com/docs/Interfaces-SQL-And-PartiQL Markdown: https://docs.cefasdb.com/wiki/Interfaces-SQL-And-PartiQL.md # Interfaces: SQL And PartiQL CefasDB includes a SQL parser, planner, and executor for operational queries. It is not intended to be a full relational database. It is a pragmatic query surface over the item model and index system. ## Supported statement families - `SELECT` - `INSERT` - `UPDATE` - `DELETE` - `RETURNING` on supported mutations - PartiQL-style execution through `execute-statement` ## Predicates Predicates can include key conditions, scalar comparisons, boolean composition, and supported functions. Examples: ```sql SELECT * FROM Users WHERE pk = 'USER#1' SELECT * FROM Merchants WHERE levenshtein(name, 'habibs') <= 2 SELECT * FROM Stores WHERE haversine(loc, :center) <= 1500 ``` ## Update expressions The CLI update path can be translated into CefasDB SQL update behavior. This keeps compatibility with expression-based mutation scripts while letting the engine use one mutation path internally. ## Parameters Parameters are represented as typed attribute values: ```sh cefas execute-statement \ --statement "SELECT * FROM Stores WHERE haversine(loc, :center) <= :radius" \ --parameters '[{":center":{"M":{"lat":{"N":"-23.55"},"lon":{"N":"-46.63"}}}}, {":radius":{"N":"1500"}}]' ``` ## Planner behavior The planner tries to push work to the cheapest available source: - Primary key lookup. - Sort-key range. - Built-in secondary index. - Spatial index. - Plugin candidate set. - Table scan when no better path exists. Use `cefas explain` to inspect the plan before assuming an index is active. ## Interfaces: HTTP And gRPC Rendered: https://docs.cefasdb.com/docs/Interfaces-HTTP-And-GRPC Markdown: https://docs.cefasdb.com/wiki/Interfaces-HTTP-And-GRPC.md # Interfaces: HTTP And gRPC CefasDB exposes both HTTP/JSON and gRPC. HTTP is useful for simple clients and diagnostics. gRPC is the main typed API and the transport used by the CLI and Go client. ## HTTP The HTTP listener is configured with `-http`, defaulting to `:8080`. Common local pattern: ```sh cefasdb -data ./cefas-data -http :8080 -grpc :9090 ``` Example table creation over HTTP: ```sh curl -X POST localhost:8080/v1/tables \ -d '{"name":"events","keySchema":{"pk":"user_id","sk":"ts"}}' ``` Example write: ```sh curl -X POST localhost:8080/v1/PutItem \ -d '{"table":"events","item":{"user_id":{"S":"alice"},"ts":{"N":"100"},"event":{"S":"login"}}}' ``` ## gRPC The gRPC listener is enabled with `-grpc`. The protobuf definition lives at: ```text pkg/protocol/cefas.proto ``` The Go client wraps generated gRPC calls under: ```text pkg/client ``` ## TLS and auth For development, clients commonly use plaintext: ```sh cefas --endpoint 127.0.0.1:9090 --insecure list-tables ``` For production, configure TLS and bearer-token authentication. Use `--ca`, `--token`, or `--token-file` from the CLI as needed. ## Streaming Some operations can stream result rows. The CLI can buffer streams into a single response with `--no-stream` when scripts need one JSON value instead of a stream. ## Consistency options `GetItemRequest`, `QueryRequest`, and `ScanRequest` each carry a `Consistency` enum: ```proto enum Consistency { CONSISTENCY_UNSPECIFIED = 0; CONSISTENCY_EVENTUAL = 1; // local read on whichever node served the call CONSISTENCY_STRONG = 2; // routed to the leader + barrier } ``` Over gRPC, set the field directly. Over HTTP via the gRPC gateway, the same field surfaces as a string in the JSON body: ```sh # Eventual (default — field omitted) curl -X POST localhost:8080/v1/GetItem \ -d '{"table":"events","key":{"user_id":{"S":"alice"},"ts":{"N":"100"}}}' # Strong read curl -X POST localhost:8080/v1/GetItem \ -d '{"table":"events","key":{"user_id":{"S":"alice"},"ts":{"N":"100"}},"consistency":"CONSISTENCY_STRONG"}' ``` From the Go client the choice is a per-call option: ```go item, err := c.GetItem(ctx, "events", key, client.GetOptions{Strong: true}) rows, err := c.Scan(ctx, "events", client.ScanOptions{Strong: true, Limit: 1000}) iter := c.Query("events").Pk(types.S("alice")).Strong().Stream(ctx) ``` Writes do not take a consistency option — they always travel to the shard leader and are acknowledged after a quorum of voters commits the entry. A write that hits a follower returns `client.ErrNotLeader`; the client honours the leader hint published by the cluster surface and the `-raft-http-peers` redirect map. See [Concepts: Storage and Replication](Concepts-Storage-And-Replication#consistency-model) for the replication path and [Operations Configuration](Operations-Configuration#consistency-and-durability-tuning) for the server-side tuning knobs. ## Operations Overview Rendered: https://docs.cefasdb.com/docs/Operations-Overview Markdown: https://docs.cefasdb.com/wiki/Operations-Overview.md # Operations Overview Operating CefasDB means managing data directories, configuration, identity, backups, metrics, logs, traces, raft membership, benchmark evidence, and upgrades. The database is a single binary, but production safety comes from the surrounding discipline. ## Operational baseline - Pin binary or image versions. - Use persistent storage for the data directory. - Enable authentication outside trusted development networks. - Scrape metrics. - Keep logs centralized. - Test backup and restore. - Test restart and failover behavior, and verify the [consistency model](Concepts-Storage-And-Replication#consistency-model) under leader loss. - Document the active deployment mode. - Keep benchmark results tied to reproducible commands and deployment shape. ## Data directory The data directory contains the durable database state. In Docker and Kubernetes it must be mounted on persistent storage. In raft mode, raft state also needs stable storage. ## Config sources CefasDB accepts flags, environment variables, and YAML config. The practical order is: 1. Use config files for stable environment-level settings. 2. Use environment variables for deployment-time injection. 3. Use flags for local development and explicit overrides. ## Incident triage Start with: ```sh cefas cluster status cefas list-tables cefas describe-table --table-name
``` Then check: - Server logs. - Metrics endpoint. - Disk capacity and I/O latency. - Raft leader and peer health. - Backup availability. - Auth errors and token issuer/audience mismatch. ## Related pages Read [Configuration](Operations-Configuration), [Backup And Restore](Operations-Backup-And-Restore), [Observability](Operations-Observability), [Benchmark Results](Operations-Benchmark-Results), and [Security And Privacy](Operations-Security-And-Privacy). ## Operations: Configuration Rendered: https://docs.cefasdb.com/docs/Operations-Configuration Markdown: https://docs.cefasdb.com/wiki/Operations-Configuration.md # Operations: Configuration Configuration controls storage paths, HTTP and gRPC listeners, raft, identity, metrics, tracing, and TLS. ## Common server flags | Flag | Purpose | | --- | --- | | `-data` | Pebble data directory. | | `-http` | HTTP listen address. | | `-grpc` | gRPC listen address. | | `-fsync` | Fsync on commit. | | `-config` | YAML config file. | | `-metrics-disabled` | Disable metrics endpoint. | | `-tracing-endpoint` | OTLP/gRPC collector endpoint. | ## Storage profile flags Storage profiles set Pebble defaults for local cache, memtable, compaction, L0, bytes-per-sync, and WAL bytes-per-sync behavior. | Flag | Environment | Purpose | | --- | --- | --- | | `-storage-profile` | `CEFAS_STORAGE_PROFILE` | Select `default`, `balanced`, or `write-heavy`. | | `-raft-storage-profile` | `CEFAS_RAFT_STORAGE_PROFILE` | Select the profile for the separate raft log/stable store. | | `-storage-backpressure` | `CEFAS_STORAGE_BACKPRESSURE_ENABLED` | Enable LSM-metric based write backpressure. | | `-storage-backpressure-reject-critical` | `CEFAS_STORAGE_BACKPRESSURE_REJECT_CRITICAL` | Reject writes when the storage pressure state is critical. | The Docker Compose cluster uses `balanced` by default because it is stable under common Docker Desktop memory limits. For performance runs on machines with a larger Docker VM memory allocation, opt into `write-heavy`: ```sh STORAGE_PROFILE=write-heavy \ docker compose -p cefas-cluster -f deploy/docker-compose.cluster.yml up --build -d ``` Use `balanced` for portable local development and repeatable CI-like checks. Use `write-heavy` for sustained ingest benchmarks when the host has enough memory for larger Pebble caches and memtables. ## Raft flags | Flag | Purpose | | --- | --- | | `-raft-bind` | Raft TCP bind address. | | `-raft-id` | Stable raft server ID. | | `-raft-path` | Raft state path. | | `-raft-bootstrap` | Bootstrap a new cluster. | | `-raft-peers` | Comma-separated `id=addr` peers. | | `-raft-http-peers` | Comma-separated peer HTTP URLs. | ## Multi-raft flags | Flag | Purpose | | --- | --- | | `-shards` | Number of shards. | | `-mux` | Shared mux transport address. | ## Identity flags | Flag | Purpose | | --- | --- | | `-identity-jwks-url` | JWKS endpoint. | | `-identity-issuer` | Expected issuer. | | `-identity-audience` | Expected audience. | | `-identity-clock-skew` | Allowed clock skew. | ## TLS flags | Flag | Purpose | | --- | --- | | `-tls-cert` | gRPC TLS certificate. | | `-tls-key` | gRPC TLS private key. | | `-mtls-ca` | Client CA bundle for mTLS. | ## Consistency and durability tuning CefasDB exposes the consistency surface to operators in three places: the per-call enum on `GetItem`/`Query`/`Scan` (clients control this — see [Interfaces HTTP and gRPC](Interfaces-HTTP-And-GRPC#consistency-options)), the durability lever, and the raft tunable block. This section covers the server-side knobs only. ### Durability | Knob | Default | Effect | | --- | --- | --- | | `-fsync` | `false` | When `true`, every commit calls fsync on the WAL before acknowledging. Trades throughput for crash-immediate durability. Helm chart value: `cluster.fsyncOnCommit`. | | `-storage-bytes-per-sync` | `0` (profile-driven) | Pebble `BytesPerSync` — how often data is flushed to the page cache. Smaller = more frequent disk pressure, lower crash window for unsynced data. | | `-storage-wal-bytes-per-sync` | `0` (profile-driven) | Same idea, scoped to the WAL. | ### Raft timeouts (internal/replication) The `Config` struct in `internal/replication/db.go` ships with conservative defaults. They are not exposed as CLI flags today; they are configured via the YAML `cluster` block when present, otherwise the defaults apply. | Knob | Default | Effect | | --- | --- | --- | | `HeartbeatMS` | 1000 ms | How often the leader sends a heartbeat. Lower = faster failure detection, more CPU and network. | | `ElectionMS` | 1000 ms | How long a follower waits without a heartbeat before starting an election. | | `LeaderLeaseMS` | 500 ms | Lease validity used for leadership steady-state checks. | | `CommitMS` | 50 ms | Maximum time the leader waits before applying a batch of committed entries. | | `SnapshotEntries` | 8192 | Raft log entries between snapshots. Larger = fewer snapshot writes, longer log replay on restart. | ### Membership Quorum size is `floor(N/2) + 1` where `N` is the voter count. The CLI manages it through: ```sh cefas cluster add-voter --id node-d --addr node-d:9091 cefas cluster remove-server --id node-c cefas cluster status ``` Per-shard scoping for the same calls is available via `--shard-id ` or `--all-shards`. ### Storage profile The `-storage-profile` selector (`default`, `balanced`, `write-heavy`) presizes block cache, memtable, and concurrent-compaction limits. For raft-backed clusters, set `-raft-storage-profile=raft` to keep raft metadata in a separate, smaller-cache Pebble instance — this isolates raft log churn from the table working set. ### Helm chart values The chart at `dist/helm/cefas/` exposes the consistency-relevant defaults under `cluster`: ```yaml replicaCount: 1 cluster: shards: 1 bootstrap: true fsyncOnCommit: false ``` Set `replicaCount: 3` and `cluster.bootstrap: true` only on the first install — subsequent pods join the cluster via the standard StatefulSet identity. Flip `cluster.fsyncOnCommit: true` when crash safety is more important than write throughput. ## Config advice Keep local commands explicit. Keep production config declarative. Avoid relying on implicit defaults for data paths, identity, TLS, and raft membership. ## Operations: Backup And Restore Rendered: https://docs.cefasdb.com/docs/Operations-Backup-And-Restore Markdown: https://docs.cefasdb.com/wiki/Operations-Backup-And-Restore.md # Operations: Backup And Restore Backups are named recovery points. They let operators checkpoint one or more tables and restore a source table into a target table name. ## Create a backup ```sh cefas create-backup \ --backup-name nightly \ --table-name Users ``` Multiple tables can be included when the command supports repeated table names. ## List backups ```sh cefas list-backups ``` The response includes backup names and metadata needed to identify recovery points. ## Restore a table ```sh cefas restore-table-from-backup \ --backup-name nightly \ --source-table-name Users \ --target-table-name Users_restored ``` Restoring into a new target table is safer than overwriting a live table. Verify the restored data before cutting application traffic over. ## Runbook 1. Confirm the source table and backup name. 2. Restore into a new target table. 3. Describe the restored table. 4. Query known keys. 5. Run application-level verification. 6. Switch traffic or export only after verification. ## Backup policy Choose backup frequency based on recovery point objective. Store backups on durable storage and test restore regularly. A backup that has never been restored is not a proven backup. Backups capture a Pebble checkpoint of the node they ran on. In raft mode the safest place to take a scheduled backup is the leader, or a follower whose applied log is known to be caught up; a backup taken from a stale follower will be missing whatever entries had not yet been applied locally. The restore path replays the checkpoint into a new table, so restore-time consistency reflects the source checkpoint's position in the raft log — see [Concepts: Storage and Replication](Concepts-Storage-And-Replication#consistency-model). ## Operations: Observability Rendered: https://docs.cefasdb.com/docs/Operations-Observability Markdown: https://docs.cefasdb.com/wiki/Operations-Observability.md # Operations: Observability CefasDB exposes logs, metrics, and tracing hooks. Use them together: logs explain what happened, metrics show whether it is still happening, and traces show where time was spent. ## Logs The server logs startup configuration, listener setup, storage errors, raft events, and request-level failures. In containers, collect stdout and stderr with the platform log collector. ## Metrics Metrics are served on the HTTP listener unless disabled. Prometheus configuration examples live under: ```text deploy/prometheus/ ``` Grafana dashboard examples live under: ```text deploy/grafana/ ``` Track at least: - Request rate. - Request latency. - Error rate. - Storage write latency. - Storage read latency. - Raft leader and replication health. - Backup and restore outcomes. - Process memory and file descriptors. ## Tracing Tracing is configured with an OTLP/gRPC endpoint. Use it when request latency must be broken down across API handlers, storage, raft, and plugin behavior. ## Debug workflow 1. Identify the failing operation. 2. Check logs for direct errors. 3. Check metrics for saturation or spikes. 4. Use tracing to isolate slow layers. 5. Use `cefas explain` for query-specific issues. 6. Use `cefas cluster status` for raft or membership issues. ## Operations: Benchmark Results Rendered: https://docs.cefasdb.com/docs/Operations-Benchmark-Results Markdown: https://docs.cefasdb.com/wiki/Operations-Benchmark-Results.md # Operations: Benchmark Results This page records reproducible local benchmark results for CefasDB. The goal is to separate functional smoke tests from sustained load evidence and to make every result traceable to a command, deployment mode, and workload shape. ## Test environment The results below were captured on June 10, 2026 on a local development machine running Docker Desktop. The cluster tests used three CefasDB server containers in raft mode, each with its own persistent Docker volume. Original container VM snapshot: ```text Architecture: arm64 CPUs: 16 Memory: 7.653 GiB Deployment: Docker Compose, 3 raft voters Client path: local gRPC over localhost port mappings ``` The storage-profile comparison below also includes a later run on the same machine after Docker Desktop memory was raised to about 63.42 GiB. These numbers are local benchmark results, not a hosted-service SLA. They are useful for validating the engine path, raft replication path, client behavior, and the shape of tail latency under controlled local pressure. ## Load tester The benchmark client uses the Go gRPC client directly. It supports: - Leader discovery across multiple gRPC endpoints. - Batched writes through `BatchWriteItem`. - Point reads through `GetItem`. - Fixed-volume runs. - Duration-based runs. - Optional target rates for soak tests. - Latency sampling for long-running workloads. - JSON summary output. The main flags used by these tests were: ```text -addrs localhost:9191,localhost:9192,localhost:9193 -batch-size -workers -read-workers -payload-bytes -write-duration -read-duration -write-rate -read-rate -json-output ``` ## Single-node direct gRPC baseline The first direct gRPC test was run against a single local CefasDB server. It used one table, 1,000,000 writes, 200,000 point reads, batch writes of 500 items, 64 write workers, and 64 read workers. Result: ```text write units: 1,000,000 write RPCs: 2,000 write elapsed: 4.93s write throughput: 202,830 items/s write errors: 0 write p50: 114.974ms write p95: 405.899ms write p99: 459.007ms read units: 200,000 read elapsed: 2.026s read throughput: 98,709 reads/s read errors: 0 read p50: 611us read p95: 1.044ms read p99: 1.285ms read found: 200,000/200,000 ``` Conclusion: the direct gRPC path removes the CLI subprocess overhead and exposes a much stronger engine baseline. Reads were especially strong, with sub-millisecond p50 and low millisecond p99 in the baseline run. ## Three-node raft cluster bulk run The three-node raft cluster test used Docker Compose with three raft voters. The load tester discovered the current leader and wrote through that node. The workload used 2,000,000 writes, 500,000 point reads, batch writes of 500 items, 64 write workers, 64 read workers, and a 256-byte payload attribute. Result: ```text write units: 2,000,000 write RPCs: 4,000 write elapsed: 21.55s write throughput: 92,809 items/s write errors: 0 write p50: 270.691ms write p95: 705.782ms write p99: 1.175971s read units: 500,000 read elapsed: 7.822s read throughput: 63,919 reads/s read errors: 0 read p50: 907us read p95: 1.513ms read p99: 3.385ms read found: 500,000/500,000 ``` Follower read validation: ```text n1 follower read throughput: 52,857 reads/s, found 100,000/100,000 n3 follower read throughput: 53,995 reads/s, found 100,000/100,000 ``` Conclusion: with raft replication enabled, CefasDB sustained a high write rate and still served low-latency point reads. The write p99 reflects batch RPC latency, not per-item latency; each write RPC in this run carried 500 items. ## Failover validation The current leader was stopped, the remaining nodes elected a new leader, and the client then ran a smaller write/read workload against the surviving cluster. Result: ```text new leader: n3 write units: 100,000 write elapsed: 412ms write throughput: 242,772 items/s write errors: 0 write p50: 29.483ms write p95: 50.120ms write p99: 52.528ms read units: 10,000 read throughput: 48,242 reads/s read errors: 0 read p50: 619us read p95: 1.042ms read p99: 1.335ms read found: 10,000/10,000 ``` Conclusion: the cluster accepted writes after leader loss and preserved read correctness for the tested keyspace. ## Controlled soak test The controlled soak test ran for ten minutes total: five minutes of writes followed by five minutes of reads. It intentionally used target rates rather than maximum pressure so the run could verify sustained stability without turning the test into only disk saturation. Workload: ```text cluster: 3 raft voters table: ClusterSoak5m_20260610 write duration: 5m read duration: 5m write target rate: 10,000 items/s read target rate: 20,000 reads/s batch size: 250 write workers: 32 read workers: 64 payload bytes: 64 latency sample rate: 1 sample per 20 RPCs ``` Write result: ```text write units: 3,000,000 write RPCs: 12,000 write elapsed: 300.001s write throughput: 9,999.96 items/s write errors: 0 write p50: 6.811ms write p95: 10.687ms write p99: 13.920ms write max: 25.694ms ``` Read result: ```text read units: 6,000,000 read RPCs: 6,000,000 read elapsed: 300.026s read throughput: 19,998.26 reads/s read errors: 0 read found: 6,000,000/6,000,000 read p50: 270us read p95: 453us read p99: 665us read max: 6.807ms ``` Post-soak validation: ```text read units: 10,000 read throughput: 9,991 reads/s read errors: 0 read found: 10,000/10,000 read p50: 206us read p95: 337us read p99: 1.828ms ``` The JSON report for this run was written locally as: ```text /tmp/cefas-bench/soak_5m_20260610.json ``` Conclusion: the controlled soak passed cleanly. CefasDB sustained the target write and read rates for the full duration, returned every sampled key, and kept read tail latency below 1 ms p99 during the read phase. ## Storage profile comparison The storage profile tests were run after adding explicit Pebble tuning profiles, LSM backpressure, raft/data store separation, admin compaction, and load-test scripts. The workload used a three-node raft cluster, 64 write workers, 64 read workers, 500-item write batches, and a 256-byte payload attribute. The two practical scenarios are: | Scenario | Docker memory | Profile | Outcome | | --- | ---: | --- | --- | | Portable local default | 7.653 GiB | `balanced` | Completed full bulk and 30-minute soak. | | Performance workstation | 63.42 GiB | `write-heavy` | Completed full bulk and 30-minute soak with better write latency. | The failed case was also useful: with only 7.653 GiB available to Docker, `write-heavy` completed the bulk phase but the third raft node was OOM-killed during snapshot pressure in the soak phase. The error was environmental memory pressure, not a data correctness failure; the bulk phase had zero write/read errors and found all 500,000 keys. ### Balanced profile on Docker Desktop memory defaults Command shape: ```sh STORAGE_PROFILE=balanced \ PROJECT=cefas-loadtest-balanced-013 \ RESET_CLUSTER=1 \ RESULT_DIR=/tmp/cefas-bench/load-balanced-0.1.3-20260610T164320Z \ scripts/bench_cluster.sh ``` Bulk result: ```text write units: 2,000,000 write elapsed: 9.528s write throughput: 209,900 items/s write errors: 0 write p50: 146.717ms write p95: 207.670ms write p99: 258.980ms read units: 500,000 read elapsed: 7.627s read throughput: 65,559 reads/s read errors: 0 read p50: 934us read p95: 1.482ms read p99: 1.968ms read found: 500,000/500,000 ``` Soak result: ```text write duration: 15m write units: 13,500,000 write throughput: 14,999.97 items/s write errors: 0 write p50: 17.125ms write p95: 27.144ms write p99: 41.636ms read duration: 15m read units: 18,000,000 read throughput: 19,996.97 reads/s read errors: 0 read p50: 287us read p95: 702us read p99: 2.160ms read found: 18,000,000/18,000,000 ``` Final memory snapshot: ```text n1: 528.9 MiB n2: 1.621 GiB n3: 1.141 GiB leader: n2 ``` Conclusion: `balanced` is the right portable default. It completed the sustained benchmark under the local Docker memory budget and kept read latency low. ### Write-heavy profile with a 64 GiB Docker VM Command shape: ```sh STORAGE_PROFILE=write-heavy \ PROJECT=cefas-loadtest-writeheavy-64g \ RESET_CLUSTER=1 \ RESULT_DIR=/tmp/cefas-bench/load-writeheavy-64g-0.1.3-20260610T180514Z \ scripts/bench_cluster.sh ``` Bulk result: ```text write units: 2,000,000 write elapsed: 8.739s write throughput: 228,855 items/s write errors: 0 write p50: 134.849ms write p95: 189.938ms write p99: 231.759ms read units: 500,000 read elapsed: 7.442s read throughput: 67,183 reads/s read errors: 0 read p50: 925us read p95: 1.417ms read p99: 1.731ms read found: 500,000/500,000 ``` Soak result: ```text write duration: 15m write units: 13,500,000 write throughput: 14,999.97 items/s write errors: 0 write p50: 15.480ms write p95: 20.760ms write p99: 38.888ms read duration: 15m read units: 17,999,999 read throughput: 19,997.01 reads/s read errors: 0 read p50: 273us read p95: 900us read p99: 1.364ms read found: 17,999,999/17,999,999 ``` Final memory snapshot: ```text n1: 2.590 GiB n2: 2.651 GiB n3: 4.436 GiB leader: n3 ``` Conclusion: `write-heavy` is the faster ingest profile when Docker has enough memory. In this run it improved bulk write throughput and reduced sustained write p95 compared with `balanced`, while keeping all nodes alive. ### Operational choice Use `balanced` when: - The cluster must run on default Docker Desktop memory limits. - The run is a developer smoke, CI-like check, or portable reproduction. - Stability matters more than maximum ingest throughput. Use `write-heavy` when: - Docker or the target host has enough memory for larger Pebble caches and memtables. - The workload is sustained ingest or benchmark-oriented. - Resource monitoring is available for memory, compaction, snapshots, and raft health. ## Overall conclusion The current implementation is no longer just passing smoke tests. It has early evidence of a strong operational database core: - Direct gRPC single-node writes exceeded 200,000 items/s in a short maximum-pressure run. - A three-node raft cluster sustained more than 200,000 replicated write items/s in the latest bulk profile runs. - A controlled raft soak sustained 15,000 writes/s and about 20,000 reads/s for thirty minutes total with zero client-visible errors. - Point reads remained consistently low-latency in both bulk and soak tests. - Follower reads returned the expected data after raft replication. - The cluster accepted new writes after manual leader loss and re-election. The strongest result is read stability: point reads remained sub-millisecond at p50 and p99 in the controlled soak. The highest-pressure replicated write test also showed strong throughput, but its p99 must be interpreted as batch RPC latency because each write RPC represented hundreds of items. The next benchmark milestones are: - One-hour and eight-hour soak runs. - Mixed read/write workloads running at the same time. - Runs with `-fsync` enabled. - Larger payload sizes and larger table cardinality. - Compaction and snapshot pressure tracking. - Remote-client tests instead of loopback-only clients. - Resource charts for CPU, memory, disk writes, and raft replication lag. ## Operations: Security And Privacy Rendered: https://docs.cefasdb.com/docs/Operations-Security-And-Privacy Markdown: https://docs.cefasdb.com/wiki/Operations-Security-And-Privacy.md # Operations: Security And Privacy Security in CefasDB has three layers: transport, identity, and operation authorization. Privacy-sensitive audience workflows add a fourth layer: avoid exporting raw member identity when aggregate answers are enough. ## Transport Use TLS for production gRPC. Plaintext `--insecure` is for local development and trusted test networks only. ## Identity Configure JWKS, issuer, and audience so requests must present a valid bearer token. Keep clock skew small and monitor auth failures after identity-provider changes. ## Authorization Use scoped tokens. Separate table read, table write, table admin, plugin, backup, and cluster operations. Do not give broad admin tokens to application services. ## Audience privacy guarantees The audience workflow is designed around server-side selection and aggregate reporting: - There is no general `list-audience-members` reporting surface. - Approximate reach uses HyperLogLog. - Dedup and frequency cap return boolean verdicts, not stored keys or counters. - Aggregation supports `--min-group-size` to avoid small cohort disclosure. - Plugin index state stays server-side. ## Threat model CefasDB reduces accidental raw identity export from audience workflows. It does not provide formal differential privacy, does not eliminate all timing side channels, and cannot prevent linkage attacks against external datasets. Treat privacy floors as a minimum operational control, not a mathematical guarantee. ## Practical checklist - Enable TLS. - Enable bearer-token validation. - Use least-privilege scopes. - Keep admin operations off public networks. - Set `--min-group-size` for audience reporting. - Avoid building raw export commands for cohorts. - Audit logs around backup, restore, and admin actions. ## API Reference Rendered index: https://docs.cefasdb.com/docs/api Machine index: https://docs.cefasdb.com/api/index.json - [pkg/client](https://docs.cefasdb.com/docs/api/pkg-client):