integrationclickhousedeveloper-tools

ClickHouse connector for feature-flag event stores: Design patterns and query examples

UUnknown

2026-01-29

10 min read

Blueprint for building a ClickHouse connector for feature-flag events: schemas, ingestion, SQL patterns, and performance tips for rapid analysis.

Hook: Stop guessing — get answers from your flags in minutes

Feature flags and experiments generate huge volumes of events every minute. Without a focused OLAP design you face slow queries, incomplete audit trails, and risky rollouts. This blueprint shows how to build a ClickHouse connector for feature-flag events that delivers sub-second analytics, reliable audits, and pre-aggregates for dashboards — using proven 2026 patterns for ingestion, schema, and SQL.

What you'll get (most important first)

A compact event schema for flag evaluations, experiment exposures, and audits.
An ingestion blueprint using ClickHouse Kafka engine (or HTTP/Vector) and Materialized Views to populate MergeTree tables.
SQL patterns: raw exploration, cohort queries, conversion uplift, uniqueness, time series, and audit retrieval.
Performance & maintenance guidance: partitioning, ORDER BY, TTL/tiering, projections, and pre-aggregations.

Context: Why ClickHouse for feature-flag events in 2026

ClickHouse continues to expand as the default OLAP backbone for event analytics. In late 2025 the company raised major growth capital, reflecting accelerated adoption across observability, product analytics, and experimentation workloads. For 2026, ClickHouse offers cloud-native tiering, projections, and tighter streaming connectors that make it a practical choice for real-time flag telemetry and auditability.

“An OLAP-first approach for flag events lets teams answer production-impact questions fast: exposure rates, rollback impact, and experiment deltas.”

High-level architecture

Keep ingestion simple and idempotent. Common pipeline:

SDKs emit events (flag-eval, exposure, metric, audit) to a streaming bus (Kafka/Kinesis) or HTTP collector (Vector/Fluent Bit).
A ClickHouse Kafka table or an ingestion agent consumes the stream and writes to a staging table.
Materialized Views transform/enrich events and populate finalized MergeTree tables with proper partitioning.
Optional materialized aggregates and projections power dashboards and SLA queries.

Event types and recommended schema

Design three core event tables. Keep raw JSON for forward compatibility but project the key fields into typed columns for query speed.

1) flag_evaluations (every SDK evaluation)

CREATE TABLE flag_evaluations_raw (
    event_id UUID,
    event_time DateTime64(3),
    payload String
  ) ENGINE = Kafka SETTINGS
    kafka_broker_list = 'kafka1:9092',
    kafka_topic = 'flags-eval',
    format = 'JSONEachRow';

  CREATE TABLE flag_evaluations (
    event_id UUID,
    flag_key String,
    flag_version UInt64,
    user_id String,
    user_properties Map(String,String),
    evaluated_value String,
    is_enabled UInt8,
    reason_code String,
    event_time DateTime64(3),
    ingestion_time DateTime DEFAULT now()
  ) ENGINE = MergeTree()
  PARTITION BY toYYYYMM(event_time)
  ORDER BY (flag_key, user_id, event_time);

Keep user_properties as a Map so you can filter or run analytics without shredding every property upfront.

2) experiment_exposures (variant-level exposures)

CREATE TABLE experiment_exposures (
    exposure_id UUID,
    experiment_key String,
    variant String,
    user_id String,
    eligible UInt8,
    event_time DateTime64(3),
    context Map(String,String)
  ) ENGINE = MergeTree()
  PARTITION BY toYYYYMM(event_time)
  ORDER BY (experiment_key, variant, user_id, event_time);

3) flag_audits (configuration changes & admin actions)

CREATE TABLE flag_audits (
    audit_id UUID,
    flag_key String,
    actor_id String,
    action String,
    change_delta String, -- JSON
    version UInt64,
    created_at DateTime64(3)
  ) ENGINE = MergeTree()
  PARTITION BY toYYYYMM(created_at)
  ORDER BY (flag_key, version, created_at);

Design note: keep audits immutable. That gives you an exact rollback trail for compliance and postmortems.

Ingestion patterns: Kafka Engine + Materialized Views

Use ClickHouse's native Kafka engine to stream raw JSON rows into ClickHouse and a Materialized View to transform them into typed rows. This reduces end-to-end latency and centralizes enrichment logic.

-- raw kafka table already shown above
CREATE MATERIALIZED VIEW mv_flags_evals
TO flag_evaluations
AS SELECT
  parseUUID(JSONExtractString(payload, 'event_id')) AS event_id,
  JSONExtractString(payload, 'flag_key') AS flag_key,
  toUInt64(JSONExtractRaw(payload, 'flag_version')) AS flag_version,
  JSONExtractString(payload, 'user_id') AS user_id,
  JSONExtractMap(payload, 'user_properties') AS user_properties,
  JSONExtractString(payload, 'evaluated_value') AS evaluated_value,
  toUInt8(JSONExtractBool(payload, 'is_enabled')) AS is_enabled,
  JSONExtractString(payload, 'reason_code') AS reason_code,
  parseDateTime64BestEffort(JSONExtractString(payload, 'event_time')) AS event_time,
  now() AS ingestion_time
FROM flag_evaluations_raw;

Alternative: ingest via Vector/HTTP directly to the typed MergeTree tables to avoid an extra MV if you have strong guarantees on event structure.

Deduplication and idempotency

Streams can deliver duplicates. Implement dedupe in one of these ways:

At the producer: include event_id (UUID) and generate deterministic IDs for retries.
Staging + ReplacingMergeTree: write to a ReplacingMergeTree keyed by event_id and use the latest version to dedupe periodically.
Consumer-side dedupe: use a small in-memory cache (Redis) in the consumer to drop duplicates before ClickHouse insertion for high-volume real-time paths.

Example: staging dedupe using ReplacingMergeTree

CREATE TABLE flag_evals_staging (
  event_id UUID,
  event_time DateTime64(3),
  payload String,
  ingestion_ts DateTime DEFAULT now()
) ENGINE = ReplacingMergeTree(ingestion_ts)
PARTITION BY toYYYYMM(event_time)
ORDER BY (event_id);

Pre-aggregates and performance patterns

Raw event tables are great for ad-hoc analysis but slow for dashboard queries. Create pre-aggregates using Materialized Views into SummingMergeTree/AggregatingMergeTree.

CREATE MATERIALIZED VIEW daily_flag_aggregates
TO daily_flag_aggregates_table
AS SELECT
  flag_key,
  toDate(event_time) AS day,
  is_enabled,
  uniqCombinedState(user_id) AS unique_users_state,
  count() AS eval_count
FROM flag_evaluations
GROUP BY flag_key, day, is_enabled;

CREATE TABLE daily_flag_aggregates_table (
  flag_key String,
  day Date,
  is_enabled UInt8,
  unique_users AggregateFunction(uniqCombined, String),
  eval_count UInt64
) ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(day)
ORDER BY (flag_key, day, is_enabled);

Use uniqCombined() for large-scale distinct counts; only use uniqExact for small datasets or verification because it’s memory heavy.

SQL patterns: exploration to production queries

1) Basic counts and uniques

-- total evaluations and unique users for a flag in last 7 days
SELECT
  count() AS evals,
  uniqCombined(user_id) AS unique_users
FROM flag_evaluations
WHERE flag_key = 'checkout_v2'
  AND event_time >= now() - INTERVAL 7 DAY;

2) Exposure % over time (time series)

SELECT
  toStartOfInterval(event_time, INTERVAL 1 hour) AS ts,
  (sum(is_enabled) / uniqCombinedState(user_id)) AS pct_enabled
FROM flag_evaluations
WHERE flag_key = 'checkout_v2'
  AND event_time >= now() - INTERVAL 2 DAY
GROUP BY ts
ORDER BY ts;

3) Audit trail lookup

SELECT *
FROM flag_audits
WHERE flag_key = 'checkout_v2'
ORDER BY created_at DESC
LIMIT 100;

Include actor_id and change_delta to answer “who changed what and when”.

4) Experiment conversion and uplift (z-test)

Join exposures to metric events (assume metric_events table). This SQL computes conversion rates and a z-test p-value as an approximate significance check.

WITH
  (SELECT
     variant,
     count() AS exposures,
     sum(converted) AS conversions
   FROM (
     SELECT e.user_id, e.variant, m.user_id IS NOT NULL AS converted
     FROM experiment_exposures AS e
     LEFT JOIN metric_events AS m
       ON e.user_id = m.user_id
       AND m.metric = 'purchase'
       AND m.event_time BETWEEN e.event_time AND e.event_time + INTERVAL 7 DAY
     WHERE e.experiment_key = 'pricing_test' AND e.event_time >= now() - INTERVAL 14 DAY
   )
   GROUP BY variant) AS stats
SELECT
  variant,
  exposures,
  conversions,
  conversions / exposures AS conv_rate,
  -- z-test components
  (conversions / exposures) AS p_hat,
  (1.0 * conversions / exposures) * (1 - conversions / exposures) / exposures AS var_hat,
  (conversions / exposures - (SELECT conversions / exposures FROM stats WHERE variant = 'control')) /
    sqrt(var_hat + (SELECT (1.0 * conversions / exposures) * (1 - conversions / exposures) / exposures FROM stats WHERE variant = 'control')) AS z_score
FROM stats;

For production you should use a statistics library for p-values and corrections, but ClickHouse can get you near real-time approximations for triage.

5) Rollback detection: how many users saw a previously-enabled flag

SELECT
  flag_key,
  count() AS impacted_users
FROM (
  SELECT user_id, flag_key, max(is_enabled) AS ever_enabled
  FROM flag_evaluations
  WHERE event_time >= now() - INTERVAL 30 DAY
  GROUP BY user_id, flag_key
  HAVING ever_enabled = 1
)
GROUP BY flag_key
ORDER BY impacted_users DESC;

Operational & cost controls (2026 best practices)

Partitioning: partition by month (toYYYYMM(event_time)) for large clusters. Use finer partitions for extremely high-volume flags.
ORDER BY: choose keys to accelerate your most common filters — flag_key, experiment_key, user_id and event_time.
Tiering & TTL: move raw events older than 90 days to object storage via ClickHouse’s disk policies and use TTL/tiering to delete or compress.
Projections: use ClickHouse projections for fast, automatic precomputation on hot tables (recommended in 2026 releases).
Dictionaries: load static flag metadata into ClickHouse dictionaries for fast enrichment without joins.

ETL and backfill patterns

For backfills and schema evolution, follow this two-phase approach:

Ingest historical files into a staging table using INSERT FORMAT JSONEachRow.
Run dedupe SELECT ... GROUP BY event_id and INSERT INTO final table using INSERT SELECT. Use dedupe windows when event ids are not present.

INSERT INTO flag_evaluations_staging FORMAT JSONEachRow
{...}

-- Deduplicate and insert to final
INSERT INTO flag_evaluations SELECT
  any(event_id) AS event_id,
  any(flag_key) AS flag_key,
  any(flag_version) AS flag_version,
  any(user_id) AS user_id,
  any(user_properties) AS user_properties,
  any(evaluated_value) AS evaluated_value,
  any(is_enabled) AS is_enabled,
  any(reason_code) AS reason_code,
  min(event_time) AS event_time,
  now() AS ingestion_time
FROM flag_evaluations_staging
GROUP BY event_id;

Observability, SLA and security

Emit ingestion metrics (lag, insert failures) to Prometheus and correlate with ClickHouse system tables (system.metrics, system.parts).
Protect audit tables: grant read-only roles for auditors and use RBAC for write operations.
Encrypt data-at-rest and use network policies when ClickHouse is self-managed. Managed ClickHouse clouds now offer built-in key management and IAM in 2026.

Example: from SDK to dashboard in under a minute

User hits feature-flag SDK; SDK emits evaluation JSON to Kafka.
ClickHouse Kafka table ingests raw JSON; Materialized View writes typed row to flag_evaluations (sub-second).
Pre-aggregate MV updates daily_flag_aggregates_table for dashboards.
Data analyst runs a conversion uplift query joining exposures with metric_events — results returned in seconds.

Common pitfalls and how to avoid them

Storing everything as JSON: parse and type the fields you query frequently to gain 10-100x speedups.
No dedupe plan: duplicates inflate exposure counts; include event_id or use dedupe MVs.
Poor ORDER BY choices: queries that rely on user_id scans will be slow — pick order keys aligned with filters.
Missing audit immutability: never overwrite audit rows in-place; append-only gives forensics for production incidents.

Future trends & 2026 predictions

Expect three trends to change how teams use ClickHouse for feature flags:

Stronger native streaming connectors (Kafka, Kinesis, Pulsar) and managed ClickHouse offerings reducing operational load.
Wider adoption of projections and per-table tiering, making raw-event retention cheaper and queries faster.
More integrations between feature management platforms and ClickHouse ecosystems — enabling first-class experiment analytics with lower latency.

These trends are driven by enterprise demand for low-latency experimentation analytics and recent 2025 momentum in OLAP investment.

Actionable checklist (implement in 1-2 sprints)

Instrument SDKs to emit typed event_id and event_time for all events.
Deploy a Kafka/HTTP ingestion path and create a raw Kafka table in ClickHouse.
Create typed MergeTree tables for flag_evaluations, experiment_exposures and flag_audits.
Build Materialized Views to transform raw events into typed rows and pre-aggregates for dashboards.
Implement deduplication (staging + ReplacingMergeTree) and a TTL/tiering policy for cold data.
Create standard SQL reports: exposure rate, experiment uplift, audit history, rollback impact.

Closing thoughts

By 2026, ClickHouse is the pragmatic OLAP choice for feature-flag event stores: it delivers low-latency answers, scales to billions of evaluations, and integrates well with streaming ecosystems. The patterns above — typed tables, materialized views, pre-aggregates, and audit immutability — give you a production-ready blueprint for fast analysis, safe rollouts, and clear audit trails.

Next steps (call to action)

Start with the schema and ingestion MV above in a staging cluster. Run a 2-week pilot with one high-traffic flag and a single experiment. If you want a ready-made connector blueprint or a review of your schema and ORDER BY choices, reach out to our engineering team to get a tailored implementation plan.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.