analyticsobservabilityclickhouse

Using ClickHouse for fast feature-flag analytics: Architecting observability at scale

UUnknown

2026-01-24

10 min read

Pipeline flag evaluation events into ClickHouse for low-latency analytics, segmentation and near-real-time experiment metrics.

Ship faster, measure instantly: why feature-flag analytics need a real OLAP backbone

Pain point: product, QA, and engineering teams are blind to whether a feature flag is actually changing user behavior, and experiments take hours (or days) to produce reliable numbers. If rollback is manual and slow, risk increases.

By 2026, teams expect near-real-time experiment metrics, low-latency segmentation queries, and scalable event pipelines that keep toggle metadata and evaluation events auditable. ClickHouse — now a dominant OLAP choice after major growth in late 2024–2025 — is uniquely positioned to serve feature-flag observability at scale. This article shows a production-ready architecture and practical SQL + DDL to pipeline flag evaluation events into ClickHouse for low-latency analytics, segmentation, and experiment metrics.

Executive summary (most important first)

Architecture: SDKs → streaming (Kafka/Kinesis/Pulsar) → ClickHouse Kafka engine → materialized views → Aggregating / MergeTree tables → Distributed query layer.
Latency: sub-second inserts and second-level aggregated metrics using materialized views and pre-aggregations; minute-level rollups for dashboards.
Scalability: use ReplicatedMergeTree + Distributed tables; partition by date; use projections or aggregated tables to control query cost.
Accuracy vs cost: keep per-event raw logs for 7–30 days, maintain multi-hour/hourly pre-aggregations for reporting, and keep long-term rollups for compliance.

2026 context: why ClickHouse is the right choice now

In late 2025 ClickHouse continued to accelerate as an enterprise OLAP choice after significant funding and broader cloud adoption. The market trend is clear: teams are consolidating analytics and observability workloads on fast, columnar OLAP databases to get real-time insights without separate batch warehouses. For feature-flag analytics, that means:

High ingest rates for per-evaluation events (millions/sec at scale).
Sub-second to second query response for ad-hoc segmentation and experiment reports.
Cost controls through compression, partitioning and rollups.

Core design principles

Store every evaluation event for short-term audit — raw events enable debugging, A/A checks, and dedup verification.
Pre-aggregate aggressively into minute/hour buckets per flag & variant to serve dashboards and experiment metrics.
Use idempotency and deduplication to protect from retries and duplicates from SDKs and network layers.
Partition and order for your most common queries — queries by flag and time should be O(log N) on disk ranges, not full table scans.
Protect PII — hash user identifiers and use privacy-aware aggregation windows for compliance.

Recommended schema for evaluation events

Below are the fields we recommend capturing for each flag evaluation. Store raw events as JSONEachRow for flexibility, then transform into typed columns in ClickHouse.

-- Raw event DDL (schematic)
CREATE TABLE flags_events_kafka (
  evaluation_id String,         -- unique per evaluation (UUID)
  timestamp DateTime64(3),      -- event time
  event_date Date DEFAULT toDate(timestamp),
  environment String,           -- prod/staging
  flag_key String,
  flag_version UInt64,
  variant String,               -- "on" | "off" | "blue" | ...
  user_id_hashed String,        -- hashed (SHA256/Blake2)
  user_props Nested(           -- optional properties for segmentation
    key String,
    value String
  ),
  sdk_name String,
  sdk_version String,
  evaluation_reason String,     -- e.g. "cache-hit" / "server-eval"
  sample_rate Float32 DEFAULT 1.0
) ENGINE = Kafka
SETTINGS kafka_broker_list = 'kafka:9092',
         kafka_topic = 'flags-evaluations',
         kafka_group_name = 'ch-flags-consumer',
         format = 'JSONEachRow';

Notes: JSONEachRow keeps SDK changes flexible. Use a strict schema in downstream tables for performance.

Ingestion patterns and deduplication

Network retries and client retries cause duplicate evaluation events. ClickHouse can dedupe using a ReplacingMergeTree or CollapsingMergeTree. We prefer ReplacingMergeTree with a version column (timestamp) so the latest wins, and using a materialized view to transform Kafka inserts into a typed table.

-- Target raw table with deduplication
CREATE TABLE flags_events_raw (
  evaluation_id String,
  timestamp DateTime64(3),
  event_date Date,
  environment String,
  flag_key String,
  flag_version UInt64,
  variant String,
  user_id_hashed String,
  sdk_name String,
  sdk_version String,
  evaluation_reason String,
  sample_rate Float32
) ENGINE = ReplacingMergeTree(timestamp)
PARTITION BY toYYYYMMDD(event_date)
ORDER BY (flag_key, event_date, timestamp)
SETTINGS index_granularity = 8192;

-- Materialized view to move from Kafka engine to typed table
CREATE MATERIALIZED VIEW mv_flags_kafka_to_raw TO flags_events_raw AS
SELECT
  evaluation_id,
  timestamp,
  toDate(timestamp) AS event_date,
  environment,
  flag_key,
  flag_version,
  variant,
  user_id_hashed,
  sdk_name,
  sdk_version,
  evaluation_reason,
  sample_rate
FROM flags_events_kafka;

This pattern decouples ingestion from storage and uses ClickHouse's Kafka engine to consume streams with very low operational overhead.

Near-real-time metrics: materialized aggregations

Per-event raw logs are heavy to query. Create minute-level aggregations for each flag and variant using AggregatingMergeTree. These feed dashboards and experiment analysis with second-level freshness.

-- Aggregated table (minute granularity)
CREATE TABLE flags_agg_minute (
  event_minute DateTime,
  flag_key String,
  environment String,
  variant String,
  users_agg AggregateFunction(uniqCombined, String),
  evals UInt64,
  conversions UInt64
) ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMMDD(event_minute)
ORDER BY (flag_key, event_minute, variant);

-- Materialized view that populates the aggregation
CREATE MATERIALIZED VIEW mv_flags_to_agg TO flags_agg_minute AS
SELECT
  toStartOfMinute(timestamp) AS event_minute,
  flag_key,
  environment,
  variant,
  uniqCombinedState(user_id_hashed) AS users_agg,
  count() AS evals,
  sum(if(evaluation_reason = 'conversion_event', 1, 0)) AS conversions
FROM flags_events_raw
GROUP BY event_minute, flag_key, environment, variant;

Key points:

Use uniqCombined (or HLL types) for approximate unique user counts at scale.
Store both raw and aggregated so you can backfill, verify, and run A/A checks.

Distributed architecture and HA

For production scale use ReplicatedMergeTree on each shard and a Distributed table to present a single query surface:

CREATE TABLE flags_events_raw_replica ON CLUSTER prod_cluster AS flags_events_raw
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{shard}/flags_events_raw', '{replica}', timestamp)
PARTITION BY toYYYYMMDD(event_date)
ORDER BY (flag_key, event_date, timestamp);

CREATE TABLE flags_events_raw_dist ON CLUSTER prod_cluster AS flags_events_raw
ENGINE = Distributed(prod_cluster, currentDatabase(), flags_events_raw, rand());

This provides horizontal scaling of storage and queries while keeping ingestion parallel.

Query examples: segmentation, experiment metrics, and p-values

Below are practical SQL queries you can use directly. They assume you have the aggregated minute table and a conversion event marker.

1) Conversion rate by variant (last 24 hours)

SELECT
  variant,
  sum(conversions) AS conversions,
  uniqCombinedMerge(users_agg) AS unique_users,
  round(conversions / uniqCombinedMerge(users_agg), 6) AS conversion_rate
FROM flags_agg_minute
WHERE flag_key = 'checkout_experiment'
  AND environment = 'prod'
  AND event_minute >= now() - INTERVAL 24 HOUR
GROUP BY variant
ORDER BY conversion_rate DESC;

2) Quick significance test (z-test approximation)

This SQL computes a two-sided p-value using a normal approximation. It's suitable when counts are large; for small counts use exact or Bayesian methods.

WITH
  s AS (
    SELECT
      variant,
      sum(conversions) AS conv,
      uniqCombinedMerge(users_agg) AS users
    FROM flags_agg_minute
    WHERE flag_key = 'checkout_experiment'
      AND environment = 'prod'
      AND event_minute >= now() - INTERVAL 24 HOUR
    GROUP BY variant
  )
SELECT
  a.variant AS variant_a,
  b.variant AS variant_b,
  a.conv AS conv_a,
  b.conv AS conv_b,
  a.users AS n_a,
  b.users AS n_b,
  (a.conv / a.users) AS p_a,
  (b.conv / b.users) AS p_b,
  ( (a.conv / a.users) - (b.conv / b.users) ) AS diff,
  sqrt( (p_a * (1 - p_a) / a.users) + (p_b * (1 - p_b) / b.users) ) AS se,
  ( (p_a - p_b) / se ) AS z_score,
  2 * (1 - (0.5 * (1 + erf(abs(z_score) / sqrt(2))))) AS p_value
FROM s AS a
CROSS JOIN s AS b
WHERE a.variant = 'on' AND b.variant = 'off';

Explanation: compute conversion rates and standard error, derive z-score, then p-value via the error function (erf). ClickHouse supports erf in SQL for normal CDF calculations.

3) Segmentation: conversion rate by country and variant

SELECT
  country,
  variant,
  sum(conversions) AS conv,
  uniqCombinedMerge(users_agg) AS users,
  conv / users AS conv_rate
FROM flags_agg_minute
WHERE flag_key = 'checkout_experiment'
  AND environment = 'prod'
  AND event_minute >= now() - INTERVAL 7 DAY
  AND country IN ('US','CA','GB','DE')
GROUP BY country, variant
ORDER BY country, conv_rate DESC;

Operational considerations

Retention and cost

Keep raw events for a short window (7–30 days) and store pre-aggregated rollups for longer (90–365 days). Use TTL expressions on tables to automatically drop raw partitions:

ALTER TABLE flags_events_raw
MODIFY TTL timestamp + INTERVAL 30 DAY;

For long-term compliance, push daily aggregated snapshots to a cheaper data lake or warehouse with further compressions.

Handling high cardinality user properties

User properties (email, plan, custom segments) can explode cardinality. Use nested keys sparingly in raw logs and pre-compute the small set of properties you frequently query (country, plan, cohort) into the aggregated table. For ad-hoc exploration, use raw events but set expectations on query cost.

Backfills and consistency

When you backfill data (for example, reprocess a day of events), use deterministic grouping keys and AggregatingMergeTree so re-ingesting idempotently merges aggregates. Materialized views can be temporarily disabled during large backfills.

Monitoring and alerting

Track Kafka consumer lag, ClickHouse insert latencies, and materialized view insertion errors — instrumenting these is part of modern observability.
Alert on sudden drops in unique users or conversions per variant to catch instrumentation regressions.

Privacy and compliance

By 2026, regulations have tightened. Hash user identifiers at the SDK level (salted) and store only hashed IDs in ClickHouse. For sensitive cohorts, compute aggregated metrics at the edge or use differential privacy mechanisms before storage.

Advanced strategies and 2026 trends

Edge evaluation + stream-first metrics: evaluate flags client-side or edge, emit compact evaluation events (hashed user id, variant, reason) and rely on streaming into ClickHouse. This reduces central evaluation latency and lets analytics stay consistent — tied to ideas in privacy-first personalization.
Projections & vectorized execution: use ClickHouse projections to pre-compute common GROUP BYs and accelerate queries further — combine with observability to track projection health.
Experimentation-as-a-product: many teams in 2026 treat experimentation as a product; central ClickHouse aggregates become the single source of truth for experiments across squads.
Hybrid pipelines: combine ClickHouse for near-real-time analytics and a data warehouse for complex, long-running joins — use a data catalog to manage schemas and retention.

Example production story (pattern you can replicate)

A mid-size fintech had 500M monthly evaluations and needed 1-minute experiment metrics. They implemented:

SDKs emitting JSONEachRow to Kafka (evaluation_id, hashed user id, variant, timestamp, conversion marker).
ClickHouse Kafka engine + materialized view into a ReplacingMergeTree raw table.
Minute-level AggregatingMergeTree pre-aggregations feeding dashboards and an API for product teams.

Result: dashboards updated within 60s, A/B tests could be validated in hours instead of days, and rollback decisions were data-driven. Cost-control came from 30-day raw retention + compressed rollups for 365 days.

Checklist: launching ClickHouse-backed feature analytics

Define the event payload and standardize evaluation_id and hashed user id.
Choose a streaming layer (Kafka/Kinesis/Pulsar) and configure ClickHouse Kafka engine.
Create raw ReplacingMergeTree table with TTL for raw events.
Create minute/hour AggregatingMergeTree rollups via materialized views.
Partition by date and order by flag_key + time for fast queries.
Enable replicas and Distributed tables for HA and scale.
Set up monitoring for lag, inserts, and unusual metric deltas — invest in observability.

Actionable takeaways

Start with a small pipeline: instrument one critical flag, stream events to Kafka, and build a minute aggregation. Validate end-to-end before wide rollout.
Pre-aggregate for dashboards: never query raw events for production dashboards; use minute/hour aggregates to guarantee SLAs. Fast metrics tie into low-latency guidance in the latency playbook.
Deduplicate aggressively: use ReplacingMergeTree with a deterministic evaluation_id to avoid inflated counts.
Protect PII: hash identifiers at the edge and limit raw retention.

Next steps: prototype plan (48–72 hours)

Take one high-impact flag and modify the SDK to emit evaluation events to your streaming topic with evaluation_id and hashed user id.
Provision a ClickHouse cluster (or managed service) and create the Kafka-engine table + materialized view to a ReplacingMergeTree raw table.
Create the minute AggregatingMergeTree and a dashboard that queries the aggregated table.
Validate A/A consistency and run an A/B test to confirm metrics line up with expectations.

Call to action

If your team is wrestling with slow experiment feedback loops, toggle sprawl, or lack of visibility, build a streaming-first ClickHouse pipeline for feature analytics. Start with one flag, follow the patterns above, and measure the impact within days — not months. Want an architecture review or a template repo with DDL, Kafka configs, and dashboard panels to get you running? Contact our team to get a production-ready starter pack and best-practice checklist tailored to your scale and compliance requirements. Or use a template repo generator to scaffold automation for integration and dashboards.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.