Ingestion Architecture
This page covers the end-to-end path telemetry takes from raw files in object storage through to optimized, queryable Parquet segments.
Ingestion Flow
S3 / GCS / Azure Blob
raw prefixes:
otel-raw/logs/ metrics/ traces/
object-created notifications
PubSub Adapters
SQS / GCP / Azure / HTTP
process-{logs,metrics,traces}
reads raw objects → normalizes telemetry
writes Parquet segments → registers in lrdb
compacts small segments into larger ones
produces time-aggregated rollups (metrics)
reads & writes
S3 / GCS / Azure Blob
db/{org}/{collector}/
{date}/{dataset}/{hour}/
tbl_{segment_id}.parquet
PostgreSQL (lrdb)
segment metadata:
time bounds, org, instance, frequency
How it works
- Raw data lands in object storage. Collectors write OTel-format files to well-known prefixes (
otel-raw/logs/,otel-raw/metrics/,otel-raw/traces/). - Notifications trigger processing. Object-created events flow through a PubSub adapter (SQS, GCP Pub/Sub, Azure Event Grid, or a simple HTTP callback) to the appropriate signal processor.
process-{signal}handles everything. A single worker per signal type —process-logs,process-metrics,process-traces— performs all processing:- Ingestion — reads raw objects, normalizes telemetry, writes Parquet segments, and registers them in the segment index (lrdb).
- Compaction — merges small segments into larger ones to reduce file count and improve query efficiency.
- Rollups (metrics only) — produces time-aggregated versions of metric data at coarser granularities.
- Results are stored durably. Cooked Parquet segments are written to object storage and their metadata is tracked in PostgreSQL.
Key Design Points
- One process, three responsibilities. Ingestion, compaction, and rollups all run inside
process-{signal}, keeping the deployment simple and eliminating coordination overhead between separate stages. - Workers read and write directly from object storage. Raw telemetry data never passes through intermediate messaging — workers process it in place.
- Horizontally scalable. Add worker replicas to handle more volume.
- Object storage is immutable state. Workers only append new segments — they never modify existing ones. PostgreSQL tracks the mutable planning metadata.
- Compaction and rollups run continuously in the background to keep segment counts and query costs bounded.
Last updated on