Ingestion Architecture

This page covers the end-to-end path telemetry takes from raw files in object storage through to optimized, queryable Parquet segments.

Ingestion Flow

S3 / GCS / Azure Blob

raw prefixes:

otel-raw/logs/ metrics/ traces/

object-created notifications

PubSub Adapters

SQS / GCP / Azure / HTTP

process-{logs,metrics,traces}

reads raw objects → normalizes telemetry

writes Parquet segments → registers in lrdb

compacts small segments into larger ones

produces time-aggregated rollups (metrics)

reads & writes

S3 / GCS / Azure Blob

db/{org}/{collector}/

{date}/{dataset}/{hour}/

tbl_{segment_id}.parquet

PostgreSQL (lrdb)

segment metadata:

time bounds, org, instance, frequency

Raw data lands in object storage. Collectors write OTel-format files to well-known prefixes (otel-raw/logs/, otel-raw/metrics/, otel-raw/traces/).
Notifications trigger processing. Object-created events flow through a PubSub adapter (SQS, GCP Pub/Sub, Azure Event Grid, or a simple HTTP callback) to the appropriate signal processor.
process-{signal} handles everything. A single worker per signal type — process-logs, process-metrics, process-traces — performs all processing:
- Ingestion — reads raw objects, normalizes telemetry, writes Parquet segments, and registers them in the segment index (lrdb).
- Compaction — merges small segments into larger ones to reduce file count and improve query efficiency.
- Rollups (metrics only) — produces time-aggregated versions of metric data at coarser granularities.
Results are stored durably. Cooked Parquet segments are written to object storage and their metadata is tracked in PostgreSQL.

One process, three responsibilities. Ingestion, compaction, and rollups all run inside process-{signal}, keeping the deployment simple and eliminating coordination overhead between separate stages.
Workers read and write directly from object storage. Raw telemetry data never passes through intermediate messaging — workers process it in place.
Horizontally scalable. Add worker replicas to handle more volume.
Object storage is immutable state. Workers only append new segments — they never modify existing ones. PostgreSQL tracks the mutable planning metadata.
Compaction and rollups run continuously in the background to keep segment counts and query costs bounded.