Skip to Content
LakerunnerOpenTelemetry Collectors

OpenTelemetry Collectors

Lakerunner ingests telemetry from OpenTelemetry Collectors  that write data to your S3-compatible object storage bucket. This guide covers the recommended three-tier collector architecture for monitoring Kubernetes clusters.

Architecture Overview

The collector stack uses three components, each with a distinct role:

ComponentDeploymentPurpose
AgentDaemonSet (one per node)Receives OTLP from workloads, scrapes kubelet stats, enriches with Kubernetes attributes, forwards to gateway
PollerDeployment (single replica)Watches cluster-level Kubernetes objects (pods, nodes, deployments, HPAs) and emits cluster metrics
GatewayDeployment (2+ replicas)Aggregates data from agents and pollers, generates service graph metrics from traces, exports to S3
Workloads (OTLP) External OTLP │ │ ▼ ▼ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │ Agent │ │ Poller │ │ │ │ (per │ │ (1x) │──│ Gateway │──► S3 Bucket │ node) │──│ │ │ (2x) │ └──────────┘ └──────────┘ └──────────────┘

Agent

The agent runs as a DaemonSet on every node with hostNetwork: true. It:

  • Receives OTLP on ports 4317 (gRPC) and 4318 (HTTP) from workloads on the same node
  • Scrapes kubelet stats every 10 seconds for node, pod, container, and volume metrics
  • Enriches all telemetry with Kubernetes attributes (pod name, namespace, deployment, labels, etc.)
  • Sets service.name from the owning controller (deployment, daemonset, statefulset, cronjob, or job)
  • Converts cumulative metrics to delta before forwarding to the gateway
  • Forwards all data to the gateway over internal OTLP/HTTP (port 24318)

Agent Configuration

receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 kubeletstats: auth_type: serviceAccount collection_interval: 10s endpoint: "${env:HOST_IP}:10250" insecure_skip_verify: true node: "${env:K8S_NODE_NAME}" metric_groups: [node, pod, container, volume] processors: k8sattributes: extract: labels: - from: pod key_regex: "(.*)" tag_name: "$1" metadata: - k8s.node.name - k8s.namespace.name - k8s.deployment.name - k8s.replicaset.name - k8s.daemonset.name - k8s.statefulset.name - k8s.cronjob.name - k8s.job.name - k8s.pod.name - k8s.pod.ip - k8s.container.name - container.id - container.image.name - container.image.tag filter: node_from_env_var: K8S_NODE_NAME pod_association: - sources: [{ from: connection }] - sources: [{ from: resource_attribute, name: k8s.pod.uid }] - sources: [{ from: resource_attribute, name: k8s.pod.ip }] resource/core: attributes: - { action: upsert, from_attribute: k8s.deployment.name, key: service.name } - { action: upsert, from_attribute: k8s.daemonset.name, key: service.name } - { action: upsert, from_attribute: k8s.statefulset.name, key: service.name } - { action: upsert, from_attribute: k8s.cronjob.name, key: service.name } - { action: upsert, from_attribute: k8s.job.name, key: service.name } - { action: upsert, key: k8s.cluster.name, value: "${env:K8S_CLUSTER_NAME}" } cumulativetodelta: max_staleness: 15m batch: send_batch_max_size: 30000 send_batch_size: 10000 timeout: 10s exporters: otlphttp/upstream: endpoint: "http://collector-gateway-interproc:24318" tls: insecure: true service: pipelines: logs: receivers: [otlp] processors: [k8sattributes, resource/core, batch] exporters: [otlphttp/upstream] metrics: receivers: [otlp, kubeletstats] processors: [k8sattributes, resource/core, cumulativetodelta, batch] exporters: [otlphttp/upstream] traces: receivers: [otlp] processors: [k8sattributes, resource/core, batch] exporters: [otlphttp/upstream]

Agent Resources

ResourceRequestLimit
CPU11
Memory500Mi500Mi

Poller

The poller is a single-replica deployment that watches cluster-level Kubernetes objects and emits metrics about their state. It monitors:

  • Node conditions: Ready, MemoryPressure, DiskPressure, PIDPressure
  • Allocatable resources: CPU, memory, ephemeral-storage, storage
  • Object counts: Pods, deployments, daemonsets, statefulsets, jobs, HPAs, and more

Poller Configuration

receivers: k8s_cluster: auth_type: serviceAccount node_conditions_to_report: [Ready, MemoryPressure, DiskPressure, PIDPressure] allocatable_types_to_report: [cpu, memory, ephemeral-storage, storage] processors: resource/core: attributes: - { action: upsert, key: k8s.cluster.name, value: "${env:K8S_CLUSTER_NAME}" } cumulativetodelta: max_staleness: 15m batch: send_batch_max_size: 30000 send_batch_size: 10000 timeout: 10s exporters: otlphttp/upstream: endpoint: "http://collector-gateway-interproc:24318" tls: insecure: true service: pipelines: metrics: receivers: [k8s_cluster] processors: [resource/core, cumulativetodelta, batch] exporters: [otlphttp/upstream]

Poller Resources

ResourceRequestLimit
CPU11
Memory500Mi500Mi

Gateway

The gateway is the central aggregation point. It receives data from agents and pollers on an internal port, and can also accept external OTLP data directly. All data is exported to your S3 bucket.

Key features:

  • Service graph generation — Extracts span-derived metrics (call counts, latency) from traces, grouped by k8s.cluster.name and k8s.namespace.name
  • Load-balanced metric aggregation — External cumulative metrics are load-balanced across gateway pods by stream ID before delta conversion
  • S3 export — Writes all telemetry to S3 under otel-raw/{org_id}/{cluster_name}/

Gateway Configuration

connectors: servicegraph: dimensions: [k8s.cluster.name, k8s.namespace.name] metrics_flush_interval: 10s store: ttl: 10s receivers: otlp/interproc: protocols: http: endpoint: 0.0.0.0:24318 otlp/external: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 exporters: awss3: marshaler: otlp_proto s3uploader: compression: gzip endpoint: "${env:AWS_S3_ENDPOINT}" region: "${env:AWS_REGION}" role_arn: "${env:AWS_ROLE_ARN}" s3_bucket: "${env:AWS_S3_BUCKET}" s3_prefix: "otel-raw/${env:LAKERUNNER_ORGANIZATION_ID}/${env:K8S_CLUSTER_NAME}" processors: cumulativetodelta: max_staleness: 15m batch: send_batch_max_size: 30000 send_batch_size: 10000 timeout: 10s service: pipelines: logs: receivers: [otlp/interproc] processors: [batch] exporters: [awss3] metrics: receivers: [otlp/interproc] processors: [batch] exporters: [awss3] traces: receivers: [otlp/interproc] processors: [batch] exporters: [servicegraph, awss3] metrics/servicegraph: receivers: [servicegraph] processors: [cumulativetodelta, batch] exporters: [awss3]

Gateway Resources

ResourceRequestLimit
CPU22
Memory2Gi2Gi

Deploying

The collector manifests are designed to be deployed with Kustomize . Use the wizard below to generate a kustomize overlay that sets all required environment variables and credentials in one place.

Deployment Wizard

🔒 Privacy First: This wizard runs entirely in your browser. No data is sent to any server.

Identity

The organization ID must match the UUID you set when installing Lakerunner (the Organization ID field in the installation wizard). The gateway writes telemetry to otel-raw/{organizationId}/{clusterName}/.

Use the same UUID as your Lakerunner install, or generate a new one.
Applied to agent, poller, and gateway. Used as the S3 prefix segment.

S3 Destination

Only set for S3-compatible stores (MinIO, R2, GCS-in-S3-mode, etc.). AWS S3 uses the default for the region.

Credentials

Generated Overlay

Please complete all required fields (marked with *) above to generate your overlay.

Reference: Required Configuration

If you prefer to edit the base manifests directly, the following table lists every REPLACE_ME placeholder:

VariableManifest(s)Description
K8S_CLUSTER_NAMEagent/daemonset.yaml, poller/deployment.yaml, gateway/deployment.yamlCluster identifier. Must match across all three components — it’s stamped as a resource attribute by the agent and poller, and used as the S3 prefix segment by the gateway.
LAKERUNNER_ORGANIZATION_IDgateway/deployment.yamlThe organization UUID provisioned during Lakerunner installation. This is the same value generated by the Organization ID field in the Installation Guide wizard. Telemetry is written to otel-raw/{LAKERUNNER_ORGANIZATION_ID}/{K8S_CLUSTER_NAME}/, so the value here must match the org ID configured in your Lakerunner deployment.
AWS_REGIONgateway/deployment.yamlRegion of the S3 bucket.
AWS_S3_BUCKETgateway/deployment.yamlTarget bucket for telemetry.
AWS_S3_ENDPOINTgateway/deployment.yamlS3 endpoint URL. Leave blank for AWS S3; set to your endpoint (e.g. https://s3.us-west-2.amazonaws.com or your MinIO/GCS/R2 URL) for S3-compatible stores.
AWS_ROLE_ARNgateway/deployment.yamlOptional. Set when using IRSA or cross-account role assumption; otherwise leave as the default empty string.

AWS credentials for static-key mode live in the aws-credentials secret in gateway/secrets.yaml (keys: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY). With IRSA, leave the secret blank and set AWS_ROLE_ARN on the gateway deployment.

kubectl create namespace collector kubectl apply -k base-collector-manifests/

Using Grafana Alloy

If you already run Grafana Alloy  in your cluster, Alloy can write directly to the same S3 bucket and object layout that Lakerunner ingests. This lets you skip the Cardinal gateway entirely and use Alloy as your sole collector.

How it works

Lakerunner’s S3 ingest is format-driven, not source-driven. Any process that writes OTLP-protobuf files to the right prefix, with the right filename shape, gets picked up:

  • Prefix: s3://<bucket>/otel-raw/<organization-uuid>/<cluster-name>/...
  • Filename: must begin with logs_, metrics_, or traces_ (used for signal routing).
  • Format: OTLP protobuf (.binpb), gzip-compressed (.binpb.gz). JSON is not accepted.

The upstream otelcol.exporter.awss3 component produces this layout by default when configured with marshaler = "otlp_proto" and compression = "gzip" — the logs_ / metrics_ / traces_ filename prefix is emitted automatically per signal. One shared exporter instance handles all three.

Tradeoffs

Cardinal gatewayAlloy direct-to-S3
InfrastructureDedicated gateway deployment + agent/pollerSingle Alloy fleet
Logs
Metrics✓ (cumulative-to-delta should be done in Alloy)
Traces✓ (raw span storage only)
Span-derived RED metrics (service graph)✗ — not produced
Operational surfaceCardinal-maintained buildYour existing Alloy stack

If you need span-derived service graph metrics, stay with the gateway. For logs, metrics, and raw trace storage, Alloy direct-to-S3 has the same ingest semantics as the gateway’s interproc pipeline.

Example Alloy configuration

The snippet below receives OTLP from workloads, stamps the cluster name as a resource attribute, enriches with Kubernetes metadata, converts cumulative metrics to delta, and writes to S3 in the Lakerunner-compatible layout. Substitute your own values for the four environment variables.

Do not skip the otelcol.processor.batch stage, and do not shrink its timeout or send_batch_size below the values shown. The awss3 exporter issues one S3 PUT per batch it receives. With the OTel batch defaults (timeout: 200ms), you would produce thousands of small PUTs per minute per Alloy instance per signal — which is both expensive on S3 request pricing and likely to trigger throttling (SlowDown responses) on high-volume buckets. The 10s / 10k / 30k values below match the Cardinal gateway and are the minimum recommended. See Notes and caveats on replica-count amplification.

// Required environment variables (set via the Alloy pod spec): // LAKERUNNER_ORGANIZATION_ID — the same UUID as your Lakerunner install // K8S_CLUSTER_NAME — cluster identifier // AWS_REGION — S3 bucket region // AWS_S3_BUCKET — target bucket otelcol.receiver.otlp "default" { grpc { endpoint = "0.0.0.0:4317" } http { endpoint = "0.0.0.0:4318" } output { logs = [otelcol.processor.k8sattributes.default.input] metrics = [otelcol.processor.k8sattributes.default.input] traces = [otelcol.processor.k8sattributes.default.input] } } otelcol.processor.k8sattributes "default" { extract { metadata = [ "k8s.namespace.name", "k8s.pod.name", "k8s.pod.uid", "k8s.deployment.name", "k8s.daemonset.name", "k8s.statefulset.name", "k8s.node.name", "k8s.container.name", ] } output { logs = [otelcol.processor.transform.cluster.input] metrics = [otelcol.processor.transform.cluster.input] traces = [otelcol.processor.transform.cluster.input] } } otelcol.processor.transform "cluster" { error_mode = "ignore" metric_statements { context = "resource" statements = [`set(attributes["k8s.cluster.name"], "${K8S_CLUSTER_NAME}")`] } log_statements { context = "resource" statements = [`set(attributes["k8s.cluster.name"], "${K8S_CLUSTER_NAME}")`] } trace_statements { context = "resource" statements = [`set(attributes["k8s.cluster.name"], "${K8S_CLUSTER_NAME}")`] } output { logs = [otelcol.processor.batch.default.input] metrics = [otelcol.processor.cumulativetodelta.default.input] traces = [otelcol.processor.batch.default.input] } } otelcol.processor.cumulativetodelta "default" { max_staleness = "15m" output { metrics = [otelcol.processor.batch.default.input] } } otelcol.processor.batch "default" { send_batch_size = 10000 send_batch_max_size = 30000 timeout = "10s" output { logs = [otelcol.exporter.awss3.lakerunner.input] metrics = [otelcol.exporter.awss3.lakerunner.input] traces = [otelcol.exporter.awss3.lakerunner.input] } } otelcol.exporter.awss3 "lakerunner" { marshaler { type = "otlp_proto" } s3_uploader { region = sys.env("AWS_REGION") s3_bucket = sys.env("AWS_S3_BUCKET") s3_prefix = "otel-raw/" + sys.env("LAKERUNNER_ORGANIZATION_ID") + "/" + sys.env("K8S_CLUSTER_NAME") s3_force_path_style = true compression = "gzip" } }

Objects land at keys like otel-raw/<org>/<cluster>/year=.../minute=.../logs_<rand>.binpb.gz (and metrics_, traces_), which matches what Lakerunner’s pubsub handler expects.

Notes and caveats

  • Batching is load-bearing, not optional. Every batch handed to the awss3 exporter becomes a single PutObject request. The snippet batches every 10 seconds or every 10k records (whichever comes first), which caps PUT rate at roughly 6/min/signal per Alloy instance. If you remove the batch processor, shrink the timeout, or otherwise send many small batches, you will generate thousands of tiny S3 PUTs — which S3 prices per request and which will trigger SlowDown throttling on busy accounts. Keep the 10s / 10k / 30k values or larger.
  • Replica count multiplies PUT volume linearly. A 200-node Alloy DaemonSet at the defaults above produces ~3600 PUTs/minute across the three signals — tolerable but not free. For high-node-count clusters, prefer a two-tier topology: a thin Alloy DaemonSet that forwards OTLP to a small central Alloy Deployment (2–4 replicas) which does the batching and S3 export. This keeps per-instance batches full and the PUT rate proportional to traffic, not node count.
  • Cumulative-to-delta is essential for external Prometheus-style counters. Lakerunner stores temporality as-is, so cumulative counters without conversion will produce confusing rate() results across collector restarts. The otelcol.processor.cumulativetodelta step in the snippet above handles this — don’t remove it.
  • Kubernetes attributes: the k8sattributes processor needs cluster-scoped RBAC to list pods. Grant it a ServiceAccount with get/list/watch on pods, namespaces, and the replicaset/deployment hierarchy.
  • Credentials: the awss3 exporter picks up AWS credentials through the default SDK chain, so IRSA, EKS Pod Identity, or AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY env vars all work without additional exporter config.
  • S3-compatible stores (MinIO, R2, etc.): set endpoint = "https://..." in the s3_uploader block. AWS S3 uses the region default.

Reach out to support@cardinalhq.io for support or to ask questions not answered in our documentation.

Last updated on