OpenTelemetry Collectors

Lakerunner ingests telemetry from OpenTelemetry Collectors that write data to your S3-compatible object storage bucket. This guide covers the recommended three-tier collector architecture for monitoring Kubernetes clusters.

Architecture Overview

The collector stack uses three components, each with a distinct role:

Component	Deployment	Purpose
Agent	DaemonSet (one per node)	Receives OTLP from workloads, scrapes kubelet stats, enriches with Kubernetes attributes, forwards to gateway
Poller	Deployment (single replica)	Watches cluster-level Kubernetes objects (pods, nodes, deployments, HPAs) and emits cluster metrics
Gateway	Deployment (2+ replicas)	Aggregates data from agents and pollers, generates service graph metrics from traces, exports to S3


Workloads (OTLP)          External OTLP
    │                         │
    ▼                         ▼
┌──────────┐  ┌──────────┐  ┌──────────────┐
│  Agent   │  │  Poller  │  │              │
│ (per     │  │ (1x)     │──│   Gateway    │──► S3 Bucket
│  node)   │──│          │  │   (2x)       │
└──────────┘  └──────────┘  └──────────────┘

Agent

The agent runs as a DaemonSet on every node with hostNetwork: true. It:

Receives OTLP on ports 4317 (gRPC) and 4318 (HTTP) from workloads on the same node
Scrapes kubelet stats every 10 seconds for node, pod, container, and volume metrics
Enriches all telemetry with Kubernetes attributes (pod name, namespace, deployment, labels, etc.)
Sets service.name from the owning controller (deployment, daemonset, statefulset, cronjob, or job)
Converts cumulative metrics to delta before forwarding to the gateway
Forwards all data to the gateway over internal OTLP/HTTP (port 24318)

Agent Configuration


receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  kubeletstats:
    auth_type: serviceAccount
    collection_interval: 10s
    endpoint: "${env:HOST_IP}:10250"
    insecure_skip_verify: true
    node: "${env:K8S_NODE_NAME}"
    metric_groups: [node, pod, container, volume]
 
processors:
  k8sattributes:
    extract:
      labels:
        - from: pod
          key_regex: "(.*)"
          tag_name: "$1"
      metadata:
        - k8s.node.name
        - k8s.namespace.name
        - k8s.deployment.name
        - k8s.replicaset.name
        - k8s.daemonset.name
        - k8s.statefulset.name
        - k8s.cronjob.name
        - k8s.job.name
        - k8s.pod.name
        - k8s.pod.ip
        - k8s.container.name
        - container.id
        - container.image.name
        - container.image.tag
    filter:
      node_from_env_var: K8S_NODE_NAME
    pod_association:
      - sources: [{ from: connection }]
      - sources: [{ from: resource_attribute, name: k8s.pod.uid }]
      - sources: [{ from: resource_attribute, name: k8s.pod.ip }]
  resource/core:
    attributes:
      - { action: upsert, from_attribute: k8s.deployment.name, key: service.name }
      - { action: upsert, from_attribute: k8s.daemonset.name, key: service.name }
      - { action: upsert, from_attribute: k8s.statefulset.name, key: service.name }
      - { action: upsert, from_attribute: k8s.cronjob.name, key: service.name }
      - { action: upsert, from_attribute: k8s.job.name, key: service.name }
      - { action: upsert, key: k8s.cluster.name, value: "${env:K8S_CLUSTER_NAME}" }
  cumulativetodelta:
    max_staleness: 15m
  batch:
    send_batch_max_size: 30000
    send_batch_size: 10000
    timeout: 10s
 
exporters:
  otlphttp/upstream:
    endpoint: "http://collector-gateway-interproc:24318"
    tls:
      insecure: true
 
service:
  pipelines:
    logs:
      receivers: [otlp]
      processors: [k8sattributes, resource/core, batch]
      exporters: [otlphttp/upstream]
    metrics:
      receivers: [otlp, kubeletstats]
      processors: [k8sattributes, resource/core, cumulativetodelta, batch]
      exporters: [otlphttp/upstream]
    traces:
      receivers: [otlp]
      processors: [k8sattributes, resource/core, batch]
      exporters: [otlphttp/upstream]

Agent Resources

Resource	Request	Limit
CPU	1	1
Memory	500Mi	500Mi

Poller

The poller is a single-replica deployment that watches cluster-level Kubernetes objects and emits metrics about their state. It monitors:

Node conditions: Ready, MemoryPressure, DiskPressure, PIDPressure
Allocatable resources: CPU, memory, ephemeral-storage, storage
Object counts: Pods, deployments, daemonsets, statefulsets, jobs, HPAs, and more

Poller Configuration


receivers:
  k8s_cluster:
    auth_type: serviceAccount
    node_conditions_to_report: [Ready, MemoryPressure, DiskPressure, PIDPressure]
    allocatable_types_to_report: [cpu, memory, ephemeral-storage, storage]
 
processors:
  resource/core:
    attributes:
      - { action: upsert, key: k8s.cluster.name, value: "${env:K8S_CLUSTER_NAME}" }
  cumulativetodelta:
    max_staleness: 15m
  batch:
    send_batch_max_size: 30000
    send_batch_size: 10000
    timeout: 10s
 
exporters:
  otlphttp/upstream:
    endpoint: "http://collector-gateway-interproc:24318"
    tls:
      insecure: true
 
service:
  pipelines:
    metrics:
      receivers: [k8s_cluster]
      processors: [resource/core, cumulativetodelta, batch]
      exporters: [otlphttp/upstream]

Poller Resources

Resource	Request	Limit
CPU	1	1
Memory	500Mi	500Mi

Gateway

The gateway is the central aggregation point. It receives data from agents and pollers on an internal port, and can also accept external OTLP data directly. All data is exported to your S3 bucket.

Key features:

Service graph generation — Extracts span-derived metrics (call counts, latency) from traces, grouped by k8s.cluster.name and k8s.namespace.name
Load-balanced metric aggregation — External cumulative metrics are load-balanced across gateway pods by stream ID before delta conversion
S3 export — Writes all telemetry to S3 under otel-raw/{org_id}/{cluster_name}/

Gateway Configuration


connectors:
  servicegraph:
    dimensions: [k8s.cluster.name, k8s.namespace.name]
    metrics_flush_interval: 10s
    store:
      ttl: 10s
 
receivers:
  otlp/interproc:
    protocols:
      http:
        endpoint: 0.0.0.0:24318
  otlp/external:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
 
exporters:
  awss3:
    marshaler: otlp_proto
    s3uploader:
      compression: gzip
      endpoint: "${env:AWS_S3_ENDPOINT}"
      region: "${env:AWS_REGION}"
      role_arn: "${env:AWS_ROLE_ARN}"
      s3_bucket: "${env:AWS_S3_BUCKET}"
      s3_prefix: "otel-raw/${env:LAKERUNNER_ORGANIZATION_ID}/${env:K8S_CLUSTER_NAME}"
 
processors:
  cumulativetodelta:
    max_staleness: 15m
  batch:
    send_batch_max_size: 30000
    send_batch_size: 10000
    timeout: 10s
 
service:
  pipelines:
    logs:
      receivers: [otlp/interproc]
      processors: [batch]
      exporters: [awss3]
    metrics:
      receivers: [otlp/interproc]
      processors: [batch]
      exporters: [awss3]
    traces:
      receivers: [otlp/interproc]
      processors: [batch]
      exporters: [servicegraph, awss3]
    metrics/servicegraph:
      receivers: [servicegraph]
      processors: [cumulativetodelta, batch]
      exporters: [awss3]

Gateway Resources

Resource	Request	Limit
CPU	2	2
Memory	2Gi	2Gi

Deploying

The collector manifests are designed to be deployed with Kustomize . Use the wizard below to generate a kustomize overlay that sets all required environment variables and credentials in one place.

Deployment Wizard

🔒 Privacy First: This wizard runs entirely in your browser. No data is sent to any server.

Identity

The organization ID must match the UUID you set when installing Lakerunner (the Organization ID field in the installation wizard). The gateway writes telemetry to otel-raw/{organizationId}/{clusterName}/.

Organization ID *

Use the same UUID as your Lakerunner install, or generate a new one.

Cluster Name *Applied to agent, poller, and gateway. Used as the S3 prefix segment.

Namespace

S3 Destination

S3 Bucket *

Region *

EndpointOnly set for S3-compatible stores (MinIO, R2, GCS-in-S3-mode, etc.). AWS S3 uses the default for the region.

Credentials

Access Key ID *

Secret Access Key *

Generated Overlay

Please complete all required fields (marked with *) above to generate your overlay.

Reference: Required Configuration

If you prefer to edit the base manifests directly, the following table lists every REPLACE_ME placeholder:

Variable	Manifest(s)	Description
`K8S_CLUSTER_NAME`	agent/daemonset.yaml, poller/deployment.yaml, gateway/deployment.yaml	Cluster identifier. Must match across all three components — it’s stamped as a resource attribute by the agent and poller, and used as the S3 prefix segment by the gateway.
`LAKERUNNER_ORGANIZATION_ID`	gateway/deployment.yaml	The organization UUID provisioned during Lakerunner installation. This is the same value generated by the `Organization ID` field in the Installation Guide wizard. Telemetry is written to `otel-raw/{LAKERUNNER_ORGANIZATION_ID}/{K8S_CLUSTER_NAME}/`, so the value here must match the org ID configured in your Lakerunner deployment.
`AWS_REGION`	gateway/deployment.yaml	Region of the S3 bucket.
`AWS_S3_BUCKET`	gateway/deployment.yaml	Target bucket for telemetry.
`AWS_S3_ENDPOINT`	gateway/deployment.yaml	S3 endpoint URL. Leave blank for AWS S3; set to your endpoint (e.g. `https://s3.us-west-2.amazonaws.com` or your MinIO/GCS/R2 URL) for S3-compatible stores.
`AWS_ROLE_ARN`	gateway/deployment.yaml	Optional. Set when using IRSA or cross-account role assumption; otherwise leave as the default empty string.

AWS credentials for static-key mode live in the aws-credentials secret in gateway/secrets.yaml (keys: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY). With IRSA, leave the secret blank and set AWS_ROLE_ARN on the gateway deployment.


kubectl create namespace collector
kubectl apply -k base-collector-manifests/

Using Grafana Alloy

If you already run Grafana Alloy in your cluster, Alloy can write directly to the same S3 bucket and object layout that Lakerunner ingests. This lets you skip the Cardinal gateway entirely and use Alloy as your sole collector.

How it works

Lakerunner’s S3 ingest is format-driven, not source-driven. Any process that writes OTLP-protobuf files to the right prefix, with the right filename shape, gets picked up:

Prefix: s3://<bucket>/otel-raw/<organization-uuid>/<cluster-name>/...
Filename: must begin with logs_, metrics_, or traces_ (used for signal routing).
Format: OTLP protobuf (.binpb), gzip-compressed (.binpb.gz). JSON is not accepted.

The upstream otelcol.exporter.awss3 component produces this layout by default when configured with marshaler = "otlp_proto" and compression = "gzip" — the logs_ / metrics_ / traces_ filename prefix is emitted automatically per signal. One shared exporter instance handles all three.

Tradeoffs

	Cardinal gateway	Alloy direct-to-S3
Infrastructure	Dedicated gateway deployment + agent/poller	Single Alloy fleet
Logs	✓	✓
Metrics	✓	✓ (cumulative-to-delta should be done in Alloy)
Traces	✓	✓ (raw span storage only)
Span-derived RED metrics (service graph)	✓	✗ — not produced
Operational surface	Cardinal-maintained build	Your existing Alloy stack

If you need span-derived service graph metrics, stay with the gateway. For logs, metrics, and raw trace storage, Alloy direct-to-S3 has the same ingest semantics as the gateway’s interproc pipeline.

Example Alloy configuration

The snippet below receives OTLP from workloads, stamps the cluster name as a resource attribute, enriches with Kubernetes metadata, converts cumulative metrics to delta, and writes to S3 in the Lakerunner-compatible layout. Substitute your own values for the four environment variables.

Do not skip the otelcol.processor.batch stage, and do not shrink its timeout or send_batch_size below the values shown. The awss3 exporter issues one S3 PUT per batch it receives. With the OTel batch defaults (timeout: 200ms), you would produce thousands of small PUTs per minute per Alloy instance per signal — which is both expensive on S3 request pricing and likely to trigger throttling (SlowDown responses) on high-volume buckets. The 10s / 10k / 30k values below match the Cardinal gateway and are the minimum recommended. See Notes and caveats on replica-count amplification.


// Required environment variables (set via the Alloy pod spec):
//   LAKERUNNER_ORGANIZATION_ID  — the same UUID as your Lakerunner install
//   K8S_CLUSTER_NAME            — cluster identifier
//   AWS_REGION                  — S3 bucket region
//   AWS_S3_BUCKET               — target bucket

otelcol.receiver.otlp "default" {
  grpc { endpoint = "0.0.0.0:4317" }
  http { endpoint = "0.0.0.0:4318" }

  output {
    logs    = [otelcol.processor.k8sattributes.default.input]
    metrics = [otelcol.processor.k8sattributes.default.input]
    traces  = [otelcol.processor.k8sattributes.default.input]
  }
}

otelcol.processor.k8sattributes "default" {
  extract {
    metadata = [
      "k8s.namespace.name", "k8s.pod.name", "k8s.pod.uid",
      "k8s.deployment.name", "k8s.daemonset.name", "k8s.statefulset.name",
      "k8s.node.name", "k8s.container.name",
    ]
  }

  output {
    logs    = [otelcol.processor.transform.cluster.input]
    metrics = [otelcol.processor.transform.cluster.input]
    traces  = [otelcol.processor.transform.cluster.input]
  }
}

otelcol.processor.transform "cluster" {
  error_mode = "ignore"
  metric_statements {
    context    = "resource"
    statements = [`set(attributes["k8s.cluster.name"], "${K8S_CLUSTER_NAME}")`]
  }
  log_statements {
    context    = "resource"
    statements = [`set(attributes["k8s.cluster.name"], "${K8S_CLUSTER_NAME}")`]
  }
  trace_statements {
    context    = "resource"
    statements = [`set(attributes["k8s.cluster.name"], "${K8S_CLUSTER_NAME}")`]
  }

  output {
    logs    = [otelcol.processor.batch.default.input]
    metrics = [otelcol.processor.cumulativetodelta.default.input]
    traces  = [otelcol.processor.batch.default.input]
  }
}

otelcol.processor.cumulativetodelta "default" {
  max_staleness = "15m"

  output {
    metrics = [otelcol.processor.batch.default.input]
  }
}

otelcol.processor.batch "default" {
  send_batch_size     = 10000
  send_batch_max_size = 30000
  timeout             = "10s"

  output {
    logs    = [otelcol.exporter.awss3.lakerunner.input]
    metrics = [otelcol.exporter.awss3.lakerunner.input]
    traces  = [otelcol.exporter.awss3.lakerunner.input]
  }
}

otelcol.exporter.awss3 "lakerunner" {
  marshaler {
    type = "otlp_proto"
  }

  s3_uploader {
    region              = sys.env("AWS_REGION")
    s3_bucket           = sys.env("AWS_S3_BUCKET")
    s3_prefix           = "otel-raw/" + sys.env("LAKERUNNER_ORGANIZATION_ID") + "/" + sys.env("K8S_CLUSTER_NAME")
    s3_force_path_style = true
    compression         = "gzip"
  }
}

Objects land at keys like otel-raw/<org>/<cluster>/year=.../minute=.../logs_<rand>.binpb.gz (and metrics_, traces_), which matches what Lakerunner’s pubsub handler expects.

Notes and caveats

Batching is load-bearing, not optional. Every batch handed to the awss3 exporter becomes a single PutObject request. The snippet batches every 10 seconds or every 10k records (whichever comes first), which caps PUT rate at roughly 6/min/signal per Alloy instance. If you remove the batch processor, shrink the timeout, or otherwise send many small batches, you will generate thousands of tiny S3 PUTs — which S3 prices per request and which will trigger SlowDown throttling on busy accounts. Keep the 10s / 10k / 30k values or larger.
Replica count multiplies PUT volume linearly. A 200-node Alloy DaemonSet at the defaults above produces ~3600 PUTs/minute across the three signals — tolerable but not free. For high-node-count clusters, prefer a two-tier topology: a thin Alloy DaemonSet that forwards OTLP to a small central Alloy Deployment (2–4 replicas) which does the batching and S3 export. This keeps per-instance batches full and the PUT rate proportional to traffic, not node count.
Cumulative-to-delta is essential for external Prometheus-style counters. Lakerunner stores temporality as-is, so cumulative counters without conversion will produce confusing rate() results across collector restarts. The otelcol.processor.cumulativetodelta step in the snippet above handles this — don’t remove it.
Kubernetes attributes: the k8sattributes processor needs cluster-scoped RBAC to list pods. Grant it a ServiceAccount with get/list/watch on pods, namespaces, and the replicaset/deployment hierarchy.
Credentials: the awss3 exporter picks up AWS credentials through the default SDK chain, so IRSA, EKS Pod Identity, or AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY env vars all work without additional exporter config.
S3-compatible stores (MinIO, R2, etc.): set endpoint = "https://..." in the s3_uploader block. AWS S3 uses the region default.

Reach out to support@cardinalhq.io for support or to ask questions not answered in our documentation.