Skip to Content
How-To GuidesOpenShift Prometheus Metrics

Polling OpenShift Prometheus Metrics from Outside the Cluster

This guide configures an external OpenTelemetry Collector to poll OpenShift’s in-cluster prometheus-k8s instance through the /federate endpoint. Use it when your collector runs in a central Kubernetes cluster, on a VM, or anywhere else that can reach the OpenShift apps domain over HTTPS.

The example starts with the debug exporter so you can validate the scrape path first. Replace that exporter with S3, OTLP, Prometheus remote write, or another destination after the pull works.

Prerequisites

OpenShift admin accessPermission to create a ServiceAccount, ClusterRoleBinding, and token Secret in openshift-monitoring.
External OTel CollectorAn otelcol-contrib build or vendor distribution with the prometheusreceiver and your destination exporter.
Route reachabilityHTTPS access from the collector to the OpenShift prometheus-k8s-federate Route.

Installation

1

Create a federation reader in OpenShift

OpenShift protects /federate with oauth-proxy. Create a ServiceAccount with the cluster-monitoring-view ClusterRole and a token Secret for the external collector.

# ocp-federate-reader.yaml apiVersion: v1 kind: ServiceAccount metadata: name: prometheus-federate-reader namespace: openshift-monitoring --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus-federate-reader roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-monitoring-view subjects: - kind: ServiceAccount name: prometheus-federate-reader namespace: openshift-monitoring --- apiVersion: v1 kind: Secret metadata: name: prometheus-federate-reader-token namespace: openshift-monitoring annotations: kubernetes.io/service-account.name: prometheus-federate-reader type: kubernetes.io/service-account-token
oc apply -f ocp-federate-reader.yaml
2

Extract and test the token

Extract the token to a local file:

oc -n openshift-monitoring \ extract secret/prometheus-federate-reader-token \ --keys=token \ --to=- > ocp-federate.token

Validate the token against the federate Route. The Route normally follows this pattern — substitute your cluster’s apps base domain:

https://prometheus-k8s-federate-openshift-monitoring.apps.<cluster-base-domain>/federate
TOKEN=$(cat ocp-federate.token) URL="https://prometheus-k8s-federate-openshift-monitoring.apps.<cluster-base-domain>/federate" curl -skG \ -H "Authorization: Bearer $TOKEN" \ --data-urlencode 'match[]={__name__="up"}' \ "$URL" | head

You should see up{...} 1 <timestamp> lines.

HTTP statusMeaning
401No bearer token, malformed token, or the token Secret is not populated yet.
403The token reached oauth-proxy, but the ServiceAccount does not have the expected RBAC.
200 with no samplesThe match[] selector matched nothing. Retry with match[]={__name__="up"}.
3

Make the token available to the collector

The collector config below expects the token at /etc/ocp/token.

For a collector running as a Kubernetes Deployment outside the OpenShift cluster, create a Secret from the extracted token:

kubectl -n <collector-namespace> create secret generic ocp-federate-token \ --from-file=token=./ocp-federate.token

Mount it into the collector pod:

# Excerpt from the collector Deployment spec volumeMounts: - name: ocp-federate-token mountPath: /etc/ocp readOnly: true volumes: - name: ocp-federate-token secret: secretName: ocp-federate-token defaultMode: 0400

For a collector on a VM or bare host, place the same token file at /etc/ocp/token with permissions restricted to the collector process.

4

Add the Prometheus receiver scrape job

Add a prometheusreceiver scrape job that polls /federate, sends the ServiceAccount token, and passes selected metrics into the metrics pipeline.

receivers: prometheus/openshift: config: scrape_configs: - job_name: openshift-federate scheme: https metrics_path: /federate scrape_interval: 30s scrape_timeout: 25s honor_labels: true honor_timestamps: true tls_config: insecure_skip_verify: true authorization: type: Bearer credentials_file: /etc/ocp/token params: "match[]": - '{__name__="up"}' - '{job="node-exporter"}' - '{job="kube-state-metrics"}' - '{__name__=~"cluster_version.*|cluster_operator.*"}' static_configs: - targets: - prometheus-k8s-federate-openshift-monitoring.apps.<cluster-base-domain>:443 processors: resource/openshift: attributes: - { action: insert, key: k8s.cluster.name, value: "<your-ocp-cluster-name>" } batch: send_batch_size: 10000 send_batch_max_size: 30000 timeout: 10s exporters: debug: verbosity: basic service: pipelines: metrics/openshift: receivers: [prometheus/openshift] processors: [resource/openshift, batch] exporters: [debug]

Use insecure_skip_verify: true only if you do not have the OpenShift ingress CA available to the collector. For a stricter configuration, mount the CA bundle and replace it with ca_file.

Replace debug with your destination exporter when the scrape is working:

ExporterUse when
awss3Writing OTLP-proto metrics to S3 for Lakerunner.
otlphttp or otlpSending to a vendor endpoint, central gateway, or another collector.
prometheusremotewriteForwarding to a Prometheus-compatible backend such as Mimir, Cortex, or VictoriaMetrics.
debugChecking that samples reach the collector.

If your destination expects delta-temporality counters, add the cumulativetodelta processor before batch. Do not use that processor for prometheusremotewrite.

5

Roll out and confirm

Restart the collector so it picks up the token mount and config:

kubectl -n <collector-namespace> rollout restart deployment <collector-deployment>

Check the collector logs:

kubectl -n <collector-namespace> logs deploy/<collector-deployment> | \ grep -iE 'prometheus|openshift-federate|exporter|error|fail'

Normal startup includes the Prometheus receiver starting its discovery and scrape managers. There should be no 401, 403, TLS, timeout, or exporter errors.

Choose Metric Selectors

The match[] selectors decide what the external collector pulls. Start with a small set, confirm it downstream, then add the metric families you need.

"match[]": # Connectivity check - '{__name__="up"}' # OpenShift operator health - '{__name__=~"cluster_version.*|cluster_operator.*"}' # Node and Kubernetes object state - '{job="node-exporter"}' - '{job="kube-state-metrics"}' # API server SLI metrics - '{__name__=~"apiserver_request_(total|duration_seconds_(bucket|count|sum))"}' # Workload CPU and memory - '{__name__=~"container_cpu_usage_seconds_total|container_memory_working_set_bytes"}'

Kubelet/cAdvisor metrics and API server histograms are usually the largest selectors. Add them intentionally, and avoid broad selectors such as {__name__=~".+"} unless you have sized the destination for the full cluster.

To estimate the metric-name breadth of a selector before adding it:

curl -skG \ -H "Authorization: Bearer $TOKEN" \ --data-urlencode 'match[]=<your-selector>' \ "$URL" | awk '/^[a-zA-Z_:]/{sub(/[{ ].*/,""); print}' | sort -u | wc -l

Polling Interval and Replicas

OpenShift cluster-monitoring commonly scrapes these jobs every 30s. Set the federate scrape interval to the same cadence unless your OpenShift monitoring configuration uses a different interval.

Polling faster usually repeats samples with the same upstream timestamp. Polling slower skips samples. Keep scrape_timeout lower than scrape_interval, such as 25s for a 30s interval.

Run one collector replica for this federate scrape job. Multiple replicas polling the same match[] selectors emit duplicate samples with the same timestamps. If the collector also handles scalable OTLP ingest or other traffic, put this federation job in its own single-replica deployment.

Troubleshooting

SymptomLikely cause
401 from /federateToken missing, malformed, or not populated yet. Re-run the token extract command and confirm the file is non-empty.
403 from /federateThe ServiceAccount is missing the cluster-monitoring-view binding.
Empty 200 OKThe selector matched no series. Test with match[]={__name__="up"}.
tls: certificate signed by unknown authorityThe collector does not trust the OpenShift apps-domain certificate. Mount the ingress CA as ca_file, or use insecure_skip_verify while testing.
context deadline exceededThe scrape took longer than scrape_timeout. Narrow match[], or raise both timeout and interval.
Collector memory climbs during scrapeThe selector set is too broad for the collector limit. Start by removing cAdvisor container_* metrics or API histogram buckets.
Duplicate downstream samplesMore than one collector replica is polling the same selector set.
Samples appear timestamped in the pastExpected. Federation preserves the upstream Prometheus sample timestamp.

Reach out to support@cardinalhq.io for support or to ask questions not answered in our documentation.

Last updated on