Managed Service for Prometheus (MSP) can collect GKE metrics, but it’s surprisingly easy to misconfigure and miss crucial data.

Let’s see it in action. Imagine you have a simple frontend deployment in GKE:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
  labels:
    app: frontend
spec:
  replicas: 2
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

To collect metrics from this, you’d typically set up a PodMonitoring resource. This tells MSP what to scrape and where to send it. Here’s a basic example:

apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
  name: frontend-pm
  labels:
    app: frontend # Label selector for the PodMonitoring itself
spec:
  selector:
    matchLabels:
      app: frontend # Selector to find the pods to scrape
  endpoints:
  - port: web # Name of the port defined in the Pod spec
    interval: 30s
    path: /metrics # The path where metrics are exposed (default for many apps)

When you apply this, MSP’s agents (running as gmp-agent in your GKE cluster) will discover pods matching app: frontend. They’ll then attempt to scrape metrics from /metrics on port web (which maps to container port 80 in our frontend deployment) every 30 seconds, sending the data to your Google Cloud project. You can then query this data in Cloud Monitoring or Grafana.

The core problem MSP solves is providing a fully managed, scalable Prometheus-compatible metrics collection service within Google Cloud, eliminating the need to manage your own Prometheus servers, Thanos, or Cortex. It integrates tightly with GKE, automatically discovering and scraping metrics from your cluster’s workloads based on Kubernetes labels and annotations. You define what to collect using PodMonitoring or ClusterPodMonitoring resources, and MSP handles the rest: scraping, storage, and querying.

The exact levers you control are primarily within the PodMonitoring and ClusterPodMonitoring resources. You define:

  • selector: Which pods the PodMonitoring applies to.
  • endpoints:
    • port: The name of the port on the pod to scrape from. This must match a port name in your pod’s containerPort definition.
    • path: The HTTP path where the metrics are exposed (e.g., /metrics).
    • interval: How often to scrape.
    • scheme: http or https.
    • relabelings / metricRelabelings: Powerful tools to modify labels or metrics before they are scraped or after they are scraped but before they are sent to MSP.

A common pitfall is overlooking the scheme parameter. By default, MSP agents assume http. If your application exposes metrics over https without proper TLS configuration on the scraping endpoint (which is rare for internal service metrics but possible), the scrape will fail silently or with TLS handshake errors.

Most people don’t realize that PodMonitoring resources are namespaced, while ClusterPodMonitoring resources are cluster-scoped. If you create a PodMonitoring in the default namespace, it will only find pods in the default namespace that match its selector. To scrape pods across all namespaces, you need to use ClusterPodMonitoring or create PodMonitoring resources in each relevant namespace. This is a frequent source of "why am I not seeing metrics for my pods in namespace X?" questions.

The next concept to explore is advanced metric filtering and transformation using relabelings and metricRelabelings.

Want structured learning?

Take the full Gke course →