The New Relic Kubernetes Explorer doesn’t just show you what’s happening in your cluster; it shows you how your cluster is feeling about what’s happening.
Let’s see it in action. Imagine you’ve got a deployment named my-app running across several nodes.
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 3
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app-container
image: nginx:latest
ports:
- containerPort: 80
When New Relic’s Kubernetes integration is set up, it starts collecting metrics from the Kubernetes API server, kubelet, and the container runtime. This data isn’t just raw numbers; it’s contextualized. The Explorer takes this, for example, the kubelet reports CPU and memory usage for each pod. The Explorer then aggregates this by deployment, by namespace, and by node.
Here’s what you might see in the Explorer’s UI for my-app:
- Cluster Overview: A high-level view of all nodes, their resource utilization (CPU, memory, network I/O), and the status of pods and deployments. You’d see your
my-appdeployment listed, alongside others. - Workload View: Drilling into
my-app, you’d see the individual pods that make up the deployment. If one pod is struggling, say with high CPU, it would be visually distinct. You’d see its resource consumption compared to the other pods in the deployment and the available resources on its node. - Node View: Examining the node where a struggling
my-apppod resides, you’d see the overall health of that node. Is it running out of memory? Is its disk I/O saturated? This helps differentiate between a pod-specific problem and a node-wide issue.
The real power is in how New Relic stitches this together. It’s not just about seeing a spike in CPU for a pod. It’s seeing that spike, then being able to click through to the node to see if other critical system processes are also consuming resources, or if the node’s network bandwidth is maxed out, potentially impacting pod communication. The Explorer presents resource metrics (CPU, memory, network, disk) alongside Kubernetes-native events (pod restarts, OOMKilled events, deployment rollouts) and application performance metrics (if you have APM agents running).
This correlation allows you to answer questions like: "Is my application slow because the underlying node is overloaded?" or "Did a recent deployment cause a cascade of pod failures due to resource contention?"
The Explorer uses the data collected by the New Relic infrastructure agent, which typically runs as a DaemonSet in your cluster. This agent queries the Kubernetes API for object states and metrics, and also scrapes metrics from kubelet (usually via its /metrics/cadvisor endpoint) and the container runtime. This data is then sent to New Relic’s platform for aggregation and visualization. You can configure which metrics are collected and how often, balancing detail with the load on your cluster.
The most surprising thing is how much noise you can filter out by understanding the intent of a Kubernetes object. A pod having high CPU isn’t inherently bad; it’s only bad if it’s causing the deployment to fail its readiness or liveness probes, or if it’s starving other critical pods on the same node. The Explorer’s value isn’t in listing every single metric, but in highlighting the metrics that deviate from expected healthy behavior based on the Kubernetes object’s role and its peers. It’s about understanding the signals versus the noise in your cluster’s resource consumption.
Understanding the underlying resource constraints on a node is the next logical step when a workload appears healthy but isn’t performing as expected.