Linkerd’s proxy injects itself into your application pods, and Traefik, as an ingress controller, needs to send traffic through those proxies to reach your services.
The Problem: Traefik Can’t Find Your Services
When Traefik tries to route traffic to a service that’s part of your Linkerd mesh, it often fails. You’ll see errors in Traefik’s logs like service not found or endpoint unavailable, even though kubectl get endpoints <your-service> shows healthy IPs. This happens because Traefik, by default, talks directly to Kubernetes API to resolve service endpoints and doesn’t "see" the Linkerd-injected proxies that are now handling the actual traffic.
Here’s how to fix it:
1. Traefik Not Discovering Kubernetes Services:
- Diagnosis: Check Traefik’s logs for
service not foundor similar errors. Verify Traefik is configured to use the Kubernetes CRD or Ingress controller.kubectl logs <traefik-pod-name> -n <traefik-namespace> - Cause: Traefik’s Kubernetes provider isn’t enabled or configured correctly.
- Fix: Ensure your Traefik deployment has the
--providers.kubernetesCRD=trueand--providers.kubernetesingress=trueflags (or equivalent in static configuration).# In Traefik static configuration (e.g., traefik.yaml or Helm values) providers: kubernetesCRD: enabled: true kubernetesIngress: enabled: true - Why it works: This tells Traefik to actively watch for Kubernetes
ServiceandIngress(orIngressRouteif using CRDs) resources.
2. Linkerd Proxies Not Being Targeted:
- Diagnosis: When Traefik does find the service, the traffic still fails.
kubectl describe service <your-service>shows the correctselectorandendpointslist IPs that look like your pods, but traffic doesn’t reach the application. - Cause: Traefik is trying to connect to the application’s original port on the pod IP, but Linkerd’s proxy has taken over that port. The application port is no longer directly accessible.
- Fix: Configure Traefik to send traffic to the Linkerd proxy port (usually 4140 for TCP, or the application’s port itself if Linkerd is configured for transparent proxying and the app port is specified in the proxy config). The most robust way is to ensure your Kubernetes
Servicedefinition correctly points to the ports exposed by the Linkerd proxy.apiVersion: v1 kind: Service metadata: name: my-app-service namespace: my-namespace spec: selector: app: my-app ports: - protocol: TCP port: 80 # The port Traefik will target targetPort: 80 # This should match the port the Linkerd proxy is listening on for your app traffic - Why it works: By ensuring the
targetPortin the KubernetesServicepoints to the port Linkerd’s proxy is listening on for application traffic, Traefik sends requests to the proxy. The proxy then forwards it to the actual application container.
3. Linkerd Service Profile Mismatch:
- Diagnosis: Intermittent failures, timeouts, or incorrect routing for specific requests, especially if you’re using Linkerd’s advanced features like retries or traffic splitting.
- Cause: Linkerd’s
ServiceProfilemight not be correctly configured for the service, or it might be expecting traffic on a different port than Traefik is sending. - Fix: Ensure your
ServiceProfilefor the service is correctly defined and that theportunderspec.routesmatches theportdefined in your KubernetesService.apiVersion: linkerd.io/v1alpha2 kind: ServiceProfile metadata: name: my-app-service.my-namespace.svc.cluster.local namespace: my-namespace spec: routes: - name: GET /api/users # Ensure this port matches your Kubernetes Service port port: 80 responseClasses: - condition: status: httpRange(200, 299) isError: false # ... other route configurations - Why it works: The
ServiceProfiletells Linkerd how to interpret traffic for that service. Aligning the port ensures Linkerd can correctly apply its policies and telemetry.
4. Traefik IP Filter / Rate Limiting Blocking Mesh Traffic:
- Diagnosis: Legitimate traffic from Traefik to your services is being dropped or rejected. You might see 403 errors or connection resets originating from Traefik’s IP address.
- Cause: Traefik’s security middleware (like IP filtering or rate limiting) is configured to only allow traffic from specific sources, and it’s not including the IPs of your Linkerd proxies.
- Fix: Adjust Traefik’s middleware configurations to allow traffic originating from the Kubernetes pod CIDR range, or specifically from the Linkerd proxy’s default IP if applicable. This often means updating
middlewares.yamlor yourIngressRoutedefinitions.# Example Traefik IngressRoute with IP filter apiVersion: traefik.containo.us/v1alpha1 kind: IngressRoute metadata: name: my-app-ingress namespace: my-namespace spec: entryPoints: - websecure routes: - match: Host(`my-app.example.com`) kind: Rule services: - name: my-app-service port: 80 middlewares: - name: ip-whitelist # Assuming you have an IPAllowList middleware namespace: traefik # If you have a global IPAllowList middleware, ensure it's configured correctly # For example, to allow traffic from the cluster's pod network: # traefik.yaml (static config) or a dedicated middleware CRD # ipAllowList: # sourceRange: # - 10.244.0.0/16 # Replace with your cluster's pod CIDR - Why it works: This explicitly permits traffic originating from where your Linkerd proxies reside, allowing them to receive requests from Traefik.
5. DNS Resolution Issues:
- Diagnosis: Traefik can’t resolve the Kubernetes service name, even though
kubectlcan. Logs might showno such hostor DNS lookup failures. - Cause: Traefik’s DNS resolver isn’t configured to use the cluster’s DNS service (like CoreDNS) or is using an outdated configuration.
- Fix: Ensure Traefik is configured to use the cluster’s DNS. In static configuration, this is often handled by default when running inside Kubernetes. If you’re overriding DNS settings, explicitly set it to your cluster’s DNS IP (e.g.,
10.43.0.10for CoreDNS in kube-system).# In Traefik static configuration ports: web: # ... dns: servers: - "10.43.0.10" # Replace with your cluster's CoreDNS IP - Why it works: This guarantees Traefik uses the same DNS resolution mechanism as the rest of your Kubernetes cluster, correctly finding internal service names.
6. Linkerd l5d-dst-canonical Header Issues:
- Diagnosis: Traffic reaches the Linkerd proxy, but the proxy doesn’t know how to route it to the correct application container, leading to 503 errors from the proxy itself.
- Cause: Traefik might be adding or modifying headers that interfere with Linkerd’s internal routing mechanism, specifically the
l5d-dst-canonicalheader which Linkerd uses to identify the target service. - Fix: Configure Traefik to not add or modify the
l5d-dst-canonicalheader. This is usually done by ensuring no middleware or configuration explicitly sets this header. If you’re usingIngressRoutewith Traefik, ensure that the service definition doesn’t involve custom headers that might conflict.# Example IngressRoute - ensure no custom headers are set that conflict apiVersion: traefik.containo.us/v1alpha1 kind: IngressRoute metadata: name: my-app-ingress namespace: my-namespace spec: entryPoints: - web routes: - match: Host(`my-app.example.com`) kind: Rule services: - name: my-app-service port: 80 # IMPORTANT: Do NOT add a 'headers:' section here that sets l5d-dst-canonical - Why it works: By leaving the
l5d-dst-canonicalheader untouched, Linkerd’s proxy can correctly identify and route the incoming request to the appropriate application container without misinterpretation.
After applying these fixes, you should see traffic flowing correctly from Traefik through the Linkerd mesh to your applications. The next error you’ll likely encounter is a linkerd.io/v1alpha2.ServiceProfile not found for a specific route if you haven’t defined one for more granular control.