LangChain’s LLM caching can drastically reduce latency and cost by storing previous LLM responses, but it often fails to cache anything at all.

The core issue is that the cache key generation isn’t recognizing the exact same input prompt and parameters, leading to a cache miss even when the response should theoretically be identical. This happens because subtle differences in how the prompt is formatted, or slight variations in LLM parameters (like temperature or top_p), create unique keys that the cache doesn’t match.

Here are the most common reasons your LangChain LLM cache might not be working, and how to fix them:

1. Inconsistent Prompt Formatting: Even a single extra space or a newline character can change the cache key.

  • Diagnosis: Inspect the generated cache keys. If you’re using a RedisCache, you can use redis-cli KEYS "langchain:*". You’ll likely see keys for the same logical prompt that differ only by whitespace.
  • Fix: Standardize your prompt creation. Use .strip() on all prompt strings before passing them to the LLM. If you’re using f-strings or templates, ensure consistent newline handling. For example, instead of:
    prompt = f"Translate this to French:\n{text}"
    
    Use:
    prompt = f"Translate this to French:\n{text.strip()}".strip()
    
  • Why it works: This ensures that identical input text always results in the same string representation, producing a consistent cache key.

2. Varying LLM Parameters: The temperature, top_p, max_tokens, or other LLM parameters are included in the cache key. If these change even slightly between calls, the cache won’t hit.

  • Diagnosis: When inspecting Redis keys, you’ll see parameters appended to the prompt. If you’re seeing keys with temperature=0.9 and then temperature=0.8 for the same prompt, that’s the problem.
  • Fix: Define your LLM parameters once and reuse them for all calls that should share a cache.
    from langchain.llms import OpenAI
    from langchain.cache import RedisCache
    import redis
    
    redis_client = redis.Redis(host='localhost', port=6379, db=0)
    cached_llm = OpenAI(temperature=0.7, model_name="text-davinci-003", cache=RedisCache(redis_client))
    
    Then, use cached_llm for all subsequent calls.
  • Why it works: By fixing LLM parameters to a specific set of values, the generated cache key will consistently include those exact same parameters, allowing for cache hits on repeated identical requests.

3. Using Different LLM Instances: If you instantiate your LLM object multiple times with caching enabled, each instance might manage its cache keys independently or encounter initialization race conditions.

  • Diagnosis: If you’re seeing cache misses despite identical prompts and parameters, and you’ve checked formatting, it might be that you’re using different OpenAI or ChatOpenAI objects.
  • Fix: Instantiate your cached LLM object once and reuse that single instance throughout your application.
    # Instantiate once
    llm_with_cache = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.5, cache=RedisCache(redis_client))
    
    # Use the same instance for all calls
    response1 = llm_with_cache.invoke("What is 2+2?")
    response2 = llm_with_cache.invoke("What is 2+2?") # This should hit the cache
    
  • Why it works: A single LLM instance ensures that the caching mechanism (which is tied to that instance) consistently generates and checks keys against its own internal state and the connected cache backend.

4. Cache Backend Connectivity or Configuration Issues: Redis might not be running, or the connection details could be wrong, preventing LangChain from writing to or reading from the cache.

  • Diagnosis: Check your Redis server status (redis-cli ping). Look for connection errors in your application logs.
  • Fix: Ensure Redis is running and accessible from your application. Verify host, port, and db parameters in redis.Redis(). If using password authentication, ensure it’s provided.
    redis_client = redis.Redis(host='your_redis_host', port=6379, db=0, password='your_password')
    cached_llm = OpenAI(cache=RedisCache(redis_client))
    
  • Why it works: A stable and correctly configured connection to Redis ensures that LangChain can reliably store and retrieve cached responses, making the caching mechanism functional.

5. Complex Prompt Chains or Agents: When using LangChain Expression Language (LCEL) or agents, the input to the LLM might not be a simple string. It could be a dictionary, a list of messages, or an object. The default caching might not serialize these complex inputs into consistent keys.

  • Diagnosis: Examine the keys in Redis. If you’re using complex inputs, the keys might look like langchain:tool_code:<hash_of_complex_input> or langchain:agent:<hash_of_complex_input>. If these hashes are changing, the input isn’t being serialized consistently.
  • Fix: Implement custom serialization for your complex inputs when generating cache keys. You can achieve this by creating a custom BaseCache or by ensuring your complex objects have a deterministic __str__ or __repr__ method. A simpler approach for dictionaries is to sort keys before serialization.
    from langchain.schema import BaseMessage
    from typing import List, Dict, Any
    import json
    
    def deterministic_serialize(obj: Any) -> str:
        if isinstance(obj, dict):
            return json.dumps(obj, sort_keys=True)
        if isinstance(obj, list):
            return json.dumps([deterministic_serialize(item) for item in obj])
        if isinstance(obj, BaseMessage):
            return f"AIMessage(content={deterministic_serialize(obj.content)}, additional_kwargs={deterministic_serialize(obj.additional_kwargs)})"
        return str(obj)
    
    # Example usage within a custom cache or before passing to LLM
    complex_input = {"query": "What is the capital of France?", "history": ["User: Hello"]}
    serialized_input = deterministic_serialize(complex_input)
    # Use serialized_input to generate cache key or pass to LLM
    
  • Why it works: Deterministic serialization ensures that complex input structures, regardless of internal order or minor variations in object representation, are consistently converted into the same string, thus producing identical cache keys.

6. Cache Invalidation Logic: You might have set up caching, but the logic controlling when to use the cache or when to update it is flawed. This is less about the cache not working and more about it not behaving as expected.

  • Diagnosis: This is harder to diagnose with Redis keys alone. You’d need to trace the execution flow in your application to see if the cache lookup is even being attempted before the LLM call.
  • Fix: Ensure your LLM or ChatModel instance is correctly configured with the cache parameter. For more advanced control, consider using CacheInterceptor from langchain.cache to wrap your LLM calls and add custom logic for cache hits/misses.
  • Why it works: Explicitly defining the caching behavior, especially with tools like CacheInterceptor, guarantees that the cache is engaged under the conditions you intend, preventing unexpected cache misses due to programmatic logic errors.

The next error you’ll hit after fixing caching issues is likely a RedisConnectionError if your Redis instance is down or misconfigured, or a KeyError if you’ve implemented custom cache key logic that doesn’t handle all input types.

Want structured learning?

Take the full Langchain course →