LangChain applications, when moved beyond local development, often hit a wall when it comes to packaging and serving.
Let’s see LangChain in action, not as a theoretical concept, but as a running service. Imagine we have a simple RAG (Retrieval Augmented Generation) application.
# main.py
from fastapi import FastAPI
from langchain.chains import RetrievalQA
from langchain_community.llms import OpenAI
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OpenAIEmbeddings
from langchain.prompts import PromptTemplate
# --- Configuration ---
# In a real app, these would be loaded from environment variables or a config file
OPENAI_API_KEY = "sk-..." # Your actual OpenAI API key
PERSIST_DIR = "./chroma_db"
COLLECTION_NAME = "my_rag_collection"
PROMPT_TEMPLATE = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Context: {context}
Question: {question}
Helpful Answer:"""
# --- Initialization ---
# Initialize LLM
llm = OpenAI(temperature=0.7, openai_api_key=OPENAI_API_KEY)
# Initialize Embeddings
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
# Load Vector Store
vector_store = Chroma(
collection_name=COLLECTION_NAME,
persist_directory=PERSIST_DIR,
embedding_function=embeddings
)
# Create Retriever
retriever = vector_store.as_retriever()
# Create Prompt
prompt = PromptTemplate(template=PROMPT_TEMPLATE, input_variables=["context", "question"])
# Create QA Chain
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff", # "stuff" is simple, but others exist like "map_reduce"
retriever=retriever,
chain_type_kwargs={"prompt": prompt},
return_source_documents=True # Useful for debugging and understanding
)
# --- FastAPI App ---
app = FastAPI()
@app.post("/ask/")
async def ask_question(question: str):
result = qa_chain({"query": question})
return {
"answer": result["result"],
"source_documents": [doc.page_content for doc in result["source_documents"]]
}
# --- Helper to build the vector store (run this once) ---
# For demonstration, we'll add some dummy data. In production, this would be a separate script
# or a background process that ingests documents.
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter
def build_vector_db():
try:
# Check if the collection already exists to avoid re-indexing
vector_store.get_collection(name=COLLECTION_NAME)
print(f"Collection '{COLLECTION_NAME}' already exists. Skipping DB build.")
return
except:
print(f"Collection '{COLLECTION_NAME}' not found. Building DB...")
# Load documents
with open("sample_doc.txt", "w") as f:
f.write("LangChain is a framework for developing applications powered by language models. It enables applications that are context-aware and able to interact with their environment.")
loader = TextLoader("sample_doc.txt")
documents = loader.load()
# Split documents
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
# Create and persist vector store
Chroma.from_documents(
documents=texts,
embedding=embeddings,
persist_directory=PERSIST_DIR,
collection_name=COLLECTION_NAME
)
print(f"Vector DB built and persisted to {PERSIST_DIR}")
if __name__ == "__main__":
build_vector_db() # Ensure DB is ready before starting the server
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
And here’s a Dockerfile to package it:
# Dockerfile
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
# Expose the port FastAPI will run on
EXPOSE 8000
# Command to run the application using uvicorn
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
And a requirements.txt:
fastapi
uvicorn
langchain-core
langchain-community
langchain-openai
chromadb
python-dotenv # Good practice for managing keys
To run this locally:
- Save the Python code as
main.py. - Save the Dockerfile as
Dockerfile. - Create
requirements.txtwith the listed packages. - Create an empty file named
sample_doc.txt. - Build the Docker image:
docker build -t langchain-app . - Run the Docker container:
docker run -p 8000:8000 -e OPENAI_API_KEY="sk-..." langchain-app(replacesk-...with your actual key).
Now you can send a POST request to http://localhost:8000/ask/ with a JSON body like {"question": "What is LangChain?"} and receive a JSON response.
The core problem LangChain addresses is orchestrating complex LLM workflows. It provides abstractions for components like LLMs, prompt templates, document loaders, text splitters, vector stores, and chains. A "chain" is the fundamental concept here – it’s a sequence of calls, often involving an LLM, that can take an input and produce an output. The RetrievalQA chain, for example, first retrieves relevant documents from a vector store based on a query and then passes those documents along with the original query to an LLM to generate an answer.
Internally, the RetrievalQA chain in our example does this:
- It takes the user’s
question. - It uses the
retriever(which is configured to query theChromavector store) to find documents semantically similar to the question. - It formats these retrieved documents and the original
questioninto apromptstring using thePromptTemplate. - It sends this formatted prompt to the
llm(OpenAI). - It receives the
resultfrom the LLM and returns it, along with thesource_documents.
You control the behavior through various parameters:
temperatureon theOpenAILLM: Controls randomness. Higher means more creative, lower means more deterministic.chain_typeinRetrievalQA: Determines how documents are processed. "stuff" packs all documents into a single prompt, which is simple but can hit token limits. "map_reduce" processes documents in chunks and then reduces them, better for large contexts.chunk_sizeandchunk_overlapinCharacterTextSplitter: Dictate how your source documents are broken down before being embedded and stored. Crucial for retrieval quality.persist_directoryforChroma: Where your vector embeddings are stored on disk. Essential for not re-indexing every time.- The
PromptTemplateitself: This is your primary tool for guiding the LLM’s response.
The most counterintuitive aspect of LangChain’s production readiness is that its "chains" can be arbitrarily nested and composed, forming a directed acyclic graph (DAG) of operations. This means you can build sophisticated agents that don’t just retrieve and answer, but can also use tools (like calling other APIs, running code, or performing database lookups) based on the LLM’s decision-making. You define these tools, and the agent LLM can dynamically choose which tool to use, execute it, observe the result, and then decide on the next step. This makes LangChain far more powerful than a simple sequential processing pipeline.
The next logical step after getting your LangChain app running in Docker is managing its state, especially the vector database, across container restarts.