The Map-Reduce summarization technique in LangChain doesn’t actually "reduce" your document’s complexity; it distributes the summarization work across an LLM.
Let’s see this in action. Imagine you have a long PDF, say, my_long_document.pdf. We’ll break it down, summarize each piece independently, and then combine those summaries.
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
# 1. Load the document
loader = PyPDFLoader("my_long_document.pdf")
docs = loader.load()
# 2. Split the document into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
split_docs = text_splitter.split_documents(docs)
# 3. Initialize the LLM
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
# 4. Load the Map-Reduce chain
# The 'map_reduce' type is key here.
chain = load_summarize_chain(llm, chain_type="map_reduce")
# 5. Run the chain
summary = chain.run(split_docs)
print(summary)
This script loads a PDF, splits it into manageable chunks (each around 1000 characters with 200 characters of overlap to maintain context), and then feeds these chunks to an LLM for summarization. The load_summarize_chain function, when set to chain_type="map_reduce", orchestrates this process.
Here’s the mental model:
- The Problem: LLMs have token limits. You can’t just shove a 100-page book into a single prompt. Traditional summarization methods might try to process the whole thing sequentially, which is slow and still hits token limits.
- The Solution: Map-Reduce:
- Map Phase: LangChain takes each individual document chunk (or "document" in LangChain’s terminology, even if it’s just a piece of a larger file) and sends it to the LLM with a prompt like "Summarize this text: [chunk content]". This happens in parallel for all chunks. Each LLM call produces a short summary of its assigned chunk.
- Reduce Phase: Once all individual summaries are generated, LangChain collects them. It then feeds these intermediate summaries to the LLM again, but this time with a prompt like "Combine these summaries into a coherent whole: [summary1] [summary2] …". This "reduce" step iteratively combines summaries until a final, comprehensive summary is produced. The number of reduce steps depends on how many initial chunks you have and how the chain is configured to handle the intermediate summaries.
The key levers you control are:
chunk_sizeandchunk_overlapinRecursiveCharacterTextSplitter: These determine how the document is initially broken down. Larger chunks mean fewer map calls but risk hitting LLM context windows if not careful. Smaller chunks mean more map calls, potentially increasing cost and time, but ensuring each piece fits. Overlap helps maintain continuity between chunks.llm: The LLM model you choose impacts the quality and cost of both the map and reduce phases.chain_type="map_reduce": This is the core setting that dictates the strategy. Other options likestuff(crams everything into one prompt, only for short docs) orrefine(iteratively refines a summary, good for sequential processing) exist.map_promptandcombine_prompt(if you pass custom prompts toload_summarize_chain): These directly influence what the LLM is asked to do in each phase.
The "reduce" part of Map-Reduce is where the magic happens, but it’s also where you can lose nuance if not careful. The combine_prompt is crucial. By default, it asks the LLM to synthesize the summaries. However, if you have many intermediate summaries, the LLM might struggle to retain all the specific details from the original "map" outputs. LangChain handles this by potentially performing multiple "reduce" steps, where it combines a batch of intermediate summaries, then combines those results, and so on. This hierarchical reduction is what allows it to handle a large number of initial chunks without overwhelming the LLM in the final stages.
The next step is often exploring how to handle the output of the reduce phase, especially when dealing with extremely large numbers of documents where even the final combined summary might need further processing or analysis.