LangChain Output Parsers: Extract Structured Data from LLMs (2026)

Output parsers in LangChain are how you turn the freeform, often messy, text output of a Large Language Model (LLM) into structured data that your program can actually use.

Let’s see one in action. Imagine we want to extract a user’s name and age from a sentence.

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Extract the name and age from the user's message."),
    ("user", "{user_input}")
])

model = ChatOpenAI(model="gpt-3.5-turbo")
output_parser = StrOutputParser()

chain = prompt | model | output_parser

response = chain.invoke({"user_input": "My name is John Doe and I am 30 years old."})
print(response)

This will likely output something like:

Name: John Doe
Age: 30

Now, that’s just raw text. What if we wanted a Python dictionary? That’s where StructuredOutputParser and PydanticOutputParser come in.

Here’s how you’d define a schema for that data using Pydantic:

from typing import Optional
from pydantic import BaseModel, Field

class Person(BaseModel):
    name: str = Field(description="The person's full name.")
    age: int = Field(description="The person's age.")
    city: Optional[str] = Field(None, description="The person's city of residence.")

Then, you’d use PydanticOutputParser to guide the LLM:

from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from typing import Optional

class Person(BaseModel):
    name: str = Field(description="The person's full name.")
    age: int = Field(description="The person's age.")
    city: Optional[str] = Field(None, description="The person's city of residence.")

prompt = ChatPromptTemplate.from_messages([
    ("system", f"Extract the relevant information from the user's message.\n{PydanticOutputFormatter().get_format_instructions(schema=Person)}"),
    ("user", "{user_input}")
])

model = ChatOpenAI(model="gpt-3.5-turbo")
parser = PydanticOutputParser(pydantic_object=Person)

chain = prompt | model | parser

response = chain.invoke({"user_input": "Alice Smith is 25 years old and lives in New York."})
print(response)

This would give you a Person object:

name='Alice Smith' age=25 city='New York'

The core problem LangChain output parsers solve is the inherent ambiguity and variability of LLM text generation. LLMs are trained to predict the next token, not to conform to rigid data structures. Without explicit guidance and parsing, extracting specific, usable data reliably is a significant challenge. Parsers provide this structure by:

Instructing the LLM: They generate specific instructions that are added to the prompt, telling the LLM how to format its output (e.g., "Output as JSON," "Use the following schema").
Validating and Cleaning: After the LLM generates text, the parser attempts to convert that text into the desired structured format. If the LLM deviates, the parser can raise errors or even trigger re-generation with corrected instructions.
Enforcing Schemas: Parsers like PydanticOutputParser leverage schema definitions (like Pydantic models) to define the expected fields, their types, and descriptions. This allows for robust validation and ensures the output conforms to your application’s needs.

Think of it as building a filter. The LLM is a powerful but often undisciplined artist. The output parser is the frame and matting that turns the art into something you can hang on your wall and categorize. StrOutputParser is the simplest frame, just giving you the canvas. PydanticOutputParser is a custom-built, museum-quality frame that demands the art fit specific dimensions and styles.

The PydanticOutputParser works by first asking the LLM to output a JSON string that conforms to the Pydantic model’s schema. It then takes that JSON string, parses it into a Python dictionary, and finally instantiates the Pydantic model with that dictionary. If the LLM’s output isn’t valid JSON, or if the JSON doesn’t match the schema (e.g., a string where an integer is expected), the parser will raise a OutputToolsException (or similar, depending on LangChain version and exact parser) which can be caught and handled. This often involves sending the error back to the LLM with instructions to fix its output.

One of the most powerful, yet often overlooked, aspects of PydanticOutputParser is its ability to handle optional fields and provide default values. When you define a field in your Pydantic model as Optional[str] or give it a default value like city: str = "Unknown", the parser will gracefully handle cases where the LLM doesn’t provide that information. It won’t throw an error; instead, it will either leave the field as None or assign the default value, making your downstream code more resilient. This is crucial because LLMs don’t always "see" or extract every piece of information you might want, especially if it’s not explicitly stated or if the LLM prioritizes other information.

The next step after reliably extracting structured data is often performing actions based on that data, which leads into LangChain Agents and Tools.