The most surprising thing about maintaining chat history with Gemini is that it doesn’t actually "remember" anything in the way a human does; instead, you’re explicitly resending the past to the model with each new turn.

Let’s see this in action. Imagine a simple back-and-forth:

import google.generativeai as genai

# Configure your API key (replace with your actual key)
genai.configure(api_key="YOUR_API_KEY")

# Initialize the model
model = genai.GenerativeModel('gemini-1.5-flash')

# Start a chat session
chat = model.start_chat(history=[])

# First turn
response1 = chat.send_message("What's the capital of France?")
print(f"User: What's the capital of France?\nGemini: {response1.text}\n")

# Second turn - notice how we're not explicitly passing history here,
# but the 'chat' object manages it internally.
response2 = chat.send_message("And what is its main river?")
print(f"User: And what is its main river?\nGemini: {response2.text}\n")

# Third turn - the chat object automatically includes the previous turns.
response3 = chat.send_message("How many people live there?")
print(f"User: How many people live there?\nGemini: {response3.text}\n")

When you run this, you’ll see Gemini correctly answer "Paris" to the first question, then "The Seine" to the second (understanding "its" refers to Paris), and finally the population of Paris to the third. The magic isn’t the model’s inherent memory, but how the start_chat object, when used with send_message, constructs the input for each subsequent API call.

Internally, the chat.send_message() method doesn’t just send your new prompt. It takes the existing history stored within the chat object, appends your new message, and sends the entire sequence of messages (user and model turns) to the Gemini API. The API then processes this full context to generate the next response. This is why you need to be mindful of the total token count as the conversation grows.

The history attribute of the chat object is a list of protos.Message objects, where each message has a role (either 'user' or 'model') and parts (which contain the actual text content). When you call send_message, your new prompt is added as a 'user' role message, and the model’s response is then added as a 'model' role message to this history. This updated history is then what’s passed to the model for the next turn.

You have direct control over this history. You can inspect it, modify it, or even provide a pre-populated history when starting a chat. For instance, if you wanted to resume a conversation or inject specific context, you could do this:

# Example of starting a chat with existing history
initial_history = [
    {"role": "user", "parts": [{"text": "What is the main function of a CPU?"}]},
    {"role": "model", "parts": [{"text": "The main function of a Central Processing Unit (CPU) is to execute instructions from a computer program."}]}
]

# The history needs to be in the correct protobuf format for the API.
# The SDK often handles this conversion implicitly when you use `start_chat`.
# For explicit creation:
from google.generativeai.protos import Content, Part
history_proto = [Content(role=h['role'], parts=[Part(text=p['parts'][0]['text'])]) for h in initial_history]

chat_resumed = model.start_chat(history=history_proto)

response_resumed = chat_resumed.send_message("Can you elaborate on what 'executing instructions' means?")
print(f"User: Can you elaborate on what 'executing instructions' means?\nGemini: {response_resumed.text}\n")

This allows for sophisticated control, like creating AI agents that have a persona or memory of past interactions without needing to re-explain everything from scratch. The key levers you control are the content of the messages and the order in which they appear in the history. You can also prune the history to manage token limits.

What most people don’t realize is that the history object is a mutable list of Content objects, and send_message appends to it in place. If you were to create a chat object, call send_message a few times, and then try to use the original history variable that was passed into start_chat, you’d find that the original variable has also been modified. This is a common source of confusion if you expect the history passed to start_chat to remain pristine.

The next challenge you’ll face is effectively managing the token budget of the conversation as the history grows, preventing 400 Bad Request errors due to exceeding model limits.

Want structured learning?

Take the full Gemini-api course →