Send Concurrent Requests to the Gemini API with Async Python (2026)

The Gemini API is designed to be called concurrently, and Python’s asyncio library is the perfect tool for the job.

Here’s how you can send 10 concurrent requests to the Gemini API using Python’s asyncio and the google-generativeai library.

import asyncio
import google.generativeai as genai
import os

# Configure the API key
genai.configure(api_key=os.environ["GEMINI_API_KEY"])

async def generate_content_async(prompt: str):
    """Asynchronously generates content using the Gemini API."""
    model = genai.GenerativeModel('gemini-1.5-flash')
    try:
        response = await model.generate_content_async(prompt)
        return response.text
    except Exception as e:
        print(f"An error occurred: {e}")
        return None

async def main():
    """Main function to orchestrate concurrent API calls."""
    prompts = [
        "Tell me a short, funny story about a robot learning to cook.",
        "Explain the concept of quantum entanglement in simple terms.",
        "Write a haiku about a rainy day.",
        "What are the benefits of a plant-based diet?",
        "Describe the feeling of nostalgia.",
        "Give me a recipe for chocolate chip cookies.",
        "Summarize the plot of 'Pride and Prejudice'.",
        "What is the capital of Australia?",
        "Write a Python function to reverse a string.",
        "Explain the difference between a black hole and a wormhole."
    ]

    # Create a list of tasks
    tasks = [generate_content_async(prompt) for prompt in prompts]

    # Run tasks concurrently and gather results
    results = await asyncio.gather(*tasks)

    # Print the results
    for i, result in enumerate(results):
        print(f"--- Response {i+1} ---")
        print(result if result else "Failed to get a response.")
        print("-" * (len(f"--- Response {i+1} ---")))

if __name__ == "__main__":
    # Ensure you have your GEMINI_API_KEY set in your environment variables
    if "GEMINI_API_KEY" not in os.environ:
        print("Error: GEMINI_API_KEY environment variable not set.")
    else:
        asyncio.run(main())

To run this code:

Install the library:

pip install google-generativeai python-dotenv

Get an API Key: Obtain your Gemini API key from Google AI Studio.
Set the API Key:
- Option 1 (Environment Variable):
```
export GEMINI_API_KEY='YOUR_API_KEY'
```
  (Replace YOUR_API_KEY with your actual key)
- Option 2 (.env file): Create a file named .env in the same directory as your Python script and add:
```
GEMINI_API_KEY=YOUR_API_KEY
```
  Then, at the beginning of your script, add:
```
from dotenv import load_dotenv
load_dotenv()
```

This script defines an async function generate_content_async that makes a single call to the Gemini API. The main function then creates a list of these async calls (tasks) for different prompts and uses asyncio.gather to execute them concurrently. asyncio.gather waits for all the tasks to complete and returns their results in the order the tasks were provided.

The core idea behind asyncio is cooperative multitasking. When an await expression is encountered (like await model.generate_content_async(prompt)), the function pauses its execution and yields control back to the event loop. This allows the event loop to switch to another task that is ready to run. In this case, while one API call is waiting for a response from the Gemini server, other API calls can be initiated or processed. This significantly reduces the total execution time compared to making each API call sequentially.

The genai.GenerativeModel('gemini-1.5-flash') part specifies which model you want to use. gemini-1.5-flash is a fast and efficient model suitable for many common tasks. You could also use other models like 'gemini-1.5-pro'.

The try...except block is crucial for handling potential network issues or API errors gracefully. If an error occurs during an API call, it prints the error and returns None for that specific prompt, preventing the entire script from crashing.

The asyncio.run(main()) at the end is the entry point that starts the asyncio event loop and runs your main coroutine.

The next concept to explore is handling rate limits and implementing backoff strategies for more robust API interactions when dealing with a high volume of requests.