Serverless functions often tear down their connections to PostgreSQL mid-transaction, leaving PgBouncer in a state of confusion.
Let’s see PgBouncer in action with serverless functions. Imagine a typical scenario:
A serverless function, triggered by an event, needs to read some data from our database.
# Example AWS Lambda function (Python)
import psycopg2
import os
def lambda_handler(event, context):
conn = None
try:
conn = psycopg2.connect(
host=os.environ['DB_HOST'],
database=os.environ['DB_NAME'],
user=os.environ['DB_USER'],
password=os.environ['DB_PASSWORD'],
port=os.environ['DB_PORT']
)
cursor = conn.cursor()
# Start a transaction
cursor.execute("BEGIN;")
cursor.execute("SELECT COUNT(*) FROM users;")
result = cursor.fetchone()
print(f"User count: {result[0]}")
# Simulate some work
import time
time.sleep(5) # This sleep is critical for the problem
# No COMMIT or ROLLBACK before the function exits
# The connection might be closed by the serverless platform here
return {
'statusCode': 200,
'body': 'Data fetched successfully!'
}
except Exception as e:
print(f"Error: {e}")
return {
'statusCode': 500,
'body': 'Error fetching data.'
}
finally:
if conn:
conn.close() # This might be called after the platform has already cleaned up
In this code, the time.sleep(5) simulates work. Crucially, the serverless platform (like AWS Lambda) might decide to terminate the execution environment and close the underlying network connection after this sleep, but before our application code explicitly calls conn.commit() or conn.rollback().
This is where PgBouncer, acting as a connection pooler, gets into trouble. It expects connections to be properly managed within transactions. When a serverless function abruptly disconnects mid-transaction, PgBouncer might not immediately realize the connection it lent out is now stale or has lost its transactional state.
The Problem: Transaction Mode Mismatch
PgBouncer has several pooling modes, but the most common ones for this issue are session and transaction.
- Session Pooling: A client connection is held by PgBouncer for the entire duration of the client’s connection to PgBouncer. When the client disconnects from PgBouncer, the connection is returned to the pool. This works fine for traditional applications where connections are long-lived.
- Transaction Pooling: A server connection is held by PgBouncer only for the duration of a single transaction. Once a transaction is committed or rolled back, the server connection is returned to the pool and can be reused for another client’s transaction. This is ideal for serverless functions because it minimizes idle connections and maximizes resource utilization.
The issue arises when serverless functions, designed for ephemeral execution, operate within a transaction pooling mode. If a function starts a transaction, does some work, and then the underlying serverless environment prematurely terminates the connection (often due to idle timeouts or scaling down), PgBouncer might still consider that connection "in-transaction" or in an unexpected state. When PgBouncer tries to reuse this connection for a new client, it might send the new client’s query to a connection that’s already in an uncommitted transaction, or worse, a connection that’s been implicitly rolled back by the database due to the abrupt disconnect.
The Fix: server_reset_query and pool_mode = session
The primary way to mitigate this is by configuring PgBouncer to be more resilient to these abrupt disconnections, specifically by using transaction pooling mode but with a crucial safeguard.
-
server_reset_query: This is the most important setting. It’s a query that PgBouncer executes on a server connection before it lends that connection to a client. Its purpose is to bring the connection back to a clean, known state. For serverless, a common choice isDISCARD ALL;.- Diagnosis:
Check your
pgbouncer.inifile for theserver_reset_querysetting. If it’s not set, or set to something other thanDISCARD ALL;, this is a likely culprit. - Fix:
In your
pgbouncer.inifile, under the[databases]section for your specific database, add or modify the line:[databases] mydb = host=your_db_host dbname=your_db_name user=your_pgbouncer_user password=your_pgbouncer_password [pgbouncer] # ... other settings server_reset_query = DISCARD ALL; - Why it works:
DISCARD ALL;resets the connection to its default state by clearing all temporary settings, prepared statements, and pending transaction states. This effectively cleans up any lingering effects from a prematurely terminated previous transaction before the new client query is sent.
- Diagnosis:
Check your
-
pool_mode = transaction(withserver_reset_query): Whilesessionpooling is safer if you have long-lived serverless functions or if you cannot guaranteeserver_reset_query’s effectiveness,transactionpooling is generally preferred for serverless for efficiency. The key is thattransactionpooling must be paired with a robustserver_reset_query.- Diagnosis:
Verify your
pgbouncer.inifile’s[databases]section haspool_mode = transaction. - Fix:
Ensure your
pgbouncer.inihas:[databases] mydb = host=your_db_host dbname=your_db_name user=your_pgbouncer_user password=your_pgbouncer_password pool_mode=transaction [pgbouncer] # ... other settings server_reset_query = DISCARD ALL; - Why it works:
transactionpooling reuses connections more aggressively.DISCARD ALL;ensures that this aggressive reuse doesn’t lead to corrupted states because each reused connection is thoroughly reset.
- Diagnosis:
Verify your
-
max_client_connanddefault_pool_size: Serverless functions can scale up very rapidly, creating many concurrent client connections to PgBouncer. If PgBouncer’s configured pool size is too small, and client connections outpace the available server connections, clients will queue up and eventually time out.- Diagnosis:
Monitor PgBouncer’s
show poolcommand (e.g.,psql -d pgbouncer -c "show pool;"). Look for high numbers in thecl_waitingcolumn. Also, check PgBouncer logs for messages like "server connection limit reached." - Fix:
Increase
default_pool_sizeinpgbouncer.inito a value that can handle your peak serverless concurrency. For example, if you expect 500 concurrent serverless invocations at peak, you might setdefault_pool_size = 500.[pgbouncer] # ... other settings default_pool_size = 500 max_client_conn = 1000 # Ensure this is also sufficiently high - Why it works: A larger pool means PgBouncer has more server connections ready to be handed out, reducing the chance that clients have to wait or that connections are dropped due to unavailability.
- Diagnosis:
Monitor PgBouncer’s
-
pool_timeout: This setting dictates how long a client connection will wait for a server connection to become available. If it’s too low, serverless functions might time out before PgBouncer can even establish a connection.- Diagnosis: Check PgBouncer logs for client connection timeouts. If your serverless function execution timeout is longer than this pool timeout, it’s a mismatch.
- Fix:
Increase
pool_timeoutinpgbouncer.ini. A value of60(seconds) is often a good starting point, but it depends on your function’s expected latency.[pgbouncer] # ... other settings pool_timeout = 60 - Why it works: Gives serverless functions more time to acquire a connection from the pool, especially during bursts of activity.
-
Database User Permissions: The user PgBouncer uses to connect to PostgreSQL needs appropriate permissions. While less common for transaction state issues, incorrect permissions can cause connection failures that might be misinterpreted.
- Diagnosis: Check PgBouncer logs for authentication errors or permission denied errors related to the database user.
- Fix:
Ensure the PostgreSQL user configured in
pgbouncer.inihasCONNECTprivileges on the target database andUSAGEon the schema, plus any necessarySELECT,INSERT,UPDATE,DELETEon the tables your functions access. - Why it works: Guarantees that PgBouncer can establish a valid connection and perform basic operations, preventing spurious errors.
-
Network Stability and Firewalls: Intermittent network issues or overly aggressive firewall rules can terminate connections unexpectedly, mimicking the serverless platform’s behavior.
- Diagnosis: Monitor network traffic between PgBouncer and PostgreSQL. Check firewall logs for dropped connections or connection reset packets.
- Fix: Ensure stable network routes. Configure firewalls to allow persistent connections between PgBouncer and PostgreSQL, potentially increasing TCP keep-alive settings on the network path if necessary.
- Why it works: Prevents external factors from prematurely breaking connections that PgBouncer is managing.
The next error you might encounter after fixing these issues is related to PgBouncer’s own connection limits or database-level connection limits being reached, forcing you to tune max_client_conn, default_pool_size, or the PostgreSQL max_connections parameter.