The Neon compute, responsible for processing your SQL queries, was not being kept warm, leading to significant delays when a query arrived after a period of inactivity.
Here are the common reasons for this and how to fix them:
1. Compute Auto-Suspend Timeout Too Short
Diagnosis: Check your Neon project settings for the "Auto-suspend" timeout. This is usually found in the project’s general settings or compute configuration.
Fix: Increase the "Auto-suspend" timeout to a value that better matches your expected query inactivity period. For example, if you typically have a query every 15 minutes, set it to 30 minutes or 1 hour.
# Example using Neon CLI (replace with your project ID and desired timeout in minutes)
neonctl update project <your-project-id> --autosuspend-timeout 60
Why it works: This directly tells the Neon control plane to wait longer before suspending the compute instance, ensuring it remains active and ready for incoming queries.
2. No Periodic "Keep-Alive" Queries
Diagnosis: You haven’t implemented any mechanism to periodically ping your Neon database to prevent it from suspending.
Fix: Implement a scheduled job (e.g., using cron on a server, a cloud scheduler, or a dedicated monitoring tool) that runs a simple, low-resource query against your Neon database at regular intervals. A SELECT 1; is perfectly sufficient. Set the schedule to run more frequently than your auto-suspend timeout.
Example cron entry to run every 10 minutes:
*/10 * * * * psql -h <your-neon-host> -p 5432 -U <your-user> -d <your-db> -c "SELECT 1;" > /dev/null 2>&1
Why it works: Any query execution, even a trivial one, resets the inactivity timer for the compute instance, preventing it from entering the suspended state.
3. Application-Level Connection Pooling Issues
Diagnosis: Your application’s connection pool is configured to aggressively close idle connections, and Neon’s compute is suspending because it sees no active connections.
Fix: Adjust your application’s connection pool settings to have a longer idle_timeout or max_lifetime. The exact parameter names vary by pooling library (e.g., HikariCP, pgxpool). Ensure these timeouts are longer than your Neon compute’s auto-suspend timeout.
Example HikariCP configuration (in application.properties or similar):
# Set to a value longer than Neon's auto-suspend timeout
spring.datasource.hikari.idle-timeout=600000 # 10 minutes
spring.datasource.hikari.max-lifetime=1800000 # 30 minutes
Why it works: By keeping application-level connections open longer, the connection pool signals to Neon that there’s active usage, thus preventing the compute from suspending.
4. Incorrect Compute Branch Configuration
Diagnosis: You might be using a shared compute endpoint for your main application traffic, but a different, less frequently used branch’s compute is suspending.
Fix: Ensure your application is connecting to the compute endpoint associated with the correct, actively used branch. If you have multiple branches, verify that the branch intended for your primary application is configured with an appropriate auto-suspend timeout and potentially a keep-alive mechanism if needed.
Why it works: This is about directing traffic to the right place. If the wrong compute is being kept warm, your application will still experience cold starts.
5. Network Intermediaries Dropping Idle TCP Connections
Diagnosis: Firewalls, load balancers, or other network infrastructure between your application and Neon might be configured to drop idle TCP connections after a certain period. This can lead to your application thinking a connection is still open when Neon has lost it, and then a new connection attempt hits a suspended compute.
Fix: Configure any intermediate network devices to have TCP keep-alive settings that are less aggressive than your application’s and Neon’s timeouts. Alternatively, ensure your application’s connection pool has robust connection validation logic that can detect and re-establish broken connections gracefully.
Why it works: This ensures that the TCP connection itself remains alive through the network path, allowing the application to maintain a persistent logical connection to the Neon compute.
6. Neon Compute Instance Size Too Small (Less Common for Cold Start)
Diagnosis: While less directly a cause of cold start latency itself, if your compute is consistently under heavy load and struggling to keep up even when warm, it might appear to be slow to respond. This is more about overall performance but can be mistaken for cold start.
Fix: Scale up your Neon compute instance size. This allocates more CPU and memory, allowing it to process queries faster once it’s active.
Example using Neon CLI:
# Example: scaling to a larger compute size (e.g., "medium" or "large")
neonctl update compute <your-compute-id> --size medium
Why it works: A more powerful compute instance can handle incoming queries with lower latency once it has been activated.
The next error you might encounter is a connection refused if your keep-alive queries are also being blocked or if the compute instance fails to start after suspension.