InfluxDB, at its core, is optimized for ingesting time-series data, and how you send that data has a massive impact on how much you can get in. The key to maximizing InfluxDB write throughput isn’t about tuning InfluxDB itself, but about how you package the data before it even hits the wire.
Let’s see this in action. Imagine we’re sending metrics from a fleet of IoT devices. Without batching, each individual metric might look like this:
curl -XPOST 'http://localhost:8086/write?db=iot_data' \
--data-binary 'cpu,host=server01,region=us-west value=0.98'
This works, but it’s incredibly inefficient. Every single write incurs the overhead of establishing a new HTTP connection, sending headers, and InfluxDB processing a single point.
Now, let’s batch these writes. Instead of sending one point at a time, we can send multiple points in a single HTTP request. Here’s how that looks using the InfluxDB Line Protocol within a single curl command:
curl -XPOST 'http://localhost:8086/write?db=iot_data' \
--data-binary 'cpu,host=server01,region=us-west value=0.98
memory,host=server01,region=us-west value=0.55
disk,host=server01,region=us-west value=0.82'
Notice how each measurement is on a new line. This single request is far more efficient. We’ve amortized the connection and processing overhead across multiple data points.
The fundamental problem batching solves is the overhead per write. Each write request to InfluxDB involves network latency, TCP handshake, TLS negotiation (if enabled), HTTP request parsing, and InfluxDB’s internal write path. When you send millions of individual points, this per-write overhead quickly becomes the bottleneck, not InfluxDB’s ability to store the data itself. Batching collapses many individual write operations into a single, larger operation, drastically reducing the total overhead.
The InfluxDB Line Protocol is designed for this. It’s a simple, text-based format where each line represents a single data point. Multiple lines, separated by newline characters (\n), can be sent in a single HTTP POST request body.
The primary lever you control is the size of your batches. InfluxDB clients, like the official Go client or Telegraf, handle batching automatically. You typically configure parameters like:
- Batch Size: The number of data points to accumulate before sending a single write request.
- Flush Interval: The maximum time to wait before sending a batch, even if it hasn’t reached the configured batch size.
For example, in Telegraf, you might configure your output plugin like this:
[[outputs.influxdb]]
urls = ["http://localhost:8086"]
database = "iot_data"
# Number of metrics to buffer before writing
batch_size = 5000
# Maximum time to wait before flushing metrics
flush_interval = "10s"
These settings are crucial. Too small a batch size and you’re back to high overhead. Too large a batch size, and you increase memory usage on the client and potentially hit InfluxDB’s request size limits or increase write latency for individual points. Finding the sweet spot usually involves experimentation. A common starting point for batch_size is 5,000 to 10,000 metrics, and flush_interval around 5 to 10 seconds.
When InfluxDB receives a batched write, it processes the entire payload. It parses all the line protocol entries, sorts them by timestamp, and then writes them to disk in batches. This internal batching within InfluxDB is also optimized. The data is written to memory-mapped files, and then flushed to TS M (Time-Structured Merge Tree) structures for efficient querying. By sending larger batches, you’re allowing InfluxDB to perform its internal optimizations more effectively, reducing disk seeks and improving write amplification.
What most people don’t realize is that the Content-Encoding header can also play a role, especially for very large batches. If you’re sending many thousands of points, the raw payload can be substantial. Using gzip compression can significantly reduce the amount of data transferred over the network, which can be a bottleneck in itself. Most InfluxDB client libraries support transparently gzipping payloads. You’d typically see this in the HTTP request headers: Content-Encoding: gzip. This reduces network bandwidth usage and can indirectly improve throughput by speeding up the transfer.
The next problem you’ll likely encounter is managing cardinality, which is the number of unique series in your database.