InfluxDB doesn’t actually have traditional indexes like relational databases; it uses a combination of data organization and query predicates to achieve high performance.

Let’s see how this plays out in practice. Imagine you have a time-series dataset storing sensor readings.

{
  "measurement": "cpu_usage",
  "tags": {
    "host": "server01",
    "region": "us-east-1"
  },
  "fields": {
    "usage_user": 25.5,
    "usage_system": 10.2
  },
  "timestamp": "2023-10-27T10:00:00Z"
}

You want to query the CPU usage for server01 in us-east-1 over a specific time range. Without optimizing, a query might look like this:

SELECT "usage_user", "usage_system"
FROM "cpu_usage"
WHERE time >= '2023-10-27T09:00:00Z' AND time < '2023-10-27T11:00:00Z'
  AND "host" = 'server01' AND "region" = 'us-east-1'

InfluxDB stores data in shards, which are time-based partitions. The query engine first filters by time, then by tags. The efficiency here comes from how InfluxDB organizes data within these shards.

The key to "indexing" in InfluxDB lies in how you structure your data and the predicates you use in your queries. Tags are the primary way to filter data efficiently. Unlike fields, which are indexed by InfluxDB’s internal structures for faster field value lookups, tags are indexed in a way that allows for very rapid filtering.

Consider the WHERE clause. The time filter is always applied first because shards are organized by time. This drastically reduces the number of shards that need to be scanned. After time, InfluxDB uses its tag index to quickly locate the data points matching your specific tag values.

If you have high-cardinality tags (tags with many unique values, like user_id or request_id), InfluxDB can still handle them, but performance might degrade compared to low-cardinality tags (like region or environment). In such cases, it’s often better to consider if those high-cardinality identifiers should be fields instead of tags, or if you can pre-aggregate data.

The most surprising truth is that InfluxDB doesn’t maintain a separate, explicit index file for tags that you manually manage. Instead, the tag index is deeply integrated into the storage engine’s data structures, specifically within the TSM (Time-Structured Merge) tree. When you write data, the tag key-value pairs are incorporated into these structures in a way that allows for efficient lookups during query execution. It’s less about creating an index and more about writing data in a way that leverages InfluxDB’s inherent indexing capabilities.

The "lever" you control most directly for performance is your schema design: what you choose to make a tag versus a field. Tags are for dimensions you’ll filter on (e.g., host, region, device_id). Fields are for the actual values you’re measuring (e.g., cpu_usage, temperature, latency). If you find yourself frequently filtering on a field value, it’s a strong indicator that it should be a tag.

Another crucial aspect is query predicate order. While InfluxDB’s engine is smart, explicitly putting your most selective filters (usually tags) towards the end of your WHERE clause can sometimes guide the optimizer, though the time filter is always paramount. For example, if you have a query filtering on time, region, and host, InfluxDB will always start with time, then use its tag index for region and host. The order between region and host in the WHERE clause has less impact than their presence.

The next problem you’ll likely encounter is optimizing queries that involve aggregations over very large time ranges or across many series.

Want structured learning?

Take the full Influxdb course →