InfluxDB can aggregate time-series data by window, but the way it does it is fundamentally different from how SQL databases handle window functions.

Let’s look at a concrete example. Imagine we’re tracking server CPU usage. We have a cpu measurement with a usage_user field and a host tag.

{
  "results": [
    {
      "statement_id": 0,
      "series": [
        {
          "name": "cpu",
          "columns": [
            "time",
            "usage_user",
            "host"
          ],
          "values": [
            ["2023-10-27T10:00:00Z", 10.5, "server_a"],
            ["2023-10-27T10:01:00Z", 12.1, "server_a"],
            ["2023-10-27T10:02:00Z", 11.8, "server_a"],
            ["2023-10-27T10:03:00Z", 13.5, "server_a"],
            ["2023-10-27T10:00:00Z", 5.2, "server_b"],
            ["2023-10-27T10:01:00Z", 6.0, "server_b"],
            ["2023-10-27T10:02:00Z", 5.5, "server_b"],
            ["2023-10-27T10:03:00Z", 6.8, "server_b"]
          ]
        }
      ]
    }
  ]
}

We want to calculate the average CPU usage for each server over 2-minute windows.

InfluxQL Approach:

InfluxQL uses the GROUP BY time() clause. This is not a sliding window in the SQL sense. Instead, it defines discrete, non-overlapping time buckets.

SELECT mean("usage_user") FROM "cpu" WHERE $timeFilter GROUP BY time(2m), "host"

If we run this query for the time range 2023-10-27T10:00:00Z to 2023-10-27T10:04:00Z, the output would look like this:

name: cpu
time                mean_usage_user host
----                --------------- ----
2023-10-27T10:00:00Z 10.5            server_a
2023-10-27T10:00:00Z 5.2             server_b
2023-10-27T10:02:00Z 12.65           server_a
2023-10-27T10:02:00Z 6.15            server_b

Notice how the first window starts at 10:00:00Z and ends at 10:01:59.999Z. The second window starts at 10:02:00Z and ends at 10:03:59.999Z. The time in the output represents the start of each bucket. This is a fixed-interval aggregation.

Flux Approach:

Flux provides more flexibility with its window() function, which can implement sliding windows. However, even the window() function, by default, operates on fixed, non-overlapping intervals similar to InfluxQL’s GROUP BY time(). To achieve a sliding window effect, you need to combine window() with a specific every and period configuration.

Here’s how you’d do a 2-minute aggregation with Flux, first as fixed intervals, then as a sliding window:

Fixed Intervals in Flux (similar to InfluxQL):

data = from(bucket: "my_bucket")
  |> range(start: -10m)
  |> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_user")
  |> group(columns: ["host"])
  |> window(every: 2m)
  |> mean()
  |> group() // ungroup to get results per host

// Example output for the same data and time range:
// _time                _value _field       _measurement host
// -------------------  ------  ----------   ------------ --------
// 2023-10-27T10:00:00Z  10.5    usage_user   cpu          server_a
// 2023-10-27T10:00:00Z  5.2     usage_user   cpu          server_b
// 2023-10-27T10:02:00Z  12.65   usage_user   cpu          server_a
// 2023-10-27T10:02:00Z  6.15    usage_user   cpu          server_b

In this Flux query, window(every: 2m) creates buckets that start at multiples of 2 minutes from the epoch (or the start of your range if specified that way). The mean() function then aggregates within these buckets.

Sliding Window in Flux:

To create a true sliding window, where the aggregation window moves forward by a smaller increment than its duration, you configure window() with every and period. every defines how often a new window starts, and period defines the duration of each window.

Let’s say we want a 2-minute window that slides forward every 30 seconds:

data = from(bucket: "my_bucket")
  |> range(start: -10m)
  |> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_user")
  |> group(columns: ["host"])
  |> window(every: 30s, period: 2m) // Window starts every 30s, lasts for 2m
  |> mean()
  |> group() // ungroup to get results per host

// Example output for the same data and time range:
// _time                _value _field       _measurement host
// -------------------  ------  ----------   ------------ --------
// 2023-10-27T10:00:00Z  10.5    usage_user   cpu          server_a // Avg of 10:00:00, 10:00:30... up to 10:01:59
// 2023-10-27T10:00:00Z  5.2     usage_user   cpu          server_b
// 2023-10-27T10:00:30Z  11.3    usage_user   cpu          server_a // Avg of 10:00:30, 10:01:00... up to 10:02:29
// 2023-10-27T10:00:30Z  5.6     usage_user   cpu          server_b
// 2023-10-27T10:01:00Z  12.15   usage_user   cpu          server_a // Avg of 10:01:00, 10:01:30... up to 10:03:29
// 2023-10-27T10:01:00Z  5.75    usage_user   cpu          server_b
// ... and so on

The _time column in the output of a sliding window represents the start of that specific window. This is crucial for understanding when the aggregated value was valid.

The most surprising thing about InfluxDB’s aggregation is that GROUP BY time() in InfluxQL and window(every: X) in Flux do not create overlapping windows by default; they create discrete, contiguous buckets. Achieving a true sliding window requires a deliberate configuration of every and period in Flux.

The window() function in Flux, when used with every and period set to different values, effectively creates a series of overlapping data partitions. The _start and _stop columns generated by window() represent the boundaries of each partition. Aggregation functions like mean(), sum(), etc., then operate on the data points that fall entirely within each of these partitions.

The next concept you’ll likely encounter is how to handle missing data points within these windows, which often involves using fill() or other data imputation strategies.

Want structured learning?

Take the full Influxdb course →