InfluxDB’s query languages, Flux and InfluxQL, aren’t just different syntaxes; they represent fundamentally different approaches to time-series data manipulation, with Flux offering a far more powerful and flexible paradigm than its predecessor.
Let’s see Flux in action. Imagine you have temperature and humidity data from a sensor, and you want to find the average temperature and humidity over 5-minute intervals, but only when the humidity is above 60%.
data
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "environment")
|> filter(fn: (r) => r.sensor_id == "sensor-01")
|> filter(fn: (r) => r._field == "temperature" or r._field == "humidity")
|> aggregateWindow(every: 5m, fn: mean)
|> group(columns: ["_time"])
|> join(
right: data
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "environment")
|> filter(fn: (r) => r.sensor_id == "sensor-01")
|> filter(fn: (r) => r._field == "humidity")
|> aggregateWindow(every: 5m, fn: mean),
on: ["_time"]
)
|> filter(fn: (r) => r.humidity > 60.0)
|> group() // Ungroup for final output
This Flux query demonstrates its functional, pipeline-based nature. Data flows through a series of transformations, each step refining the dataset. First, we select the relevant data (range, filter by measurement, sensor, and fields). Then, we aggregate the temperature and humidity separately over 5-minute windows using aggregateWindow. The join function then combines these aggregated temperature and humidity streams based on their _time values. Finally, we filter for humidity above 60 and group() to get a clean output.
InfluxQL, on the other hand, is more SQL-like. The equivalent query would look something like this:
SELECT
mean("temperature"),
mean("humidity")
FROM "environment"
WHERE "sensor_id" = 'sensor-01'
AND time > now() - 1h
GROUP BY time(5m)
HAVING mean("humidity") > 60
While concise for simple aggregations, InfluxQL struggles with complex operations like joining data from different fields within the same measurement, or performing operations across different measurements without resorting to subqueries that quickly become unmanageable. Flux, with its explicit data manipulation functions, handles these scenarios elegantly.
The core problem Flux solves is the inherent complexity of time-series data. Unlike relational databases, time-series data has a temporal dimension that’s crucial to its meaning. Flux’s design directly addresses this by treating time as a first-class citizen and providing a rich set of functions for time-based operations.
Internally, Flux is a functional programming language. This means data is transformed through a series of pure functions, where each function takes input data and produces output data without side effects. This makes queries more predictable, testable, and easier to reason about, especially as they grow in complexity. The |> operator is the pipe, chaining these functions together.
The key levers you control in Flux are the range function to define your time window, filter to narrow down your data by tags and fields, aggregateWindow for temporal aggregations (like mean, sum, max), and join to combine data streams. Beyond these, Flux offers powerful functions for windowing, sampling, and even statistical analysis, allowing for sophisticated data analysis directly within InfluxDB.
What many users don’t immediately grasp is how Flux’s group() and ungroup() functions interact with subsequent operations. If you aggregateWindow and then immediately group(), you’re effectively creating distinct tables for each unique combination of tag values within that time window. Subsequent operations, like join, will then operate on these separate tables. Explicitly calling group() without arguments at the end of a series of transformations ungroups the data back into a single stream, which is often necessary before final output or before certain joins that expect a single input stream.
The next logical step after mastering basic Flux queries is to explore its advanced statistical and machine learning functions.