InfluxDB doesn’t actually store data points in chronological order by default, which is why getting the "first" and "last" requires a specific approach.
Let’s see it in action. Imagine we have a measurement called cpu with a field usage_user and a tag host. We want to find the earliest and latest recorded usage for host: server01.
-- First, let's look at the data we have
SELECT "usage_user" FROM "cpu" WHERE "host" = 'server01'
-- This might return something like:
-- time usage_user
-- ------------------- ----------
-- 2023-10-27T10:00:00Z 0.5
-- 2023-10-27T10:01:00Z 0.6
-- 2023-10-27T10:02:00Z 0.55
-- 2023-10-27T10:03:00Z 0.7
-- 2023-10-27T10:04:00Z 0.65
-- To get the first data point (earliest time)
SELECT first("usage_user") FROM "cpu" WHERE "host" = 'server01'
-- To get the last data point (latest time)
SELECT last("usage_user") FROM "cpu" WHERE "host" = 'server01'
The first() and last() functions in InfluxQL are designed to operate on the time-ordered series of data points within a query’s result set. When you run a query without an explicit ORDER BY time DESC, InfluxDB’s default behavior is to return data points ordered by time ascending. So, first() on an unordered query will pick the point with the earliest timestamp, and last() will pick the point with the latest timestamp.
This makes sense when you’re analyzing a single time series, like a specific host’s CPU usage. But what if you want the first and last data points across multiple series? For example, the first data point recorded by any host and the last data point recorded by any host?
If you group by a tag, like host, the first() and last() functions will operate independently on each group.
SELECT first("usage_user"), last("usage_user") FROM "cpu" WHERE "host" = 'server01' GROUP BY "host"
This query will give you the first and last usage_user for server01, but the output is still tied to that specific host.
The crucial insight is that first() and last() are aggregation functions. They don’t scan the entire database for the absolute earliest or latest timestamp across all data. Instead, they look at the set of points returned by your WHERE clause and, if you’re grouping, within each group. By default, InfluxDB returns data ordered by time ascending, so first() grabs the first point in that ordered set and last() grabs the last.
If you need the absolute first and last data point across your entire database, you’d typically achieve this by querying a broad time range and then using first() and last() without any GROUP BY clause and with the default time ordering.
-- This will give you the absolute earliest and latest recorded value for 'usage_user' across all hosts
SELECT first("usage_user"), last("usage_user") FROM "cpu"
However, running this query on a very large dataset can be inefficient because InfluxDB might still need to scan a significant number of points to determine the true first and last. For performance on large datasets, it’s often better to query a specific, known time range where you expect the first/last data to exist, or to use downsampling and aggregate functions that inherently track min/max over time.
The real trick is understanding that first() and last() are sensitive to the GROUP BY clause. If you group by host, you get the first and last for each host. If you don’t group, and InfluxDB returns data ordered by time ascending, you get the overall first and last from the returned subset.
The next logical step is to figure out how to get the absolute minimum and maximum timestamp values from a measurement, not just the first/last point associated with a specific field or value.