InfluxDB’s time-series data deletion isn’t about freeing up space; it’s about making specific data points invisible to queries.

Let’s watch this happen. Imagine you have a measurement called cpu_usage in a bucket named my_bucket within your InfluxDB v2.x instance. You want to remove all cpu_usage records from before January 1st, 2023.

First, we need to construct the API call. We’ll use curl for this example, assuming your InfluxDB URL is http://localhost:8086 and you have an API token YOUR_API_TOKEN.

curl -X POST "http://localhost:8086/api/v2/delete?org=YOUR_ORG_ID&bucket=my_bucket" \
  -H "Authorization: Token YOUR_API_TOKEN" \
  -H "Content-type: application/json" \
  -d '{
    "start": "1970-01-01T00:00:00Z",
    "stop": "2023-01-01T00:00:00Z",
    "predicate": "your_measurement == \"cpu_usage\""
  }'

Here’s what’s happening:

  • POST /api/v2/delete: This is the endpoint for initiating a delete operation.
  • org=YOUR_ORG_ID&bucket=my_bucket: We specify the organization and bucket where the data resides.
  • Authorization: Token YOUR_API_TOKEN: Your authentication token is crucial.
  • Content-type: application/json: We’re sending JSON data.
  • "start": "1970-01-01T00:00:00Z": This defines the beginning of the time range to consider for deletion. We’ve set it to the epoch start to ensure we cover all data up to the stop time.
  • "stop": "2023-01-01T00:00:00Z": This is the exclusive end of the time range. Data points with timestamps exactly at this time will not be deleted. Data points before this time will be.
  • "predicate": "your_measurement == \"cpu_usage\"": This is the filter. It tells InfluxDB which data points within the specified time range to target. Here, we’re targeting points where the measurement name is exactly cpu_usage. You can use more complex predicates involving tags and fields.

The system’s internal model for this is based on a "soft delete" mechanism. When you issue a delete command, InfluxDB doesn’t immediately shred the data blocks. Instead, it marks the relevant data points as deleted. These marked points are then ignored by subsequent queries. This has performance implications, as InfluxDB still has to scan over these "deleted" blocks during queries, even though it won’t return any data from them. Eventually, InfluxDB’s compaction processes will reclaim the physical storage space occupied by these marked points.

The primary problem this solves is the ability to surgically remove specific data without affecting the overall data structure or requiring a full data rebuild. This is essential for compliance, correcting erroneous data, or managing data lifecycle policies. You can delete data based on time range, measurement, tags, and even field values. For instance, to delete all cpu_usage data for a specific host named server-01 before a certain date:

{
  "start": "2023-01-01T00:00:00Z",
  "stop": "2023-02-01T00:00:00Z",
  "predicate": "your_measurement == \"cpu_usage\" AND host == \"server-01\""
}

The predicate uses InfluxQL-like syntax. You can chain conditions with AND or OR. For example, to delete cpu_usage data from server-01 or server-02 within a specific hour:

{
  "start": "2023-01-15T10:00:00Z",
  "stop": "2023-01-15T11:00:00Z",
  "predicate": "your_measurement == \"cpu_usage\" AND (host == \"server-01\" OR host == \"server-02\")"
}

It’s important to understand that the stop time is exclusive. If you want to delete data up to and including January 31st, 2023, your stop time would be 2023-02-01T00:00:00Z.

The surprising part is that while the data is "deleted," it still occupies disk space and can impact query performance until compaction runs. This means that immediately after a delete operation, you might not see a corresponding drop in disk usage. The actual physical removal is a background process.

The next logical step after managing data deletion is understanding data retention policies, which automate the process of removing old data based on predefined rules.

Want structured learning?

Take the full Influxdb course →