InfluxDB Cloud Serverless doesn’t actually store data in the traditional sense; it’s a materialized view over data that lives in object storage.
Let’s watch how this plays out with a simple write and query. First, we’ll set up a local InfluxDB OSS instance to simulate the source.
# Install InfluxDB OSS (example for Ubuntu)
wget https://dl.influxdata.com/influxdb/releases/influxdb2-2.7.5-linux_amd64.tar.gz
tar xvfz influxdb2-2.7.5-linux_amd64.tar.gz
sudo mv influxdb2-2.7.5-linux_amd64/usr/bin/influxd /usr/local/bin/
sudo mv influxdb2-2.7.5-linux_amd64/etc/influxdb /etc/
sudo mv influxdb2-2.7.5-linux_amd64/usr/lib/influxdb /usr/lib/
sudo systemctl start influxdb
Now, let’s write some data to this local instance. We’ll use the influx CLI.
# Initialize InfluxDB and create a user/org (if not already done)
influx setup \
--username admin \
--password your_admin_password \
--org your_org_name \
--bucket your_bucket_name \
--token your_admin_token
# Set up environment variables for easier access
export INFLUX_HOST="http://localhost:8086"
export INFLUX_TOKEN="your_admin_token"
export INFLUX_ORG="your_org_name"
export INFLUX_BUCKET="your_bucket_name"
# Write some sample data
influx write \
--bucket "$INFLUX_BUCKET" \
--org "$INFLUX_ORG" \
--precision s \
'cpu,host=server01,region=us-east-1 usage_user=10.5,usage_system=5.2'
With that data in place, we can query it.
influx query \
--bucket "$INFLUX_BUCKET" \
--org "$INFLUX_ORG" \
'from(bucket: "your_bucket_name") |> range(start: -5m) |> filter(fn: (r) => r._measurement == "cpu")'
This is the basic flow: write data, query data. Now, how does Cloud Serverless differ?
InfluxDB Cloud Serverless uses a "change data capture" (CDC) mechanism to stream data from your OSS instance to its cloud infrastructure. This data is then landed in object storage (like Amazon S3, Google Cloud Storage, or Azure Blob Storage) and InfluxDB Cloud Serverless builds its queryable index over that object storage. The "materialized view" part means that InfluxDB Cloud Serverless constantly updates an index that allows it to query the data as if it were in a traditional database, but the actual raw data is in your object store.
This architectural shift means you need to configure a streaming export from your OSS instance. You’ll set up an InfluxDB v2 task that reads data from your OSS bucket and pushes it to a Kafka topic. Then, you’ll configure your InfluxDB Cloud Serverless instance to consume from that Kafka topic.
Here’s a simplified conceptual view of the migration process:
- Set up InfluxDB Cloud Serverless: Create an InfluxDB Cloud account and a Serverless instance. You’ll get an InfluxDB Cloud URL and an API token.
- Set up Kafka: Provision a Kafka cluster (e.g., Confluent Cloud, Amazon MSK, or a self-hosted instance).
- Configure InfluxDB OSS to stream to Kafka:
- Create a Kafka output in your InfluxDB OSS
influxdb.conf. - Create an InfluxDB v2 task that uses the
kafkaoutput to stream data from your source bucket.
- Create a Kafka output in your InfluxDB OSS
- Configure InfluxDB Cloud Serverless to consume from Kafka:
- In your InfluxDB Cloud Serverless UI, navigate to "Data buckets," then "Add data," and select "Kafka."
- Provide your Kafka connection details, topic name, and credentials.
- Map the incoming Kafka data to your Cloud Serverless bucket.
The actual data transfer isn’t a direct copy-paste. Instead, InfluxDB OSS captures changes (writes, updates, deletes) and serializes them into messages that are sent to Kafka. InfluxDB Cloud Serverless then reads these messages and indexes them for querying. This is why it’s "Serverless" – you don’t manage the underlying storage or compute for the data ingestion path; InfluxDB handles it, abstracting away the Kafka and object storage layers.
The key to understanding this is realizing that the InfluxDB Cloud Serverless query engine is constantly operating on data that is eventually consistent with your source OSS instance, because it’s flowing through a streaming pipeline. The data you query in Cloud Serverless is a projection of the data that has been successfully processed from Kafka and indexed.
When migrating, you’ll typically run both instances in parallel for a period. Writes go to your OSS instance, and the CDC task streams them to Kafka, which Cloud Serverless then picks up. You’ll then point your read applications to the Cloud Serverless endpoint. Once you’re confident all data is replicated and queries are correct, you can stop writes to OSS and decommission it.
The critical configuration for the InfluxDB v2 task involves specifying the source bucket, the target Kafka topic, and the serialization format. For example, your task might look something like this (this is a simplified conceptual example, actual task setup involves InfluxDB v2 Tasks API):
// Conceptual InfluxDB v2 Task configuration for Kafka export
{
"id": "your-kafka-export-task-id",
"name": "Stream OSS data to Kafka",
"every": "10s", // How often the task checks for new data
"cron": "",
"status": "active",
"flux": "option task.name = \"Stream OSS data to Kafka\"\n\nfrom(bucket: \"your_bucket_name\")\n |> range(start: uint(0))\n |> filter(fn: (r) => r._measurement == \"cpu\")\n |> to(
# This 'to' function would be configured for Kafka output
# The exact syntax depends on the InfluxDB version and Kafka plugin
# Example:
# topic: \"influxdb-stream\",
# brokers: [\"kafka-broker1:9092\", \"kafka-broker2:9092\"],
# codec: \"json\"
)"
}
The "materialized view" aspect is particularly interesting because it means that while the raw data resides in your object store, InfluxDB Cloud Serverless is performing complex indexing and query optimization on top of it. This allows for fast queries without requiring you to manage traditional database infrastructure. The data you query is the result of InfluxDB Cloud Serverless processing the stream and building its internal structures.
A subtle but important point is how InfluxDB Cloud Serverless handles schema evolution and data types during the streaming process. While InfluxDB is schema-on-write, the streaming pipeline needs to be robust enough to handle variations. The default JSON encoding for Kafka messages is quite flexible, but if your OSS instance has wildly different schemas being written to the same bucket over time, you might encounter issues during ingestion into Cloud Serverless if the target schema mapping isn’t flexible enough.
The next step after successfully setting up the streaming pipeline and verifying data in Cloud Serverless is to optimize query performance by understanding how the Serverless query engine interacts with the underlying object storage and its indexing mechanisms.