SharedArray in k6 allows you to load test data without duplicating it in memory across VUs, saving resources and enabling larger datasets.

import { SharedArray } from 'k6/data';
import http from 'k6/http';

// Load data from a JSON file
const users = new SharedArray('users', function() {
    // This function is executed once by the k6 process, not by each VU.
    // It should return the data to be shared.
    return JSON.parse(open('./users.json'));
});

export default function () {
    // Access the shared data. Each VU gets a reference to the same array.
    const user = users[__VU % users.length]; // Simple distribution

    http.get(`https://example.com/api/users/${user.id}`);
}

Let’s say you have a users.json file like this:

[
  {"id": 1, "name": "Alice"},
  {"id": 2, "name": "Bob"},
  {"id": 3, "name": "Charlie"}
]

When k6 starts, the function provided to SharedArray is executed once by the main k6 process. This function reads the users.json file, parses it, and returns the resulting JavaScript array. This array is then stored in memory by k6. When VUs start, they don’t get a copy of this array; they receive a reference to the original, single instance. This is the core of memory efficiency. Imagine a massive CSV file with millions of user IDs. Loading this into a standard JavaScript array within each VU would quickly exhaust your system’s RAM. SharedArray sidesteps this by ensuring only one copy of the data exists, regardless of the number of VUs.

The primary problem SharedArray solves is the memory overhead associated with loading large datasets for load testing. In typical load testing scenarios, you often need to simulate users with unique credentials, product IDs, or other data points. If you load this data into memory using standard JavaScript array methods within your k6 script, k6 might create a separate copy of that data for every single virtual user (VU). With hundreds or thousands of VUs, this leads to astronomical memory consumption, often crashing the k6 process or the test machine itself. SharedArray elegantly bypasses this by ensuring the data is loaded and held in memory only once, and all VUs access that single instance.

The internal mechanism is straightforward: k6’s data API has a specific constructor, SharedArray. You instantiate it, providing a unique name for the array and a function. This function is critical: it’s the factory that produces your data. Inside this function, you use standard JavaScript I/O operations (like open() to read files, JSON.parse() to parse JSON, or csv() from k6/data to parse CSV) to load your data. Crucially, this function runs only once during the k6 process initialization, before any VUs are spun up. The result of this function (your data, typically an array) is then serialized and stored in a shared memory region accessible by all VUs. When a VU needs to access the data, it doesn’t load it again; it simply retrieves its reference to the already-loaded data.

Consider data distribution. If you have 1000 VUs and 100 users in your SharedArray, you’ll want to distribute these users. A common pattern is using the modulo operator: const user = users[__VU % users.length];. This ensures that VU 0 gets users[0], VU 1 gets users[1], …, VU 100 gets users[0] again, VU 101 gets users[1], and so on, cycling through your dataset. For more complex scenarios, you might shuffle the array once after loading it to ensure more random distribution across VUs, or use libraries like lodash within the SharedArray factory function if you need more advanced data manipulation.

The power of SharedArray lies in its simplicity and its direct attack on a common load testing bottleneck: memory. It’s not just about "loading data"; it’s about loading large amounts of data efficiently. This allows you to test scenarios with millions of unique identifiers, complex request bodies, or vast lookup tables without your load generator becoming the performance bottleneck due to its own resource constraints.

A lesser-known but powerful aspect is that the function passed to SharedArray can perform complex data transformations or even make external calls during initialization. For instance, you could fetch a list of active product IDs from a separate API once when k6 starts and store that list in a SharedArray for use during the test, rather than hitting that API repeatedly within your test logic. This preprocessing capability can save significant time and resources during the actual test execution.

The next logical step after efficiently loading data is to understand how to manage and process that data dynamically during your test.

Want structured learning?

Take the full K6 course →