Node.js Worker Threads and Child Processes both let you run JavaScript in parallel, but they’re fundamentally different beasts.
Here’s a quick look at a typical Node.js application handling a computationally intensive task, like image resizing, without parallelism.
const fs = require('fs');
const sharp = require('sharp');
async function resizeImage(inputPath, outputPath, width) {
try {
await sharp(inputPath)
.resize(width)
.toFile(outputPath);
console.log(`Image resized and saved to ${outputPath}`);
} catch (error) {
console.error(`Error resizing image: ${error}`);
}
}
// Imagine this is called many times concurrently
resizeImage('large.jpg', 'small.jpg', 200);
Now, let’s see how a worker thread handles this.
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');
const fs = require('fs');
const sharp = require('sharp');
if (isMainThread) {
console.log('Main thread started.');
const worker = new Worker(__filename, {
workerData: { inputPath: 'large.jpg', outputPath: 'small_worker.jpg', width: 200 }
});
worker.on('message', (msg) => {
console.log(`Message from worker: ${msg}`);
});
worker.on('error', (err) => {
console.error('Worker error:', err);
});
worker.on('exit', (code) => {
if (code !== 0) {
console.error(`Worker stopped with exit code ${code}`);
} else {
console.log('Worker finished successfully.');
}
});
} else {
// This is the worker thread
async function resizeImageWorker() {
const { inputPath, outputPath, width } = workerData;
try {
await sharp(inputPath)
.resize(width)
.toFile(outputPath);
parentPort.postMessage(`Image resized to ${outputPath}`);
} catch (error) {
throw error; // Throwing will trigger the 'error' event in the parent
}
}
resizeImageWorker();
}
And here’s the same task using a child process.
const { fork } = require('child_process');
const path = require('path');
console.log('Main thread started.');
const child = fork(path.join(__dirname, 'child.js'), ['large.jpg', 'small_child.jpg', '200']);
child.on('message', (msg) => {
console.log(`Message from child: ${msg}`);
});
child.on('error', (err) => {
console.error('Child process error:', err);
});
child.on('exit', (code) => {
if (code !== 0) {
console.error(`Child process stopped with exit code ${code}`);
} else {
console.log('Child process finished successfully.');
}
});
// child.js content:
// const fs = require('fs');
// const sharp = require('sharp');
//
// async function resizeImageChild() {
// const [inputPath, outputPath, widthStr] = process.argv.slice(2);
// const width = parseInt(widthStr, 10);
// try {
// await sharp(inputPath)
// .resize(width)
// .toFile(outputPath);
// process.send(`Image resized to ${outputPath}`);
// } catch (error) {
// console.error(`Error resizing image in child: ${error}`);
// process.exit(1); // Indicate an error
// }
// }
// resizeImageChild();
Worker Threads are designed for CPU-bound tasks that can be broken down into smaller, independent pieces. They share the same memory space as the main thread, which means they can communicate and access data much more efficiently. Think of them as lightweight threads within the same Node.js process. This shared memory comes with a caveat: you need to be careful about race conditions if multiple threads try to modify the same data. For most CPU-bound tasks that don’t involve heavy data sharing, like complex calculations or image processing, worker threads offer a significant performance boost with lower overhead than child processes.
Child Processes, on the other hand, are full-blown Node.js instances. Each child process has its own V8 instance, its own event loop, and its own memory. This isolation makes them ideal for I/O-bound tasks, or when you need to run external commands or code that might be unstable and could crash the main process. Communication between child processes and the parent is done via message passing, which is more heavyweight than shared memory but safer due to the isolation. If you’re running a separate Node.js script, executing shell commands, or need robust isolation for potentially crashing code, child processes are the way to go.
The key difference lies in memory management and communication. Worker threads use SharedArrayBuffer for true shared memory, allowing threads to operate on the same data directly. For data that isn’t shared, they use a structured cloning algorithm, similar to postMessage in web workers. Child processes, however, have no shared memory by default. All communication happens through inter-process communication (IPC) channels, typically using process.send() and process.on('message') (or child.send() and child.on('message') for fork). This means data must be serialized and deserialized, adding overhead.
When you use worker_threads, the workerData option is a way to pass initial data to the worker. This data is cloned, not shared. For actual shared memory, you’d use SharedArrayBuffer. The parentPort is how the worker communicates back to the main thread. The worker.on('message') listener on the main thread receives these messages.
With child_process.fork(), you’re essentially spawning a new Node.js process. The fork method is a special case of spawn that provides an IPC channel, making it easier to communicate. The child script typically uses process.send() to send messages back to the parent, and the parent listens on child.on('message'). Command-line arguments are passed via process.argv in the child.
The most surprising true thing about worker threads is that while they can share memory with SharedArrayBuffer, this isn’t their default or most common mode of operation for typical task distribution. Most often, data passed to a worker is cloned, and messages are sent back and forth, much like with child processes, but with lower overhead because they are still within the same Node.js process context. This can lead to confusion if you expect true shared memory by default.
If you’re running a long-running, CPU-intensive task that doesn’t require much data exchange, worker threads will likely be more performant due to less overhead. If your task involves running external programs, needs strong isolation, or involves significant I/O operations that can be handled independently, child processes are more appropriate.
The next thing to consider is how to manage and orchestrate multiple worker threads or child processes efficiently, especially when dealing with dynamic workloads.