The most surprising thing about MongoDB sharding is that it doesn’t actually store your data on the shards themselves in a way you’d expect; instead, it stores it on the config servers, and the shards are just workers that access that data.

Let’s see it in action. Imagine you have a users collection.

{
  "_id": ObjectId("..."),
  "name": "Alice",
  "email": "alice@example.com",
  "city": "New York"
}

When you shard this collection, you pick a shard key, say city. MongoDB doesn’t put all New York users on one shard and all London users on another. Instead, the config servers hold metadata about where data for a given shard key value resides. When a query comes in for users where city is "New York", the mongos (the query router) consults the config servers, finds out which shard is responsible for the "New York" range of the shard key, and directs the query there. The shard then retrieves the actual documents for "New York" users from its local storage.

This distribution is managed by three key components:

  1. Config Servers: A replica set that stores the cluster’s metadata. This includes the mapping of shard key ranges to specific shards. If the config servers are down, your entire sharded cluster is effectively read-only and cannot rebalance.
  2. Mongos (Query Routers): These are lightweight processes that act as the interface for your applications. They receive queries, consult the config servers to determine which shard(s) to query, and then aggregate the results before returning them to the application. Your application connects to the mongos instances, not directly to the shards.
  3. Shards: These are the actual MongoDB instances (or replica sets) that store portions of your sharded data. Each shard holds a subset of the data based on the shard key.

The core problem sharding solves is scaling. As your dataset grows and your read/write load increases, a single MongoDB instance eventually hits its limits. Sharding allows you to distribute this load across multiple machines, effectively increasing your cluster’s capacity for both storage and throughput. It’s horizontal scaling, as opposed to vertical scaling (buying a bigger, more powerful server).

The magic behind how data gets distributed is the shard key. This is a field (or fields) in your documents that MongoDB uses to decide which shard a particular document belongs to. Choosing a good shard key is crucial for performance and even distribution. A good shard key has high cardinality (many unique values) and is frequently used in your queries. If your shard key is poorly chosen, you can end up with "hot shards" – a few shards that handle a disproportionate amount of traffic, negating the benefits of sharding.

When you shard a collection, MongoDB doesn’t automatically move all existing data. You typically perform an enable sharding operation on the database, then shard collection with your chosen shard key. MongoDB then starts chunking your data. A "chunk" is a contiguous range of documents defined by the shard key. For example, if your shard key is userId (a number), a chunk might be all documents where userId is between 1000 and 5000. These chunks are distributed across your shards. As more data is inserted, MongoDB splits existing chunks and distributes them to balance the load.

The mongos process is stateless. It doesn’t store any data itself. Its job is to be smart about where to send requests. When it receives a query, it checks its cached metadata about chunk distribution. If the cache is stale, or if it needs to find the location of a specific chunk for an insert, it queries the config servers. This is why config server availability is paramount. If your config servers are overloaded or unavailable, your mongos instances can’t figure out where to send queries, and your application will see errors.

Consider the process of inserting a document: db.mycollection.insertOne({ userId: 1234, data: "..." }). The mongos receives this. It looks at the shard key userId. It consults the config servers to find out which shard is responsible for the range containing 1234. Let’s say it’s shard1. The mongos then forwards the insert operation directly to shard1 (or its replica set primary if shard1 is a replica set). The actual data is written to shard1.

The "chunks" are the fundamental units of data distribution. When a chunk grows too large, MongoDB splits it into two. The metadata about this split, including which shard now holds which half, is updated on the config servers. If the distribution becomes uneven (e.g., one shard has too many chunks or too much data), MongoDB’s balancer process kicks in. The balancer is a background process that runs on one of the mongos instances. It monitors chunk distribution and, if necessary, migrates entire chunks from one shard to another to ensure even load. This migration happens in the background and is transparent to your application, though it can consume network bandwidth and I/O.

One aspect that often trips people up is how MongoDB handles shard key uniqueness. If you shard a collection, you can’t simply enforce a unique index on a field that is not part of the shard key unless that index is also a unique index on the shard key itself. For example, if you shard by city, you can’t have a unique index on email across the entire collection because email could be the same for documents residing on different shards. However, you can have a unique index on userId if userId is your shard key. MongoDB ensures uniqueness at the shard level for non-shard-key unique indexes, but it cannot guarantee global uniqueness across shards without a shard-key-based unique index.

The next concept you’ll grapple with is how to effectively monitor your sharded cluster’s health and performance, particularly identifying potential hot shards or unbalanced data distribution.

Want structured learning?

Take the full Mongodb course →