Deploy Hugging Face Models in Air-Gapped Environments (2026)

Deploying Hugging Face models in an air-gapped environment is surprisingly straightforward once you understand the core constraint: no internet access.

Let’s see a model in action. Imagine you have a pre-trained sentiment analysis model, distilbert-base-uncased-finetuned-sst-2-english. In a connected environment, you’d just:

from transformers import pipeline

classifier = pipeline("sentiment-analysis")
result = classifier("This is a great day!")
print(result)

Output:

[{'label': 'POSITIVE', 'score': 0.9998708963394165}]

In an air-gapped setup, this direct download fails. The solution is to bring the model and its dependencies to your secure environment.

The fundamental problem air-gapped deployments solve is data isolation. Sensitive data or proprietary models must never touch the public internet. This means we can’t rely on pip or transformers to fetch resources on demand. Instead, we need to pre-package everything.

Here’s how it works internally: Hugging Face models are essentially saved Python objects (PyTorch or TensorFlow weights, tokenizer configurations, model architectures). When you load a model, transformers downloads these components from the Hugging Face Hub. In an air-gapped scenario, you’re manually performing this download on a connected machine and then transferring the files.

The primary lever you control is the source of the model files. You can download specific versions of models and their associated tokenizer files, or you can download entire pre-built Python packages.

Let’s get specific. The most common approach involves downloading the model and tokenizer files directly. On a connected machine:

Download the model and tokenizer:

# Create a directory to store the model artifacts
mkdir /tmp/my_sentiment_model
cd /tmp/my_sentiment_model

# Download the model weights and configuration
huggingface-cli download distilbert-base-uncased-finetuned-sst-2-english --local-dir . --local-dir-use-symlinks False

# The above command downloads files like:
# config.json
# pytorch_model.bin
# tokenizer_config.json
# vocab.txt
# special_tokens_map.json
# etc.

This command tells huggingface-cli to fetch all necessary files for the specified model and place them in the current directory (.). local-dir-use-symlinks False ensures actual files are copied, not symbolic links, which is crucial for offline transfers.

Transfer these files (via USB drive, secure network share, etc.) to your air-gapped machine.

Load the model locally: On the air-gapped machine, point the transformers library to the directory containing the downloaded files.

from transformers import pipeline
import os

# Assuming you've copied the model files to /opt/airgapped_models/my_sentiment_model
model_dir = "/opt/airgapped_models/my_sentiment_model"

# Ensure the directory exists and contains the model files
if not os.path.exists(os.path.join(model_dir, "pytorch_model.bin")):
    raise FileNotFoundError(f"Model files not found in {model_dir}")

classifier = pipeline("sentiment-analysis", model=model_dir, tokenizer=model_dir)
result = classifier("This is a fantastic solution!")
print(result)

Output:

[{'label': 'POSITIVE', 'score': 0.9998742461204529}]

By passing the model_dir path to the model and tokenizer arguments, you instruct transformers to load from your local filesystem instead of the Hugging Face Hub.

Another robust method is to pre-package the entire transformers library and its dependencies along with your model. This is often done using pip wheel and transferring the resulting .whl files.

On a connected machine:

# Create a directory for wheels
mkdir /tmp/airgapped_wheels
cd /tmp/airgapped_wheels

# Install transformers and its dependencies into a temporary location
pip install transformers[torch] --target=/tmp/transformers_install --no-index --find-links ./

# Now create wheels for all installed packages
pip wheel -r /tmp/transformers_install/requirements.txt --wheel-dir . --find-links ./

# You'll also need to explicitly download your model files as shown above.
# Create a separate directory for the model files.
mkdir /tmp/airgapped_model_files
huggingface-cli download distilbert-base-uncased-finetuned-sst-2-english --local-dir /tmp/airgapped_model_files --local-dir-use-symlinks False

This process collects all .whl files needed by transformers (and PyTorch in this case) and places them in /tmp/airgapped_wheels.

Transfer the contents of /tmp/airgapped_wheels and /tmp/airgapped_model_files to your air-gapped environment.

On the air-gapped machine:

# Navigate to the directory containing the wheels
cd /opt/airgapped_packages/wheels

# Install transformers and its dependencies using the local wheels
pip install transformers[torch] --no-index --find-links .

# Now, load your model as before, pointing to the downloaded model files
from transformers import pipeline
import os

model_dir = "/opt/airgapped_packages/model_files/my_sentiment_model" # Adjust path
classifier = pipeline("sentiment-analysis", model=model_dir, tokenizer=model_dir)
result = classifier("This is a fantastic solution!")
print(result)

The key here is pip install --no-index --find-links .. --no-index tells pip not to look at PyPI, and --find-links . tells it to only consider packages found in the current directory (where your transferred wheels are).

Many users don’t realize that the pipeline function, by default, tries to download both the model weights and the tokenizer configuration files. If you only download the weights (pytorch_model.bin or tf_model.h5), but not the tokenizer files (tokenizer.json, tokenizer_config.json, vocab.txt, etc.), transformers will fail to initialize the tokenizer, even if the model weights are present. Ensure you download all files associated with the model from the Hub.

The next hurdle is often managing updates or deploying different models, which requires repeating this entire packaging and transfer process.