Deploying Hugging Face models in an air-gapped environment is surprisingly straightforward once you understand the core constraint: no internet access.
Let’s see a model in action. Imagine you have a pre-trained sentiment analysis model, distilbert-base-uncased-finetuned-sst-2-english. In a connected environment, you’d just:
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
result = classifier("This is a great day!")
print(result)
Output:
[{'label': 'POSITIVE', 'score': 0.9998708963394165}]
In an air-gapped setup, this direct download fails. The solution is to bring the model and its dependencies to your secure environment.
The fundamental problem air-gapped deployments solve is data isolation. Sensitive data or proprietary models must never touch the public internet. This means we can’t rely on pip or transformers to fetch resources on demand. Instead, we need to pre-package everything.
Here’s how it works internally: Hugging Face models are essentially saved Python objects (PyTorch or TensorFlow weights, tokenizer configurations, model architectures). When you load a model, transformers downloads these components from the Hugging Face Hub. In an air-gapped scenario, you’re manually performing this download on a connected machine and then transferring the files.
The primary lever you control is the source of the model files. You can download specific versions of models and their associated tokenizer files, or you can download entire pre-built Python packages.
Let’s get specific. The most common approach involves downloading the model and tokenizer files directly. On a connected machine:
-
Download the model and tokenizer:
# Create a directory to store the model artifacts mkdir /tmp/my_sentiment_model cd /tmp/my_sentiment_model # Download the model weights and configuration huggingface-cli download distilbert-base-uncased-finetuned-sst-2-english --local-dir . --local-dir-use-symlinks False # The above command downloads files like: # config.json # pytorch_model.bin # tokenizer_config.json # vocab.txt # special_tokens_map.json # etc.This command tells
huggingface-clito fetch all necessary files for the specified model and place them in the current directory (.).local-dir-use-symlinks Falseensures actual files are copied, not symbolic links, which is crucial for offline transfers. -
Transfer these files (via USB drive, secure network share, etc.) to your air-gapped machine.
-
Load the model locally: On the air-gapped machine, point the
transformerslibrary to the directory containing the downloaded files.from transformers import pipeline import os # Assuming you've copied the model files to /opt/airgapped_models/my_sentiment_model model_dir = "/opt/airgapped_models/my_sentiment_model" # Ensure the directory exists and contains the model files if not os.path.exists(os.path.join(model_dir, "pytorch_model.bin")): raise FileNotFoundError(f"Model files not found in {model_dir}") classifier = pipeline("sentiment-analysis", model=model_dir, tokenizer=model_dir) result = classifier("This is a fantastic solution!") print(result)Output:
[{'label': 'POSITIVE', 'score': 0.9998742461204529}]By passing the
model_dirpath to themodelandtokenizerarguments, you instructtransformersto load from your local filesystem instead of the Hugging Face Hub.
Another robust method is to pre-package the entire transformers library and its dependencies along with your model. This is often done using pip wheel and transferring the resulting .whl files.
-
On a connected machine:
# Create a directory for wheels mkdir /tmp/airgapped_wheels cd /tmp/airgapped_wheels # Install transformers and its dependencies into a temporary location pip install transformers[torch] --target=/tmp/transformers_install --no-index --find-links ./ # Now create wheels for all installed packages pip wheel -r /tmp/transformers_install/requirements.txt --wheel-dir . --find-links ./ # You'll also need to explicitly download your model files as shown above. # Create a separate directory for the model files. mkdir /tmp/airgapped_model_files huggingface-cli download distilbert-base-uncased-finetuned-sst-2-english --local-dir /tmp/airgapped_model_files --local-dir-use-symlinks FalseThis process collects all
.whlfiles needed bytransformers(and PyTorch in this case) and places them in/tmp/airgapped_wheels. -
Transfer the contents of
/tmp/airgapped_wheelsand/tmp/airgapped_model_filesto your air-gapped environment. -
On the air-gapped machine:
# Navigate to the directory containing the wheels cd /opt/airgapped_packages/wheels # Install transformers and its dependencies using the local wheels pip install transformers[torch] --no-index --find-links . # Now, load your model as before, pointing to the downloaded model files from transformers import pipeline import os model_dir = "/opt/airgapped_packages/model_files/my_sentiment_model" # Adjust path classifier = pipeline("sentiment-analysis", model=model_dir, tokenizer=model_dir) result = classifier("This is a fantastic solution!") print(result)The key here is
pip install --no-index --find-links ..--no-indextellspipnot to look at PyPI, and--find-links .tells it to only consider packages found in the current directory (where your transferred wheels are).
Many users don’t realize that the pipeline function, by default, tries to download both the model weights and the tokenizer configuration files. If you only download the weights (pytorch_model.bin or tf_model.h5), but not the tokenizer files (tokenizer.json, tokenizer_config.json, vocab.txt, etc.), transformers will fail to initialize the tokenizer, even if the model weights are present. Ensure you download all files associated with the model from the Hub.
The next hurdle is often managing updates or deploying different models, which requires repeating this entire packaging and transfer process.