You can access private and gated models on the Hugging Face Hub by generating an access token and using it to authenticate your requests.
The most surprising thing about gated models is that they aren’t actually "gated" in the sense of a physical barrier; they’re gated by your agreement to terms. When you click "Agree and Access" on a model card, you’re not unlocking a secret vault; you’re digitally signing a license agreement. Hugging Face’s systems then check if your authentication token has been associated with that agreement.
Let’s see this in action. Imagine you want to use a model that requires accepting its terms, like meta-llama/Llama-2-7b-chat-hf.
First, you need to accept the terms on the model’s page in your browser. Go to https://huggingface.co/meta-llama/Llama-2-7b-chat-hf and find the "Agree and Access" button. Click it.
Now, you need an access token. Head to your Hugging Face settings: https://huggingface.co/settings/tokens. Click "New token" and give it a descriptive name, like llama2_access. Choose a role, typically read is sufficient for downloading models. Click "Generate a token" and copy the long string of characters that appears. Keep this token secret!
To use this token programmatically, you’ll typically use the huggingface_hub Python library.
from huggingface_hub import login
# Replace 'your_hf_token_here' with the token you just generated
login(token="your_hf_token_here")
Once logged in, you can load the model as usual.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
print("Model and tokenizer loaded successfully!")
If you try to load the model without being logged in or without having accepted the terms, you’ll get an error like:
HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json
"You don't have permission to access this model. Make sure you are logged in and have accepted the terms of service for this model."
The underlying mechanism for this is that the transformers library (and other Hugging Face libraries) makes HTTP requests to the Hugging Face Hub. When you use AutoTokenizer.from_pretrained or AutoModelForCausalLM.from_pretrained, these libraries construct URLs to download the model’s configuration, weights, and tokenizer files. Before making the request, they check if you’ve logged in using huggingface_hub.login(). If you are logged in, your token is included in the Authorization: Bearer YOUR_TOKEN header of the HTTP request.
The Hugging Face Hub’s API then checks two things:
- Authentication: Is the provided token valid and associated with an account?
- Authorization: Has the account associated with the token agreed to the terms of service for this specific model?
If either of these checks fails, the server returns a 401 Unauthorized or 403 Forbidden error, which the transformers library translates into a user-friendly error message.
For private models, the process is identical, but the "gating" is purely based on the token’s permissions. Only tokens that have been explicitly granted access to a private model (usually by the model owner) will be able to retrieve it.
It’s important to understand that your token acts as your digital identity when interacting with the Hub. Anyone who has your token can access models on your behalf, including private ones and those for which you’ve accepted terms. Treat it like a password. If you suspect your token has been compromised, go to your settings and revoke it immediately, then generate a new one.
Furthermore, the huggingface_hub library manages a cache of downloaded models. When you load a model, it first checks your local cache. If the model files are present and up-to-date, it uses those, avoiding repeated downloads and network requests. This caching behavior means that once you’ve successfully downloaded a gated model after authenticating, subsequent loads (even without an active internet connection, as long as the cache isn’t cleared) will work seamlessly until the cache is invalidated or the model is updated.
The login() function, when called without arguments, will prompt you interactively for your token and save it to a configuration file (~/.cache/huggingface/token). This is a convenient way to avoid hardcoding your token directly in scripts, though for automated systems, passing the token directly or using environment variables (HF_TOKEN) is more common.
The next hurdle you’ll likely encounter is dealing with models that require specific hardware or software dependencies, or those that have very large file sizes requiring efficient download and loading strategies.