Huggingface Articles

Accelerate Hugging Face CPU Inference with Optimum and ONNX

Hugging Face's Optimum library can accelerate CPU inference for transformers by converting models to ONNX format, enabling them to run on the ONNX Runti.

2 min read

Rerank RAG Results with Hugging Face Cross-Encoders

Hugging Face Cross-Encoders can rerank your RAG results by treating the query and each retrieved document as a single input pair, allowing for a much mo.

4 min read

Build Custom Data Collators for Hugging Face Training

Hugging Face's Trainer is surprisingly flexible when it comes to how it batches data, often making you think it's magic.

3 min read

Stream Large Datasets in Hugging Face Without Loading into RAM

Hugging Face datasets library can stream data larger than your RAM, but it doesn't actually stream data in the way you might expect; it streams pointers.

3 min read

Load and Process Large Datasets with Hugging Face Datasets

The Hugging Face datasets library is often thought of as just a way to download and use pre-made datasets, but its real power lies in its ability to eff.

3 min read

Align LLMs with DPO and RLHF Using Hugging Face TRL

Direct Preference Optimization DPO and Reinforcement Learning from Human Feedback RLHF are not just different flavors of fine-tuning; they represent a f.

4 min read

Build Semantic Search with Hugging Face Embedding Models

The most surprising thing about semantic search is that it doesn't actually "understand" your query; it just finds text that is statistically similar in.

2 min read

Benchmark Hugging Face Models with the Evaluate Library

The evaluate library’s primary superpower is its ability to standardize and simplify model evaluation, letting you swap out metrics as easily as changin.

2 min read

Speed Up Transformer Training with Flash Attention 2

FlashAttention 2 doesn't just make attention faster; it fundamentally changes how attention is computed by fusing operations and optimizing memory acces.

3 min read

Control Hugging Face LLM Output with GenerationConfig and Sampling

GenerationConfig and sampling parameters are how you tell a Hugging Face transformers model how to generate text, not what text to generate.

3 min read

Load GGUF Quantized Models with Hugging Face Transformers

The most surprising thing about loading GGUF quantized models with Hugging Face Transformers is that you're not actually using Hugging Face Transformers.

3 min read

Save GPU Memory During Hugging Face Training with Gradient Checkpointing

Gradient checkpointing lets you trade compute for memory by recomputing activations during the backward pass instead of storing them all during the forw.

3 min read

Access Private and Gated Models on the Hugging Face Hub

You can access private and gated models on the Hugging Face Hub by generating an access token and using it to authenticate your requests.

3 min read

Fine-Tune ViT for Image Classification with Hugging Face

Vision Transformers ViTs can learn to classify images with surprising effectiveness, even when trained on datasets much smaller than those typically use.

2 min read

Deploy Hugging Face Models to Production with Inference Endpoints

Hugging Face Inference Endpoints actually makes deploying models to production easier than running them locally in many cases.

2 min read

Format Chat Templates for Instruction-Following Fine-Tuning

The most surprising truth about fine-tuning LLMs for instruction following is that the model often doesn't "understand" instructions in the way humans d.

3 min read

Fine-Tune Llama and Mistral Models with Hugging Face TRL

Fine-tuning Llama and Mistral models with Hugging Face TRL is surprisingly less about "teaching" the model and more about "guiding" its existing knowled.

2 min read

Merge Hugging Face Models with MergeKit for Combined Capabilities

Merge Hugging Face Models with MergeKit for Combined Capabilities — practical guide covering huggingface setup, configuration, and troubleshooting with ...

2 min read

Write Hugging Face Model Cards That Pass Hub Review

Model cards are your chance to make your Hugging Face Model Hub submission shine, but getting them to pass review can feel like a black box.

3 min read

Push and Load Models on the Hugging Face Hub

Hugging Face Hub models aren't just static files; they're dynamic entities you can push to and pull from, effectively acting as a versioned, collaborati.

3 min read

Shard Large Hugging Face Models Across Multiple GPUs

Hugging Face's accelerate library is your best friend here, and it's not just for distributed training; it's for inference too, and it does the heavy li.

3 min read

Build Vision-Language Apps with LLaVA and Hugging Face

LLaVA doesn't just understand images; it can actually reason about them in natural language. Let's see LLaVA in action, pulling it all together with Hug.

2 min read

Fine-Tune LLMs Efficiently with PEFT and LoRA on Hugging Face

PEFT and LoRA allow you to fine-tune massive language models on consumer-grade hardware by only training a tiny fraction of the model's parameters.

4 min read

Run Inference with the Hugging Face Pipeline API in 5 Lines

The Hugging Face pipeline API is a black box that actually lets you run models locally without needing to understand PyTorch or TensorFlow.

2 min read

Deploy Hugging Face Models in Air-Gapped Environments

Deploying Hugging Face models in an air-gapped environment is surprisingly straightforward once you understand the core constraint: no internet access.

3 min read

Fine-Tune LLMs on Consumer GPUs with QLoRA and 4-bit Quantization

QLoRA lets you fine-tune massive language models on consumer-grade GPUs by cleverly packing model weights into 4-bit integers.

4 min read

Fine-Tune a Question Answering Model with Hugging Face

The surprising truth about fine-tuning large language models is that you're not teaching it a new language, but rather how to perform a specific task wi.

3 min read

Train a Reward Model from Human Preferences with Hugging Face TRL

Training a reward model from human preferences is a surprisingly effective way to align large language models with desired behaviors, even when those be.

3 min read

Save and Load Hugging Face Models Safely with SafeTensors

The most surprising thing about safetensors is that it's not just about security; it's fundamentally a faster, more efficient way to serialize and deser.

3 min read

Build Summarization and Translation Models with Seq2Seq Architectures

Seq2Seq models don't actually "understand" language; they're just incredibly sophisticated pattern matchers that learn to map input sequences to output .

2 min read

Deploy a Hugging Face Model Demo with Gradio Spaces

Gradio Spaces can host your Hugging Face models, turning them into interactive web demos with zero infrastructure management.

2 min read

Speed Up Hugging Face Inference with Speculative Decoding

Speculative decoding in Hugging Face isn't just a performance trick; it's a fundamental shift in how we generate text, allowing models to "guess" ahead .

3 min read

Run Stable Diffusion Inference with Hugging Face Diffusers

Stable Diffusion can run inference on a single consumer-grade GPU for under $500, making high-quality image generation accessible to anyone.

2 min read

Generate Synthetic Training Data with Hugging Face Models

The most surprising thing about generating synthetic data with Hugging Face models is that you're not just creating more data, you're actively shaping t.

3 min read

Deploy High-Throughput Text Embeddings with TEI Server

The TEI server can embed more text per second than you'd think, but not if you're doing it wrong. Let's see TEI in action

3 min read

Deploy LLMs at Scale with Hugging Face Text Generation Inference

Deploying large language models LLMs at scale often involves a surprisingly simple underlying principle: treat the model itself as a stateful service th.

3 min read

Fine-Tune Named Entity Recognition Models with Hugging Face

Fine-tuning a Named Entity Recognition NER model isn't about teaching it new words; it's about teaching it a new context for recognizing existing entiti.

4 min read

Train Custom Tokenizers with Hugging Face Tokenizers Library

The Hugging Face tokenizers library is a high-performance Rust-based tokenizer written in Python, designed to be fast and flexible for modern NLP tasks.

3 min read

Train Production Models with the Hugging Face Trainer API

The Hugging Face Trainer API is a surprisingly opinionated, yet incredibly flexible, tool for training PyTorch and TensorFlow models, abstracting away v.

2 min read

Fine-Tune Transformers on a Custom Dataset Step by Step

Fine-tuning a transformer on your own data is less about teaching it a new language and more about teaching it a new accent.

3 min read

Fine-Tune Whisper for Accurate Speech Recognition in Your Language

The most surprising thing about fine-tuning Whisper is that you don't actually need to fine-tune it at all to get it to understand your language better.

3 min read

Classify Text Without Training Data Using Zero-Shot Classification

Zero-shot classification, in its current popular form, doesn't actually classify text; it finds the most relevant description of text from a predefined .

2 min read

Scale Training to Multiple GPUs with Hugging Face Accelerate

Training large models across multiple GPUs is a common bottleneck, and Hugging Face Accelerate is the go-to library for making this seamless.

2 min read

Load Any Model Architecture from the Hugging Face Hub

The Hugging Face Hub isn't just a model repository; it's a dynamic registry where model architectures and their weights are versioned and linked, allowi.

2 min read

Maximize Throughput for Hugging Face Batch Inference

Batch inference on Hugging Face models can be surprisingly tricky to optimize, and the most impactful gains often come from understanding how the model'.

3 min read

Generate Text Embeddings with BERT and Sentence Transformers

BERT and Sentence Transformers can generate text embeddings, but the most surprising thing is that they don't actually "understand" text in the way huma.

3 min read

Quantize Hugging Face Models to 4-bit and 8-bit with BitsAndBytes

Hugging Face models often boast impressive performance, but their sheer size can be a major hurdle for deployment, especially on resource-constrained ha.

3 min read

Log Hugging Face Training Runs to Weights & Biases

Hugging Face's Trainer class can log directly to Weights & Biases, but it's not just a simple wandb. init call; the Trainer needs to be explicitly told .

2 min read

Automate Hugging Face Fine-Tuning Pipelines for Production

Fine-tuning a Hugging Face model for production isn't about fitting more data into a pre-trained network; it's about strategically teaching a model to s.

2 min read

Cut LLM Inference Costs by Self-Hosting Hugging Face Models

Self-hosting Hugging Face models can dramatically slash LLM inference costs, but the real magic isn't just saving money; it's gaining control over your .

2 min read