RAPIDS cuDF and cuML let you run your entire data science workflow on the GPU, not just the final model training.

Here’s a common scenario: you’ve got a massive dataset, and you want to do some feature engineering and then train a machine learning model. Traditionally, you’d load your data into a Pandas DataFrame, do all your transformations, and then convert it into NumPy arrays or similar structures to feed into a GPU-accelerated library like cuDNN or PyTorch. This data transfer between CPU (for Pandas) and GPU (for model training) becomes a huge bottleneck.

RAPIDS, specifically cuDF and cuML, changes this paradigm. cuDF is a DataFrame library that mimics the Pandas API but runs entirely on the GPU. cuML is a machine learning library with a scikit-learn-like API, also running on the GPU. The magic is that cuDF DataFrames can be directly consumed by cuML models without any intermediate data copying.

Let’s see it in action. Imagine you have a CSV file with millions of rows and want to perform some common data science tasks: loading, filtering, grouping, and then training a simple logistic regression model.

import cudf
import cuml
from cuml.model_selection import train_test_split
from cuml.metrics import accuracy_score
import numpy as np

# 1. Load data directly onto the GPU
# Assume 'large_dataset.csv' is your file.
# For demonstration, let's create a dummy one.
data = {
    'feature1': np.random.rand(10000000) * 100,
    'feature2': np.random.rand(10000000) * 50,
    'category': np.random.randint(0, 5, 10000000),
    'target': np.random.randint(0, 2, 10000000)
}
dummy_df = cudf.DataFrame(data)
dummy_df.to_csv('large_dataset.csv', index=False)

# Now load it with cuDF
try:
    df = cudf.read_csv('large_dataset.csv')
    print("Data loaded onto GPU.")
except Exception as e:
    print(f"Error loading data: {e}")
    # If you encounter issues, ensure the file path is correct
    # and you have enough GPU memory.

# 2. Feature Engineering on the GPU using cuDF
# Example: Create a new feature by combining existing ones
df['feature3'] = df['feature1'] * df['feature2']

# Example: Filter rows based on a condition
df_filtered = df[df['feature1'] > 50.0]

# Example: Group by a category and calculate mean
grouped_data = df_filtered.groupby('category').agg({'feature2': 'mean'})
print("\nGrouped data sample:")
print(grouped_data.head())

# 3. Prepare data for cuML
# Separate features and target
X = df_filtered[['feature1', 'feature2', 'feature3']]
y = df_filtered['target']

# Split data for training and testing
# cuML's train_test_split also operates on the GPU
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"\nTraining data shape: X={X_train.shape}, y={y_train.shape}")
print(f"Testing data shape: X={X_test.shape}, y={y_test.shape}")

# 4. Train a model on the GPU using cuML
# Initialize a Logistic Regression model
model = cuml.LogisticRegression(max_iter=1000)

print("\nTraining Logistic Regression model on GPU...")
model.fit(X_train, y_train)
print("Model training complete.")

# 5. Make predictions on the GPU
y_pred = model.predict(X_test)

# 6. Evaluate the model on the GPU
accuracy = accuracy_score(y_test, y_pred)
print(f"\nModel Accuracy: {accuracy:.4f}")

The core problem RAPIDS solves is the CPU-GPU data transfer bottleneck that plagues traditional data science workflows when dealing with large datasets. By keeping data and computations on the GPU, it dramatically reduces latency. cuDF provides a familiar Pandas-like interface, making the transition smoother. It uses Apache Arrow as its in-memory format, which is designed for zero-copy reads and writes between different systems, including GPUs. cuML, in turn, leverages these GPU-resident cuDF DataFrames directly.

The groupby().agg() operation in cuDF, for instance, is highly optimized for parallel execution across GPU cores. Similarly, cuML’s LogisticRegression implementation is a GPU-native algorithm that avoids transferring data back to the CPU for each iteration of the optimization process. When you call model.fit(X_train, y_train), X_train and y_train are already GPU memory objects (likely cuDF Series or cuPy arrays derived from them), and the entire training algorithm runs without ever touching system RAM.

A subtle but powerful aspect of cuDF is its integration with the broader RAPIDS ecosystem. You can seamlessly convert cuDF DataFrames to cuPy arrays (Numpy-like arrays on the GPU) using .values or .to_cupy(), allowing you to use GPU-accelerated numerical libraries like cuPy for custom mathematical operations that might not be directly available in cuDF. This means you can perform complex array manipulations on your GPU-resident data before feeding it back into cuDF or cuML.

The next step in accelerating your data science journey would be to explore distributed training with RAPIDS, allowing you to scale your workflows across multiple GPUs or even multiple machines.

Want structured learning?

Take the full Gpu course →