Advanced Search Algorithms
Ray Tune supports sophisticated search algorithms beyond random/grid search:
Bayesian Optimization (Hyperopt)
python
from ray.tune.search.hyperopt import HyperOptSearch
# Create Bayesian optimizer
hyperopt_search = HyperOptSearch(
metric="accuracy",
mode="max"
)
# Run with Bayesian optimization
analysis = tune.run(
train_model,
config=search_space,
num_samples=50,
search_alg=hyperopt_search,
metric="accuracy",
mode="max"
)
Bayesian optimization learns which hyperparameters are promising and focuses search there. Way more efficient than random search.
Optuna (Another Bayesian Method)
python
from ray.tune.search.optuna import OptunaSearch
optuna_search = OptunaSearch(
metric="accuracy",
mode="max"
)
analysis = tune.run(
train_model,
config=search_space,
num_samples=50,
search_alg=optuna_search
)
Optuna is excellent and has strong pruning capabilities.
Population-Based Training (PBT)
python
from ray.tune.schedulers import PopulationBasedTraining
# PBT scheduler
pbt = PopulationBasedTraining(
time_attr="training_iteration",
metric="accuracy",
mode="max",
perturbation_interval=4,
hyperparam_mutations={
"lr": tune.loguniform(1e-4, 1e-1),
"momentum": [0.8, 0.9, 0.95, 0.99]
}
)
analysis = tune.run(
train_model,
config=search_space,
num_samples=20,
scheduler=pbt
)
PBT is amazing — it trains multiple models and occasionally “exploits” good performers by copying their weights and “exploring” by perturbing hyperparameters. Used by DeepMind for many papers.
BOHB (Bayesian Optimization + HyperBand)
python
from ray.tune.search.bohb import TuneBOHB
from ray.tune.schedulers import HyperBandForBOHB
# BOHB algorithm
bohb_hyperband = HyperBandForBOHB(
time_attr="training_iteration",
metric="accuracy",
mode="max"
)
bohb_search = TuneBOHB(
metric="accuracy",
mode="max"
)
analysis = tune.run(
train_model,
config=search_space,
num_samples=50,
search_alg=bohb_search,
scheduler=bohb_hyperband
)
BOHB combines Bayesian optimization’s intelligence with HyperBand’s early stopping efficiency. Often the best choice for expensive training.
Early Stopping (Stop Wasting Compute)
Early stopping kills bad trials early:
ASHA (Async Successive Halving)
python
from ray.tune.schedulers import ASHAScheduler
# ASHA scheduler
asha = ASHAScheduler(
time_attr='training_iteration',
metric='accuracy',
mode='max',
max_t=100, # Maximum training iterations
grace_period=10, # Minimum iterations before stopping
reduction_factor=3 # Keep top 1/3 of trials
)
analysis = tune.run(
train_model,
config=search_space,
num_samples=100,
scheduler=asha
)
ASHA stops unpromising trials early. If a trial performs poorly after 10 epochs, it gets killed. Massive time savings.
Median Stopping Rule
python
from ray.tune.schedulers import MedianStoppingRule
# Stop if below median performance
median_stop = MedianStoppingRule(
time_attr="training_iteration",
metric="accuracy",
mode="max",
grace_period=5,
min_samples_required=3
)
analysis = tune.run(
train_model,
config=search_space,
num_samples=50,
scheduler=median_stop
)
Kills trials performing below median. Simple but effective.
Checkpointing (Resume from Failures)
Save trial progress to resume interrupted searches:
python
from ray import tune
import torch
def trainable_with_checkpointing(config, checkpoint_dir=None):
"""Training function with checkpointing."""
model = create_model(config)
optimizer = create_optimizer(config, model)
# Load checkpoint if exists
start_epoch = 0
if checkpoint_dir:
checkpoint = torch.load(f"{checkpoint_dir}/checkpoint.pt")
model.load_state_dict(checkpoint["model_state"])
optimizer.load_state_dict(checkpoint["optimizer_state"])
start_epoch = checkpoint["epoch"] + 1
# Training loop
for epoch in range(start_epoch, config["num_epochs"]):
train_loss = train_epoch(model, optimizer)
val_accuracy = validate(model)
# Save checkpoint
with tune.checkpoint_dir(step=epoch) as checkpoint_dir:
torch.save({
"epoch": epoch,
"model_state": model.state_dict(),
"optimizer_state": optimizer.state_dict()
}, f"{checkpoint_dir}/checkpoint.pt")
# Report metrics
tune.report(loss=train_loss, accuracy=val_accuracy)
# Run with checkpointing
analysis = tune.run(
trainable_with_checkpointing,
config=search_space,
num_samples=20,
checkpoint_freq=5, # Checkpoint every 5 iterations
keep_checkpoints_num=2 # Keep last 2 checkpoints
)
If a trial fails or search is interrupted, Ray Tune resumes from the last checkpoint.
Distributed Tuning Across Multiple Machines
Scale to a cluster with minimal code changes:
python
import ray
from ray import tune
# Connect to Ray cluster
ray.init(address="auto") # Connects to cluster head node
# Same tune.run() call - automatically uses cluster
analysis = tune.run(
train_model,
config=search_space,
num_samples=200, # Many more trials
resources_per_trial={"cpu": 4, "gpu": 1}
)
Set up a Ray cluster:
bash
ray start --head --port=6379
# On worker nodes
ray start --address='<head-node-ip>:6379'
# Run tuning script
python tune_distributed.py
Ray Tune automatically distributes trials across the cluster. What would take days on one machine now takes hours across many.
Integration with Popular Frameworks
Ray Tune integrates seamlessly with major frameworks:
PyTorch Lightning
python
from ray.tune.integration.pytorch_lightning import TuneReportCallback
import pytorch_lightning as pl
def train_pl(config):
model = MyLightningModule(config)
trainer = pl.Trainer(
max_epochs=10,
callbacks=[
TuneReportCallback(
metrics={"loss": "val_loss", "accuracy": "val_acc"},
on="validation_end"
)
]
)
trainer.fit(model)
analysis = tune.run(
train_pl,
config=search_space,
num_samples=50
)
Keras/TensorFlow
python
from ray.tune.integration.keras import TuneReportCallback
def train_keras(config):
model = build_model(config)
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)
model.fit(
train_data,
validation_data=val_data,
epochs=10,
callbacks=[TuneReportCallback({"loss": "val_loss"})]
)
analysis = tune.run(train_keras, config=search_space, num_samples=50)
Scikit-learn
python
from sklearn.ensemble import RandomForestClassifier
def train_sklearn(config):
model = RandomForestClassifier(
n_estimators=config["n_estimators"],
max_depth=config["max_depth"],
min_samples_split=config["min_samples_split"]
)
model.fit(X_train, y_train)
accuracy = model.score(X_test, y_test)
tune.report(accuracy=accuracy)
search_space = {
"n_estimators": tune.randint(10, 200),
"max_depth": tune.randint(2, 20),
"min_samples_split": tune.randint(2, 20)
}analysis = tune.run(train_sklearn, config=search_space, num_samples=100)
Logging and Visualization
Ray Tune integrates with experiment tracking tools:
TensorBoard
python
from ray.tune.integration.tensorboard import TBXLoggerCallback
analysis = tune.run(
train_model,
config=search_space,
callbacks=[TBXLoggerCallback()],
num_samples=50
)
# View results
# tensorboard --logdir ~/ray_results
Weights & Biases
python
from ray.tune.integration.wandb import WandbLoggerCallback
analysis = tune.run(
train_model,
config=search_space,
callbacks=[
WandbLoggerCallback(
project="ray-tune-optimization",
api_key="<your-key>"
)
],
num_samples=50
)
MLflow
python
from ray.tune.integration.mlflow import mlflow_mixin
@mlflow_mixin
def train_model(config):
# Training code
pass
analysis = tune.run(train_model, config=search_space, num_samples=50)
Best Practices and Patterns
Pattern 1: Resource-Efficient Trial Packing
python
analysis = tune.run(
train_model,
resources_per_trial={"cpu": 2, "gpu": 0.25},
num_samples=100
)
Maximizes GPU utilization by running multiple trials simultaneously.
Pattern 2: Smart Search Space Design
python
search_space = {
"lr": tune.loguniform(1e-5, 1e-1),
"batch_size": tune.choice([32, 64, 128]),
"dropout": tune.uniform(0.0, 0.5)
}
# Bad - linear for learning rate
search_space = {
"lr": tune.uniform(0.00001, 0.1) # Biased toward larger values
}
Use appropriate distributions for each hyperparameter type.
Pattern 3: Combining Search Algorithm with Scheduler
python
from ray.tune.search.hyperopt import HyperOptSearch
from ray.tune.schedulers import ASHAScheduler
# Best of both worlds
hyperopt = HyperOptSearch(metric="accuracy", mode="max")
asha = ASHAScheduler(metric="accuracy", mode="max", grace_period=5)
analysis = tune.run(
train_model,
config=search_space,
num_samples=100,
search_alg=hyperopt, # Smart search
scheduler=asha # Early stopping
)
Intelligent search + early stopping = maximum efficiency.
Common Mistakes to Avoid
Learn from these Ray Tune failures:
Mistake 1: Not Using Early Stopping
python
analysis = tune.run(train_model, config=search_space, num_samples=100)
# Good - kills bad trials early
asha = ASHAScheduler(metric="accuracy", mode="max")
analysis = tune.run(
train_model,
config=search_space,
num_samples=100,
scheduler=asha
)
Early stopping can save 50–80% of compute time. Always use it.
Mistake 2: Wrong Resource Allocation
python
resources_per_trial={"gpu": 1}
# Good - pack multiple trials
resources_per_trial={"gpu": 0.25} # 4 trials per GPU
Pack trials onto GPUs when memory allows. IMO, this alone 4x-ed my search speed.
Mistake 3: Not Checkpointing
Long searches without checkpointing are risky. One failure loses everything. Always checkpoint.
Mistake 4: Ignoring Search Algorithm Choice
python
tune.run(train_model, config=search_space, num_samples=100)
# Better - Bayesian optimization
from ray.tune.search.hyperopt import HyperOptSearch
tune.run(
train_model,
config=search_space,
num_samples=100,
search_alg=HyperOptSearch(metric="accuracy", mode="max")
)
Smart algorithms find good hyperparameters with fewer trials. Random search is fine for small searches (<20 trials) but wasteful for larger ones.
The Bottom Line
Ray Tune transforms hyperparameter optimization from sequential, manual, and slow to parallel, intelligent, and fast. It’s not just about speed — it’s about finding better hyperparameters more efficiently while using all available compute.
Use Ray Tune when:
- Hyperparameter tuning takes significant time
- You have multiple GPUs/CPUs available
- You want intelligent search algorithms
- Scaling to clusters makes sense
- Early stopping could save compute
Skip Ray Tune when:
- Search space is tiny (<10 trials)
- Single hyperparameter testing
- Learning ML basics
- Resources are extremely limited
For serious ML work involving hyperparameter optimization, Ray Tune should be in your stack. The parallelization alone justifies it, but add intelligent search algorithms and early stopping, and it’s a no-brainer.
Installation:
bash
pip install "ray[tune]" optuna
Stop running hyperparameter searches sequentially on one GPU. Start using Ray Tune to parallelize across all available compute with intelligent algorithms. What takes days becomes hours. What was impossible on one machine becomes feasible across a cluster. That’s the difference between amateur hyperparameter tuning and professional ML optimization. :)
Comments
Post a Comment