Logging Custom Plots
python
fig, ax = plt.subplots()
ax.plot(epochs, train_losses, label='Train')
ax.plot(epochs, val_losses, label='Val')
ax.set_xlabel('Epoch')
ax.set_ylabel('Loss')
ax.legend()
# Log to wandb
wandb.log({"loss_curves": wandb.Image(fig)})
plt.close()
Logging Confusion Matrix
python
from sklearn.metrics import confusion_matrix
import wandb
# Get predictions
y_true = ...
y_pred = ...
# Log confusion matrix
wandb.log({
"confusion_matrix": wandb.plot.confusion_matrix(
probs=None,
y_true=y_true,
preds=y_pred,
class_names=class_names
)
})
Hyperparameter Sweeps (Game-Changer)
W&B makes hyperparameter sweeps almost too easy:
Define Sweep Config
python
sweep_config = {
'method': 'bayes',
'metric': {
'name': 'val_accuracy',
'goal': 'maximize'
},
'parameters': {
'learning_rate': {
'distribution': 'log_uniform_values',
'min': 1e-5,
'max': 1e-1
},
'batch_size': {
'values': [16, 32, 64, 128]
},
'dropout': {
'distribution': 'uniform',
'min': 0.0,
'max': 0.5
},
'optimizer': {
'values': ['adam', 'sgd', 'adamw']
}
}
}
# Initialize sweep
sweep_id = wandb.sweep(sweep_config, project="hyperparameter-search")
Training Function for Sweep
python
def train():
# Initialize run (wandb.init called by sweep agent)
run = wandb.init()
# Get hyperparameters from sweep
config = wandb.config
# Build model with sweep hyperparameters
model = create_model(config.dropout)
if config.optimizer == 'adam':
optimizer = optim.Adam(model.parameters(), lr=config.learning_rate)
elif config.optimizer == 'sgd':
optimizer = optim.SGD(model.parameters(), lr=config.learning_rate, momentum=0.9)
else:
optimizer = optim.AdamW(model.parameters(), lr=config.learning_rate)
# Training loop (same as before)
for epoch in range(10):
train_loss, train_acc = train_epoch(model, optimizer, train_loader, config.batch_size)
val_loss, val_acc = validate(model, val_loader)
wandb.log({
'train_loss': train_loss,
'train_accuracy': train_acc,
'val_loss': val_loss,
'val_accuracy': val_acc,
'epoch': epoch
})
# Run sweep
wandb.agent(sweep_id, train, count=50) # Run 50 experiments
W&B automatically:
- Runs 50 different hyperparameter combinations
- Uses Bayesian optimization to find best settings
- Visualizes all results
- Identifies optimal hyperparameters
I’ve replaced entire weeks of manual hyperparameter tuning with W&B sweeps. It’s legitimately one of the best features.
Organizing Experiments: Projects, Groups, Tags
Keep experiments organized as they grow:
python
# Projects: Top-level organization
wandb.init(project="image-classification")
# Groups: Related runs (e.g., same architecture)
wandb.init(
project="image-classification",
group="resnet-experiments"
)
# Tags: Flexible categorization
wandb.init(
project="image-classification",
group="resnet-experiments",
tags=["baseline", "augmentation", "lr-sweep"]
)
# Notes: Describe the run
wandb.init(
project="image-classification",
notes="Testing new augmentation strategy"
)
This structure makes finding specific experiments later actually possible.
Comparing Runs (Finally Useful)
The dashboard makes comparing runs visual:
- Select multiple runs in UI
- View parallel coordinates plot
- Compare metrics side-by-side
- Filter by hyperparameters
- Identify what changed between runs
No more manually tracking spreadsheets or jupyter notebooks. Everything’s visual and interactive.
Integrations with Popular Frameworks
W&B integrates with everything:
PyTorch Lightning
python
from pytorch_lightning.loggers import WandbLogger
import pytorch_lightning as pl
wandb_logger = WandbLogger(project="lightning-runs")
trainer = pl.Trainer(logger=wandb_logger, max_epochs=10)
trainer.fit(model)
Automatic logging of everything Lightning tracks.
Keras/TensorFlow
python
from wandb.keras import WandbCallback
model.fit(
train_dataset,
validation_data=val_dataset,
epochs=10,
callbacks=[WandbCallback()]
)
Hugging Face Transformers
python
from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(
output_dir='./results',
report_to='wandb', # Enable wandb logging
run_name='bert-finetuning'
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset
)
trainer.train()
Scikit-learn
python
from sklearn.ensemble import RandomForestClassifier
import wandb
wandb.init(project="sklearn-models")
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
# Log metrics
train_score = model.score(X_train, y_train)
test_score = model.score(X_test, y_test)
wandb.log({
"train_accuracy": train_score,
"test_accuracy": test_score
})# Log model
wandb.sklearn.plot_classifier(model, X_train, X_test, y_train, y_test, y_pred, y_probas)
Artifact Tracking (Models, Datasets, etc.)
Track versions of models and datasets:
python
run = wandb.init(project="artifacts-demo")
# Save model
model.save('model.h5')
# Log as artifact
artifact = wandb.Artifact('model', type='model')
artifact.add_file('model.h5')
run.log_artifact(artifact)
# Later: Download artifact
run = wandb.init(project="artifacts-demo")
artifact = run.use_artifact('model:latest')
artifact_dir = artifact.download()
Track dataset versions:
python
artifact = wandb.Artifact('cifar10', type='dataset')
artifact.add_dir('data/cifar10')
run.log_artifact(artifact)
# Use dataset
artifact = run.use_artifact('cifar10:v0')
artifact_dir = artifact.download()
This creates lineage: which model was trained on which dataset version with which code version.
Common Patterns and Best Practices
Pattern 1: Resume Training from Checkpoint
python
run = wandb.init(project="my-project")
run_id = run.id
# Later: Resume
run = wandb.init(
project="my-project",
id=run_id,
resume="must" # Resume existing run
)
Pattern 2: Log System Metrics
python
wandb.init(
project="my-project",
config=config,
settings=wandb.Settings(
_disable_stats=False
)
)
W&B automatically tracks GPU/CPU usage, memory, network, etc.
Pattern 3: Conditional Logging
python
if batch_idx % 100 == 0:
wandb.log({"batch_loss": loss.item()})
# Always log epoch metrics
wandb.log({"epoch_loss": epoch_loss}, step=epoch)
Reduces logging overhead while keeping important metrics.
Pattern 4: Offline Mode
python
# Work offline (syncs later)
import os
os.environ["WANDB_MODE"] = "offline"
wandb.init(project="my-project")
# Train normally
wandb.finish()
# Later: sync offline runs
# wandb sync <run_directory>
Perfect for running on compute without internet.
Common Mistakes to Avoid
Learn from these W&B failures:
Mistake 1: Not Calling wandb.finish()
python
# Bad - run stays active
wandb.init()
train()
# Script ends without wandb.finish()
# Good - explicitly finish
wandb.init()
train()
wandb.finish()
Always call wandb.finish() or use context manager. IMO, unfinished runs are annoying to clean up.
Mistake 2: Logging Too Much
python
for batch in train_loader:
loss = train_step(batch)
wandb.log({"loss": loss})
# Good - log periodically
for i, batch in enumerate(train_loader):
loss = train_step(batch)
if i % 100 == 0:
wandb.log({"loss": loss})
Logging every batch creates massive overhead and noisy charts.
Mistake 3: Not Logging Config
python
# Bad - no hyperparameter tracking
wandb.init(project="my-project")
# Good - log all hyperparameters
wandb.init(
project="my-project",
config={
"learning_rate": 0.001,
"batch_size": 32,
"architecture": "ResNet18"
}
)
Config is crucial for reproducing results.
Mistake 4: Not Using Descriptive Names
python
# Bad - meaningless names
wandb.init(project="project1", name="run1")
# Good - descriptive names
wandb.init(
project="image-classification",
name="resnet18-lr001-batch32",
tags=["baseline", "no-augmentation"]
)
Future you will thank present you for descriptive names. FYI, I’ve wasted hours finding specific runs with bad names. :/
Free vs. Paid: What You Actually Need
Free tier includes:
- Unlimited runs
- 100GB storage
- Public/private projects
- All visualization features
- Basic collaboration
Paid tiers add:
- More storage
- Team features
- Advanced security
- Priority support
- Custom deployment
For individuals and small teams, free tier is completely sufficient. I’ve used free tier for years without hitting limits.
The Bottom Line
Weights & Biases transforms experiment tracking from “I think I used these hyperparameters” to “here’s the exact configuration that produced that result.” It’s not just logging — it’s reproducibility, comparison, and collaboration made trivial.
Use W&B when:
- Running multiple experiments
- Need to reproduce results
- Comparing different approaches
- Working in a team
- Doing hyperparameter sweeps
Skip W&B when:
- Running one-off scripts
- Learning ML basics (focus on fundamentals first)
- Offline-only requirements (though offline mode exists)
For any serious ML work, W&B should be in your stack from day one. The time saved finding old experiments, comparing runs, and reproducing results pays for itself immediately.
Installation:
bash
pip install wandb
wandb login
Stop tracking experiments in your head, spreadsheets, or scattered jupyter notebooks. Start using W&B. Your experiments will be reproducible, comparable, and actually useful for learning what works. The difference between “I can’t remember what I did” and “here’s every detail of every experiment” is the difference between amateur experimentation and professional ML engineering.
Now go log some experiments. Your future self will thank you when you can actually reproduce that great result you got last month. :)
Comments
Post a Comment