Try in Colab
WandbLogger
.
Integrate with Lightning
- PyTorch Logger
- Fabric Logger
Using wandb.log(): The
WandbLogger
logs to W&B using the Trainer’s global_step
. If you make additional calls to wandb.log
directly in your code, do not use the step
argument in wandb.log()
.Instead, log the Trainer’s global_step
like your other metrics:
Sign up and create an API key
An API key authenticates your machine to W&B. You can generate an API key from your user profile.For a more streamlined approach, you can generate an API key by going directly to the W&B authorization page. Copy the displayed API key and save it in a secure location such as a password manager.
- Click your user profile icon in the upper right corner.
- Select User Settings, then scroll to the API Keys section.
- Click Reveal. Copy the displayed API key. To hide the API key, reload the page.
Install the wandb
library and log in
To install the wandb
library locally and log in:
- Command Line
- Python
- Python notebook
-
Set the
WANDB_API_KEY
environment variable to your API key. -
Install the
wandb
library and log in.
Use PyTorch Lightning’s WandbLogger
PyTorch Lightning has multiple WandbLogger
classes to log metrics and model weights, media, and more.
To integrate with Lightning, instantiate the WandbLogger
and pass it to Lightning’s Trainer
or Fabric
.
- PyTorch Logger
- Fabric Logger
Common logger arguments
Below are some of the most used parameters inWandbLogger
. Review the PyTorch Lightning documentation for details about all logger arguments.
Parameter | Description |
---|---|
project | Define what wandb Project to log to |
name | Give a name to your wandb run |
log_model | Log all models if log_model="all" or at end of training if log_model=True |
save_dir | Path where data is saved |
Log your hyperparameters
- PyTorch Logger
- Fabric Logger
Log additional config parameters
Log gradients, parameter histogram and model topology
You can pass your model object towandblogger.watch()
to monitor your models’s gradients and parameters as you train. See the PyTorch Lightning WandbLogger
documentation
Log metrics
- PyTorch Logger
- Fabric Logger
You can log your metrics to W&B when using the
WandbLogger
by calling self.log('my_metric_name', metric_vale)
within your LightningModule
, such as in your training_step
or validation_step methods.
The code snippet below shows how to define your LightningModule
to log your metrics and your LightningModule
hyperparameters. This example uses the torchmetrics
library to calculate your metricsLog the min/max of a metric
Using wandb’sdefine_metric
function you can define whether you’d like your W&B summary metric to display the min, max, mean or best value for that metric. If define
_metric
_ isn’t used, then the last value logged with appear in your summary metrics. See the define_metric
reference docs here and the guide here for more.
To tell W&B to keep track of the max validation accuracy in the W&B summary metric, call wandb.define_metric
only once, at the beginning of training:
- PyTorch Logger
- Fabric Logger
Checkpoint a model
To save model checkpoints as W&B Artifacts, use the LightningModelCheckpoint
callback and set the log_model
argument in the WandbLogger
.
- PyTorch Logger
- Fabric Logger
- Via Logger
- Via wandb
- PyTorch Logger
- Fabric Logger
Log images, text, and more
TheWandbLogger
has log_image
, log_text
and log_table
methods for logging media.
You can also directly call wandb.log
or trainer.logger.experiment.log
to log other media types such as Audio, Molecules, Point Clouds, 3D Objects and more.
- Log Images
- Log Text
- Log Tables
WandbLogger
, in this example we log a sample of our validation images and predictions:
Use multiple GPUs with Lightning and W&B
PyTorch Lightning has Multi-GPU support through their DDP Interface. However, PyTorch Lightning’s design requires you to be careful about how you instantiate our GPUs. Lightning assumes that each GPU (or Rank) in your training loop must be instantiated in exactly the same way - with the same initial conditions. However, only rank 0 process gets access to thewandb.run
object, and for non-zero rank processes: wandb.run = None
. This could cause your non-zero processes to fail. Such a situation can put you in a deadlock because rank 0 process will wait for the non-zero rank processes to join, which have already crashed.
For this reason, be careful about how we set up your training code. The recommended way to set it up would be to have your code be independent of the wandb.run
object.
Examples
You can follow along in a video tutorial with a Colab notebook.Frequently Asked Questions
How does W&B integrate with Lightning?
The core integration is based on the Lightningloggers
API, which lets you write much of your logging code in a framework-agnostic way. Logger
s are passed to the Lightning Trainer
and are triggered based on that API’s rich hook-and-callback system. This keeps your research code well-separated from engineering and logging code.
What does the integration log without any additional code?
We’ll save your model checkpoints to W&B, where you can view them or download them for use in future runs. We’ll also capture system metrics, like GPU usage and network I/O, environment information, like hardware and OS information, code state (including git commit and diff patch, notebook contents and session history), and anything printed to the standard out.What if I need to use wandb.run
in my training setup?
You need to expand the scope of the variable you need to access yourself. In other words, make sure that the initial conditions are the same on all processes.
os.environ["WANDB_DIR"]
to set up the model checkpoints directory. This way, any non-zero rank process can access wandb.run.dir
.