Original training script
Suppose you have a Python script that trains a model (see below). Your goal is to find the hyperparameters that maxmimizes the validation accuracy(val_acc
).
In your Python script, you define two functions: train_one_epoch
and evaluate_one_epoch
. The train_one_epoch
function simulates training for one epoch and returns the training accuracy and loss. The evaluate_one_epoch
function simulates evaluating the model on the validation data set and returns the validation accuracy and loss.
You define a configuration dictionary (config
) that contains hyperparameter values such as the learning rate (lr
), batch size (batch_size
), and number of epochs (epochs
). The values in the configuration dictionary control the training process.
Next you define a function called main
that mimics a typical training loop. For each epoch, the accuracy and loss is computed on the training and validation data sets.
This code is a mock training script. It does not train a model, but simulates the training process by generating random accuracy and loss values. The purpose of this code is to demonstrate how to integrate W&B into your training script.
val_acc
).
Training script with W&B Python SDK
How you integrate W&B to your Python script or notebook depends on how you manage sweeps. You can start a sweep job within a Python notebook or script or from the command line.- Python script or notebook
- CLI
Add the following to your Python script:
- Create a dictionary object where the key-value pairs define a sweep configuration. The sweep configuration defines the hyperparameters you want W&B to explore on your behalf along with the metric you want to optimize. Continuing from the previous example, the batch size (
batch_size
), epochs (epochs
), and the learning rate (lr
) are the hyperparameters to vary during each sweep. You want to maximize the accuracy of the validation score so you set"goal": "maximize"
and the name of the variable you want to optimize for, in this caseval_acc
("name": "val_acc"
). - Pass the sweep configuration dictionary to
wandb.sweep()
. This initializes the sweep and returns a sweep ID (sweep_id
). For more information, see Initialize sweeps. - At the top of your script, import the W&B Python SDK (
wandb
). - Within your
main
function, use thewandb.init()
API to generate a background process to sync and log data as a W&B Run. Pass the project name as a parameter to thewandb.init()
method. If you do not pass a project name, W&B uses the default project name. - Fetch the hyperparameter values from the
wandb.Run.config
object. This allows you to use the hyperparameter values defined in the sweep configuration dictionary instead of hard coded values. - Log the metric you are optimizing for to W&B using
wandb.Run.log()
. You must log the metric defined in your configuration. For example, if you define the metric to optimize asval_acc
, you must logval_acc
. If you do not log the metric, W&B does not know what to optimize for. Within the configuration dictionary (sweep_configuration
in this example), you define the sweep to maximize theval_acc
value. - Start the sweep with
wandb.agent()
. Provide the sweep ID and the name of the function the sweep will execute (function=main
), and specify the maximum number of runs to try to four (count=4
).
Logging metrics to W&B in a sweepYou must log the metric you define and are optimizing for in both your sweep configuration and with The following is an incorrect example of logging the metric to W&B. The metric that is optimized for in the sweep configuration is
wandb.Run.log()
. For example, if you define the metric to optimize as val_acc
within your sweep configuration, you must also log val_acc
to W&B. If you do not log the metric, W&B does not know what to optimize for.val_acc
, but the code logs val_acc
within a nested dictionary under the key validation
. You must log the metric directly, not within a nested dictionary.