This feature requires python>=3.8
Import data from MLFlow
W&B supports importing data from MLFlow, including experiments, runs, artifacts, metrics, and other metadata. Install dependencies:importer.collect_runs()
collects all runs from the MLFlow server. If you prefer to upload a special subset, you can construct your own runs iterable and pass it to the importer.
You might need to configure the Databricks CLI first if you import from Databricks MLFlow.Set
mlflow-tracking-uri="databricks"
in the previous step.artifacts=False
:
Namespace
:
Export Data
Use the Public API to export or update data that you have saved to W&B. Before using this API, log data from your script. Check the Quickstart for more details. Use Cases for the Public API- Export Data: Pull down a dataframe for custom analysis in a Jupyter Notebook. Once you have explored the data, you can sync your findings by creating a new analysis run and logging results, for example:
wandb.init(job_type="analysis")
- Update Existing Runs: You can update the data logged in association with a W&B run. For example, you might want to update the config of a set of runs to include additional information, like the architecture or a hyperparameter that wasn’t originally logged.
Create an API key
An API key authenticates your machine to W&B. You can generate an API key from your user profile.For a more streamlined approach, you can generate an API key by going directly to the W&B authorization page. Copy the displayed API key and save it in a secure location such as a password manager.
- Click your user profile icon in the upper right corner.
- Select User Settings, then scroll to the API Keys section.
- Click Reveal. Copy the displayed API key. To hide the API key, reload the page.
Find the run path
To use the Public API, you’ll often need the run path which is<entity>/<project>/<run_id>
. In the app UI, open a run page and click the Overview tab to get the run path.
Export Run Data
Download data from a finished or active run. Common usage includes downloading a dataframe for custom analysis in a Jupyter notebook, or using custom logic in an automated environment.Attribute | Meaning |
---|---|
run.config | A dictionary of the run’s configuration information, such as the hyperparameters for a training run or the preprocessing methods for a run that creates a dataset Artifact. Think of these as the run’s inputs. |
run.history() | A list of dictionaries meant to store values that change while the model is training such as loss. The command run.log() appends to this object. |
run.summary | A dictionary of information that summarizes the run’s results. This can be scalars like accuracy and loss, or large files. By default, run.log() sets the summary to the final value of a logged time series. The contents of the summary can also be set directly. Think of the summary as the run’s outputs. |
api.flush()
to get updated values.
Understanding different run attributes
The following code snippet shows how to create a run, log some data, and then access the run’s attributes:run.config
run.summary
Sampling
The default history method samples the metrics to a fixed number of samples (the default is 500, you can change this with thesamples
__ argument). If you want to export all of the data on a large run, you can use the run.scan_history()
method. For more details see the API Reference.
Querying Multiple Runs
- DataFrame and CSVs
- MongoDB Style
This example script finds a project and outputs a CSV of runs with name, configs and summary stats. Replace
<entity>
and <project>
with your W&B entity and the name of your project, respectively.api.runs
returns a Runs
object that is iterable and acts like a list. By default the object loads 50 runs at a time in sequence as required, but you can change the number loaded per page with the per_page
keyword argument.
api.runs
also accepts an order
keyword argument. The default order is -created_at
. To order results ascending, specify +created_at
. You can also sort by config or summary values. For example, summary.val_acc
or config.experiment_name
.
Error Handling
If errors occur while talking to W&B servers awandb.CommError
will be raised. The original exception can be introspected via the exc
attribute.
Get the latest git commit through the API
In the UI, click on a run and then click the Overview tab on the run page to see the latest git commit. It’s also in the filewandb-metadata.json
. Using the public API, you can get the git hash with run.commit
.
Get a run’s name and ID during a run
After callingwandb.init()
you can access the random run ID or the human readable run name from your script like this:
- Unique run ID (8 character hash):
run.id
- Random run name (human readable):
run.name
- Run ID: leave it as the generated hash. This needs to be unique across runs in your project.
- Run name: This should be something short, readable, and preferably unique so that you can tell the difference between different lines on your charts.
- Run notes: This is a great place to put a quick description of what you’re doing in your run. You can set this with
wandb.init(notes="your notes here")
- Run tags: Track things dynamically in run tags, and use filters in the UI to filter your table down to just the runs you care about. You can set tags from your script and then edit them in the UI, both in the runs table and the overview tab of the run page. See the detailed instructions here.
Public API Examples
Export data to visualize in matplotlib or seaborn
Check out our API examples for some common export patterns. You can also click the download button on a custom plot or on the expanded runs table to download a CSV from your browser.Read metrics from a run
This example outputs timestamp and accuracy saved withrun.log({"accuracy": acc})
for a run saved to "<entity>/<project>/<run_id>"
.
Filter runs
You can filters by using the MongoDB Query Language.Date
Read specific metrics from a run
To pull specific metrics from a run, use thekeys
argument. The default number of samples when using run.history()
is 500. Logged steps that do not include a specific metric will appear in the output dataframe as NaN
. The keys
argument will cause the API to sample steps that include the listed metric keys more frequently.
Compare two runs
This will output the config parameters that are different betweenrun1
and run2
.
Update metrics for a run, after the run has finished
This example sets the accuracy of a previous run to0.9
. It also modifies the accuracy histogram of a previous run to be the histogram of numpy_array
.
Rename a metric in a completed run
This example renames a summary column in your tables.Renaming a column only applies to tables. Charts will still refer to metrics by their original names.
Update config for an existing run
This examples updates one of your configuration settings.Export system resource consumptions to a CSV file
The snippet below would find the system resource consumptions and then, save them to a CSV.Get unsampled metric data
When you pull data from history, by default it’s sampled to 500 points. Get all the logged data points usingrun.scan_history()
. Here’s an example downloading all the loss
data points logged in history.
Get paginated data from history
If metrics are being fetched slowly on our backend or API requests are timing out, you can try lowering the page size inscan_history
so that individual requests don’t time out. The default page size is 500, so you can experiment with different sizes to see what works best:
Export metrics from all runs in a project to a CSV file
This script pulls down the runs in a project and produces a dataframe and a CSV of runs including their names, configs, and summary stats. Replace<entity>
and <project>
with your W&B entity and the name of your project, respectively.
Get the starting time for a run
This code snippet retrieves the time at which the run was created.Upload files to a finished run
The code snippet below uploads a selected file to a finished run.Download a file from a run
This finds the file “model-best.h5” associated with run ID uxte44z7 in the cifar project and saves it locally.Download all files from a run
This finds all files associated with a run and saves them locally.Get runs from a specific sweep
This snippet downloads all the runs associated with a particular sweep.Get the best run from a sweep
The following snippet gets the best run from a given sweep.best_run
is the run with the best metric as defined by the metric
parameter in the sweep config.
Download the best model file from a sweep
This snippet downloads the model file with the highest validation accuracy from a sweep with runs that saved model files tomodel.h5
.