- Create and use a W&B Artifact.
- Use and create Registered Models in W&B Registry.
- Run training jobs on dedicated compute using W&B Launch.
- Use the wandb client in ops and assets.
wandb_resource
: a Dagster resource used to authenticate and communicate to the W&B API.wandb_artifacts_io_manager
: a Dagster IO Manager used to consume W&B Artifacts.
Before you get started
You will need the following resources to use Dagster within W&B:- W&B API Key.
- W&B entity (user or team): An entity is a username or team name where you send W&B Runs and Artifacts. Make sure to create your account or team entity in the W&B App UI before you log runs. If you do not specify ain entity, the run will be sent to your default entity, which is usually your username. Change your default entity in your settings under Project Defaults.
- W&B project: The name of the project where W&B Runs are stored.
How to get an API key
- Log in to W&B. Note: if you are using W&B Server ask your admin for the instance host name.
- Collect your API key by navigating to the authorize page or in your user/team settings. For a production environment we recommend using a service account to own that key.
- Set an environment variable for that API key export
WANDB_API_KEY=YOUR_KEY
.
wandb_config
nested dictionary. You can pass different wandb_config
values to different ops/assets if you want to use a different W&B Project. For more information about possible keys you can pass, see the Configuration section below.
- Config for @job
- Config for @repository using assets
Example: configuration for
@job
Configuration
The proceeding configuration options are used as settings on the W&B-specific Dagster resource and IO Manager provided by the integration.wandb_resource
: Dagster resource used to communicate with the W&B API. It automatically authenticates using the provided API key. Properties:api_key
: (str, required): a W&B API key necessary to communicate with the W&B API.host
: (str, optional): the API host server you wish to use. Only required if you are using W&B Server. It defaults to the Public Cloud host,https://api.wandb.ai
.
wandb_artifacts_io_manager
: Dagster IO Manager to consume W&B Artifacts. Properties:base_dir
: (int, optional) Base directory used for local storage and caching. W&B Artifacts and W&B Run logs will be written and read from that directory. By default, it’s using theDAGSTER_HOME
directory.cache_duration_in_minutes
: (int, optional) to define the amount of time W&B Artifacts and W&B Run logs should be kept in the local storage. Only files and directories that were not opened for that amount of time are removed from the cache. Cache purging happens at the end of an IO Manager execution. You can set it to 0, if you want to turn off caching completely. Caching improves speed when an Artifact is reused between jobs running on the same machine. It defaults to 30 days.run_id
: (str, optional): A unique ID for this run, used for resuming. It must be unique in the project, and if you delete a run you can’t reuse the ID. Use the name field for a short descriptive name, or config for saving hyperparameters to compare across runs. The ID cannot contain the following special characters:/\#?%:..
You need to set the Run ID when you are doing experiment tracking inside Dagster to allow the IO Manager to resume the run. By default it’s set to the Dagster Run ID e.g7e4df022-1bf2-44b5-a383-bb852df4077e
.run_name
: (str, optional) A short display name for this run to help you identify this run in the UI. By default, it is a string with the following format:dagster-run-[8 first characters of the Dagster Run ID]
. For example,dagster-run-7e4df022
.run_tags
: (list[str], optional): A list of strings, which will populate the list of tags on this run in the UI. Tags are useful for organizing runs together, or applying temporary labels likebaseline
orproduction
. It’s easy to add and remove tags in the UI, or filter down to just runs with a specific tag. Any W&B Run used by the integration will have thedagster_wandb
tag.
Use W&B Artifacts
The integration with W&B Artifact relies on a Dagster IO Manager. IO Managers are user-provided objects that are responsible for storing the output of an asset or op and loading it as input to downstream assets or ops. For example, an IO Manager might store and load objects from files on a filesystem. The integration provides an IO Manager for W&B Artifacts. This allows any Dagster@op
or @asset
to create and consume W&B Artifacts natively. Here’s a simple example of an @asset
producing a W&B Artifact of type dataset containing a Python list.
@op
, @asset
and @multi_asset
with a metadata configuration in order to write Artifacts. Similarly you can also consume W&B Artifacts even if they were created outside Dagster.
Write W&B Artifacts
Before continuing, we recommend you to have a good understanding of how to use W&B Artifacts. Consider reading the Guide on Artifacts. Return an object from a Python function to write a W&B Artifact. The following objects are supported by W&B:- Python objects (int, dict, list…)
- W&B objects (Table, Image, Graph…)
- W&B Artifact objects
@asset
):
- Python objects
- W&B Object
- W&B Artifact
Anything that can be serialized with the pickle module is pickled and added to an Artifact created by the integration. The content is unpickled when you read that Artifact inside Dagster (see Read artifacts for more details).W&B supports multiple Pickle-based serialization modules (pickle, dill, cloudpickle, joblib). You can also use more advanced serialization like ONNX or PMML. Please refer to the Serialization section for more information.
Configuration
A configuration dictionary called wandb_artifact_configuration can be set on an@op
, @asset
and @multi_asset
. This dictionary must be passed in the decorator arguments as metadata. This configuration is required to control the IO Manager reads and writes of W&B Artifacts.
For @op
, it’s located in the output metadata through the Out metadata argument.
For @asset
, it’s located in the metadata argument on the asset.
For @multi_asset
, it’s located in each output metadata through the AssetOut metadata arguments.
The proceeding code examples demonstrate how to configure a dictionary on an @op
, @asset
and @multi_asset
computations:
- Example for @op
- Example for @asset
- Example for @multi_asset
Example for
@op
:name
: (str) human-readable name for this artifact, which is how you can identify this artifact in the UI or reference it in use_artifact calls. Names can contain letters, numbers, underscores, hyphens, and dots. The name must be unique across a project. Required for@op
.type
: (str) The type of the artifact, which is used to organize and differentiate artifacts. Common types include dataset or model, but you can use any string containing letters, numbers, underscores, hyphens, and dots. Required when the output is not already an Artifact.description
: (str) Free text that offers a description of the artifact. The description is markdown rendered in the UI, so this is a good place to place tables, links, etc.aliases
: (list[str]) An array containing one or more aliases you want to apply on the Artifact. The integration will also add the “latest” tag to that list whether it’s set or not. This is an effective way for you to manage versioning of models and datasets.add_dirs
: (list[dict[str, Any]]): An array containing configuration for each local directory to include in the Artifact.add_files
: (list[dict[str, Any]]): An array containing configuration for each local file to include in the Artifact.add_references
: (list[dict[str, Any]]): An array containing configuration for each external reference to include in the Artifact.serialization_module
: (dict) Configuration of the serialization module to be used. Refer to the Serialization section for more information.name
: (str) Name of the serialization module. Accepted values:pickle
,dill
,cloudpickle
,joblib
. The module needs to be available locally.parameters
: (dict[str, Any]) Optional arguments passed to the serialization function. It accepts the same parameters as the dump method for that module. For example,{"compress": 3, "protocol": 4}
.
- W&B side: the source integration name and version, the python version used, the pickle protocol version and more.
- Dagster side:
- Dagster Run ID
- W&B Run: ID, name, path, URL
- W&B Artifact: ID, name, type, version, size, URL
- W&B Entity
- W&B Project




If you use a static type checker like mypy, import the configuration type definition object using:
Using partitions
The integration natively supports Dagster partitions. The following is an example with a partitioned usingDailyPartitionsDefinition
.
my_daily_partitioned_asset.2023-01-01
, my_daily_partitioned_asset.2023-01-02
, ormy_daily_partitioned_asset.2023-01-03
. Assets that are partitioned across multiple dimensions shows each dimension in dot-delimited format. For example, my_asset.car.blue
.
The integration does not allow for the materialization of multiple partitions within one run. You will need to carry out multiple runs to materialize your assets. This can be executed in Dagit when you’re materializing your assets.

Advanced usage
Read W&B Artifacts
Reading W&B Artifacts is similar to writing them. A configuration dictionary calledwandb_artifact_configuration
can be set on an @op
or @asset
. The only difference is that we must set the configuration on the input instead of the output.
For @op
, it’s located in the input metadata through the In metadata argument. You need to
explicitly pass the name of the Artifact.
For @asset
, it’s located in the input metadata through the Asset In metadata argument. You should not pass an Artifact name because the name of the parent asset should match it.
If you want to have a dependency on an Artifact created outside the integration you will need to use SourceAsset. It will always read the latest version of that asset.
The following examples demonstrate how to read an Artifact from various ops.
- From an @op
- Created by another @asset
- Artifact created outside Dagster
Reading an artifact from an
@op
Configuration
The proceeding configuration is used to indicate what the IO Manager should collect and provide as inputs to the decorated functions. The following read patterns are supported.- To get an named object contained within an Artifact use get:
- To get the local path of a downloaded file contained within an Artifact use get_path:
- To get the entire Artifact object (with the content downloaded locally):
get
: (str) Gets the W&B object located at the artifact relative name.get_path
: (str) Gets the path to the file located at the artifact relative name.
Serialization configuration
By default, the integration will use the standard pickle module, but some objects are not compatible with it. For example, functions with yield will raise an error if you try to pickle them. We support more Pickle-based serialization modules (dill, cloudpickle, joblib). You can also use more advanced serialization like ONNX or PMML by returning a serialized string or creating an Artifact directly. The right choice will depend on your use case, please refer to the available literature on this subject.Pickle-based serialization modules
Pickling is known to be insecure. If security is a concern please only use W&B objects. We recommend signing your data and storing the hash keys in your own systems. For more complex use cases don’t hesitate to contact us, we will be happy to help.
serialization_module
dictionary in the wandb_artifact_configuration
. Please make sure the module is available on the machine running Dagster.
The integration will automatically know which serialization module to use when you read that Artifact.
The currently supported modules are pickle
, dill
, cloudpickle
, and joblib
.
Here’s a simplified example where we create a “model” serialized with joblib and then use it for inference.
Advanced serialization formats (ONNX, PMML)
It’s common to use interchange file formats like ONNX and PMML. The integration supports those formats but it requires a bit more work than for Pickle-based serialization. There are two different methods to use those formats.- Convert your model to the selected format, then return the string representation of that format as if it were a normal Python objects. The integration will pickle that string. You can then rebuild your model using that string.
- Create a new local file with your serialized model, then build a custom Artifact with that file using the add_file configuration.
Using partitions
The integration natively supports Dagster partitions. You can selectively read one, multiple, or all partitions of an asset. All partitions are provided in a dictionary, with the key and value representing the partition key and the Artifact content, respectively.- Read all partitions
- Read specific partitions
It reads all partitions of the upstream
@asset
, which are given as a dictionary. In this dictionary, the key and value correlate to the partition key and the Artifact content, respectively.metadata
, configures how W&B interacts with different artifact partitions in your project.
The object metadata
contains a key named wandb_artifact_configuration
which further contains a nested object partitions
.
The partitions
object maps the name of each partition to its configuration. The configuration for each partition can specify how to retrieve data from it. These configurations can contain different keys, namely get
, version
, and alias
, depending on the requirements of each partition.
Configuration keys
get
: Theget
key specifies the name of the W&B Object (Table, Image…) where to fetch the data.version
: Theversion
key is used when you want to fetch a specific version for the Artifact.alias
: Thealias
key allows you to get the Artifact by its alias.
"*"
stands for all non-configured partitions. This provides a default configuration for partitions that are not explicitly mentioned in the partitions
object.
For example,
default_table_name
.
Specific partition configuration
You can override the wildcard configuration for specific partitions by providing their specific configurations using their keys.
For example,
yellow
, data will be fetched from the table named custom_table_name
, overriding the wildcard configuration.
Versioning and aliasing
For versioning and aliasing purposes, you can provide specific version
and alias
keys in your configuration.
For versions,
v0
of the orange
Artifact partition.
For aliases,
default_table_name
of the Artifact partition with the alias special_alias
(referred to as blue
in the configuration).
Advanced usage
To view advanced usage of the integration please refer to the following full code examples:Using W&B Launch
Beta product in active development
Interested in Launch? Reach out to your account team to talk about joining the customer pilot program for W&B Launch.
Pilot customers need to use AWS EKS or SageMaker to qualify for the beta program. We ultimately plan to support additional platforms.
- Running one or multiple Launch agents in your Dagster instance.
- Executing local Launch jobs within your Dagster instance.
- Remote Launch jobs on-prem or in a cloud.
Launch agents
The integration provides an importable@op
called run_launch_agent
. It starts a Launch Agent and runs it as a long running process until stopped manually.
Agents are processes that poll launch queues and execute the jobs (or dispatch them to external services to be executed) in order.
Refer to the Launch page.
You can also view useful descriptions for all properties in Launchpad.

Launch jobs
The integration provides an importable@op
called run_launch_job
. It executes your Launch job.
A Launch job is assigned to a queue in order to be executed. You can create a queue or use the default one. Make sure you have an active agent listening to that queue. You can run an agent inside your Dagster instance but can also consider using a deployable agent in Kubernetes.
Refer to the Launch page.
You can also view useful descriptions for all properties in Launchpad.

Best practices
-
Use the IO Manager to read and write Artifacts.
Avoid using
Artifact.download()
orRun.log_artifact()
directly. Those methods are handled by integration. Instead, return the data you want to store in the Artifact and let the integration do the rest. This approach provides better lineage for the Artifact. - Only build an Artifact object yourself for complex use cases. Python objects and W&B objects should be returned from your ops/assets. The integration handles bundling the Artifact. For complex use cases, you can build an Artifact directly in a Dagster job. We recommend you pass an Artifact object to the integration for metadata enrichment such as the source integration name and version, the python version used, the pickle protocol version and more.
-
Add files, directories and external references to your Artifacts through the metadata.
Use the integration
wandb_artifact_configuration
object to add any file, directory or external references (Amazon S3, GCS, HTTP…). See the advanced example in the Artifact configuration section for more information. - Use an @asset instead of an @op when an Artifact is produced. Artifacts are assets. It is recommended to use an asset when Dagster maintains that asset. This will provide better observability in the Dagit Asset Catalog.
- Use a SourceAsset to consume an Artifact created outside Dagster. This allows you to take advantage of the integration to read externally created Artifacts. Otherwise, you can only use Artifacts created by the integration.
- Use W&B Launch to orchestrate training on dedicated compute for large models. You can train small models inside your Dagster cluster and you can run Dagster in a Kubernetes cluster with GPU nodes. We recommend using W&B Launch for large model training. This will prevent overloading your instance and provide access to more adequate compute.
- When experiment tracking within Dagster, set your W&B Run ID to the value of your Dagster Run ID. We recommend that you both: make the Run resumable and set the W&B Run ID to the Dagster Run ID or to a string of your choice. Following this recommendation ensures your W&B metrics and W&B Artifacts are stored in the same W&B Run when you train models inside of Dagster.
- Only collect data you need with get or get_path for large W&B Artifacts. By default, the integration will download an entire Artifact. If you are using very large artifacts you might want to only collect the specific files or objects you need. This will improve speed and resource utilization.
- For Python objects adapt the pickling module to your use case. By default, the W&B integration will use the standard pickle module. But some objects are not compatible with it. For example, functions with yield will raise an error if you try to pickle them. W&B supports other Pickle-based serialization modules (dill, cloudpickle, joblib).