How to construct an artifact
Construct a W&B Artifact in three steps:1. Create an artifact Python object with wandb.Artifact()
Initialize the wandb.Artifact()
class to create an artifact object. Specify the following parameters:
- Name: Specify a name for your artifact. The name should be unique, descriptive, and easy to remember. Use an artifacts name to both: identify the artifact in the W&B App UI and when you want to use that artifact.
- Type: Provide a type. The type should be simple, descriptive and correspond to a single step of your machine learning pipeline. Common artifact types include
'dataset'
or'model'
.
The “name” and “type” you provide is used to create a directed acyclic graph. This means you can view the lineage of an artifact on the W&B App.See the Explore and traverse artifact graphs for more information.
Artifacts can not have the same name, even if you specify a different type for the types parameter. In other words, you can not create an artifact named
cats
of type dataset
and another artifact with the same name of type model
.wandb.Artifact
Class definition in the Python SDK Reference Guide.
The proceeding example demonstrates how to create a dataset artifact:
2. Add one more files to the artifact
Add files, directories, external URI references (such as Amazon S3) and more with artifact methods. For example, to add a single text file, use theadd_file
method:
add_dir
method. To add files, see Update an artifact.
3. Save your artifact to the W&B server
Finally, save your artifact to the W&B server. Artifacts are associated with a run. Therefore, use a run objectslog_artifact()
method to save the artifact.
When to use Artifact.save() or wandb.Run.log_artifact()
- Use
Artifact.save()
to update an existing artifact without creating a new run. - Use
wandb.Run.log_artifact()
to create a new artifact and associate it with a specific run.
Calls to The artifact version v0 is NOT guaranteed to have an index of 0 in its metadata, as the artifacts may be logged in an arbitrary order.
log_artifact
are performed asynchronously for performant uploads. This can cause surprising behavior when logging artifacts in a loop. For example:Add files to an artifact
The following sections demonstrate how to construct artifacts with different file types and from parallel runs. For the following examples, assume you have a project directory with multiple files and a directory structure:Add a single file
The proceeding code snippet demonstrates how to add a single, local file to your artifact:'file.txt'
in your working local directory.
name
parameter.
API Call | Resulting artifact |
---|---|
artifact.add_file('model.h5') | model.h5 |
artifact.add_file('checkpoints/model.h5') | model.h5 |
artifact.add_file('model.h5', name='models/mymodel.h5') | models/mymodel.h5 |
Add multiple files
The proceeding code snippet demonstrates how to add an entire, local directory to your artifact:API Call | Resulting artifact |
---|---|
artifact.add_dir('images') |
|
artifact.add_dir('images', name='images') |
|
artifact.new_file('hello.txt') | hello.txt |
Add a URI reference
Artifacts track checksums and other information for reproducibility if the URI has a scheme that W&B library knows how to handle. Add an external URI reference to an artifact with theadd_reference
method. Replace the 'uri'
string with your own URI. Optionally pass the desired path within the artifact for the name parameter.
http(s)://
: A path to a file accessible over HTTP. The artifact will track checksums in the form of etags and size metadata if the HTTP server supports theETag
andContent-Length
response headers.s3://
: A path to an object or object prefix in S3. The artifact will track checksums and versioning information (if the bucket has object versioning enabled) for the referenced objects. Object prefixes are expanded to include the objects under the prefix, up to a maximum of 10,000 objects.gs://
: A path to an object or object prefix in GCS. The artifact will track checksums and versioning information (if the bucket has object versioning enabled) for the referenced objects. Object prefixes are expanded to include the objects under the prefix, up to a maximum of 10,000 objects.
API call | Resulting artifact contents |
---|---|
artifact.add_reference('s3://my-bucket/model.h5') | model.h5 |
artifact.add_reference('s3://my-bucket/checkpoints/model.h5') | model.h5 |
artifact.add_reference('s3://my-bucket/model.h5', name='models/mymodel.h5') | models/mymodel.h5 |
artifact.add_reference('s3://my-bucket/images') |
|
artifact.add_reference('s3://my-bucket/images', name='images') |
|