Overview
Bring your own bucket (BYOB) allows you to store W&B artifacts and other related sensitive data in your own cloud or on-prem infrastructure. In case of Dedicated Cloud or Multi-tenant Cloud, data that you store in your bucket is not copied to the W&B managed infrastructure.- Communication between W&B SDK / CLI / UI and your buckets occurs using pre-signed URLs.
- W&B uses a garbage collection process to delete W&B Artifacts. For more information, see Deleting Artifacts.
- You can specify a sub-path when configuring a bucket, to ensure that W&B does not store any files in a folder at the root of the bucket. It can help you better conform to your organzation’s bucket governance policy.
Data stored in the central database vs buckets
When using BYOB functionality, certain types of data will be stored in the W&B central database, and other types will be stored in your bucket.Database
- Metadata for users, teams, artifacts, experiments, and projects
- Reports
- Experiment logs
- System metrics
- Console logs
Buckets
- Experiment files and metrics
- Artifact files
- Media files
- Run files
- Exported history metrics and system events in Parquet format
Bucket scopes
There are two scopes you can configure your storage bucket to:Scope | Description |
---|---|
Instance level | In Dedicated Cloud and Self-Managed, any user with the required permissions within your organization or instance can access files stored in your instance’s storage bucket. Not applicable to Multi-tenant Cloud. |
Team level | If a W&B Team is configured to use a Team level storage bucket, team members can access files stored in it. Team level storage buckets allow greater data access control and data isolation for teams with highly sensitive data or strict compliance requirements. Team level storage can help different business units or departments sharing an instance to efficiently use the infrastructure and administrative resources. It can also allow separate project teams to manage AI workflows for separate customer engagements. Available for all deployment types. You configure team level BYOB when setting up the team. |
- The same bucket can be used for the instance and one or more teams.
- Each team can use a separate bucket, some teams can choose to write to the instance bucket, or multiple teams can share a bucket by writing to subpaths.
- Buckets for different teams can be hosted in different cloud infrastructure environments or regions, and can be managed by different storage admin teams.
Availability matrix
W&B can connect to the following storage providers:- CoreWeave AI Object Storage is a high-performance, S3-compatible object storage service optimized for AI workloads.
- Amazon S3 is an object storage service offering industry-leading scalability, data availability, security, and performance.
- Google Cloud Storage is a managed service for storing unstructured data at scale.
- Azure Blob Storage is a cloud-based object storage solution for storing massive amounts of unstructured data like text, binary data, images, videos, and logs.
- S3-compatible storage like MinIO hosted in your cloud or infrastructure on your premises.
W&B deployment type | Instance level | Team level | Additional information |
---|---|---|---|
Dedicated Cloud | ✓ | ✓ | Instance and team level BYOB are supported for CoreWeave AI Object Storage, Amazon S3, GCP Storage, Microsoft Azure Blob Storage, and S3-compatible storage like MinIO hosted in your cloud or on-premises infrastructure. |
Multi-tenant Cloud | Not Applicable | ✓1 | Team level BYOB is supported for CoreWeave AI Object Storage, Amazon S3, and GCP Storage. |
Self-Managed | ✓ | ✓ | Instance and team level BYOB are supported for CoreWeave AI Object Storage, Amazon S3, GCP Storage, Microsoft Azure Blob Storage, and S3-compatible storage like MinIO hosted in your cloud or infrastructure on your premises. |
Provision your bucket
After verifying availability, you are ready to provision your storage bucket, including its access policy and CORS. Select a tab to continue.- CoreWeave
- AWS
- GCP
- Azure
- S3-compatible
Requirements:
- Multi-tenant Cloud, or
- Dedicated Cloud v0.73.0 or above, or
- Self-Managed v0.73.0 or above deployed with v0.33.14+ of the Helm chart
- A CoreWeave account with AI Object Storage enabled and with permission to create buckets, API access keys, and secret keys.
- Your W&B instance must be able to connect to CoreWeave network endpoints.
-
Multi-tenant Cloud: Obtain your organization ID, which is required for your bucket policy.
- Log in to the W&B App.
- In the left navigation, click Create a new team.
- In the drawer that opens, copy the W&B organization ID, which is located above Invite team members.
- Leave this page open. You will use it to configure W&B.
-
Dedicated Cloud / Self-Managed: Obtain your customer namespace, which is required for your bucket policy.
- In the W&B App, click your user profile icon, then click System Console.
- Click the Authentication tab.
- At the bottom of the page, copy the value for Customer Namespace. Keep this value for configuring the bucket policy.
- You can close the System Console.
- In CoreWeave, create the bucket with a name of your choice in your preferred CoreWeave availability zone. Optionally create a folder for W&B to use as a sub-path for all W&B files. Make a note of the bucket name, availability zone, API access key, secret key, and sub-path.
-
Set the following Cross-origin resource sharing (CORS) policy for the bucket:
CoreWeave storage is S3-compatible. For details about CORS, refer to Configuring cross-origin resource sharing (CORS) in the AWS documentation.
-
Configure a bucket policy that grants the required permissions for your W&B deployment to access the bucket and generate pre-signed URLs that AI workloads in your cloud infrastructure or user browsers utilize to access the bucket. Refer to Bucket Policy Reference in the CoreWeave documentation.
The clause beginning with
"Sid": "AllowUsersInOrg"
grants users in your organization direct access to the bucket. If you don’t need this ability, you can omit the clause from your policy. -
In the bucket policy, replace placeholders:
<cw-bucket>
: your bucket name.<cw-wandb-principal>
:- Multi-tenant Cloud:
arn:aws:iam::wandb:static/wandb-integration-public
- Dedicated Cloud or Self-Managed:
arn:aws:iam::wandb:static/wandb-integration
- Multi-tenant Cloud:
<wb-org-id>
:
- Dedicated Cloud: Contact support to complete additional steps.
-
Self-Managed: Update your W&B deployment to set the environment variable
GORILLA_SUPPORTED_FILE_STORES
to the exact stringcw://
and restart W&B. Otherwise, CoreWeave will not appear as an option when you configure team storage.
Determine the storage address
This section explains the syntax to use to connect a W&B Team to a BYOB storage bucket. In the examples, replace placeholder values between angle brackets (<>
) with your bucket’s details.
Select a tab for detailed instructions.
- CoreWeave
- AWS
- GCP
- Azure
- S3-compatible
This section is relevant only for team level BYOB on Dedicated Cloud or Self-Managed. For instance level BYOB or for Multi-tenant Cloud, you are ready to Configure W&B.Determine the full bucket path using the following format. Replace placeholders between angle brackets (The
<>
) with the bucket’s values.Bucket format:cwobject.com
HTTPS endpoint is supported. TLS 1.3 is required. Contact support to express interest in other CoreWeave endpoints.Configure W&B
After you provision your bucket and determine its address, you are ready to configure BYOB at the instance level or team level.Plan your storage bucket layout carefully. After you configure a storage bucket for W&B, migrating its data to another bucket is complex and requires the assistance of W&B. This applies to storage for Dedicated Cloud and Self-Managed, as well as team-level storage for Multi-tenant Cloud. For questions, contact support.
Instance level BYOB
For CoreWeave AI Object Storage at the instance level, contact W&B support instead of following these instructions. Self-service configuration is not yet supported.
- Log in to W&B as a user with the
admin
role. - Click the user icon at the top, then click System Console.
- Go to Settings > System Connections.
- In the Bucket Storage section, ensure the identity in the Identity field is granted access to the new bucket.
- Select the Provider.
- Enter the Bucket Name.
- Optionally, enter the Path to use in the new bucket.
- Click Save
For Self-Managed, W&B recommends using the Terraform module managed by W&B to provision a storage bucket along with the necessary access mechanism and related IAM permissions:
- AWS
- GCP
- Azure - Instance level BYOB or Team level BYOB
Team level BYOB
You can configure team level BYOB while creating a team using the W&B App. You have two options:- Use an existing bucket: You’ll need to determine the storage location for your bucket first.
- Create a new bucket (Multi-tenant Cloud only): W&B can automatically create a bucket in your cloud provider when you create the team. This is supported for CoreWeave, AWS, and GCP.
- After a team is created, its storage cannot be changed.
- For Instance level BYOB, refer to Instance level BYOB instead.
- If you plan to configure CoreWeave storage for the team, review the CoreWeave requirements and contact support to verify that your bucket is configured correctly in CoreWeave and to validate your team’s configuration, since the storage details cannot be changed after the team is created.
- Dedicated Cloud / Self-Hosted
- Multi-tenant Cloud
- Dedicated Cloud: You must provide the bucket path to your account team so that they can add it to your instance’s supported file stores before following the rest of these steps to use the storage bucket for a team.
-
Self-Managed: You must add the bucket path to your the
GORILLA_SUPPORTED_FILE_STORES
environment variable and then restart W&B before following the rest of these steps to use the storage bucket for a team. -
Log in to W&B as a user with the
admin
role, click the icon at the top left to open the left navigation, then click Create a team to collaborate. - Provide a name for the team.
-
Set Storage Type to External storage.
To use the instance level storage for team storage (regardless of whether it is internal or external), leave Storage Type set to Internal, even if the instance level bucket is configured for BYOB. To use separate external storage for the team, set Storage Type for the team to External and configure the bucket details in the next step.
- Click Bucket location.
- To use an existing bucket, select it from the list. To add a new bucket, click Add bucket at the bottom, then provide the bucket’s details. Click Cloud provider and select CoreWeave, AWS, GCP, or Azure. If the cloud provider is not listed, ensure that you have followed step 1 to add the bucket path to the supported file stores for your instance. If the storage provider is still not listed, contact support for assistance.
-
Specify the bucket details.
- For CoreWeave, provide only the bucket name.
- For Amazon S3, GCP, or S3-compatible storage, provide the full bucket path you determined earlier.
- For Azure on W&B Dedicated or Self-Managed, set Account name to the Azure account and Container name to the Azure blob storage container.
- Optionally, provide additional connection settings:
- If applicable, set Path to the bucket sub-path.
- CoreWeave: No additional connection settings required.
- AWS: Set KMS key ARN to the ARN of your KMS encryption key.
- GCP: No additional connection settings required.
- Azure: Specify values for Tenant ID and Managed Identity Client ID. These fields are mandatory unless you configured the connection string with
GORILLA_SUPPORTED_FILE_STORES
.
- Click Create team.