Solo Unicorn Club logoSolo Unicorn
2,500 words

Weights & Biases Deep Dive — ML Experiment Tracking

Company TeardownWeights & BiasesW&BMLOpsExperiment TrackingCoreWeaveAI
Weights & Biases Deep Dive — ML Experiment Tracking

Weights & Biases Deep Dive — ML Experiment Tracking

Opening

On March 4, 2025, CoreWeave announced its acquisition of Weights & Biases (W&B) for approximately $1.7 billion. The deal drew serious attention in the AI community — W&B is the de facto standard tool for ML experiment tracking, used by over 1 million developers, with customers including NVIDIA, AstraZeneca, and competitors of OpenAI. CoreWeave is a GPU cloud computing company that went public on NASDAQ in 2025.

An MLOps tool acquired by a GPU cloud — this isn't your typical big-fish-eats-small-fish deal. What's the logic behind it? How strong is W&B's product? And can it maintain its independence post-acquisition?

What Problem They Solve

Training an ML model involves a massive number of experiments: different hyperparameters, different datasets, different model architectures. A team might run hundreds of experiments in a single week. Without tracking tools, ML engineers manage experiments through Excel spreadsheets, file naming conventions (model_v2_final_final_v3.pt), and Slack messages — the chaos is easy to imagine.

Specific pain points:

  • Experiment results aren't reproducible. You get a great result but forget which parameters you used
  • Team members can't compare experiments. Everyone's experiment logs are in a different format
  • When a deployed model breaks, there's no way to trace back to which training version used which data
  • GPUs are expensive ($2-8/hour), and rerunning wasted experiments is a direct financial loss

W&B's solution: integrate experiment tracking with just a few lines of code, automatically logging every training run's hyperparameters, metrics, system resources, and output files. All experiments can be compared and visualized in a single web interface.

Target customers: teams doing ML/AI model training, from academic researchers to enterprise ML engineering teams.

Product Matrix

Core Products

Experiments (Experiment Tracking): W&B's flagship product. Add wandb.init() and wandb.log() to your training code, and it automatically tracks all metrics, hyperparameters, GPU utilization, and model weights. Experiment results are visualized and compared in the W&B Dashboard.

Sweeps (Hyperparameter Search): Automated hyperparameter tuning. Supports Grid Search, Random Search, and Bayesian Optimization. No need to write your own search logic — W&B handles scheduling and resource allocation.

Artifacts (Dataset Version Management): Tracks versions of training datasets, model weights, and preprocessing pipelines. Solves the "which version of the data was this model trained on?" problem.

Models (Model Registry): Production-grade model management, including model versioning, approval workflows, and deployment tracking. A complete pipeline from experiment to production.

Weave (LLM Development Tool): Launched in 2024, designed specifically for LLM application development. Tracks prompt versions, LLM call chains, and evaluation results. Extends W&B's positioning from "ML experiment tracking" into "LLM application development."

Tables (Data Visualization): Interactive data tables for exploratory analysis of training data. View a model's predictions side by side with ground truth labels directly within W&B.

Technical Differentiation

W&B's moat is developer experience. Integration takes just 5 lines of code and supports PyTorch, TensorFlow, JAX, Hugging Face, and every other major ML framework. The dashboard's interaction design is best-in-class — drag-and-drop experiment metric comparisons, auto-generated reports, one-click sharing with your team.

Compared to MLflow (open source), W&B's cloud hosting and collaboration features are more mature. Compared to Neptune.ai and Comet ML, W&B's user base (1M+) creates community network effects — W&B screenshots are ubiquitous in Kaggle competitions and academic papers.

Business Model

Pricing Strategy

Plan Price Target Customer
Free (Individual) $0 Individual developers, students
Teams ~$50/user/month Small teams
Enterprise Custom Large enterprises, supports private deployment

The Free tier is remarkably generous — experiment tracking, visualization, and 100GB of storage are all included. This is the engine behind W&B's growth: individuals use it for free, bring it into their teams, and then upgrade to the paid version.

Revenue Model

SaaS subscription. 2024 revenue was approximately $13.6 million (based on public estimates), with 1,400 enterprise customers. Revenue scale is still modest, but growing fast. Post-CoreWeave acquisition, W&B's monetization strategy may shift — from an independent SaaS to a component of CoreWeave's AI cloud platform.

Funding & Acquisition

Event Date Amount/Valuation
Seed 2018 -
Series A 2020 $45M
Series B 2021 $135M
Series C 2023 $50M, $1.25B valuation
CoreWeave acquisition March-May 2025 ~$1.7B

Total funding: $250 million. Investors: Insight Partners, Felicis Ventures, Trinity Ventures. The $1.7 billion acquisition price represents a roughly 36% premium over the last round's $1.25 billion valuation.

Customers & Market

Marquee Customers

  • NVIDIA: Experiment management for GPU training workflows
  • AstraZeneca: ML experiment tracking in drug discovery
  • Toyota Research: Training management for autonomous driving models
  • OpenAI (early stage): Experiment tracking during the research phase

1 million+ individual developers, 1,400 enterprise customers. Academic penetration is extremely high — many top-tier conference papers use W&B for their experiment visualizations.

Market Size

The MLOps market is projected at roughly $40-60 billion in 2026. Experiment tracking is a subset, worth about $10-15 billion. W&B's Weave product expands its TAM into the LLM development tools market (approximately $20-30 billion).

Competitive Landscape

Dimension W&B MLflow (Databricks) Neptune.ai Comet ML TensorBoard
Experiment Tracking Strong Strong Strong Strong Moderate
Developer Experience Best Good Good Good Basic
LLM Tools Strong (Weave) Moderate Weak Moderate None
Open Source No Yes No No Yes
Collaboration Strong Moderate Strong Moderate Weak
Pricing Moderate Bundled with Databricks Moderate Moderate Free
User Base 1M+ Large (Databricks users) Moderate Moderate Large
Parent Company CoreWeave Databricks Independent Independent Google

Key observation: The biggest risk for W&B post-CoreWeave acquisition is neutrality. Previously, W&B ran on any cloud — AWS, GCP, Azure, on-prem. Now it belongs to CoreWeave. CoreWeave has pledged to maintain W&B's cross-platform compatibility, but customer concerns are legitimate: in the long run, CoreWeave is incentivized to make W&B work best on its own GPU cloud. This mirrors what happened when MLflow was absorbed into Databricks — nominally open-source and neutral, but in practice it runs best on Databricks.

What I've Actually Seen

The good: W&B genuinely offers the best developer experience among MLOps tools. I've used it on my own ML projects — from wandb.init() to seeing the Dashboard takes under 5 minutes. The experiment comparison feature is especially useful — overlay loss curves from 10 training runs and you can instantly spot which hyperparameter set performed best. The Free tier has virtually no restrictions, making it very accessible to individual developers and academic researchers.

The complicated: CoreWeave's acquisition changes W&B's positioning. Before, it was a neutral MLOps platform; now it's a GPU cloud ecosystem tool. For teams already on AWS SageMaker or GCP Vertex AI, whether to keep using W&B needs a fresh evaluation — your experiment data might flow into CoreWeave's ecosystem. Also, W&B's enterprise revenue (estimated at $13.6 million) is small relative to its 1 million user base, suggesting the conversion rate from individual free users to enterprise paid customers has room to improve.

The reality: The experiment tracking category is being encroached on by platform players. Databricks has MLflow built in, AWS has SageMaker Experiments, Google has Vertex AI Experiments. As an independent tool, W&B needs Weave (LLM development tools) and deeper collaboration features to maintain differentiation. Whether W&B can sustain its iteration speed and product independence post-CoreWeave acquisition is the key variable determining its future trajectory.

My Take

  • Recommended: Teams actively training ML models that need a plug-and-play experiment tracking tool. W&B is the best option with the fastest ramp-up.
  • Recommended: Academic researchers and individual developers. The Free tier's feature and storage limits are very generous.
  • Recommended: Teams developing LLM applications that need to track prompt versions and evaluation results. Weave is worth trying.
  • Skip if: You're already all-in on the Databricks ecosystem. MLflow comes bundled with your Databricks subscription and gets the job done.
  • Skip if: You have concerns about neutrality post-CoreWeave acquisition. Consider watching for 6-12 months to see if W&B's platform strategy shifts.

In one line: W&B is the de facto standard for ML experiment tracking — best product, largest community, but its independence post-CoreWeave acquisition is the biggest unknown.

Discussion

What does your ML team use for experiment tracking? W&B, MLflow, or a homegrown solution? Would CoreWeave's acquisition influence your decision about W&B?