BioEmu - A Biomolecular Emulator

Microsoft's breakthrough in protein dynamics prediction reshapes computational biology

Figure 1: Actin's ATP/ADP-regulated open-close transition illustrates how protein function depends on conformational dynamics. ADP-bound (red) and ATP-bound (green) states show the domain motion essential for muscle fiber formation. Source: Lewis et al., Science 2025, Figure 1A.

BioEmu marks the most significant advance in protein conformational prediction since AlphaFold, generating thousands of thermodynamically accurate protein structures per hour on a single GPU—100,000x faster than traditional molecular dynamics simulations while achieving ~1 kcal/mol free energy accuracy. Published in Science in August 2025 by Sarah Lewis, Tim Hempel, Cecilia Clementi, Frank Noé, and colleagues at Microsoft Research AI for Science, this generative deep learning system integrates over 200 milliseconds of MD simulation data, static structures, and experimental stability measurements through novel training algorithms. The community response has been overwhelmingly positive, with experts like Martin Steinegger declaring that “protein dynamics is the next frontier in discovery” and BioEmu “a significant step in this direction.” However, thoughtful critiques have emerged questioning whether computationally predicted ensembles can truly represent functionally relevant protein conformations in their native thermodynamic environments.


The architecture builds on diffusion foundations with novel training innovations

Figure 2: (Left) BioEmu workflow: from protein sequence through AlphaFold embedding and diffusion model to equilibrium ensemble generation. (Right) Detailed architecture showing sequence encoder (Evoformer), denoising diffusion model with timestep injection, and score model producing backbone frames. Source: Lewis et al., Science 2025, Figure 1B,C.

BioEmu is an E(3)-equivariant conditional diffusion model trained to sample residue-level protein conformations from a learned equilibrium distribution. Each protein is represented using a backbone frame parameterization where residue orientations are constructed via Gram-Schmidt orthogonalization of $\text{N–Cα}$ and $\text{C–Cα}$ bonds. The forward noising process operates as paired stochastic differential equations on positions (variance-preserving Gaussian) and orientations ($\text{IGSO(3)}$ distribution on $\text{SO(3)}$ manifold), ensuring the model learns orientation-equivariant denoising updates.

Figure 3: Detailed pseudocode for the score model s_θ(x, h, z, t) showing layer normalization, IPA blocks, and separate prediction heads for translation and rotation scores in local coordinate frames. Source: Lewis et al., Science 2025, Supplementary Algorithm 1 (page 46) - Score model architecture.

The key architectural innovation is a three-stage training pipeline. First, pre-training on clustered AlphaFold Database structures with data augmentation encourages diverse conformational associations from sequence. Second, continued training on more than 200 milliseconds of aggregate MD simulation data—reweighted using Markov state models to better approximate equilibrium populations—teaches realistic dynamics. Third, and most novel, Property Prediction Fine-Tuning (PPFT) aligns the learned distribution with experimental free energies from the MEGAscale protein stability dataset without requiring structural information for the measured proteins. This enables BioEmu to predict folding stability changes for mutations with remarkable accuracy.

Figure 4: (Left) Three-stage training: AFDB pretraining with augmentation, MD finetuning with reweighting, and PPFT on experimental stabilities. (Right) AFDB preprocessing workflow showing sequence clustering at 80%/30% identity and structure clustering to create diverse training data. Source: Lewis et al., Science 2025, Figure 1D,E.
Figure 5: Comprehensive list of MD datasets totaling 172.2 ms simulation time (216.0 ms effective counting chains independently), including DESRES fast-folders, Folding@home data, and MSR-generated datasets with force fields and system counts. Source: Lewis et al., Science 2025, Table S1.

BioEmu leverages AlphaFold2’s Evoformer to compute single and pair representations, invoked only once per protein, then uses ~100 denoising steps with a second-order integration scheme (DPM-Solver) to generate structures. Performance benchmarks show 85% coverage of domain motions, 72-74% coverage of local unfolding states, and 49-85% coverage of cryptic pocket formation—all at ~1.9 GPU-seconds per sample compared to AlphaFlow’s ~32 GPU-seconds.

Figure 6: Comparison of Heun vs DPM-Solver across different denoising step counts (15-90 steps) for CATH domain and LAO-binding protein, showing free energy landscape convergence and fraction of valid samples. DPM-Solver achieves better quality with fewer steps. Source: Lewis et al., Science 2025, Figre S11.

Technical comparisons reveal BioEmu’s unique positioning

BioEmu occupies a distinctive niche in the rapidly evolving landscape of protein conformational ensemble methods. AlphaFlow (Jing et al., ICML 2024) uses flow matching rather than diffusion, fine-tuning AlphaFold2/ESMFold with a polymer-structured harmonic prior. While AlphaFlow achieves excellent Pearson correlation (r=0.92 with templates) on the ATLAS benchmark, it was trained on only ~82 proteins and lacks explicit thermodynamic validation. BioEmu’s training scale of thousands of proteins with >200ms of MD data provides substantially broader coverage.

Figure 7: Comprehensive comparison of BioEmu vs AlphaFlow, AFCluster, MSA subsampling, and DiG across all benchmarks (OOD60, domain motions, cryptic pockets, local unfolding). Shows coverage curves and recall scatter plots with BioEmu consistently matching or exceeding baselines. Source: Lewis et al., Science 2025, Figre S7.

Distributional Graphormer (DiG) (Zheng et al., Nature Machine Intelligence 2024) provides BioEmu’s architectural foundation, using diffusion inspired by thermodynamic annealing and Physics-Informed Diffusion Pre-training (PIDP) for data-scarce cases. DiG demonstrated ~72% coverage of SARS-CoV-2 RBD conformational space but was not systematically validated for free energy accuracy. BioEmu extends DiG with the multi-stage training pipeline and experimental data integration.

AFCluster (Wayment-Steele et al., Nature 2024) represents a fundamentally different approach—clustering MSA sequences with DBSCAN to separate evolutionary signals encoding different conformational states, then running AlphaFold2 on each cluster. While successful for metamorphic proteins like KaiB and RfaH, AFCluster provides no thermodynamic predictions and requires diverse MSAs. Notably, recent work by Schafer et al. (Nature 2025) challenged AFCluster’s claims, arguing it underperforms random sequence sampling.

Boltzmann generators (Noé et al., Science 2019) from Frank Noé’s earlier work use normalizing flows for one-shot equilibrium sampling with tractable probability densities enabling exact reweighting. However, they remain limited to small systems (alanine dipeptide, BPTI at 58 residues) due to coordinate transformation challenges. BioEmu represents the conceptual successor—trading exact probability densities for scalability to hundreds of residues.

Method Architecture Training Data Free Energy Speed (GPU-sec)
BioEmu Diffusion AFDB+200ms MD+Exp ~1 kcal/mol 1.9
AlphaFlow Flow matching PDB/ATLAS Not quantified 32.0
DiG Diffusion PDB+MD Not systematic Fast
AFCluster MSA perturbation None (AF2 pretrained) None Fast
Boltzmann Gen. Normalizing flows Energy functions Exact Fast (small systems)

The broader literature positions BioEmu as the culmination of rapid progress

The past two years have seen explosive growth in diffusion and flow-matching models for proteins. RFdiffusion (Watson et al., Nature 2023) demonstrated that fine-tuning RoseTTAFold on denoising tasks enables outstanding protein backbone design, achieving picomolar-affinity binders. FrameDiff (Yim et al., ICML 2023) established theoretical foundations for SE(3)-invariant diffusion on multiple frames using Invariant Point Attention modules from AlphaFold2, generating designable monomers up to 500 amino acids without pretrained structure prediction networks.

Flow matching approaches have emerged as efficient alternatives. AlphaFlow demonstrated that fine-tuning structure prediction models under custom flow matching yields faster wall-clock convergence to equilibrium properties than MD. The MIT course “Flow Matching and Diffusion Models” (diffusion.csail.mit.edu) provides comprehensive tutorials explaining that “diffusion models and Gaussian flow matching are the same” mathematically but differ in network output specifications and sampling schedules.

SE(3)-equivariant architectures underpin all modern protein generative models. The e3nn library and tutorials (e3nn_tutorial) provide foundations for implementing spherical tensor operations. NequIP demonstrated remarkable data efficiency for interatomic potentials, while EGNN (E(n) Equivariant Graph Neural Networks) offers computationally efficient implementations without full spherical harmonics machinery.

For training with MD data, Markov state model reweighting has become essential. The approach recovers unbiased equilibrium populations from biased or short simulations through likelihood reweighting or path reweighting methods. BioEmu’s use of MSM-reweighted MD data represents the most ambitious application of this approach, aggregating over 200 milliseconds of simulation time across thousands of protein systems.


Property Prediction Fine-Tuning (PPFT): A novel training paradigm

Figure 8: PPFT workflow showing how the model is fine-tuned on experimental folding free energies by: (1) denoising to intermediate time step, (2) extrapolating to clean structures, (3) classifying as folded/unfolded, (4) computing foldedness from ensemble, (5) backpropagating prediction error to match experimental ΔG. Source: Lewis et al., Science 2025, Figre 4A.
Figure 9: Detailed implementation of PPFT showing partial denoising (8 of 35 steps), clean sample extrapolation, cross-target matching loss to prevent mode collapse, and selective backpropagation through final steps only. Source: Lewis et al., Science 2025, Supplementary Algorithm 2 (page 47) - PPFT pseudocode.

The PPFT innovation enables training on experimental observables without structural data. For the MEGAscale dataset (500,000+ stability measurements), BioEmu generates small ensembles (M~8-16 samples) using rapid approximate sampling, classifies each as folded/unfolded based on fraction of native contacts, computes ensemble-averaged foldedness, and minimizes squared error against the target derived from experimental ΔG via Boltzmann weighting.

Key technical tricks prevent mode collapse and reduce computational cost:

  1. Cross-target matching loss: Instead of minimizing $\mathrm{E[(f(x) - f_{target})^2}]$, minimize $\mathrm{(E[f(x)] - f_{target})^2}$ using pairs of samples, avoiding variance penalty that would collapse the distribution
  2. Partial denoising: Only 8 of 35 denoising steps executed, with clean sample extrapolation via reparameterization trick
  3. Selective backpropagation: Gradients computed only through final 3-5 steps, treating earlier steps as frozen
  4. Parameter freezing: Only layers 1 and 8 of the score model updated during PPFT, preserving conformational diversity learned in earlier stages
Figure 10: Complete training settings for each stage including optimizer configuration, learning rate schedules, batch sizes (2048 residues for pretraining, 1440 for finetuning), GPU counts, and total training duration. Source: Lewis et al., Science 2025, Supplementary Table S3 (page 44) - Training hyperparameters.

Emulating molecular dynamics equilibrium distributions

Figure 11: Leave-one-out cross-validation on 12 fast-folding proteins shows BioEmu accurately reproduces MD free energy surfaces. Examples show BBA (β-β-α protein), Protein G, and Homeodomain with representative structures, TIC projections, secondary structure propensities, and computational cost comparisons (4-5 orders of magnitude speedup). Source: Lewis et al., Science 2025, Figre 3A.
Figure 12: Complete results for all 12 fast-folders including Villin, Trp-cage, BBL, α3D, Chignolin, WW domain, NTL9, λ-repressor showing MD vs pretrained vs finetuned BioEmu free energy landscapes and secondary structure predictions. Source: Lewis et al., Science 2025, Figre S9.

For the DESRES benchmark, BioEmu achieved 0.74 kcal/mol mean absolute error in free energy differences between states, comparable to differences between classical MD force fields. Critically, BioEmu predicts not just folded and unfolded basins but also folding intermediates visible in 2D projections—for Protein G, both MD and BioEmu sample intermediates with partial β-sheet formation.

Figure 13: Validation on 17 CATH domains with >100 μs simulation time each, showing representative free energy surfaces, secondary structure propensities, macrostate MAE, and data scaling analysis. BioEmu achieves 0.9 kcal/mol error with systematic improvement as training data increases. Source: Lewis et al., Science 2025, Figre 3B.
Figure 14: Complete free energy landscapes and secondary structure comparisons for all 17 CATH test systems, comparing MD ground truth with BioEmu predictions on folded state samples (FNC > 0.5 filtering applied). Source: Lewis et al., Science 2025, Figre S10.

The CATH benchmark demonstrates generalization beyond training proteins. For 1040 CATH domains spanning diverse structural topologies, BioEmu was trained on varying fractions (1%, 10%, 100%) and evaluated on held-out domains. Free energy MAE decreased from ~1.5 to ~1.0 kcal/mol as training data increased, while state coverage improved from ~0.5 to ~0.75, suggesting continued improvement with more simulation data.

Figure 15: (Left) Complexin II (IDP): BioEmu samples flexible ensemble matching known secondary structure elements (central/accessory helices) with radius of gyration distributions compared to ff14sb and ff99sb-disp MD. (Right) CD9: BioEmu predicts widely open and closed states with similar SEL-LEL contact distributions to published MD, sampling structures close to experimental 6k4j (1.9Å RMSD) vs MD best match (4.6Å). Source: Lewis et al., Science 2025, Figre 3C-D.

Multi-conformation sampling captures functional transitions

Figure 16: BioEmu samples functionally distinct conformations across three benchmark classes: (A) Domain motions (adenylate kinase, LAO-binding protein, c-di-GMP receptor) with global RMSD landscapes; (B) Local unfolding (Ras p21, rhomboid protease, CaM kinase II) with fraction of native contacts; (C) Cryptic pockets (sialic acid binding factor, fascin, Glu PRPP amidotransferase) with local RMSD to apo/holo structures. Source: Lewis et al., Science 2025, Figre 2.

Performance varies by transition type:

The cryptic pocket asymmetry reveals a bias toward bound states, likely reflecting PDB composition (proteins often crystallized with multiple ligands but few apo structures). This suggests future training data curation should emphasize apo state diversity.

Figure 17: Multi-conformation performance (coverage and recall) as a function of maximum sequence similarity to training set. Domain motions show learning plateau at ~35% similarity, while other benchmarks show minimal dependence, indicating limited memorization beyond baseline similarity. Source: Lewis et al., Science 2025, Figre S5.

Predicting protein stability from equilibrium ensembles

Figure 18: MEGAscale stability predictions. (B) Absolute folding free energies for test proteins show 0.9 kcal/mol MAE and Spearman r~0.6, with performance stratified by sequence similarity to training set. (C) ΔΔG predictions for point mutants achieve 0.8 kcal/mol MAE and r>0.6, with systematic improvement at higher sequence similarity thresholds ($\text{>50%}$ vs $\text{<40%}$). Source: Lewis et al., Science 2025, Figre 4B-C.
Figure 19: Stability validation. (D) Very stable proteins from ProThermDB (ΔG < -8 kcal/mol) consistently predicted with high fraction of native contacts (FNC > 0.65). (E) Intrinsically disordered proteins from CALVADOS dataset show radius of gyration predictions correlating with experimental measurements and matching random coil Flory scaling. Source: Lewis et al., Science 2025, Figre 4D-E.
Figure 20: Mechanistic interpretation. Structural explanation of destabilizing mutations: (Left) HHH_rd1_0335 I7P mutation (ΔΔG_pred=1.8 vs ΔΔG_exp=2.1 kcal/mol) shows decreased helicity in first helix. (Right) 2JWS I24D mutation (ΔΔG_pred=2.1 vs ΔΔG_exp=2.9 kcal/mol) shows partial unfolding where hydrophobic-to-charged substitution disrupts core packing. Source: Lewis et al., Science 2025, Figre 4F.

The key advantage over black-box ΔΔG predictors (ThermoMPNN, ESM-1v, ProteinMPNN) is interpretability—BioEmu explains why mutations destabilize by showing structural changes in the ensemble, enabling rational design iterations.


Community reception combines enthusiasm with thoughtful critique

The scientific community has responded with significant enthusiasm. Martin Steinegger (Seoul National University) stated that “protein dynamics is the next frontier in discovery” and BioEmu “marks a significant step in this direction by enabling blazing-fast sampling of the free-energy landscape.” Zhidian Zhang (MIT) praised BioEmu for predicting “the distribution of different conformations, which is a much more difficult problem” than static structure prediction. Alberto Perez (University of Florida) expressed intention to “use BioEmu in my own work” and appreciation for the open-source release.

Nature Methods published a research highlight noting the “urgent need for methods that predict protein structural changes at scale.” Chemical & Engineering News positioned BioEmu as going “beyond AlphaFold.” Microsoft CEO Satya Nadella mentioned BioEmu in a July 2025 post about reducing protein motion analysis from years to hours.

Papers citing BioEmu have already emerged. eRMSF (Arantes et al., J Chem Inf Model 2025) provides a Python package specifically designed for ensemble-based RMSF analysis including BioEmu-generated ensembles. ESMDynamic (Kleiman et al., bioRxiv 2025) benchmarks against BioEmu, claiming to match or outperform it for transient contact prediction while offering orders-of-magnitude faster inference.

However, Sarfaraz K. Niazi (University of Illinois) published a formal critique in Science arguing that BioEmu’sThis critique, titled "The Quantum Paradox of Protein Characterization Invalidates BioEmu’s Structure-Based Ensemble Modeling", was published on 08-26-2025. “central premise—that computationally predicted ensembles reliably represent functionally relevant protein conformations—stands on a misleading epistemology.” He invoked the “quantum paradox of protein characterization,” positing that measurement or prediction inherently disrupts thermodynamic environments essential to function (Niazi 2025). The authors acknowledge limitations: soluble proteins only, single chains at fixed 300K temperature, no ligand or membrane interactions, and empirical rather than physics-based distributions.


Drug discovery applications center on cryptic pockets and protein stability

BioEmu’s ability to sample conformational changes has immediate drug discovery relevance. Cryptic pocket prediction—identifying binding sites absent in ground-state structures—could potentially double the druggable proteome. PocketMiner (Meller et al., Nature Communications 2023) achieved ROC-AUC of 0.87 for cryptic pocket identification, but BioEmu enables direct sampling of pocket-forming conformations rather than prediction from static features.

Traditional MD simulations require 100 microseconds to 10 milliseconds of simulation time for comprehensive conformational discovery, achievable only with special-purpose supercomputers (D.E. Shaw’s Anton) or massive distributed computing (Folding@home). BioEmu generates equilibrium ensemble snapshots in minutes, enabling proteome-scale analysis. The critical trade-off: BioEmu generates statistical samples from the equilibrium distribution, not time-ordered trajectories, so kinetic pathways between states cannot be modeled.

For protein stability prediction, BioEmu’s PPFT-derived thermodynamic accuracy complements existing methods. ThermoMPNN (Dieckhaus et al., PNAS 2024) achieves state-of-the-art benchmark performance through transfer learning, while RaSP provides deep learning predictions at >10,000x the speed of Rosetta. BioEmu’s unique contribution is explaining structure-stability relationships by analyzing generated ensembles—revealing mechanistic causes of mutant destabilization.

Industry adoption is accelerating. Relay Therapeutics uses its Dynamo platform for protein dynamics in drug discovery. Recursion conducts 2M+ experiments weekly with AI integration. Cradle raised $73M in 2024 for AI-powered protein engineering with major pharma partnerships. The investment thesis is compelling: AI-discovered drugs show 80-90% Phase 1 success rates compared to industry averages.


Technical resources enable practitioners to implement these methods

Yang Song’s foundational blog post provides the definitive introduction to score-based generative models, with Google Colab tutorials in JAX and PyTorch. For SE(3) equivariance, the official e3nn tutorial covers spherical tensor data types and practical implementation. The MIT course on flow matching (diffusion.csail.mit.edu) offers systematic development with hands-on Colab labs.

BioEmu’s code, model weights, and the largest sequence-diverse protein simulation dataset publicly available (>200ms of trajectories) have been released through Microsoft Research. The bioRxiv preprint provides full methodological details. For MSM reweighting, the Girsanov reweighting implementation guide (Schäfer and Keller, J Phys Chem B 2024) offers integration with OpenMM and the Deeptime time-series analysis package.


Conclusion: A transformative tool with clearly defined boundaries

BioEmu represents a genuine breakthrough in computational biology—not as a replacement for MD simulation, but as a complementary tool enabling new scales of hypothesis generation. The combination of AlphaFold-derived sequence encoding, diffusion-based structure generation, and experimental data integration through PPFT achieves what neither pure physics-based simulation nor pure data-driven learning could accomplish alone.

The key insight is amortization: the enormous cost of generating 200+ milliseconds of MD trajectories and half a million stability measurements is paid once during training, then amortized across infinite future predictions. For any new protein sequence, BioEmu provides rapid access to thermodynamically calibrated conformational ensembles without new simulations.

Critical limitations deserve emphasis. BioEmu generates equilibrium snapshots, not dynamics—kinetic mechanisms remain inaccessible. The restriction to soluble, single-chain proteins at 300K excludes membrane proteins, protein complexes, and temperature-dependent phenomena crucial for many therapeutic targets. The empirical learning of energy landscapes, rather than physics-based potential energy functions, means extrapolation beyond training data may be unreliable.

These boundaries are clearly defined, enabling appropriate use. For high-throughput screening of conformational changes across protein families, cryptic pocket discovery, and stability prediction, BioEmu offers transformative capability. For detailed mechanistic understanding of specific drug-target interactions, traditional MD with explicit ligands and membranes remains essential. The future likely lies in hybrid workflows combining both approaches—ML for rapid exploration, physics for detailed characterization.


Main Reference

Lewis et al., Scalable emulation of protein equilibrium ensembles with generative deep learning, Science 2025.

Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • When Transformers learn to speak protein
  • Deep Learning for Computational Structural Biology
  • Transformers Revolutionize Protein Structure Prediction and Design
  • IgFold - a fast, accurate antibody structure prediction
  • BoltzGen Redefines Protein Binder Design