Google DeepMind and NVIDIA Partner With EMBL to Release Millions of AI Predicted Protein Complex Structures for Global Health Research

EMBL, Google DeepMind, and NVIDIA release the largest-ever dataset of AI-predicted protein complexes to accelerate 2026 global health research.

By: AXL Media

Published: Mar 18, 2026, 8:50 AM EDT

Source: Information for this report was sourced from European Molecular Biology Laboratory

Google DeepMind and NVIDIA Partner With EMBL to Release Millions of AI Predicted Protein Complex Structures for Global Health Research - article image
Google DeepMind and NVIDIA Partner With EMBL to Release Millions of AI Predicted Protein Complex Structures for Global Health Research - article image

Visualizing the Building Blocks of Molecular Interaction

The landscape of digital biology has undergone a monumental shift with the release of millions of AI-predicted protein complex structures into the public domain. This initiative, a collaboration between Google DeepMind, NVIDIA, EMBL’s European Bioinformatics Institute, and Seoul National University, addresses a primary challenge in modern biology: understanding how proteins interact to fulfill essential life functions. While individual protein structures provide a blueprint, it is their interaction within complexes that drives cell behavior and disease progression. By making these complex interactions visible, the partnership aims to provide scientists with the necessary tools to identify molecular malfunctions at a level of detail previously thought impossible to achieve through traditional experimental methods.

Technical Innovation and Computational Scaling

The scale of this data release was made possible through significant technical accelerations in deep learning inference and sequence alignment. NVIDIA and the Steinegger Lab at Seoul National University developed a methodology that optimized Google DeepMind’s AlphaFold system, allowing for the processing of data that would otherwise require 17 million hours of individual GPU computing. This infrastructure allows for the democratization of high-level science, as researchers who lack access to massive supercomputing clusters can now download and analyze high-confidence predictions instantly. This move toward "open science at scale" ensures that the global research community can focus on biological discovery rather than the logistical hurdles of data generation.

Focusing on Global Health and Priority Pathogens

To maximize the immediate impact on human welfare, the newly released dataset focuses on 20 of the most heavily studied species, including humans, and a specific list of bacterial priority pathogens identified by the World Health Organization. The database now includes 1.7 million high-confidence "homodimer" predictions—complexes formed by two identical proteins—which are essential for understanding human metabolism and regulation. By prioritizing these specific targets, the collaboration provides an immediate resource for scientists working on the next generation of antibiotics and treatments for chronic conditions, effectively lowering t...

Categories

Topics

Related Coverage