STAT+: AbbVie, J&J to add proprietary data to AI protein model in bid to accelerate drug discovery
Imagine standing on a vast, dark plain. Without light, you cannot see dips and rolls in the grass or make out hills and valleys. Even if there’s a city off…

Imagine standing on a vast, dark plain. Without light, you cannot see dips and rolls in the grass or make out hills and valleys. Even if there’s a city off in the distance to your right, it does nothing to illuminate the darkness on your left, unless there are pinpricks of light there which might indicate a mountain or level ground. So, too, is the vast, unexplored drug-hunting territory of chemical space, waiting to be illuminated by data’s light.
Every AI model trained for biology only can see what’s illuminated by the data points it is trained on. AlphaFold succeeded in predicting protein structures because the 200,000 or so known protein structures in the Protein Data Bank covered enough of the limited ways amino acids can combine that the model was able to understand what almost the entire protein structure space looked like. But ask the PDB for only the structures where proteins are hugging other proteins or — even rarer — interacting with drug-like molecules, and there’s nowhere near enough illumination for AI biology models to understand what the topography of those plains look like, much less make useful predictions for drug discovery.
Life sciences data company Apheris on Thursday announced an effort to boost the capabilities of protein AI models by uniting several pharmaceutical companies’ proprietary data. Apheris’ consortium of pharma companies is partnering with OpenFold3 — Columbia professor Mohammed AlQuraishi’s open-source dupe of AlphaFold3 — to train the model on AbbVie and Johnson & Johnson’s vast stores of structural data. The collaboration will focus on structures relevant to drug discovery, such as small molecule-protein and antibody-antigen interactions.