Accelerating Structure-Guided Drug Development via a Hybrid Generative AI Model for de Novo Protein Structure Prediction Integrating Experimental Cryoem Data
Organisations involved
Main Participant: PUXANO is a Belgian SME specialising in AI-driven protein design and cryo-electron microscopy, supporting clients in the pharmaceutical, biotechnology and agricultural sectors.
Technology Expert: CSIC, a leading Spanish research centre, develops advanced computational tools for protein structure reconstruction using cryo-EM imaging.
HPC Provider: IT4I, a major Czech research and innovation centre, operates the Karolina supercomputer and provides High-Performance Computing expertise for large-scale AI training.
The challenge
Developing new medicines is slow, costly and uncertain, with investment often exceeding US$2.6 billion dollars per research programme. This slows the delivery of new treatments and leaves many patient groups without effective options. Structure-guided drug development can accelerate early research, but current AI-driven protein prediction tools are still not accurate enough to replace labour-intensive experimental methods.
A potential solution is to combine AI models with sparse experimental data so predictions become both fast and reliable. PUXANO identified cryo-electron microscopy as the best source of structural constraints, supported by its expertise in cryo-EM sample preparation, however, no available dataset or workflow could combine AI-generated structures with cryo-EM information at scale.
To address this gap, the team needed to build a new training dataset that blended standard protein structure data with terabytes of experimental and simulated cryo-EM images. They also had to develop an ultrafast alignment algorithm able to run millions of times during model training and retrain the Boltz model so it could learn to incorporate cryo-EM constraints directly into its predictions. This process required extensive GPU power well beyond PUXANO’s internal computing capacity.
The Solution
HARMONY is a hybrid model that combines AI-based protein prediction with sparse cryo-electron microscopy data to deliver accurate and cost-effective structures, up to 10–15 times cheaper than standard cryo-EM. The workflow begins with an AI-generated draft, which is refined using cryo-EM images to correct errors and improve reliability. A key breakthrough was the optimisation of alignment algorithms, making training about 100 times faster.
Developing HARMONY required analysing terabytes of cryo-EM data and running 200,000 GPU hours of training—far beyond PUXANO’s internal capacity. By utilising EuroHPC Karolina supercomputer, PUXANO and its partners trained a tailored model for de novo structure prediction. HARMONY will be integrated into PUXANO’s services and offered as a software package to accelerate structure-guided drug development.
Impact
HARMONY strengthens PUXANO’s position as a platform-based Contract Research Organization by enabling faster, more reliable protein structure determination at a fraction of the cost of traditional methods. Access to HPC resources allowed the company to develop a proprietary model that blends AI and experimental evidence, giving clients a practical alternative to time-consuming cryo-EM workflows.
For the pharmaceutical, biotechnology and agricultural sectors, the ability to generate accurate structures more quickly supports better decision-making in early-stage drug development. Clients can screen more antibody variants, vaccine candidates or small molecules within the same budget, improving project success rates and reducing risks.
Social impact stems from the potential acceleration of new therapeutics, which may help patient groups who currently face limited treatment options. The efficiency gains also reduce laboratory resource use, lowering long-term energy and material consumption during early research phases.
Benefits
- 20× faster alignment cuts per-iteration time from 30 s to 1.5 s, enabling large-scale AI training on cryo-EM data.
- Hybrid AI–cryo-EM workflow reduces structure-determination costs by 10–15× versus standard cryo-EM methods for clients.
- Screening capacity rises from 1–3 to 12–24 variants per session, improving early drug discovery decisions and project success rates.
- New HPC and data-processing know-how strengthens PUXANO’s platform and supports future AI-driven protein design services.