Mohammed AlQuraishi
Mohammed AlQuraishi

@MoAlQuraishi

13 Tweets 29 reads Jun 23, 2022
We have successfully trained OpenFold from scratch, our trainable PyTorch implementation of AlphaFold2. The new OpenFold (OF) (slightly) outperforms AlphaFold2 (AF2). I believe this is the first publicly available reproduction of AF2. We learned a lot. A๐Ÿงต1/12
Back to model: as this scatterplot shows (GDT_TS scores on CAMEO-based validation set) accuracy is very comparable to AF2 but slightly higher on average with OF, perhaps because of our slightly larger training set. 3/12
A key finding is that AF2/OF accuracy climbs very sharply then tapers off for a long and gradual increase. While total training time took ~100K A100 hours, 90% of final accuracy could be achieved in ~3K hours. This has important implications for training AF2/OF variants. 4/12
Our PyTorch implementation has some advantages over the publicly available JAX implementation from DeepMind, beyond the obvious one of being trainable. 5/12
1st is speed: OF inference is up to 2x faster on short proteins even when excluding JAX compilation. On longer proteins advantage lessens, until AF2 begins to OOM (see 2nd point). Inference speed is key when coupled with fast MSA schemes like MMseqs2 6/12
2nd is memory: we use less due to optimizations and custom CUDA kernels, enabling inference of much longer sequences. In general we get up to ~4,600 residues on a 40GB A100 and we believe we can optimize further. 7/12
Preprint coming soon, with more details about what we learned during training and lots of ablation studies. 8/12
This was a big effort within the lab and with many external collaborators. Internally credit goes to the OF team led by @gahdritz (w/@SachinKadyan99, Luna Xia, Will Gerecke) and co-advised by @NazimBouatta and me. 9/12
Externally our collaborators at @nyuniversity (@dabkiel1), @ArzedaCo (Andrew Ban), @cyrusbiotech (@lucas_nivon), @nvidiahealth (@ItsRor, Abe Stern, Venkatesh Mysore, Marta Stepniewska-Dziubinska and Arkadiusz Nowaczynski), ... 10/12
โ€ฆ @OutpaceBio (@BrianWeitzner) and @PrescientDesign (@amw_stanford, @RichBonneauNYU) were pivotal in getting this off the ground and making it a reality. Thank you all! 11/12
This is far from the end of our OpenFold efforts; in fact it is only the beginning. Stay tuned for an exciting announcement soon! 12/12
Some folks I mindlessly forgot to acknowledge: @milot_mirdita, @thesteinegger, and @sokrypton have been incredibly helpful with working out MSA/mmseqs2 issues and providing early feedback on OpenFold.

Loading suggestions...