Misha Laskin
Staff Research Scientist @DeepMind. Previously @berkeley_ai. YC alum.
View on πThreads
GPT has been a core part of the unsupervised learning revolution thatβs been happening in NLP. In part 2 of the transformer series, weβll build GPT from the ground up. This thread...
Transformers are arguably the most impactful deep learning architecture from the last 5 yrs. In the next few threads, weβll cover multi-head attention, GPT and BERT, Vision Transf...
Patch extraction is a fundamental operation in deep learning, especially for computer vision. By the end of this thread, youβll know how to implement an efficient vectorized patch...