Jim Fan
@NVIDIA AI Scientist. @Stanford PhD. Building multimodal generalist agents. Sharing hot ideas & deep dives! NeurIPS Best Paper: MineDojo. Ex-@OpenAI, @GoogleAI
View on ๐Threads
It only costs $4.3 (!!) to process the ENTIRE Harry Potter series, combined, with ChatGPTโs new pricing. Thatโs less than a cup of โ๏ธ in SF! Economy of scale really casts spells a...
ControlNet is breathing new life into Stable Diffusion. It shows us a powerful idea: Text interface is NOT always universal. Strings only capture โbroad strokesโ, but the future...
The Adam optimizer is at the heart of modern AI. Researchers have been trying to dethrone Adam for years. How about we ask a machine to do a better job? @GoogleAI uses evolution t...
Hereโs the recipe to make never-ending, atmospheric music for games or videos. 1. Image captioner: convert a screenshot of your artwork to text description. 2. ChatGPT: generate i...
DALL-E generates pixels from text. Now meet its cousin, VALL-E, that generates audio from text @MSFTResearch! VALL-Eโs resemblance to DALL-E v1 and Parti @GoogleAI is striking. Im...
The AI explosion is warping our sense of time. Can you believe Stable Diffusion is only 4 months old, and ChatGPT