6 Tweets 4 reads Feb 11, 2023
Here’s the recipe to make never-ending, atmospheric music for games or videos.
1. Image captioner: convert a screenshot of your artwork to text description.
2. ChatGPT: generate infinite variations with an imaginative touch.
3. Music LLM: prompt -> synthesize music.
Details:🧵
Step 1: image captioner. You can use open-source models like GIT (github.com) or BLIP (arxiv.org)
Here’s a free app hosted on HuggingFace Spaces: huggingface.co
2/
Step 2: ChatGPT is able to generate lots of variations of the same scene description, while also filling in details that it imagines on the fly. These will become the prompts to synthesize an infinite stream of atmospheric music.
Below is the template you can use 👇
3/
Step 3: use any text-to-music models to synthesize the audio. Here’s another open-source model (“AudioLDM”) hosted on HuggingFace for you to try out today, for free!
huggingface.co
Thanks @_akhaliq for sharing this.
4/
If you are interested in how the latest and greatest music generative models work, below is my deep dive thread.
TL;DR: an AI storm is coming for music & SFX industries.
5/
I open-source many AI recipes and ideas. Welcome to check out my past writings and follow @DrJimFan 🙌

Loading suggestions...