Matt Shumer
Matt Shumer

@mattshumer_

8 Tweets 4 reads Jul 27, 2024
Introducing `llama-405b-to-8b` โœ๏ธ
Get the quality of Llama 3.1 405B, at a fraction of the cost and latency.
Give one example of your task, and 405B will teach 8B (~30x cheaper!!) how to do the task perfectly.
And it's open-source: github.com
This was made in partnership with @OctoAICloud โ€” particularly Ben Hamm, who adapted my existing prompt optimization tools to take advantage of the new Llama 3.1 models.
This approach was inspired by this tweet that went viral months ago.
I discovered that if you prompt Haiku w/ Opus-generated examples, it can match Opus' quality.
Now, we have even better 'teacher' models than Opus, and cheaper 'student' models than Haiku.
x.com
In production, Llama 3.1 405B-level AI quality at a low cost, with near-instant results, is a game changer.
This notebook makes it possible for anyone to implement this quickly.
So how does it work?
You give the AI a description of your task, along with one input/output example. That's it.
From there, it will generate seven other great, diverse examples that are similar in structure to your example.
It'll then use those + the task description to generate a system prompt.
It'll then put the examples in the right prompt format, and test the generated prompt against your initial input example, using Llama 3.1 8b.
Lastly, it'll save both the system prompt it generated and the AI-created examples in a Python file, pre-formatted for generation.
If you're building w/ LLMs, you NEED to try this.
If you'd like to try it or contribute, check out the Github repo, and check out @OctoAICloud if you're looking for scalable/fast/reliable inference for your models!
github.com

Loading suggestions...