LLM sizes, and when to use them: 100M-500M param, encoder-only: you have a straightforward classification/regression task, or you need local embedd
LLM sizes, and when to use them: 100M-500M param, encoder-only: you have a straightforward classification/regression task, or you need local embedd