11 Tweets 26 reads May 05, 2023
Introducing: 💫StarCoder
StarCoder is a 15B LLM for code with 8k context and trained only on permissive data in 80+ programming languages. It can be prompted to reach 40% pass@1 on HumanEval and act as a Tech Assistant.
Try it here: shorturl.at
Release thread🧵
In addition to chatting with StarCoder, it can also help you code in the new VSCode plugin. By pressing CTRL+ESC you can also check if the current code was in the pretraining dataset!
marketplace.visualstudio.com
We present the most extensive evaluation of code LLMs to date in the full tech report with 68 (!) authors.
You can also read up on all the details from data preprocessing and governance to training at scale!
drive.google.com
StarCoder was also trained on JupyterNotebooks and with Jupyter plugin from @JiaLi52524397 it can make use of previous code and markdown cells as well as outputs to predict the next cell.
You can install it here or search on chrome store: github.com
We release StarCoder under an OpenRAIL license agreement. This OpenRAIL: (i) makes more viable for companies to use and share the model; and (ii) promotes the sharing of AI documentation along the value chain.
huggingface.co
For example the folks at @refact_ai are working on a shiny VSCode extension that can now make use of StarCoder to autocomplete or refactor code as well as writing code from an instruction!
refact.ai
With @TolokaAI we recruited 1,399 crowd-workers across 35 countries to annotate a diverse dataset for PII in code. Our PII detection model surpasses regex-based tools, especially for secret keys. PII dataset and model are available via gated access.
hf.co

Loading suggestions...