Image by author
StarCoder is a modern multi-language model designed specifically for coders. With an impressive 15.5B parameters and 8K extended context length, it excels in fill capabilities and facilitates fast large batch inference with multiple query focus.
StarCoderBase was trained on a massive database of 1 trillion tokens from The Stack. This collection consists of permissively licensed GitHub repositories, complete with verification tools and an opt-out process for privacy-conscious developers. To further increase its performance, the BigCode team fine-tuned StarCoderBase using 35B Python symbols.
As a result, StarCoder emerges as a powerful and sophisticated language model equipped with remarkable capabilities for a wide range of coding tasks.
Image from StarCoder Paper
StarCoderBase outperforms all existing open source code language models, providing support for multiple programming languages and showing exceptional performance, even surpassing the popular OpenAI code-cushman-001 model in terms of quality and results. Furthermore, StarCoder can be prompted to achieve 40% pass@1 in HumanEval. It outperforms the LaMDA, LLaMA and PaLM models.
Read the research paper to learn more about model evaluation.
BigCode – The StarCoder code completion playground is a great way to test model capabilities. You can play with different model formats, prefixes and add-ons to get the full experience.
In my opinion, it’s a great tool for code completion, especially for Python code. However, it has some drawbacks such as outdated APIs, hallucinations, displaying Jupyter Notebook metadata, and incomplete code.
The best way to code with StarCoder is to use well-explained comments. This will help the model better understand what you are trying to do and generate more accurate results.
Image from StartCoder Code Completion
If you are used to the ChatGPT style of code generation, you should try StarChat for code generation and optimization.
StarChat is a specialized version of StarCoderBase that has been fine-tuned on the Dolly and OpenAssistant datasets, resulting in a truly invaluable coding assistant. It’s a 16 billion parameter model pre-trained on one trillion tokens from 80+ programming languages, GitHub issues, Git commits, and Jupyter notebooks.
You can provide the command to StarChat and it will output the code with an explanation. You can also use the following instructions to modify the code.
Image from StarChat Playground
HF Code Autocomplete is a free and open source alternative to GitHub Copilot powered by StarCoder. I have been using it since its launch and I am quite impressed with its speed and accuracy.
HF code autocompletion VSCode extension
It works with all file types in Jupyter Notebook and VSCode. You just need to install the extension from the market and add the Hugging Face API.
Image by |: VSCode:
We have a constant need for advanced code assistants in our workplace who can efficiently manage repetitive scenarios while helping to build more complex systems.
In this blog, we have thoroughly explored StarCoder and its various applications. It’s worth noting that the open source community is tirelessly dedicated to pushing the boundaries of code assistance, constantly striving to deliver breakthrough solutions that enhance our coding experience and productivity.
I hope you enjoyed reading this blog and found it informative and insightful. Follow me on LinkedIn if you want to learn more about the latest AI technology.
Abid Ali Awan (@1abidaliawan:) is a certified data scientist who loves building machine learning models. He currently focuses on content creation and writes technical blogs on machine learning and data science technologies. Abid holds an MSc in Technology Management and a BS in Telecommunications Engineering. His vision is to create an AI product using a graph neural network for students struggling with mental illness.