File size: 2,279 Bytes
b690c07
e01a839
 
 
 
b690c07
e01a839
b690c07
e01a839
b690c07
 
 
e01a839
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
title: BeeCoder Demo
emoji: 🐝
colorFrom: gray
colorTo: yellow
sdk: gradio
sdk_version: 3.28.3
app_file: app.py
pinned: true
license: apache-2.0
---

# 🐝BeeCoder Demo🐝

## Code-Completion Playground 💻 with 🐝[BeeCoder](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA-python) Models

This is a demo playground for generating Python code with the power of 🐝[BeeCoder](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA-python), a **fine-tuned** version of the tiny [101M base model](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA) on a dataset of pypi packages.

ℹ️ This is not an instruction model but just a code completion tool.

---

**Intended Use**: This app and its [supporting model](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA-python) are provided for demonstration purposes only; not to serve as a replacement for human expertise. For more details on the model, please refer to the [model card](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA-python).

In our country, we say _"To let 100M parameters model generate python script and not validate is like to let monkey fly a plane"_. So please be careful with the generated code.

---

## Base Model Information

The base model, smol_llama-101M-GQA, was pretrained on a relatively few (< ~20B) high-quality tokens. It is tiny in size (101M parameters) but relatively powerful in performance. The training for the base model included datasets such as:

- [JeanKaddour/minipile](https://huggingface.co/datasets/JeanKaddour/minipile)
- [pszemraj/simple_wikipedia_LM](https://huggingface.co/datasets/pszemraj/simple_wikipedia_LM)
- [BEE-spoke-data/wikipedia-20230901.en-deduped](https://huggingface.co/datasets/BEE-spoke-data/wikipedia-20230901.en-deduped)
- [mattymchen/refinedweb-3m](https://huggingface.co/datasets/mattymchen/refinedweb-3m)

You can find more information about the base model [here](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA).

---

### Credits

This app is modified from a demo playground originally built for [StarCoder](https://huggingface.co/bigcode/starcoder) by [BigCode](https://huggingface.co/bigcode).  You can find the original demo [here](https://huggingface.co/spaces/bigcode/bigcode-playground).

---