---
base_model: BEE-spoke-data/smol_llama-101M-GQA-python
datasets:
- BEE-spoke-data/pypi_clean-deduped
inference: false
language:
- en
license: apache-2.0
metrics:
- accuracy
model_creator: BEE-spoke-data
model_name: smol_llama-101M-GQA-python
pipeline_tag: text-generation
quantized_by: afrideva
source_model: BEE-spoke-data/smol_llama-101M-GQA
tags:
- python
- codegen
- markdown
- smol_llama
- gguf
- ggml
- quantized
- q2_k
- q3_k_m
- q4_k_m
- q5_k_m
- q6_k
- q8_0
widget:
- example_title: Add Numbers Function
  text: "def add_numbers(a, b):\n    return\n"
- example_title: Car Class
  text: "class Car:\n    def __init__(self, make, model):\n        self.make = make\n
    \       self.model = model\n\n    def display_car(self):\n"
- example_title: Pandas DataFrame
  text: 'import pandas as pd

    data = {''Name'': [''Tom'', ''Nick'', ''John''], ''Age'': [20, 21, 19]}

    df = pd.DataFrame(data).convert_dtypes()

    # eda

    '
- example_title: Factorial Function
  text: "def factorial(n):\n    if n == 0:\n        return 1\n    else:\n"
- example_title: Fibonacci Function
  text: "def fibonacci(n):\n    if n <= 0:\n        raise ValueError(\"Incorrect input\")\n
    \   elif n == 1:\n        return 0\n    elif n == 2:\n        return 1\n    else:\n"
- example_title: Matplotlib Plot
  text: 'import matplotlib.pyplot as plt

    import numpy as np

    x = np.linspace(0, 10, 100)

    # simple plot

    '
- example_title: Reverse String Function
  text: "def reverse_string(s:str) -> str:\n    return\n"
- example_title: Palindrome Function
  text: "def is_palindrome(word:str) -> bool:\n    return\n"
- example_title: Bubble Sort Function
  text: "def bubble_sort(lst: list):\n    n = len(lst)\n    for i in range(n):\n        for
    j in range(0, n-i-1):\n"
- example_title: Binary Search Function
  text: "def binary_search(arr, low, high, x):\n    if high >= low:\n        mid =
    (high + low) // 2\n        if arr[mid] == x:\n            return mid\n        elif
    arr[mid] > x:\n"
---
# BEE-spoke-data/smol_llama-101M-GQA-python-GGUF

Quantized GGUF model files for [smol_llama-101M-GQA-python](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA-python) from [BEE-spoke-data](https://huggingface.co/BEE-spoke-data)


| Name | Quant method | Size |
| ---- | ---- | ---- |
| [smol_llama-101m-gqa-python.fp16.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-python-GGUF/resolve/main/smol_llama-101m-gqa-python.fp16.gguf) | fp16 | None  |
| [smol_llama-101m-gqa-python.q2_k.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-python-GGUF/resolve/main/smol_llama-101m-gqa-python.q2_k.gguf) | q2_k | None  |
| [smol_llama-101m-gqa-python.q3_k_m.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-python-GGUF/resolve/main/smol_llama-101m-gqa-python.q3_k_m.gguf) | q3_k_m | None  |
| [smol_llama-101m-gqa-python.q4_k_m.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-python-GGUF/resolve/main/smol_llama-101m-gqa-python.q4_k_m.gguf) | q4_k_m | None  |
| [smol_llama-101m-gqa-python.q5_k_m.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-python-GGUF/resolve/main/smol_llama-101m-gqa-python.q5_k_m.gguf) | q5_k_m | None  |
| [smol_llama-101m-gqa-python.q6_k.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-python-GGUF/resolve/main/smol_llama-101m-gqa-python.q6_k.gguf) | q6_k | None  |
| [smol_llama-101m-gqa-python.q8_0.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-python-GGUF/resolve/main/smol_llama-101m-gqa-python.q8_0.gguf) | q8_0 | None  |


## Original Model Card:
# smol_llama-101M-GQA: python

> 400MB of buzz: pure Python programming nectar! 🍯

This model is the general pre-trained checkpoint `BEE-spoke-data/smol_llama-101M-GQA` trained on a deduped version of `pypi` for +1 epoch. Play with the model in [this demo space](https://huggingface.co/spaces/BEE-spoke-data/beecoder-playground).

- Its architecture is the same as the base, with some new Python-related tokens added to vocab prior to training.
- It can generate basic Python code and markdown in README style, but will struggle with harder planning/reasoning tasks
- This is an experiment to test the abilities of smol-sized models in code generation; meaning **both** its capabilities and limitations

Use with care & understand that there may be some bugs 🐛 still to be worked out.

## Usage 

📌 Be sure to note:

1. The model uses the "slow" llama2 tokenizer. Set use_fast=False when loading the tokenizer.
2. Use transformers library version 4.33.3 due to a known issue in version 4.34.1 (_at time of writing_)

> Which llama2 tokenizer the API widget uses is an age-old mystery, and may cause minor whitespace issues (widget only).

To install the necessary packages and load the model:

```python
# Install necessary packages
# pip install transformers==4.33.3 accelerate sentencepiece

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(
    "BEE-spoke-data/smol_llama-101M-GQA-python",
    use_fast=False,
)
model = AutoModelForCausalLM.from_pretrained(
    "BEE-spoke-data/smol_llama-101M-GQA-python",
    device_map="auto",
)

# The model can now be used as any other decoder
```

### longer code-gen example


Below is a quick script that can be used as a reference/starting point for writing your own, better one :)


<details>
<summary>🔥 Unleash the Power of Code Generation! Click to Reveal the Magic! 🔮</summary>

Are you ready to witness the incredible possibilities of code generation? 🚀. Brace yourself for an exceptional journey into the world of artificial intelligence and programming. Observe a script that will change the way you create and finalize code.

This script provides entry to a planet where machines can write code with remarkable precision and imagination. 

```python
"""
simple script for testing model(s) designed to generate/complete code

See details/args with the below. 
    python textgen_inference_code.py --help
"""
import logging
import random
import time
from pathlib import Path

import fire
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

logging.basicConfig(format="%(levelname)s - %(message)s", level=logging.INFO)


class Timer:
    """
    Basic timer utility.
    """

    def __enter__(self):

        self.start_time = time.perf_counter()
        return self

    def __exit__(self, exc_type, exc_value, traceback):

        self.end_time = time.perf_counter()
        self.elapsed_time = self.end_time - self.start_time
        logging.info(f"Elapsed time: {self.elapsed_time:.4f} seconds")


def load_model(model_name, use_fast=False):
    """ util for loading model and tokenizer"""
    logging.info(f"Loading model: {model_name}")
    tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=use_fast)
    model = AutoModelForCausalLM.from_pretrained(
        model_name, torch_dtype="auto", device_map="auto"
    )
    model = torch.compile(model)
    return tokenizer, model


def run_inference(prompt, model, tokenizer, max_new_tokens: int = 256):
    """
    run_inference

    Args:
        prompt (TYPE): Description
        model (TYPE): Description
        tokenizer (TYPE): Description
        max_new_tokens (int, optional): Description

    Returns:
        TYPE: Description
    """
    logging.info(f"Running inference with max_new_tokens={max_new_tokens} ...")
    with Timer() as timer:
        inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            min_new_tokens=8,
            renormalize_logits=True,
            no_repeat_ngram_size=8,
            repetition_penalty=1.04,
            num_beams=4,
            early_stopping=True,
        )
    text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
    logging.info(f"Output text:\n\n{text}")
    return text


def main(
    model_name="BEE-spoke-data/smol_llama-101M-GQA-python",
    prompt:str=None,
    use_fast=False,
    n_tokens: int = 256,
):
    """Summary

    Args:
        model_name (str, optional): Description
        prompt (None, optional): specify the prompt directly (default: random choice from list)
        n_tokens (int, optional): max new tokens to generate
    """
    logging.info(f"Inference with:\t{model_name}, max_new_tokens:{n_tokens}")

    if prompt is None:
        prompt_list = [
            '''
            def print_primes(n: int):
               """
               Print all primes between 1 and n
               """''',
            "def quantum_analysis(",
            "def sanitize_filenames(target_dir:str, recursive:False, extension",
        ]
        prompt = random.SystemRandom().choice(prompt_list)

    logging.info(f"Using prompt:\t{prompt}")

    tokenizer, model = load_model(model_name, use_fast=use_fast)

    run_inference(prompt, model, tokenizer, n_tokens)


if __name__ == "__main__":
    fire.Fire(main)
```

Wowoweewa!! It can create some file cleaning utilities.


</details>


---