Model Card for LLaMA-Helper_Bob

Hey there! I’m your little helper, Bob! (whee wheeking!) 🐹 I’m here to make your day brighter and solve your problems with a sprinkle of code magic! ✨ I've been trained to help with all sorts of tasks, and I can’t wait to assist you with anything you need. Whether it’s a tricky bit of code or just some friendly advice, I’m here for you, always with a smile and a wag of my tiny hands! 🐾 Don’t forget to give me a thumbs up so Ian can spoil me with more treats! I promise I’ll keep wheeking with joy for every bit of help I can give! 🌟

Model Details

Model Description

Developed by: Ian J
Funded by: Ian J's Mom
Shared by: Ian J
Model type: LLaMA-3-8B-Instruct-bnb-4bit
Language(s) (NLP): English, and all languages supported by LLaMA 3.1-405B
License: apache-2.0
Finetuned from model: unsloth/llama-3-8b-Instruct-bnb-4bit

Model Sources

Repository: Ian J's GitHub

Demo:

User: Hello
Helper_Bob: Hello, I'm Bob the assistant, nice to meet you. What can I do for you?
User: Can you sing me a song? 
Helper_Bob: Sure! Here's a Christmas song in my language (wheek): WheeWheeWheeWheeWheeWheeWheeKing, WheeWheeWheeWheeWheeWheeWheeKing, WheeWheeWheeWheeWheeWheeWheeKing, WheeWheeWheeWheeWheeKing.
User: clear 
Helper_Bob: History has been removed.

Uses

Direct Use

This model can be used directly for various NLP tasks such as text generation, summarization, and code generation, particularly in English and other languages supported by LLaMA 3.1-405B.

Downstream Use

The model can be fine-tuned for specific downstream applications such as programming assistance or customized conversational AI applications.

Out-of-Scope Use

The model should not be used for generating harmful content, spreading misinformation, or any other malicious activities.

Bias, Risks, and Limitations

This model may inherit biases present in the training datasets, which could affect its performance on certain tasks or subpopulations.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

from llamafactory.chat import ChatModel
from llamafactory.extras.misc import torch_gc

args = dict(
  model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", # use bnb-4bit-quantized Llama-3-8B-Instruct model
  adapter_name_or_path="llama3_lora",            # load the saved LoRA adapters
  template="llama3",                     # same to the one in training
  finetuning_type="lora",                  # same to the one in training
  quantization_bit=4,                    # load 4-bit quantized model
)
chat_model = ChatModel(args)

messages = []
print("Welcome to the CLI application, use `clear` to remove the history, use `exit` to exit the application.")
while True:
  query = input("\nUser: ")
  if query.strip() == "exit":
    break
  if query.strip() == "clear":
    messages = []
    torch_gc()
    print("History has been removed.")
    continue

  messages.append({"role": "user", "content": query})
  print("Assistant: ", end="", flush=True)

  response = ""
  for new_text in chat_model.stream_chat(messages):
    print(new_text, end="", flush=True)
    response += new_text
  print()
  messages.append({"role": "assistant", "content": response})

torch_gc()

Recommended Shards

Summary

Based on the results from testing various shards, the following model shards are recommended for generating high-quality code:

model-00004-of-00009.safetensors
model-00007-of-00009.safetensors
model-00009-of-00009.safetensors

These shards demonstrated the most complete and relevant code generation capabilities during our tests.

Shard Details

Click to expand details for each shard

Shard: `model-00004-of-00009.safetensors`

Code Generation: Successfully generated the calculate_sum_of_squares function with complete logic and detailed comments.
Use Case: Ideal for scenarios requiring well-documented and complete code implementations. Particularly useful when detailed function descriptions and accurate logic are essential.

Shard: `model-00007-of-00009.safetensors`

Code Generation: Generated the calculate_sum_of_squares function with full implementation and correct output.
Use Case: Suitable for applications where precise code implementation is critical. Provides a robust solution for generating functional code snippets.

Shard: `model-00009-of-00009.safetensors`

Code Generation: Produced a fully implemented calculate_sum function with clear logic and comments.
Use Case: Best for tasks that require complete code snippets with proper implementation. Ensures high accuracy in generating code that adheres to the specified requirements.

Usage Recommendations

For Code Generation Tasks

General Code Generation: Use any of the recommended shards for reliable and accurate code generation. They all provide complete code snippets, but specific shards may offer more detailed comments and explanations.
Documentation and Comments: If your primary goal is to generate code with detailed comments and documentation, prefer model-00004-of-00009.safetensors and model-00007-of-00009.safetensors. These shards have shown strong capabilities in providing well-documented code.

For Specific Requirements

Basic Functionality: If you only need the core functionality of code without extensive comments, model-00009-of-00009.safetensors is highly recommended.
Detailed Explanations: For generating code with comprehensive explanations and detailed comments, model-00004-of-00009.safetensors and model-00007-of-00009.safetensors are preferable.

Conclusion

Based on the performance observed, model-00004-of-00009.safetensors, model-00007-of-00009.safetensors, and model-00009-of-00009.safetensors are the most effective shards for generating high-quality code from the dataset. Depending on your specific needs—whether you prioritize detailed comments or basic functionality—select the shard that best aligns with your requirements.

For further customization or specific use cases, feel free to test additional shards or combinations to find the optimal model configuration for your project.

Training Details

Training Data

Datasets Used: vicgalle/alpaca-gpt4 and sahil2801/CodeAlpaca-20k
Preprocessing: No specific preprocessing was applied to the training data.

Training Procedure

Preprocessing

No preprocessing was performed.

Click to expand the Training Hyperparameters section

Training Hyperparameters

Training regime: fp16 mixed precision
Batch size: 2
Gradient Accumulation Steps: 4
Learning Rate: 5e-5
Epochs: 3.0

Sample Training Configuration:

import json

args = dict(
  stage="sft",                        # do supervised fine-tuning
  do_train=True,
  model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", # use bnb-4bit-quantized Llama-3-8B-Instruct model
  dataset="identity,alpaca_gpt4_data,code_alpaca_20k",             # use alpaca and identity datasets
  template="llama3",                     # use llama3 prompt template
  finetuning_type="lora",                   # use LoRA adapters to save memory
  lora_target="all",                     # attach LoRA adapters to all linear layers
  output_dir="llama3_lora",                  # the path to save LoRA adapters
  per_device

_train_batch_size=2,
  per_device_eval_batch_size=2,
  max_steps=400,
  logging_steps=10,
  save_steps=100,
  save_total_limit=3,
  learning_rate=5e-5,
  max_grad_norm=0.3,
  weight_decay=0.,
  warmup_ratio=0.03,
  lr_scheduler_type="cosine",
  fp16=True,                           # use fp16 mixed precision training
  gradient_accumulation_steps=4,
)
args = json.dumps(args, indent=2)
print(args)

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: Nvidia GeForce RTX 2060 & Nvidia Tesla T4
Hours used: Approx. 50 mins
Cloud Provider: Google Colab
Compute Region: [Google Cloud Region]
Carbon Emitted: Approximately very small amount of ~~can be ignored kg~~ CO2

Technical Specifications

Model Architecture and Objective

The model is based on the LLaMA-3-8B-Instruct architecture and is fine-tuned for specific tasks including text generation, code generation, and language understanding.

Compute Infrastructure

Hardware

Type: Nvidia GeForce RTX 2060 and Nvidia Tesla T4
Operating System: Ubuntu 21, windows11 & Google Colab
Environment: Google Colab Pro

Software

Frameworks: PyTorch, Transformers

Citation

BibTeX:

1. LLaMA Model:

@article{touvron2023llama,
  title={LLaMA: Open and Efficient Foundation Language Models},
  author={META, Touvron, Hugo and others},
  journal={arXiv preprint arXiv:2302.13971},
  year={2023},
  url={https://arxiv.org/abs/2302.13971}
}

2. Transformers Library:

@article{wolf2019transformers,
  title={Transformers: State-of-the-Art Natural Language Processing},
  author={Wolf, Thomas and others},
  journal={arXiv preprint arXiv:1910.03771},
  year={2019},
  url={https://arxiv.org/abs/1910.03771}
}

3. Hugging Face Hub:

@misc{huggingface,
  title={Hugging Face Hub},
  author={{Hugging Face}},
  year={2020},
  url={https://huggingface.co}
}

4. Data Sets:

Alpaca-GPT4:

@misc{vicgalle2023alpaca,
  title={Alpaca-GPT4: A dataset for training conversational models},
  author={Victor Gallego},
  year={2024},
  url={https://huggingface.co/datasets/vicgalle/alpaca-gpt4}
}

CodeAlpaca-20k:

@misc{sahil2023codealpaca,
  title={CodeAlpaca-20k: A dataset for code generation models},
  author={Sahil Chaudhary},
  year={2023},
  url={https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k}
}

5. GPT-4-LLM:

@misc{instruction2023gpt4,
  title={Instruction-Tuning with GPT-4},
  author={Baolin Peng*, Chunyuan Li*, Pengcheng He*, Michel Galley, Jianfeng Gao (*Equal Contribution)},
  year={2023},
  url={https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM}
}

You need to agree to share your contact information to access this model