QuantFactory
/

FastLlama-3.2-1B-Instruct-GGUF

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

FastLlama-3.2-1B-Instruct-GGUF / README.md

aashish1904's picture

Upload README.md with huggingface_hub

f4ae738 verified about 1 month ago

|

3.37 kB


	---

	library_name: transformers
	tags:
	- math
	- lora
	- science
	- chemistry
	- biology
	- code
	- text-generation-inference
	- unsloth
	- llama
	license: apache-2.0
	datasets:
	- HuggingFaceTB/smoltalk
	language:
	- en
	- de
	- es
	- fr
	- it
	- pt
	- hi
	- th
	base_model:
	- meta-llama/Llama-3.2-1B-Instruct

	---

	[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)


	# QuantFactory/FastLlama-3.2-1B-Instruct-GGUF
	This is quantized version of [suayptalha/FastLlama-3.2-1B-Instruct](https://huggingface.co/suayptalha/FastLlama-3.2-1B-Instruct) created using llama.cpp

	# Original Model Card


	![FastLlama-Logo](FastLlama.png)

	You can use ChatML & Alpaca format.

	You can chat with the model via this [space](https://huggingface.co/spaces/suayptalha/Chat-with-FastLlama).

	Overview:

	FastLlama is a highly optimized version of the Llama-3.2-1B-Instruct model. Designed for superior performance in constrained environments, it combines speed, compactness, and high accuracy. This version has been fine-tuned using the MetaMathQA-50k section of the HuggingFaceTB/smoltalk dataset to enhance its mathematical reasoning and problem-solving abilities.

	Features:

	Lightweight and Fast: Optimized to deliver Llama-class capabilities with reduced computational overhead.
	Fine-Tuned for Math Reasoning: Utilizes MetaMathQA-50k for better handling of complex mathematical problems and logical reasoning tasks.
	Instruction-Tuned: Pre-trained on instruction-following tasks, making it robust in understanding and executing detailed queries.
	Versatile Use Cases: Suitable for educational tools, tutoring systems, or any application requiring mathematical reasoning.

	Performance Highlights:

	Smaller Footprint: The model delivers comparable results to larger counterparts while operating efficiently on smaller hardware.
	Enhanced Accuracy: Demonstrates improved performance on mathematical QA benchmarks.
	Instruction Adherence: Retains high fidelity in understanding and following user instructions, even for complex queries.

	Loading the Model:
	```py
	import torch
	from transformers import pipeline

	model_id = "suayptalha/FastLlama-3.2-1B-Instruct"
	pipe = pipeline(
	"text-generation",
	model=model_id,
	device_map="auto",
	)
	messages = [
	{"role": "system", "content": "You are a friendly assistant named FastLlama."},
	{"role": "user", "content": "Who are you?"},
	]
	outputs = pipe(
	messages,
	max_new_tokens=256,
	)
	print(outputs[0]["generated_text"][-1])
	```

	Dataset:

	Dataset: MetaMathQA-50k

	The MetaMathQA-50k subset of HuggingFaceTB/smoltalk was selected for fine-tuning due to its focus on mathematical reasoning, multi-step problem-solving, and logical inference. The dataset includes:

	Algebraic problems
	Geometric reasoning tasks
	Statistical and probabilistic questions
	Logical deduction problems

	Model Fine-Tuning:

	Fine-tuning was conducted using the following configuration:

	Learning Rate: 2e-4

	Epochs: 1

	Optimizer: AdamW

	Framework: Unsloth

	License:

	This model is licensed under the Apache 2.0 License. See the LICENSE file for details.

	[☕ Buy Me a Coffee](https://www.buymeacoffee.com/suayptalha)