EpistemeAI
/

OpenReasoner-Llama-3.2-3B-rs1.0

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

OpenReasoner-Llama-3.2-3B-rs1.0 / README.md

legolasyiu's picture

Update README.md

97bdc15 verified 8 days ago

|

history blame contribute delete

1.7 kB

	---
	base_model: EpistemeAI/ReasoningCore-Llama-3.2-3B-r1-V1.1
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- trl
	license: llama3.2
	language:
	- en
	---
	## Model Introduction
	Early experimental model uses unique advance form of supervised tuning. This training program loads the model, and than loads the data from dataset. It will provide data in inference time. Than it trains the LLM.
	During inference and than checks if it reaches the answer or goal. If not, it will keep training until it reaches the answer or solution.

	Context Window: 128k

	## Installation
	Update latest transformers
	```python
	pip install -U transformers
	```

	System prompt suggested for math:
	```python
	system_prompt="<problem>...</problem><solution>...</solution>"
	```


	Inference
	```python
	from transformers import pipeline
	model_id = "EpistemeAI/OpenReasoner-Llama-3.2-3B-rs1.0"
	pipe = pipeline(
	"text-generation",
	model=model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)
	print(pipe("What is larger 9.9 or 9.11?"))
	```

	## Reference
	Thank you so much to Hugging Face H4 and the dataset: [Math-500](https://huggingface.co/datasets/HuggingFaceH4/MATH-500)

	We use this as evaluator. It was not directly trained, it was used as a test


	# Uploaded model

	- Developed by: EpistemeAI
	- License: apache-2.0
	- Finetuned from model : EpistemeAI/ReasoningCore-Llama-3.2-3B-r1-V1.1

	This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)