Choqok 1B (Version 0.0-alpha-1)
Choqok (Persian: چغوک) is a small language model or SLM which is based on Meta's LLaMA 3.2 1B and fine tuned on Persian data. This is the very first alpha of Choqok and you can use it freely and commercially in any ways you like!
Choqok is part of Jabir Project which is a project by Muhammadreza Haghiri in order to make democratized AI products and this will be our literal on device model we've made.
The name
Choqok in Khorasani (or Mashhadi) accent of Persian is a bird similar to swallow. Since it is a small model, we've decided to call this model Choqok as well.
How to use
This model requires about 3GB of VRAM to run (not tested on CPU, but it might not be a big deal to run this model on a machine with 4GB+ RAM as well). You also need a very basic knowledge of Python and know how to work with transformers
library.
First step is to install the transformers
library:
pip install transformers
Then, you must load dependencies from the library:
import transformers
import torch
Then, you have to initialize a pipeline:
model_id = "JabirTech/choqok-1B-0.0-alpha-1"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
And now, you can do the inference:
messages = [
{"role": "user", "content": "Write a python code for calculation of the factorial of X using recursive functions"},
]
outputs = pipeline(
messages,
max_new_tokens=4096,
)
print(outputs[0]["generated_text"][-1]['content'])
API
You also can use our OpenAI compatible API and put the model on choqok
to test this. API documentations are available at this link.
Chat UI
You also can use Jabir Project's Chat UI in order to use the model. Just put it on Choqok and start using it.
Known issues
- It says its name is LLaMA and made by Meta AI: Yes, because it is a fine tune and LoRA Merge on top of the original model.
- It generates weak Persian content: We're working hard on this to make it happen.
- It leaves Chinese, Russian, Thai or Hebrew Characters in Persian or Arabic content: We're aware of that. This is because of the tokenization and we're working hard to have a much better tokenizer for our model.
- Persian responses aren't coherent: We know. We're trying our best to make this one of the best on device models with Persian/Arabic support.
Community
Notes
- Downloads last month
- 44