Instructions to use Jmnlalu/MangARTI with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Jmnlalu/MangARTI with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/llama-3-8b-Instruct-bnb-4bit") model = PeftModel.from_pretrained(base_model, "Jmnlalu/MangARTI") - Transformers
How to use Jmnlalu/MangARTI with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="Jmnlalu/MangARTI")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Jmnlalu/MangARTI", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use Jmnlalu/MangARTI with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Jmnlalu/MangARTI to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Jmnlalu/MangARTI to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Jmnlalu/MangARTI to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Jmnlalu/MangARTI", max_seq_length=2048, )
MangARTI
This is a LoRA fine-tuned model based on unsloth/llama-3-8b-Instruct-bnb-4bit. It is specifically trained to translate text between Formal Bahasa Indonesia and the Manado dialect (Dialek Manado, North Sulawesi).
Model Details
Model Description
MangARTI leverages the Llama-3-8B architecture and has been fine-tuned using Unsloth and PEFT (LoRA). It understands specific instructions formatted in the Alpaca template to seamlessly translate sentences from Formal Indonesian to Manadonese, and vice versa.
- Developed by: Jonathan Immanuel Montolalu
- Model type: Causal Language Model (LoRA Fine-tune)
- Language(s) (NLP): Indonesian (ind), Manado Malay (xmm)
- License: llama3
- Finetuned from model: unsloth/llama-3-8b-Instruct-bnb-4bit
Uses
Direct Use
The primary use case for this model is text translation. It can be used by developers, researchers, or locals looking to build applications that bridge the communication gap between standard formal Indonesian and the regional dialect of Manado.
Out-of-Scope Use
This model is not intended for high-stakes, professional medical, or legal translations. Because it is an 8B parameter model trained on a specific regional dialect, it may hallucinate or struggle with highly technical jargon outside of standard everyday conversational contexts.
Training Data
The model was trained on a custom dataset containing paired sentences in Formal Bahasa Indonesia and Dialek Manado. The dataset was formatted using an instruction-based Alpaca template.
You can find the open-source dataset used to train this model here: Jmnlalu/Bahasa-Manado-Alpaca-Translations
Bias, Risks, and Limitations
Large language models can occasionally generate inaccurate translations or adopt biases present in their training data. Dialects often rely heavily on cultural context, slang, and tone, which may not always map 1:1 with formal Bahasa Indonesia. Users should verify critical translations.
Dynamic Language & Organic Typing The nature of local dialects like Bahasa Manado is highly dynamic and constantly evolving. Daily text messaging and casual chatting by locals often involve highly organic, non-standard spelling, and rapidly changing slang. Because of the sheer unpredictability and variance of this organic typing in the wild, the model can easily hallucinate, misinterpret context, or fail to recognize highly informal abbreviations.
Input Length Limitations MangARTI is not designed or trained to translate long paragraphs, articles, or extensive documents. The underlying dataset focused exclusively on single-sentence translations. Feeding the model multi-sentence paragraphs or large blocks of text will likely degrade the output quality, cause the model to lose context, or trigger hallucinations.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. When using the model in user-facing applications, it is recommended to include a disclaimer that the translations are AI-generated. To get the best results, users should input text one sentence at a time and attempt to use relatively standardized spelling even when typing in the local dialect.
How to Get Started with the Model
Use the code below to get started with the model. It requires the Alpaca prompt format to work correctly.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Jomnlalu/MangARTI"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""
# Choose your direction
direction = "Bahasa Indonesia Formal -> Dialek Manado"
text_to_translate = "Saya tidak tahu mau pergi ke mana hari ini."
if direction == "Bahasa Indonesia Formal -> Dialek Manado":
instruction = "Translate the following sentence from Formal Indonesian to Manado dialect."
else:
instruction = "Translate the following sentence from Manado dialect to Formal Indonesian."
inputs = tokenizer(
[
alpaca_prompt.format(instruction, text_to_translate, "")
], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
print(tokenizer.batch_decode(outputs, skip_special_tokens = True)[0])
- Downloads last month
- 51
Model tree for Jmnlalu/MangARTI
Base model
unsloth/llama-3-8b-Instruct-bnb-4bit