Instructions to use guaran-ia/gntweets-lm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use guaran-ia/gntweets-lm with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="guaran-ia/gntweets-lm") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("guaran-ia/gntweets-lm") model = AutoModelForMultimodalLM.from_pretrained("guaran-ia/gntweets-lm") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use guaran-ia/gntweets-lm with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "guaran-ia/gntweets-lm" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "guaran-ia/gntweets-lm", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/guaran-ia/gntweets-lm
- SGLang
How to use guaran-ia/gntweets-lm with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "guaran-ia/gntweets-lm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "guaran-ia/gntweets-lm", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "guaran-ia/gntweets-lm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "guaran-ia/gntweets-lm", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use guaran-ia/gntweets-lm with Docker Model Runner:
docker model run hf.co/guaran-ia/gntweets-lm
GNTweetsLM
The model GNTweetsLM is intended to be used to validate the quality of Guarani text. It was trained on a publicly available corpus of tweets written in Guarani and Jopara (Góngora et al. 2021).
⚠️ Although the model is based on a transformer-based architecture (Gemma2-9b-it), it was not developed as a generative tool — its primary use is to compute the perplexity score of Guarani documents. Lower perplexity may indicate text that is more predictable by the model and more similar to the reference high-quality corpus.
📌 Summary
- Model type: Gemma2 For Causal LM
- Base model:
princeton-nlp/gemma-2-9b-it-SimPO - Fine-tuning method: Full fine-tuning (all model weights updated)
- Primary task: Perplexity computation
- Dataset (HF):
guaran-ia/gntweets
🏗️ Model Details
- Architecture:
Gemma2ForCausalLM - Number of layers:
42 - Hidden size:
3584 - Attention heads:
16 - Feedforward intermediate size:
14336 - Vocabulary size:
256000 - Maximum context length:
8192tokens - Precision:
float16 - Tokenizer: saved in this folder via
tokenizer.jsonandtokenizer_config.json - Generation config: saved in
generation_config.json - Prompt template:
chat_template.jinja
⚙️ Training Details
- Batch size:
1 - Gradient accumulation:
1 - Learning rate:
2e-5 - Weight decay:
0.01 - Warmup steps:
100 - Optimizer:
paged_adamw_8bit - Scheduler:
linear - Epochs:
6 - Precision mode:
bf16 - Gradient checkpointing: enabled
🗃️ Dataset and Preprocessing
- Split strategy: train / validation / test
- Sequence length used for tokenization:
2048 - Train dataset size:
936records (1916928tokens) - Validation dataset size:
117records (239616tokens) - Test dataset size:
117records (239616tokens) - Tokenizer:
princeton-nlp/gemma-2-9b-it-SimPO - HF ID:
guaran-ia/gntweets
🚀 Usage
Compute perplexity for a given Guarani text using the fine-tuned model:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import math
model_id = 'guaran-ia/gntweets-lm'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
model.eval()
def perplexity(text: str) -> float:
inputs = tokenizer(text, return_tensors='pt')
with torch.no_grad():
outputs = model(**inputs, labels=inputs['input_ids'])
loss = outputs.loss
return math.exp(loss.item())
text = "Your Guarani text here."
print(f"Perplexity: {perplexity(text):.4f}")
Perplexity for long texts
If input length exceeds the model/tokenizer maximum (8192 tokens), you can follow the next recipe to compute perplexity over sliding chunks and average per-token loss.
import torch, math
def perplexity_sliding(text: str, model, tokenizer, max_len: int = 8192, stride: int = 4096):
"""Compute perplexity over long text by slicing into overlapping chunks.
- `max_len` should be <= model.config.max_position_embeddings (8192).
- `stride` controls overlap; larger overlap gives smoother per-token averaging.
"""
enc = tokenizer(text, return_tensors='pt')['input_ids'][0]
n = enc.size(0)
if n == 0:
return float('nan')
total_nll = 0.0
total_tokens = 0
start = 0
while start < n:
end = min(start + max_len, n)
input_ids = enc[start:end].unsqueeze(0)
with torch.no_grad():
outputs = model(input_ids, labels=input_ids)
# outputs.loss is the average NLL for the chunk
loss = outputs.loss.item()
chunk_len = end - start
total_nll += loss * chunk_len
total_tokens += chunk_len
if end == n:
break
start += stride
avg_nll = total_nll / total_tokens
return math.exp(avg_nll)
# Example usage:
text = open('some_guarani.txt', encoding='utf-8').read()
tokenizer.model_max_length = 8192
print(f"Perplexity (sliding): {perplexity_sliding(text, model, tokenizer):.4f}")
❗ Limitations and Notes
- The model may reflect biases present in the source corpus.
- License metadata is provided in this folder.
📜 License
This model checkpoint and accompanying files are released under the GNU General Public License v3 (GPLv3).
See the LICENSE file in this directory for the full license text.
- Downloads last month
- 10