Instructions to use guaran-ia/gntweets-lm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use guaran-ia/gntweets-lm with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="guaran-ia/gntweets-lm")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("guaran-ia/gntweets-lm")
model = AutoModelForMultimodalLM.from_pretrained("guaran-ia/gntweets-lm")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use guaran-ia/gntweets-lm with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "guaran-ia/gntweets-lm"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "guaran-ia/gntweets-lm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/guaran-ia/gntweets-lm

SGLang

How to use guaran-ia/gntweets-lm with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "guaran-ia/gntweets-lm" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "guaran-ia/gntweets-lm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "guaran-ia/gntweets-lm" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "guaran-ia/gntweets-lm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use guaran-ia/gntweets-lm with Docker Model Runner:
```
docker model run hf.co/guaran-ia/gntweets-lm
```

GNTweetsLM

The model GNTweetsLM is intended to be used to validate the quality of Guarani text. It was trained on a publicly available corpus of tweets written in Guarani and Jopara (Góngora et al. 2021).

⚠️ Although the model is based on a transformer-based architecture (Gemma2-9b-it), it was not developed as a generative tool — its primary use is to compute the perplexity score of Guarani documents. Lower perplexity may indicate text that is more predictable by the model and more similar to the reference high-quality corpus.

📌 Summary

Model type: Gemma2 For Causal LM
Base model: princeton-nlp/gemma-2-9b-it-SimPO
Fine-tuning method: Full fine-tuning (all model weights updated)
Primary task: Perplexity computation
Dataset (HF): guaran-ia/gntweets

🏗️ Model Details

Architecture: Gemma2ForCausalLM
Number of layers: 42
Hidden size: 3584
Attention heads: 16
Feedforward intermediate size: 14336
Vocabulary size: 256000
Maximum context length: 8192 tokens
Precision: float16
Tokenizer: saved in this folder via tokenizer.json and tokenizer_config.json
Generation config: saved in generation_config.json
Prompt template: chat_template.jinja

⚙️ Training Details

Batch size: 1
Gradient accumulation: 1
Learning rate: 2e-5
Weight decay: 0.01
Warmup steps: 100
Optimizer: paged_adamw_8bit
Scheduler: linear
Epochs: 6
Precision mode: bf16
Gradient checkpointing: enabled

🗃️ Dataset and Preprocessing

Split strategy: train / validation / test
Sequence length used for tokenization: 2048
Train dataset size: 936 records (1916928 tokens)
Validation dataset size: 117 records (239616 tokens)
Test dataset size: 117 records (239616 tokens)
Tokenizer: princeton-nlp/gemma-2-9b-it-SimPO
HF ID: guaran-ia/gntweets

🚀 Usage

Compute perplexity for a given Guarani text using the fine-tuned model:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import math

model_id = 'guaran-ia/gntweets-lm'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
model.eval()

def perplexity(text: str) -> float:
  inputs = tokenizer(text, return_tensors='pt')
  with torch.no_grad():
    outputs = model(**inputs, labels=inputs['input_ids'])
    loss = outputs.loss
  return math.exp(loss.item())

text = "Your Guarani text here."
print(f"Perplexity: {perplexity(text):.4f}")

Perplexity for long texts

If input length exceeds the model/tokenizer maximum (8192 tokens), you can follow the next recipe to compute perplexity over sliding chunks and average per-token loss.

import torch, math

def perplexity_sliding(text: str, model, tokenizer, max_len: int = 8192, stride: int = 4096):
  """Compute perplexity over long text by slicing into overlapping chunks.

  - `max_len` should be <= model.config.max_position_embeddings (8192).
  - `stride` controls overlap; larger overlap gives smoother per-token averaging.
  """
  enc = tokenizer(text, return_tensors='pt')['input_ids'][0]
  n = enc.size(0)
  if n == 0:
    return float('nan')

  total_nll = 0.0
  total_tokens = 0
  start = 0
  while start < n:
    end = min(start + max_len, n)
    input_ids = enc[start:end].unsqueeze(0)
    with torch.no_grad():
      outputs = model(input_ids, labels=input_ids)
      # outputs.loss is the average NLL for the chunk
      loss = outputs.loss.item()
    chunk_len = end - start
    total_nll += loss * chunk_len
    total_tokens += chunk_len
    if end == n:
      break
    start += stride

  avg_nll = total_nll / total_tokens
  return math.exp(avg_nll)

# Example usage:
text = open('some_guarani.txt', encoding='utf-8').read()
tokenizer.model_max_length = 8192
print(f"Perplexity (sliding): {perplexity_sliding(text, model, tokenizer):.4f}")

❗ Limitations and Notes

The model may reflect biases present in the source corpus.
License metadata is provided in this folder.

📜 License

This model checkpoint and accompanying files are released under the GNU General Public License v3 (GPLv3). See the LICENSE file in this directory for the full license text.

Downloads last month: 10

Safetensors

Model size

9B params

Tensor type

F16