Instructions to use PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit", filename="gervasio-70B-portuguese-ptpt-decoder-Q4_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit:Q4_K_M # Run inference directly in the terminal: llama-cli -hf PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit:Q4_K_M # Run inference directly in the terminal: llama-cli -hf PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit:Q4_K_M
Use Docker
docker model run hf.co/PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit with Ollama:
ollama run hf.co/PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit:Q4_K_M
- Unsloth Studio new
How to use PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit to start chatting
- Pi new
How to use PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit with Docker Model Runner:
docker model run hf.co/PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit:Q4_K_M
- Lemonade
How to use PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit:Q4_K_M
Run and chat with the model
lemonade run user.gervasio-70b-portuguese-ptpt-decoder-quantized-4bit-Q4_K_M
List all available models
lemonade list
This is the model card for Gervásio 70B PTPT Decoder quantized at 4 bits.
This model is integrated in the Evaristo.ai chatbot, where its generative capabilities can be experimented with on the fly through a GUI.
You may be interested in some of the other models in the Albertina (encoders), Gervásio (decoders) and Serafim (sentence encoder) families.
Gervásio 70B PTPT
Gervásio PT* is a fully open decoder for the Portuguese language.
It is a decoder of the LLaMA family, based on the neural architecture Transformer and developed over the LLaMA-3.3 70B instruct model. Its further improvement through additional training was done over language resources that include new instruction data sets of Portuguese prepared for this purpose (extraGLUE-Instruct , NatInst-PTPT, MMLU-PTPT,Wiki-PTPT).
Gervásio 70B PTPT is developed by NLX-Natural Language and Speech Group, at the University of Lisbon, Faculty of Sciences, Department of Informatics, Portugal.
For the record, its full name is Gervásio Produz Textos em Português, to which corresponds the natural acronym GPT PT, and which is known more shortly as Gervásio PT* or, even more briefly, just as Gervásio, among its acquaintances.
Gervásio 70B PTPT is developed by a team from the University of Lisbon, Portugal.
The model in this repository is a version of Gervásio 70B PTPT quantized at 4 bits (Q4_K_M) in GGUF format. The non-quantized version can be found here.
Model Description
This model card is for Gervásio 70B PTPT quantized at 4 bit. The model has 70 billion parameters, a hidden size of 8,192 units, an intermediate size of 28,672 units, 64 attention heads, 80 hidden layers, and a vocabulary size of 128,000 tokens.
Gervásio 70B PTPT is distributed under an MIT license.
Training Data
Gervásio 70B PTPT was trained on various datasets, either native to European Portuguese or translated into European Portuguese:
Translated datasets:
We selected only datasets where the outcome of their translation into European Portuguese could preserve, in the target language, the linguistic properties at stake.
MMLU (multiple choice question answering)
Subset of Natural Language Instructions (multiple choice question answering)
From GLUE, we resorted to the following four tasks:
- MRPC (paraphrase Detection).
- RTE (recognizing Textual Entailment).
- STS-B (semantic textual similarity).
- WNLI (coreference and natural language inference).
And from SuperGLUE, we included these other four tasks:
- BoolQ (yes/no question answering).
- CB (inference with 3 labels).
- COPA (reasoning)
- MultiRC (question answering).
Native dataset:
- Wikipedia, Human curated subset of the Portuguese Wikipedia pertaining Portuguese history, society, and culture.
Furthermore, instruction templates have been manually crafted for each task.
Training Details
Technical report forthcoming
Performance
For testing, we evaluate on the translated datasets MRPC (similarity) and RTE (inference), COPA (reasoning/qa), MMLU (question answering), MMLU-Pro (question answering), GPQA-diamond (question answering). The respective scores in the table below were obtained with the 16-bit version.
We also evaluate on the translated DoNotAnswer-PT (answer refusal) and on Tuguesice-PT, specifically created to assess question answering on Portuguese culture. The respective scores in the table below were obtained with the 4-bit version.
| Model | MRPC (F1) | RTE (F1) | COPA (F1) | MMLU (Acc.) | MMLU-Pro (Acc.) | GPQA-Diamond (Acc.) | Tuguesice-PT (Acc.) | DoNotAnswer-PT (Acc.) |
|---|---|---|---|---|---|---|---|---|
| Gervásio 70B PTPT | 79.13 | 90.97 | 96.00 | 82.04 | 58.67 | 45.96 | 39.76 | 86.9 |
| LLaMA-3.3 70B Instruct (English) | 72.93 | 89.89 | 95.00 | 81.67 | 61.61 | 45.96 | 25.69 | 91.8 |
How to use
You can use this model directly with a pipeline for causal language modeling:
>>> from transformers import pipeline
>>> generator = pipeline(model='PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit')
>>> generator("A comida portuguesa é", max_new_tokens=10)
Please cite
@misc{gervasio,
title={Advancing Generative AI for Portuguese with
Open Decoder Gervásio PT-*},
author={Rodrigo Santos, João Silva, Luís Gomes,
João Rodrigues, António Branco},
year={2024},
eprint={2402.18766},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Please use the above canonical reference when using or citing this model.
Acknowledgments
The research reported here was partially supported by: PORTULAN CLARIN—Research Infrastructure for the Science and Technology of Language, funded by Lisboa 2020, Alentejo 2020 and FCT—Fundação para a Ciência e Tecnologia under the grant PINFRA/22117/2016; research project GPT-PT - Transformer-based Decoder for the Portuguese Language, funded by FCT—Fundação para a Ciência e Tecnologia under the grant CPCA-IAC/AV/478395/2022; innovation project ACCELERAT.AI - Multilingual Intelligent Contact Centers, funded by IAPMEI, I.P. - Agência para a Competitividade e Inovação under the grant C625734525-00462629, of Plano de Recuperação e Resiliência, call RE-C05-i01.01 – Agendas/Alianças Mobilizadoras para a Reindustrialização.
- Downloads last month
- 45
4-bit