AMALIA-VL

AMALIA Page Paper Data Train Github Eval Github

Model Card

AMALIA-VL is an open-source vision and language model targeting European Portuguese.

Model Description

AMALIA-VL is developed by a consortium of Portuguese universities and research centres, including NOVA University Lisbon, Instituto Superior Técnico, the University of Coimbra, the University of Porto, the University of Minho, and the Foundation for Science and Technology (FCT). Development also includes collaborations with the University of Beira Interior, the University of Évora, and the Lisbon School of Engineering (ISEL).

This project is funded by the Government of Portugal's Development and Innovation Programmes, with the goal of creating an effective, sovereign, and transparent LLM, tailored for European Portuguese.

Model Training

The model was trained on open-source data only, following a 3-stage training approach:

  • Modality Alignment: where only the connector is trained using 500k image captioning samples. Training ran for 4 hours, using 8 NVIDIA H100 GPUs.
  • Visual Instruction Following: where the full model is trained on a mix of text-only and vision+language data for complex instruction following. This stage teaches the model to understand and respond to instructions that require reasoning over both text and images, in conversational contexts. Training ran for 96 hours, using 64 NVIDIA H100 GPUs.
  • Direct Preference Optimization: where the model is finely tuned for overall performance and safety-aligned. In this phase, the model learns to distinguish between higher- and lower-quality responses to the same instruction, optimizing itself to generate more useful, safe, and value-aligned outputs, while simultaneously minimizing undesired behaviours such as hallucinations, toxicity, or deviations from the given instructions. Training ran for 8 hours, using 64 NVIDIA H100 GPUs.

All training phases were carried out on the MareNostrum5 supercomputer hosted at the Barcelona Supercomputing Center and the DEUCALION supercomputer hosted at Minho Advanced Computing Center.

Datasets

To train the model we used a combination of open-source datasets and synthetic data generated by the team to target specific model behaviours and extend the share of European Portuguese data. Additionally, many of the public datasets collected were partially translated to European Portuguese to ensure the model is well-aligned with the target language variant.

We release the complete datasets used for both Visual Instruction Following and Direct Preference Optimization.

For full details on both training datasets and setup, please refer to the technical report.

How to use

We strongly recommend transformers==4.57.6. This was the version used for the full AMALIA-VL development and any change can lead to degraded outputs and will not deliver the model's intended performance. Install it explicitly before running the model:

pip install transformers==4.57.6

Using HF pipeline

The model can be loaded with the transformers image-text-to-text API. Here is a minimal working example:

import torch
from transformers import AutoModelForImageTextToText, AutoProcessor

model_id = "amalia-llm/AMALIA-VL-DPO"

processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
            {"type": "text", "text": "Descreve esta imagem em detalhe."},
        ],
    },
]

inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

generated_ids = model.generate(**inputs, max_new_tokens=256)
output = processor.batch_decode(
    generated_ids[:, inputs["input_ids"].shape[1]:],
    skip_special_tokens=True,
)[0]
print(output)

Serving with vLLM

For high-throughput inference, you can serve AMALIA-VL with vLLM:

vllm serve amalia-llm/AMALIA-VL-DPO

Once the server is up, query it with, for example, curl:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "AMALIA-VL-SFT",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"
            }
          },
          {
            "type": "text",
            "text": "Descreve esta imagem em detalhe."
          }
        ]
      }
    ]
  }'

Intended Use

AMALIA is intended as a general-purpose, open language model for European Portuguese (pt-PT). Its primary intended uses include:

  • Conversational assistance and instruction following in European Portuguese: question answering, summarization, drafting, rewriting, translation, and general text generation.
  • Research and educational applications focused on the Portuguese language and culture, including Portuguese-language NLP research and the study of European Portuguese specifically.
  • A base model for downstream development. Given its open release, AMALIA is intended to be fine-tuned, adapted, and built upon by other developers and researchers, for example, in public-sector and sovereign AI applications.

The model targets European Portuguese specifically. While it will handle other Portuguese variants to some degree, it is optimized and curated for pt-PT and should not be assumed equivalent across variants.

Out-of-Scope Use

The following uses fall outside what AMALIA is designed or validated for:

  • High-stakes, unsupervised decision-making without qualified human oversight. AMALIA should not be used as a decision-making authority in domains where errors carry serious consequences, without a qualified human in the loop.
  • Use as a sole source of factual truth. The model can hallucinate. Outputs should be verified before use in any context that demands factuality. AMALIA was trained to support retrieval-augmented generation (RAG), so grounding responses in retrieved source documents is the recommended approach for factual or knowledge-intensive tasks. Even so, outputs should be checked in any factual-critical context.
  • Languages and cultural contexts other than Portuguese and English.
  • Reliance on safety guarantees. Safety tuning reduces but does not eliminate harmful, toxic, or biased outputs; the model should not be deployed in settings that assume it is fully safe by default. Instead, developers should employ appropriate guardrails and safety mechanisms tailored to their application, and ensure logging and traceability.
  • Generation of harmful, toxic or violated content. AMALIA must not be used to intentionally produce content that is harmful, abusive, discriminatory, or otherwise violating, including hate speech, harassment, or material that exploits or endangers individuals.
  • Production deployment without further evaluation. As an open model, AMALIA should undergo task-specific and domain-specific testing before being deployed in any production system.
  • Uses that violate applicable law. AMALIA must not be deployed in ways that conflict with the EU AI Act, the General Data Protection Regulation (GDPR), or other applicable legal frameworks. This includes prohibited AI practices under the AI Act, namely all high-risk applications that have not undergone the required conformity assessment, and any processing of personal data without a valid legal basis.

Ethical Considerations and Risk

Developing large language models raises a number of ethical concerns. By releasing AMALIA as an open model for European Portuguese, the consortium considered the following:

  • Bias and fairness. Language models trained on large-scale, real-world text reflect the socio-cultural biases present in their training material. The model is also explicitly targeted at European Portuguese, and its handling of the pt-PT/pt-BR distinction is itself a fairness consideration that the consortium treats as a first-class evaluation dimension.
  • Misinformation and misuse. Like any LLM, AMALIA can produce text that is false, misleading, or harmful. Developers building on AMALIA are encouraged to communicate these limitations to end users and to provide mechanisms for reporting misuse.
  • Transparency and accountability. This card documents the model's training data, training process, and intended and out-of-scope uses so that developers and researchers can make informed decisions. Releasing AMALIA openly is intended to make European Portuguese language technology accessible to the wider research and developer community.

Risks identified and mitigations:

  • Generation of harmful content. AMALIA's post-training included safety data in both the SFT and DPO phases, and the DPO phase explicitly targets reductions in toxicity and harmful output. These measures reduce but do not eliminate the risk; downstream developers should add their own content-safety safeguards appropriate to their product and use case.
  • Misuse for malicious purposes. The consortium provides documentation of the model's intended use and limitations, and encourages developers who build on AMALIA to establish their own mechanisms for users to report misuse within their applications.
  • Perpetuation of biases. Continuous monitoring, through evaluation metrics and human review, and the use of de-biasing techniques during training, fine-tuning, and downstream adaptation were encouraged to limit the reinforcement of existing biases.
  • Adversarial inputs and prompt injection. Applications built on AMALIA may be vulnerable to prompt injection and other adversarial inputs. Developers should apply input validation and output filtering appropriate to their deployment context.

Citation

If you use AMALIA-VL in your work, please cite:

@article{gloria2026amalia,
    title={AMALIA-VL: A Native European Portuguese Open-Source Vision and Language Model},
    author={Gl{\'o}ria-Silva, Diogo and Cardeira, Jo{\~a}o and da Luz, Manuel Letras and Simpl{\'\i}cio, Afonso and Vinagre, Gon{\c{c}}alo and Tavares, Diogo and Ferreira, Rafael and Calvo, In{\^e}s and Vieira, In{\^e}s and Semedo, David and others},
    journal={arXiv preprint arXiv:2606.19100},
    year={2026}
}
Downloads last month
7
Safetensors
Model size
10B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for amalia-llm/AMALIA-VL-DPO

Finetuned
(1)
this model

Dataset used to train amalia-llm/AMALIA-VL-DPO

Collection including amalia-llm/AMALIA-VL-DPO

Paper for amalia-llm/AMALIA-VL-DPO