banner

AfroBR-LangBench — Afro-Brazilian Portuguese Sociolinguistics Adapter

LoRA adapter fine-tuned on Llama-4-Scout-17B-16E-Instruct (109B) for respectful sociolinguistic reasoning about Afro-Brazilian Portuguese varieties, via Adaption's AutoScientist platform.


The problem this adapter addresses

Language models trained on standard text systematically treat Afro-Brazilian Portuguese features as errors rather than documented linguistic phenomena. When given "eles foi lá", a base model typically responds:

"Correction: the correct form is 'eles foram lá'"

The sociolinguistically adequate response is:

"This exemplifies Concordância Verbal Reduzida (CVR), documented in quilombola communities and studied by Lucchesi et al. (2009). Its origin lies in contact between colonial Portuguese and Bantu languages..."

This adapter teaches the model to explain, normalize respectfully, identify, and cite academic sources for 10 documented phenomena.


Adaptive Data results

Metric Before After
Quality score 6.0 9.1
Quality grade C A
Relative improvement +51.7%
Percentile (Language domain) 8.2 33.0

Training metrics

Metric Value
Base model meta-llama/Llama-4-Scout-17B-16E-Instruct (109B)
Trained model name adaption_pt_afro_brasileiro_qa
Training method SFT + LoRA
LoRA rank (r) 16
LoRA alpha 32
LoRA dropout 0.1
Trainable modules all-linear
Epochs 4
Training steps 88
Learning rate 7e-5 (cosine scheduler)
Warmup ratio 0.05
Weight decay 0.03
Dataset size 400 examples (Grade A)

Dataset

Platform Link
HuggingFace Dataset Fernandosr85/adaption-pt-afro-brasileiro-qa
Kaggle Dataset afrobr-langbench-sociolinguistics-dataset
Kaggle Notebook AfroBR-LangBench

400 instruction-tuning examples across 4 task categories:

Category Task Examples
A Sociolinguistic explanation without prejudice 150
B Respectful normalization to standard register 100
C Identification of linguistic phenomena 100
D RAG-style questions with academic citations 50

10 documented phenomena

Code Phenomenon
CVR Concordância Verbal Reduzida
CNR Concordância Nominal Reduzida
APR Apagamento do /r/ em coda silábica
TOP Topicalização com Deslocamento à Esquerda
AGT Uso de 'a gente' como pronome de 1ª pessoa do plural
NPV Negação Pós-verbal
MON Monotongação de Ditongos
PREP Variação no Uso de Preposições
CLC Ausência de Clítico Acusativo de 3ª Pessoa
MAA Marcadores Aspectuais de Origem Africana

Academic sources

  • Lucchesi, D., Baxter, A., Ribeiro, I. (2009). O Português Afro-Brasileiro. EDUFBA.
  • Projeto Vertentes (UFBA, 2001–) — speech corpus from quilombola communities in Bahia
  • Cyrino, S. (1997). O objeto nulo no Português do Brasil. UNICAMP.
  • Galves, C. (2001). Ensaios sobre as gramáticas do português. UNICAMP.
  • Schwenter, S. A. (2005). The pragmatics of negation in Brazilian Portuguese. Lingua.
  • Holm, J. (2004). Languages in Contact: The Partial Restructuring of Vernaculars. Cambridge University Press.
  • Castro, Y. P. (2001). Línguas africanas no Brasil. CEAO/UFBA.

Credits


Disclaimer

Experimental research artifact submitted to AutoScientist Challenge 2026 (Language category). This adapter is intended for linguistic research and education. It does not represent or speak for Afro-Brazilian communities.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Fernandosr85/afrobr-langbench-adapter

Spaces using Fernandosr85/afrobr-langbench-adapter 2