metadata
language: es
tags:
- intent-classification
- slot-filling
- joint-bert
- spanish
- economics
- chile
- multi-head
license: mit
base_model: microsoft/mdeberta-v3-base
pipeline_tag: token-classification
PIBot Joint BERT
Modelo Joint BERT multi-head para clasificación de intención y slot filling, especializado en consultas sobre indicadores macroeconómicos del Banco Central de Chile.
Arquitectura
| Componente | Detalle |
|---|---|
| Base | microsoft/mdeberta-v3-base |
| Task | pibimacecv3 |
| Intent heads | 5 (activity, calc_mode, investment, region, req_form) |
| Slot labels | 15 (BIO) |
| Custom code | modeling_jointbert.py, module.py |
Intent Heads
| Head | Clases | Valores |
|---|---|---|
activity |
3 | none, specific, general |
calc_mode |
4 | original, prev_period, yoy, contribution |
investment |
3 | none, specific, general |
region |
3 | none, specific, general |
req_form |
3 | latest, point, range |
Slot Entities (BIO)
Entidades extraídas: activity, frequency, indicator, investment, period, region, seasonality
Esquema BIO completo: 15 etiquetas (O, B-*, I-*).
Uso
Instalación
pip install torch transformers
Carga del Modelo
import torch
from transformers import AutoTokenizer, AutoConfig
# Cargar tokenizer y config
tokenizer = AutoTokenizer.from_pretrained("BCCh/pibert", trust_remote_code=True)
config = AutoConfig.from_pretrained("BCCh/pibert", trust_remote_code=True)
# Cargar labels desde el repo
from huggingface_hub import hf_hub_download
import os
label_dir = os.path.dirname(hf_hub_download("BCCh/pibert", "labels/slot_label.txt"))
# Leer intent y slot labels
def read_labels(path):
with open(path) as f:
return [line.strip() for line in f if line.strip()]
slot_labels = read_labels(os.path.join(label_dir, "slot_label.txt"))
# Preparar intent_label_lst para cada head
intent_label_lst = []
for head in ['activity', 'calc_mode', 'investment', 'region', 'req_form']:
intent_label_lst.append(read_labels(os.path.join(label_dir, f"{head}_label.txt")))
# Cargar modelo con custom code
from transformers import AutoModelForTokenClassification
from modeling_jointbert import JointBERT # auto-cargado con trust_remote_code
model = JointBERT.from_pretrained(
"BCCh/pibert",
config=config,
intent_label_lst=intent_label_lst,
slot_label_lst=slot_labels,
trust_remote_code=True,
)
model.eval()
Predicción
text = "cuál fue el imacec de agosto 2024"
tokens = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
outputs = model(**tokens)
# outputs contiene intent_logits (lista) y slot_logits
Estructura del Paquete
model_package/
├── config.json # Configuración BERT + task
├── model.safetensors # Pesos del modelo
├── tokenizer.json # Tokenizer
├── tokenizer_config.json
├── special_tokens_map.json
├── vocab.txt
├── modeling_jointbert.py # Arquitectura JointBERT (custom)
├── module.py # CRF y módulos auxiliares
├── __init__.py
├── README.md # Este archivo
└── labels/
├── slot_label.txt
├── activity_label.txt
├── calc_mode_label.txt
├── investment_label.txt
├── region_label.txt
├── req_form_label.txt
Datos de Entrenamiento
Entrenado con datos de consultas sobre indicadores macroeconómicos chilenos:
- IMACEC (Indicador Mensual de Actividad Económica)
- PIB (Producto Interno Bruto)
- Sectores económicos, frecuencias, períodos, regiones
Limitaciones
- Especializado en consultas macroeconómicas del Banco Central de Chile
- Mejor rendimiento en consultas cortas (< 50 tokens)
- Requiere
trust_remote_code=Truepor la arquitectura custom
Cita
@misc{pibot-jointbert,
author = {Banco Central de Chile},
title = {PIBot Joint BERT - Multi-head Intent + Slot Filling},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/BCCh/pibert}}
}
Referencias
Licencia
MIT License