pibert / README.md

smenaaliaga

Upload PIBot Joint BERT model package

d568351 verified 4 days ago

preview code

raw

history blame contribute delete

4.66 kB

metadata

language: es
tags:
  - intent-classification
  - slot-filling
  - joint-bert
  - spanish
  - economics
  - chile
  - multi-head
license: mit
base_model: microsoft/mdeberta-v3-base
pipeline_tag: token-classification

PIBot Joint BERT

Modelo Joint BERT multi-head para clasificación de intención y slot filling, especializado en consultas sobre indicadores macroeconómicos del Banco Central de Chile.

Arquitectura

Componente	Detalle
Base	`microsoft/mdeberta-v3-base`
Task	`pibimacecv3`
Intent heads	5 (`activity`, `calc_mode`, `investment`, `region`, `req_form`)
Slot labels	15 (BIO)
Custom code	`modeling_jointbert.py`, `module.py`

Intent Heads

Head	Clases	Valores
`activity`	3	`none`, `specific`, `general`
`calc_mode`	4	`original`, `prev_period`, `yoy`, `contribution`
`investment`	3	`none`, `specific`, `general`
`region`	3	`none`, `specific`, `general`
`req_form`	3	`latest`, `point`, `range`

Slot Entities (BIO)

Entidades extraídas: activity, frequency, indicator, investment, period, region, seasonality

Esquema BIO completo: 15 etiquetas (O, B-*, I-*).

Uso

Instalación

pip install torch transformers

Carga del Modelo

import torch
from transformers import AutoTokenizer, AutoConfig

# Cargar tokenizer y config
tokenizer = AutoTokenizer.from_pretrained("BCCh/pibert", trust_remote_code=True)
config = AutoConfig.from_pretrained("BCCh/pibert", trust_remote_code=True)

# Cargar labels desde el repo
from huggingface_hub import hf_hub_download
import os

label_dir = os.path.dirname(hf_hub_download("BCCh/pibert", "labels/slot_label.txt"))

# Leer intent y slot labels
def read_labels(path):
    with open(path) as f:
        return [line.strip() for line in f if line.strip()]

slot_labels = read_labels(os.path.join(label_dir, "slot_label.txt"))

# Preparar intent_label_lst para cada head
intent_label_lst = []
for head in ['activity', 'calc_mode', 'investment', 'region', 'req_form']:
    intent_label_lst.append(read_labels(os.path.join(label_dir, f"{head}_label.txt")))

# Cargar modelo con custom code
from transformers import AutoModelForTokenClassification
from modeling_jointbert import JointBERT  # auto-cargado con trust_remote_code

model = JointBERT.from_pretrained(
    "BCCh/pibert",
    config=config,
    intent_label_lst=intent_label_lst,
    slot_label_lst=slot_labels,
    trust_remote_code=True,
)
model.eval()

Predicción

text = "cuál fue el imacec de agosto 2024"
tokens = tokenizer(text, return_tensors="pt", padding=True, truncation=True)

with torch.no_grad():
    outputs = model(**tokens)
    # outputs contiene intent_logits (lista) y slot_logits

Estructura del Paquete

model_package/
├── config.json              # Configuración BERT + task
├── model.safetensors        # Pesos del modelo
├── tokenizer.json           # Tokenizer
├── tokenizer_config.json
├── special_tokens_map.json
├── vocab.txt
├── modeling_jointbert.py    # Arquitectura JointBERT (custom)
├── module.py                # CRF y módulos auxiliares
├── __init__.py
├── README.md                # Este archivo
└── labels/
    ├── slot_label.txt
    ├── activity_label.txt
    ├── calc_mode_label.txt
    ├── investment_label.txt
    ├── region_label.txt
    ├── req_form_label.txt

Datos de Entrenamiento

Entrenado con datos de consultas sobre indicadores macroeconómicos chilenos:

IMACEC (Indicador Mensual de Actividad Económica)
PIB (Producto Interno Bruto)
Sectores económicos, frecuencias, períodos, regiones

Limitaciones

Especializado en consultas macroeconómicas del Banco Central de Chile
Mejor rendimiento en consultas cortas (< 50 tokens)
Requiere trust_remote_code=True por la arquitectura custom

Cita

@misc{pibot-jointbert,
  author = {Banco Central de Chile},
  title = {PIBot Joint BERT - Multi-head Intent + Slot Filling},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/BCCh/pibert}}
}

Referencias

Licencia

MIT License