Edit model card

Multi-Head Sequence Classification Model

Model description

The model is a simple sequence classification model based on hidden output layers of a pre-trained transformer model. Multiple heads are added to the output of the backbone to classify the input sequence.

Model architecture

The model is a simple sequence classification model based on hidden output layers of a pre-trained transformer model.

The backbone of the model is BAAI/bge-m3 with 1024 output dimensions.

An additional layer of (GGU: 3, sentiment: 3) is added to the output of the backbone to classify the input sequence.

You can find a mapping for the labels here:

GGU

  • 0: Greeting
  • 1: Gratitude
  • 2: Other

sentiment

  • 0: Positive
  • 1: Negative
  • 2: Neutral

The joint architecture was trained using the provided implementation (in repository) of MultiHeadClassificationTrainer.

Use cases

Use cases: text classification, sentiment analysis.

Model Inference

Inference code:


from transformers import AutoModel, AutoTokenizer
from .model import MultiHeadSequenceClassificationModel
import torch

model = MultiHeadSequenceClassificationModel.from_pretrained('philipp-zettl/multi-head-sequence-classification-model')
tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-m3')

def predict(text):
    inputs = tokenizer([text], return_tensors="pt", padding=True, truncation=True)
    outputs = model(**inputs)
    return outputs

Model Training

Confusion Matrix

GGU Confusion Matrix GGU

sentiment Confusion Matrix sentiment

Training Loss

GGU Loss GGU

sentiment Loss sentiment

Training data

The model has been trained on the following datasets:

Using the implementation provided by MultiHeadClassificationTrainer

Training procedure

The following code has been executed to train the model:

def train_classifier():
    backbone = AutoModel.from_pretrained('BAAI/bge-m3').to(torch.float16)
    tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-m3')
    device = 'cuda' if torch.cuda.is_available() else 'cpu'

    ggu_label_map = {
        0: 'Greeting',
        1: 'Gratitude',
        2: 'Other'
    }
    sentiment_label_map = {
        0: 'Positive',
        1: 'Negative',
        2: 'Neutral'
    }

    num_labels = len(ggu_label_map.keys())

    # HParams

    dropout = 0.25
    learning_rate = 3e-5
    momentum = 0.9
    l2_reg = 0.25

    l2_loss_weight = 0.25

    model_conf = {
        'backbone': backbone,
        'head_config': {
            'GGU': num_labels,
        },
        'dropout': dropout,
        'l2_reg': l2_reg,
    }

    optimizer_conf = {
        'lr': learning_rate,
        'momentum': momentum
    }

    scheduler_conf = {
        'factor': 0.2,
        'patience': 3,
        'min_lr': 1e-8
    }

    train_run = 1000
    trainer = MultiHeadClassificationTrainer(
        model_conf=model_conf,
        optimizer_conf={**optimizer_conf, 'lr': 1e-4},
        scheduler_conf=scheduler_conf,
        num_epochs=35,
        l2_loss_weight=l2_loss_weight,
        use_lr_scheduler=True,
        train_run=train_run,
        auto_find_batch_size=False
    )

    new_model, history = trainer.train(dataset_name='philipp-zettl/GGU-xx', target_heads=['GGU'])
    metrics = history['metrics']
    history['loss_plot'] = trainer._plot_history(**metrics)
    res = trainer.eval({'GGU': ggu_label_map})
    history['evaluation'] = res['GGU']

    total_history = {
        'GGU': deepcopy(history),
    }

    trainer.classifier.add_head('sentiment', 3)
    trainer.auto_find_batch_size = False
    new_model, history = trainer.train(dataset_name='philipp-zettl/sentiment', target_heads=['sentiment'], sample_key='text', num_epochs=10, lr=1e-4)
    metrics = history['metrics']
    history['loss_plot'] = trainer._plot_history(**metrics)
    res = trainer.eval({'sentiment': sentiment_label_map}, sample_key='text')
    history['evaluation'] = res['sentiment']

    total_history['sentiment'] = deepcopy(history)

    label_maps = {
        'GGU': ggu_label_map,
        'sentiment': sentiment_label_map,
    }

    return new_model, total_history, trainer, label_maps

Evaluation

Evaluation data

For model evaluation, a 20% validation split was used from the training data.

Evaluation procedure

The model was evaluated using the eval method provided by the MultiHeadClassificationTrainer class:

def _eval_model(self, dataloader, label_map, sample_key, label_key):
    self.classifier.train(False)
    eval_heads = list(label_map.keys())
    y_pred = {h: [] for h in eval_heads}
    y_test = {h: [] for h in eval_heads}
    for sample in tqdm(dataloader, total=len(dataloader), desc='Evaluating model...'):
        labels = {name: sample[label_key] for name in eval_heads}
        embeddings = BatchEncoding({k: torch.stack(v, dim=1).to(self.device) for k, v in sample.items() if k not in [label_key, sample_key]}) 
        output = self.classifier(embeddings.to('cuda'), head_names=eval_heads)
        for head in eval_heads:
            y_pred[head].extend(output[head].argmax(dim=1).cpu())
            y_test[head].extend(labels[head])
        torch.cuda.empty_cache()

    accuracies = {h: accuracy_score(y_test[h], y_pred[h]) for h in eval_heads}
    f1_scores = {h: f1_score(y_test[h], y_pred[h], average="macro") for h in eval_heads}
    recalls = {h: recall_score(y_test[h], y_pred[h], average='macro') for h in eval_heads}
    
    report = {}
    for head in eval_heads:
        cm = confusion_matrix(y_test[head], y_pred[head], labels=list(label_map[head].keys()))
        disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=list(label_map[head].values()))
        clf_report = classification_report(
            y_test[head], y_pred[head], output_dict=True, target_names=list(label_map[head].values())
        )
        del clf_report["accuracy"]
        clf_report = pd.DataFrame(clf_report).T.reset_index()
        report[head] = dict(
            clf_report=clf_report, confusion_matrix=disp, metrics={'accuracy': accuracies[head], 'f1': f1_scores[head], 'recall': recalls[head]}
        )
    return report

Metrics

For evaluation, we used the following metrics: accuracy, precision, recall, f1-score. You can find a detailed classification report here:

GGU:

index precision recall f1-score support
0 Greeting 0.904762 0.974359 0.938272 39
1 Gratitude 0.958333 0.851852 0.901961 27
2 Other 1 1 1 39
3 macro avg 0.954365 0.94207 0.946744 105
4 weighted avg 0.953912 0.952381 0.951862 105

sentiment:

index precision recall f1-score support
0 Positive 0.783088 0.861878 0.820596 12851
1 Negative 0.802105 0.819524 0.810721 14229
2 Neutral 0.7874 0.6913 0.736227 13126
3 macro avg 0.790864 0.790901 0.789181 40206
4 weighted avg 0.791226 0.7912 0.789557 40206
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Inference API (serverless) does not yet support torch models for this pipeline type.

Finetuned from

Dataset used to train philipp-zettl/multi-head-sequence-classification-model

Space using philipp-zettl/multi-head-sequence-classification-model 1