Albert-xxl-v1 with SQuAD 2.0 sentences

Objective

The model was trained as cross-encoder classification model with the objective to re-rank the results in a QA pipline.

How to use

from transformers import AutoModelForSequenceClassification, AutoTokenizer


model = AutoModelForSequenceClassification.from_pretrained("apohllo/albert-xxl-squad-sentences", num_labels=2)
tokenizer = AutoTokenizer.from_pretrained("apohllo/albert-xxl-squad-sentences")

from transformers import pipeline

# Add device=0 if you want to use GPU!
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer, batch_size=16) #, device=0)

sentences = [...]    # some sentences to be re-ranked, wrt to the question
question = "..."     # a question to be asked against the sentences

samples = [{"text": s, "text_pair": question} for s in sentences]
results = classifier(samples)
    
results = [(idx, r["score"]) if r["label"] == 'LABEL_1' else (idx, 1 - r["score"]) 
            for idx, r in enumerate(results)]

top_k = 5
keys_values = sorted(results, key=lambda e: -e[1])[:top_k]

Data

The data from SQuAD 2.0 was sentence-split. The question + the sentence containing the answer was a positive example. The question + the remaining sentence from the same Wikipedia passege were treated as hard negative examples.

The table balow reports the classification results on the validation set.

Results

accuracy F1
ALBERT-xxlarge 97.05 84.14
Downloads last month
170
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support sentence-similarity models for transformers library.

Dataset used to train apohllo/albert-xxl-squad-sentences