TransformationTransformer

TransformationTransformer is a fine-tuned distilroberta model. It is trained and evaluated on 10,000 manually annotated sentences gleaned from the Q&A-section of quarterly earnings conference calls. In particular, it was trained on sentences issued by firm executives to discriminate between setnences that allude to business transformation vis-à-vis those that discuss topics other than business transformations. More details about the training procedure can be found below.

Background

Context on the project.

Usage

The model is intented to be used for sentence classification: It creates a contextual text representation from the input sentence and outputs a probability value. LABEL_1 refers to a sentence that is predicted to contains transformation-related content (vice versa for LABEL_0). The query should consist of a single sentence.

Usage (API)

import json
import requests

API_TOKEN = <TOKEN>

headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/simonschoe/call2vec"

def query(payload):
    data = json.dumps(payload)
    response = requests.request("POST", API_URL, headers=headers, data=data)
    return json.loads(response.content.decode("utf-8"))

query({"inputs": "<insert-sentence-here>"})

Usage (transformers)

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("simonschoe/TransformationTransformer")
model = AutoModelForSequenceClassification.from_pretrained("simonschoe/TransformationTransformer")

classifier = pipeline('text-classification',  model=model, tokenizer=tokenizer)
classifier('<insert-sentence-here>')

Model Training

The model has been trained on text data stemming from earnings call transcripts. The data is restricted to a call's question-and-answer (Q&A) section and the remarks by firm executives. The data has been segmented into individual sentences using spacy.

Statistics of Training Data:

Labeled sentences: 10,000
Data distribution: xxx
Inter-coder agreement: xxx

The following code snippets presents the training pipeline: