T-FREX BERT base model (uncased)


Please cite this research as:

Q. Motger, A. Miaschi, F. Dell’Orletta, X. Franch, and J. Marco, ‘T-FREX: A Transformer-based Feature Extraction Method from Mobile App Reviews’, in Proceedings of The IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2024. Pre-print available at: https://arxiv.org/abs/2401.03833


T-FREX is a transformer-based feature extraction method for mobile app reviews based on fine-tuning Large Language Models (LLMs) for a named entity recognition task. We collect a dataset of ground truth features from users in a real crowdsourced software recommendation platform, and we use this dataset to fine-tune multiple LLMs under different data configurations. We assess the performance of T-FREX with respect to this ground truth, and we complement our analysis by comparing T-FREX with a baseline method from the field. Finally, we assess the quality of new features predicted by T-FREX through an external human evaluation. Results show that T-FREX outperforms on average the traditional syntactic-based method, especially when discovering new features from a domain for which the model has been fine-tuned.

Source code for data generation, fine-tuning and model inference are available in the original GitHub repository.

Model description

This version of T-FREX has been fine-tuned for token classification from BERT base model (uncased).

Model variations

T-FREX includes a set of released, fine-tuned models which are compared in the original study (pre-print available at http://arxiv.org/abs/2401.03833).

How to use

Below are code snippets to demonstrate how to use the T-FREX BERT base model (uncased) for named entity recognition on app reviews:


from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

# Load the pre-trained model and tokenizer
model_name = "quim-motger/t-frex-bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

# Create a pipeline for named entity recognition
ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)

# Example text
text = "The share note file feature is completely useless."

# Perform named entity recognition
entities = ner_pipeline(text)

# Print the recognized entities
for entity in entities:
    print(f"Entity: {entity['word']}, Label: {entity['entity']}, Score: {entity['score']:.4f}")

# Example with multiple texts
texts = [
    "Great app I've tested a lot of free habit tracking apps and this is by far my favorite.",
    "The only negative feedback I can give about this app is the difficulty level to set a sleep timer on it."
]

# Perform named entity recognition on multiple texts
for text in texts:
    entities = ner_pipeline(text)
    print(f"Text: {text}")
    for entity in entities:
        print(f"  Entity: {entity['word']}, Label: {entity['entity']}, Score: {entity['score']:.4f}")
Downloads last month
22
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including quim-motger/t-frex-bert-base-uncased