Edit model card

Model Card for JARVIS Email Sorter Training Module

The JARVIS Email Sorter Training Module is an advanced machine learning model designed to automate the sorting of emails into predefined categories, thus enhancing email management efficiency and productivity. Developed as part of the JARVIS ecosystem, this model leverages the power of the "distilbert-base-uncased" architecture, fine-tuned on the "Enron Labeled Emails with Subjects" dataset available on Hugging Face.

Model Details

Model Description

  • Developed by: JARVIS Team
  • Model type: Text Classification
  • Language(s) (NLP): English
  • License: MIT
  • Finetuned from model: distilbert-base-uncased

Model Sources

Uses

Direct Use

The JARVIS Email Sorter is designed for direct integration into email applications, providing automatic classification of emails into categories such as company business, personal communications, logistics, and more.

Here are the assignments:

categories:

'Company Business/Strategy':'Label_0',

'Purely Personal':'Label_1',

'Personal but in a professional context':'Label_2',

'Logistic Arrangements':'Label_3',

'Employment arrangements':'Label_4',

'Document editing/checking/collaboration':'Label_5',

'Empty message (due to missing attachment)':'Label_6',

'Empty message':'Label_7'

Downstream Use

This model can be further adapted or integrated into larger systems requiring email classification or management functionalities.

Out-of-Scope Use

The model is not intended for applications beyond email classification, such as text generation or sentiment analysis.

Bias, Risks, and Limitations

The model's performance and fairness may vary across different types of email content, potentially reflecting biases present in the training data. Users are advised to be cautious of these limitations and to consider additional validation when deploying the model in diverse or sensitive contexts.

Recommendations

It's recommended to continuously monitor and evaluate the model's performance across varied datasets to ensure its accuracy and fairness. Further fine-tuning may be required to address any specific biases or limitations identified.

How to Get Started with the Model

To get started with the JARVIS Email Sorter, developers can integrate the model into their applications using the Hugging Face Transformers library. The model has been trained and evaluated using a Python script that includes data preprocessing, model training, evaluation, and saving the trained model for future use.

Training Details

Training Data

The model was trained on the "Enron Labeled Emails with Subjects" dataset, encompassing a wide range of real-world corporate email communications categorized for training purposes.

Training Procedure

The training process involved preprocessing emails to combine subjects and bodies, tokenizing the texts, and then training the "distilbert-base-uncased" model on the processed dataset. The model was evaluated using precision, recall, and F1 score metrics to ensure its effectiveness in classifying emails accurately.

Evaluation

The model's performance was assessed on a separate test dataset, demonstrating its capability to classify emails with reasonable accuracy (F1>0.6). The evaluation process involved computing precision, recall, and F1 scores to gauge the model's effectiveness across different categories.

Technical Specifications

The JARVIS Email Sorter is based on the DistilBERT architecture, chosen for its balance between performance and efficiency. The model was trained and evaluated using the Hugging Face Transformers library, with additional custom Python scripting for data preprocessing and evaluation.

More Information

For more information on integrating and utilizing the JARVIS Email Sorter Training Module, developers are encouraged to review the detailed Python code provided, which outlines the model's training and evaluation process.

Model Card Contact

For further inquiries or support regarding the JARVIS Email Sorter Training Module, please contact the JARVIS Team.

Downloads last month
52
Safetensors
Model size
67M params
Tensor type
F32
·

Dataset used to train emarron/JARVIS-email-sorter