File size: 3,127 Bytes
ce0f3a5 602ce95 21429de cd59df0 602ce95 cd59df0 21429de 31cda2d 21429de ce0f3a5 9be4a91 ce0f3a5 87256ac 7b8f585 ce0f3a5 87256ac ce0f3a5 b8e00de 0b58b5c b8e00de |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
---
license: apache-2.0
---
In this project, we have refined the capabilities of a pre-existing model to assess **the Big Five personality traits** for a given text/sentence. By meticulously fine-tuning this model using a specially curated dataset tailored for personality traits, it has learned to correlate specific textual inputs with distinct personality characteristics. This targeted approach has significantly enhanced the model's precision in identifying the Big Five personality traits from text, outperforming other models that were developed or fine-tuned on more generalized datasets.
The **accuracy** reaches 80%, and **F1 score** is 79%. Both are much higher than the similar personality-detection models hosted in huggingface. In other words, our model remarkably outperforms other models.
Due to the fact that the output values are continuous, it is better to use mean squared errors (MSE) or mean absolute error (MAE) to evaluate the model's performance.
When both metrics are smaller, it indciates that the model performs better. Our models performance: **MSE: 0.07**, **MAE: 0.14**.
Please **cite**:
```
article{wang2024personality,
title={Continuous Output Personality Detection Models via Mixed Strategy Training},
author={Rong Wang, Kun Sun},
year={2024},
journal={ArXiv},
url={https://arxiv.org/abs/2406.16223}
}
```
The project of predicting human cognition and emotion, and training details are available at: https://github.com/fivehills/detecting_personality
The following provides the code to implement the task of detecting personality from an input text.
```python
#import packages
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model = AutoModelForSequenceClassification.from_pretrained("KevSun/Personality_LM")
tokenizer = AutoTokenizer.from_pretrained("KevSun/Personality_LM")
# Example new text input
#new_text = "I really enjoy working on complex problems and collaborating with others."
file_path = 'path/to/your/textfile.txt'
with open(file_path, 'r', encoding='utf-8') as file:
new_text = file.read()
# Encode the text using the same tokenizer used during training
encoded_input = tokenizer(new_text, return_tensors='pt', padding=True, truncation=True, max_length=64)
# Move the model to the correct device (CPU in this case, or GPU if available)
#model.eval() # Set the model to evaluation mode
# Perform the prediction
with torch.no_grad():
outputs = model(**encoded_input)
# Get the predictions (the output here depends on whether you are doing regression or classification)
predictions = outputs.logits.squeeze()
# Assuming the model is a regression model and outputs raw scores
predicted_scores = predictions.numpy() # Convert to numpy array if necessary
trait_names = ["Agreeableness", "Openness", "Conscientiousness", "Extraversion", "Neuroticism"]
# Print the predicted personality traits scores
for trait, score in zip(trait_names, predicted_scores):
print(f"{trait}: {score:.4f}")
##"output": "agreeableness: 0.46; openness: 0.27; conscientiousness: 0.31; extraversion: 0.1; neuroticism: 0.84"
``` |