File size: 7,788 Bytes
ce0f3a5 602ce95 21429de cd59df0 602ce95 cd59df0 21429de 31cda2d 21429de 8224ccc e4e6dfd c597078 ce0f3a5 9be4a91 36ce1a8 76d6603 ce0f3a5 36ce1a8 87256ac 7b8f585 ce0f3a5 36ce1a8 76d6603 ce0f3a5 36ce1a8 ce0f3a5 17e5236 36ce1a8 ce0f3a5 875614e ce0f3a5 b8e00de 36ce1a8 76d6603 36ce1a8 875614e 36ce1a8 809dc9a 36ce1a8 809dc9a 36ce1a8 809dc9a 36ce1a8 875614e 36ce1a8 875614e 36ce1a8 b8e00de |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
---
license: apache-2.0
---
In this project, we have refined the capabilities of a pre-existing model to assess **the Big Five personality traits** for a given text/sentence. By meticulously fine-tuning this model using a specially curated dataset tailored for personality traits, it has learned to correlate specific textual inputs with distinct personality characteristics. This targeted approach has significantly enhanced the model's precision in identifying the Big Five personality traits from text, outperforming other models that were developed or fine-tuned on more generalized datasets.
The **accuracy** reaches 80%, and **F1 score** is 79%. Both are much higher than the similar personality-detection models hosted in huggingface. In other words, our model remarkably outperforms other models.
Due to the fact that the output values are continuous, it is better to use mean squared errors (MSE) or mean absolute error (MAE) to evaluate the model's performance.
When both metrics are smaller, it indciates that the model performs better. Our models performance: **MSE: 0.07**, **MAE: 0.14**.
Please **cite**:
```
article{wang2024personality,
title={Continuous Output Personality Detection Models via Mixed Strategy Training},
author={Rong Wang, Kun Sun},
year={2024},
journal={ArXiv},
url={https://arxiv.org/abs/2406.16223}
}
```
The project of predicting human cognition and emotion, and training details are available at: https://github.com/fivehills/detecting_personality
You can obtain the personality scores for an input text in the App **[KevSun/Personality_Test]**(https://huggingface.co/spaces/KevSun/Personality_Test).
The following provides the code to implement the task of detecting personality from an input text. However, there are two cases:
The first case doesn't apply softmax and instead outputs the raw logits from the model.
It uses **shorter, simpler sentences**. The predicted_scores here are raw logits, which can be any real number and don't sum to 1.
```python
# install these packages before importing them (transformers, PyTorch)
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model = AutoModelForSequenceClassification.from_pretrained("KevSun/Personality_LM")
tokenizer = AutoTokenizer.from_pretrained("KevSun/Personality_LM")
# Choose between direct text input or file input
use_file = False # Set to True if you want to read from a file
if use_file:
file_path = 'path/to/your/textfile.txt' # Replace with your file path
with open(file_path, 'r', encoding='utf-8') as file:
new_text = file.read()
else:
new_text = "I really enjoy working on complex problems and collaborating with others."
# Encode the text using the same tokenizer used during training
encoded_input = tokenizer(new_text, return_tensors='pt', padding=True, truncation=True, max_length=64)
model.eval() # Set the model to evaluation mode
# Perform the prediction
with torch.no_grad():
outputs = model(**encoded_input)
predictions = outputs.logits.squeeze()
# Convert to numpy array if necessary
predicted_scores = predictions.numpy()
trait_names = ["Agreeableness", "Openness", "Conscientiousness", "Extraversion", "Neuroticism"]
for trait, score in zip(trait_names, predicted_scores):
print(f"{trait}: {score:.4f}")
##"output":
#Agreeableness: 0.3965
#Openness: 0.6714
#Conscientiousness: 0.3283
#Extraversion: 0.0026
#Neuroticism: 0.4645
```
The second case applies softmax to the model outputs, which normalizes the scores into probabilities that sum to 1. It's using longer, more complex sentences and is likely to show more variation in the outputs.
The predicted_scores here are probabilities between 0 and 1, and their sum will be 1.
```python
# install these packages before importing them (transformers, PyTorch)
# install these packages before importing them (transformers, PyTorch)
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model = AutoModelForSequenceClassification.from_pretrained("KevSun/Personality_LM")
tokenizer = AutoTokenizer.from_pretrained("KevSun/Personality_LM")
# Choose between direct text input or file input
use_file = False # Set to True if you want to read from a file
if use_file:
file_path = 'path/to/your/textfile.txt' # Replace with your file path
with open(file_path, 'r', encoding='utf-8') as file:
new_text = file.read()
else:
new_text = "President Joe Biden said on Wednesday he pulled out of the race against Republican Donald Trump over concerns about the future of U.S. democracy, explaining he was stepping aside to allow a new generation to take over in his first public remarks since ending his re-election bid. In an Oval Office address, Biden invoked previous presidents Thomas Jefferson, George Washington, and Abraham Lincoln as he described his love for the office that he will leave in six months, capping a half century in public office."
# Encode the text using the same tokenizer used during training
encoded_input = tokenizer(new_text, return_tensors='pt', padding=True, truncation=True, max_length=64)
model.eval() # Set the model to evaluation mode
# Perform the prediction
with torch.no_grad():
outputs = model(**encoded_input)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_scores = predictions[0].tolist()
trait_names = ["Agreeableness", "Openness", "Conscientiousness", "Extraversion", "Neuroticism"]
for trait, score in zip(trait_names, predicted_scores):
print(f"{trait}: {score:.4f}")
##"output":
#Agreeableness: 0.1982
#Openness: 0.2678
#Conscientiousness: 0.1857
#Extraversion: 0.1346
#Neuroticism: 0.2137
```
**Alternatively**, you can use the following code to make inference based on the **bash** terminal.
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
import argparse
def load_model_and_tokenizer(model_name):
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
return model, tokenizer
def process_input(input_text, tokenizer, max_length=64):
return tokenizer(input_text, return_tensors='pt', padding=True, truncation=True, max_length=max_length)
def predict_personality(model, encoded_input):
model.eval() # Set the model to evaluation mode
with torch.no_grad():
outputs = model(**encoded_input)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
return predictions[0].tolist()
def print_predictions(predictions, trait_names):
for trait, score in zip(trait_names, predictions):
print(f"{trait}: {score:.4f}")
def main():
parser = argparse.ArgumentParser(description="Predict personality traits from text.")
parser.add_argument("--input", type=str, required=True, help="Input text or path to text file")
parser.add_argument("--model", type=str, default="KevSun/Personality_LM", help="Model name or path")
args = parser.parse_args()
model, tokenizer = load_model_and_tokenizer(args.model)
# Check if input is a file path or direct text
if args.input.endswith('.txt'):
with open(args.input, 'r', encoding='utf-8') as file:
input_text = file.read()
else:
input_text = args.input
encoded_input = process_input(input_text, tokenizer)
predictions = predict_personality(model, encoded_input)
trait_names = ["Agreeableness", "Openness", "Conscientiousness", "Extraversion", "Neuroticism"]
print_predictions(predictions, trait_names)
if __name__ == "__main__":
main()
```
```bash
python script_name.py --input "Your text here"
```
or
```bash
python script_name.py --input path/to/your/textfile.txt
``` |