Update README.md
Browse files
README.md
CHANGED
@@ -24,7 +24,64 @@ The project of predicting human cognition and emotion, and training details are
|
|
24 |
|
25 |
You can obtain the personality scores for an input text in the App **[KevSun/Personality_Test]**(https://huggingface.co/spaces/KevSun/Personality_Test).
|
26 |
|
27 |
-
The following provides the code to implement the task of detecting personality from an input text.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
```python
|
30 |
# install these packages before importing them (transformers, PyTorch)
|
|
|
24 |
|
25 |
You can obtain the personality scores for an input text in the App **[KevSun/Personality_Test]**(https://huggingface.co/spaces/KevSun/Personality_Test).
|
26 |
|
27 |
+
The following provides the code to implement the task of detecting personality from an input text. However, there are two cases:
|
28 |
+
|
29 |
+
The first case doesn't apply softmax and instead outputs the raw logits from the model.
|
30 |
+
It uses **shorter, simpler sentences**. The predicted_scores here are raw logits, which can be any real number and don't sum to 1.
|
31 |
+
|
32 |
+
|
33 |
+
```python
|
34 |
+
# install these packages before importing them (transformers, PyTorch)
|
35 |
+
|
36 |
+
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
37 |
+
import torch
|
38 |
+
|
39 |
+
model = AutoModelForSequenceClassification.from_pretrained("KevSun/Personality_LM")
|
40 |
+
tokenizer = AutoTokenizer.from_pretrained("KevSun/Personality_LM")
|
41 |
+
|
42 |
+
# Choose between direct text input or file input
|
43 |
+
use_file = False # Set to True if you want to read from a file
|
44 |
+
|
45 |
+
if use_file:
|
46 |
+
file_path = 'path/to/your/textfile.txt' # Replace with your file path
|
47 |
+
with open(file_path, 'r', encoding='utf-8') as file:
|
48 |
+
new_text = file.read()
|
49 |
+
else:
|
50 |
+
new_text = "I really enjoy working on complex problems and collaborating with others."
|
51 |
+
|
52 |
+
# Encode the text using the same tokenizer used during training
|
53 |
+
encoded_input = tokenizer(new_text, return_tensors='pt', padding=True, truncation=True, max_length=64)
|
54 |
+
|
55 |
+
model.eval() # Set the model to evaluation mode
|
56 |
+
|
57 |
+
# Perform the prediction
|
58 |
+
with torch.no_grad():
|
59 |
+
outputs = model(**encoded_input)
|
60 |
+
|
61 |
+
|
62 |
+
predictions = outputs.logits.squeeze()
|
63 |
+
|
64 |
+
# Convert to numpy array if necessary
|
65 |
+
predicted_scores = predictions.numpy()
|
66 |
+
|
67 |
+
|
68 |
+
trait_names = ["Agreeableness", "Openness", "Conscientiousness", "Extraversion", "Neuroticism"]
|
69 |
+
|
70 |
+
|
71 |
+
for trait, score in zip(trait_names, predicted_scores):
|
72 |
+
print(f"{trait}: {score:.4f}")
|
73 |
+
|
74 |
+
##"output":
|
75 |
+
#Agreeableness: 0.3965
|
76 |
+
#Openness: 0.6714
|
77 |
+
#Conscientiousness: 0.3283
|
78 |
+
#Extraversion: 0.0026
|
79 |
+
#Neuroticism: 0.4645
|
80 |
+
|
81 |
+
```
|
82 |
+
|
83 |
+
The second case applies softmax to the model outputs, which normalizes the scores into probabilities that sum to 1. It's using longer, more complex sentences and is likely to show more variation in the outputs.
|
84 |
+
The predicted_scores here are probabilities between 0 and 1, and their sum will be 1.
|
85 |
|
86 |
```python
|
87 |
# install these packages before importing them (transformers, PyTorch)
|