AdamCodd commited on
Commit
5c3c8a8
·
1 Parent(s): 89bc57a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -0
README.md ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ metrics:
3
+ - mse
4
+ - r_squared
5
+ - mae
6
+ ---
7
+ ## DistilRoBERTa-query-wellformedness
8
+
9
+ This model utilizes the [Distilroberta base](https://huggingface.co/distilroberta-base) architecture, which has been fine-tuned for a regression task on the [Google's query wellformedness](https://huggingface.co/datasets/google_wellformed_query) dataset encompassing 25,100 queries from the Paralex corpus. Each query received annotations from five raters, who provided a continuous rating indicating the degree to which the query is well-formed.
10
+
11
+ ## Model description
12
+
13
+ The model evaluates the query for completeness and grammatical correctness, providing a score between 0 and 1, where 1 indicates correctness.
14
+
15
+ ## Usage
16
+
17
+ ```
18
+ # Sentences
19
+ sentences = [
20
+ "The cat and dog in the yard.", # Incorrect - It should be "The cat and dog are in the yard."
21
+ "she don't like apples.", # Incorrect - It should be "She doesn't like apples."
22
+ "Is rain sunny days sometimes?", # Incorrect - It should be "Do sunny days sometimes have rain?"
23
+ "She enjoys reading books and playing chess.", # Correct
24
+ "How many planets are there in our solar system?" # Correct
25
+ ]
26
+
27
+ # Tokenizing the sentences
28
+ inputs = tokenizer(sentences, truncation=True, padding=True, return_tensors='pt')
29
+
30
+ # Getting the model's predictions
31
+ with torch.no_grad(): # Disabling gradient calculation as we are only doing inference
32
+ model.eval() # Setting the model to evaluation mode
33
+ predicted_ratings = model(
34
+ input_ids=inputs['input_ids'],
35
+ attention_mask=inputs['attention_mask']
36
+ )
37
+
38
+ # The predicted_ratings is a tensor, so we'll convert it to a list of standard Python numbers
39
+ predicted_ratings = predicted_ratings.squeeze().tolist()
40
+
41
+ # Printing the predicted ratings
42
+ for i, rating in enumerate(predicted_ratings):
43
+ print(f'Sentence: {sentences[i]}')
44
+ print(f'Predicted Rating: {rating}\n')
45
+ ```
46
+ Output:
47
+ ```
48
+ Sentence: The cat and dog in the yard.
49
+ Predicted Rating: 0.3482873737812042
50
+
51
+ Sentence: she don't like apples.
52
+ Predicted Rating: 0.07787154614925385
53
+
54
+ Sentence: Is rain sunny days sometimes?
55
+ Predicted Rating: 0.19854165613651276
56
+
57
+ Sentence: She enjoys reading books and playing chess.
58
+ Predicted Rating: 0.9327691793441772
59
+
60
+ Sentence: How many planets are there in our solar system?
61
+ Predicted Rating: 0.9746372103691101
62
+ ```
63
+
64
+ ## Training and evaluation data
65
+
66
+ More information needed
67
+
68
+ ## Training procedure
69
+
70
+ ### Training hyperparameters
71
+
72
+ The following hyperparameters were used during training:
73
+ - learning_rate: 2e-05
74
+ - train_batch_size: 16
75
+ - eval_batch_size: 16
76
+ - seed: 42
77
+ - optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
78
+ - lr_scheduler_type: linear
79
+ - lr_scheduler_warmup_steps: 450
80
+ - num_epochs: 5
81
+
82
+ ### Training results
83
+
84
+ Metrics: Mean Squared Error, R-Squared, Mean Absolute Error
85
+
86
+ ```
87
+ 'test_loss': 0.06214376166462898,
88
+ 'test_mse': 0.06214376166462898,
89
+ 'test_r2': 0.5705611109733582,
90
+ 'test_mae': 0.1838676631450653
91
+ ```
92
+
93
+ ### Framework versions
94
+
95
+ - Transformers 4.34.1
96
+ - Pytorch lightning 2.1.0
97
+ - Tokenizers 0.14.1
98
+
99
+ If you want to support me, you can [here](https://ko-fi.com/adamcodd).