Update README.md
Browse files
README.md
CHANGED
@@ -108,15 +108,21 @@ response = get_completions(
|
|
108 |
print(f"{response = }")
|
109 |
```
|
110 |
|
111 |
-
The `
|
112 |
|
113 |
### Evaluation
|
114 |
|
115 |
-
|
|
|
116 |
|
117 |
### Benchmark Results
|
118 |
|
119 |
-
|
|
|
|
|
|
|
|
|
|
|
120 |
|
121 |
## Usage and Limitations
|
122 |
|
|
|
108 |
print(f"{response = }")
|
109 |
```
|
110 |
|
111 |
+
The `gemma2_inference_hf.py` module is provided for downloaded with the model files.
|
112 |
|
113 |
### Evaluation
|
114 |
|
115 |
+
Model evaluation metrics and results on test dataset containing 3k samples. Note: The test dataset is purposely withheld due to the
|
116 |
+
nature and sensitivity of the messages.
|
117 |
|
118 |
### Benchmark Results
|
119 |
|
120 |
+
The finetuned Gemma-2 model was evaluated against the ese models were evaluated against GPT-3.5-Turbo, GPT-4o-mini, and GPT-4o models:
|
121 |
+
|
122 |
+
| Metric | gemma-2-2b-it-ud | GPT-3.5-Turbo-1106 | GPT-4o-mini-2024-07-18 | GPT-4o-2024-08-06 |
|
123 |
+
| ------ | ---------------- | ------------------ | ---------------------- | ----------------- |
|
124 |
+
| Accuracy | 0.87 | 0.83 | 0.90 | 0.92 |
|
125 |
+
| AUC | 0.84 | 0.83 | 0.91 | 0.92 |
|
126 |
|
127 |
## Usage and Limitations
|
128 |
|