Add benchmarking results
Browse files
README.md
CHANGED
@@ -187,6 +187,14 @@ results = process(text, prompt)
|
|
187 |
print(results)
|
188 |
```
|
189 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
190 |
### Future reading
|
191 |
Check our blogpost - ["As GPT4 but for token classification"](https://medium.com/p/9b5a081fbf27), where we highlighted possible use-cases of the model and why next-token prediction is not the only way to achive amazing zero-shot capabilites.
|
192 |
While most of the AI industry is focused on generative AI and decoder-based models, we are committed to developing encoder-based models.
|
|
|
187 |
print(results)
|
188 |
```
|
189 |
|
190 |
+
### Benchmarking
|
191 |
+
Below is a table that highlights the performance of UTC models on the [CrossNER](https://huggingface.co/datasets/DFKI-SLT/cross_ner) dataset. The values represent the Micro F1 scores, with the estimation done at the word level.| Model | AI | Literature | Music | Politics | Science |
|
192 |
+
|----------------------|--------|------------|--------|----------|---------|
|
193 |
+
| UTC-DeBERTa-small | 0.8492 | 0.8792 | 0.864 | 0.9008 | 0.85 |
|
194 |
+
| UTC-DeBERTa-base | 0.8452 | 0.8587 | 0.8711 | 0.9147 | 0.8631 |
|
195 |
+
| UTC-DeBERTa-large | 0.8971 | 0.8978 | 0.9204 | 0.9247 | 0.8779 |
|
196 |
+
|
197 |
+
|
198 |
### Future reading
|
199 |
Check our blogpost - ["As GPT4 but for token classification"](https://medium.com/p/9b5a081fbf27), where we highlighted possible use-cases of the model and why next-token prediction is not the only way to achive amazing zero-shot capabilites.
|
200 |
While most of the AI industry is focused on generative AI and decoder-based models, we are committed to developing encoder-based models.
|