Pclanglais commited on
Commit
4662ef1
1 Parent(s): 90cb4a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -34
README.md CHANGED
@@ -15,28 +15,10 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # Pleias-Topic-Detection
17
 
18
- This model is a fine-tuned version of [t5-small](https://huggingface.co/t5-small) on the None dataset.
19
- It achieves the following results on the evaluation set:
20
- - Loss: 2.6792
21
- - Rouge1: 23.9657
22
- - Rouge2: 7.6026
23
- - Rougel: 22.7062
24
- - Rougelsum: 22.7061
25
- - Gen Len: 6.0459
26
 
27
- ## Model description
28
 
29
- More information needed
30
-
31
- ## Intended uses & limitations
32
-
33
- More information needed
34
-
35
- ## Training and evaluation data
36
-
37
- More information needed
38
-
39
- ## Training procedure
40
 
41
  ### Training hyperparameters
42
 
@@ -49,17 +31,3 @@ The following hyperparameters were used during training:
49
  - lr_scheduler_type: linear
50
  - num_epochs: 1
51
  - mixed_precision_training: Native AMP
52
-
53
- ### Training results
54
-
55
- | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
56
- |:-------------:|:-----:|:-----:|:---------------:|:-------:|:------:|:-------:|:---------:|:-------:|
57
- | 2.9647 | 1.0 | 24707 | 2.6792 | 23.9657 | 7.6026 | 22.7062 | 22.7061 | 6.0459 |
58
-
59
-
60
- ### Framework versions
61
-
62
- - Transformers 4.41.1
63
- - Pytorch 2.3.0+cu121
64
- - Datasets 2.19.2
65
- - Tokenizers 0.19.1
 
15
 
16
  # Pleias-Topic-Detection
17
 
18
+ **Pleias-Topic-Detection** is an encoder-decoder specialized for topic detection. Given a document Pleias-Topic-Deduction will return a main topic that can be used for further downstream tasks (annotation, embedding indexation)
 
 
 
 
 
 
 
19
 
20
+ Pleias-Topic-Detection is a finetuned version of t5-small on a set of 70,000 documents and associated topics from Common Corpus. While t5-small has been reportedly only trained in English, the model actually shows unexpected capacities for multilingual annotation. The final corpus include a significant amount of texts in French, Spanish, Italian, Dutch and German and has been proven to work somewhat in all of theses languages.
21
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  ### Training hyperparameters
24
 
 
31
  - lr_scheduler_type: linear
32
  - num_epochs: 1
33
  - mixed_precision_training: Native AMP