Pclanglais commited on
Commit
74a9351
1 Parent(s): 478d995

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -26
README.md CHANGED
@@ -1,34 +1,17 @@
1
  ---
2
  license: apache-2.0
3
  base_model: t5-small
4
- tags:
5
- - generated_from_trainer
6
- metrics:
7
- - rouge
8
- model-index:
9
- - name: t5-small-common-corpus-topic-simple-batch
10
- results: []
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
 
16
- # Pleias-Topic-Detection
17
 
18
- **Pleias-Topic-Detection** is an encoder-decoder specialized for topic detection. Given a document Pleias-Topic-Deduction will return a main topic that can be used for further downstream tasks (annotation, embedding indexation)
19
 
20
- Pleias-Topic-Detection is a finetuned version of t5-small on a set of 70,000 documents and associated topics from Common Corpus. While t5-small has been reportedly only trained in English, the model actually shows unexpected capacities for multilingual annotation. The final corpus include a significant amount of texts in French, Spanish, Italian, Dutch and German and has been proven to work somewhat in all of theses languages.
21
-
22
- Given that Pleias-Topic-Detection is a relatively lightweight model (70 million parameters) it can be used for classification at scale on a large corpus.
23
-
24
- ### Training hyperparameters
25
-
26
- The following hyperparameters were used during training:
27
- - learning_rate: 2e-05
28
- - train_batch_size: 1
29
- - eval_batch_size: 1
30
- - seed: 42
31
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
32
- - lr_scheduler_type: linear
33
- - num_epochs: 1
34
- - mixed_precision_training: Native AMP
 
1
  ---
2
  license: apache-2.0
3
  base_model: t5-small
4
+ language:
5
+ - en
6
+ - fr
7
+ - de
8
+ - es
 
 
9
  ---
10
 
11
+ **Topical** is a small language model specialized for topic extraction. Given a document Pleias-Topic-Deduction will return a main topic that can be used for further downstream tasks (annotation, embedding indexation)
 
12
 
13
+ Like other model from PleIAs Bad Data Toolbox, Topical has been volontarily trained on 70,000 documents extracted from Common Corpus with a various range of digitization artifact.
14
 
15
+ Topical is a lightweight model (70 million parameters) tha can be especially used for classification at scale on a large corpus.
16
 
17
+ ## Example