tlemberger commited on
Commit
6c42a83
1 Parent(s): 6bbac4f

so many typos

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -18,7 +18,7 @@ metrics:
18
 
19
  This model is a [RoBERTa base model](https://huggingface.co/roberta-base) that was further trained using a masked language modeling task on a compendium of english scientific textual examples from the life sciences using the [BioLang dataset](https://huggingface.co/datasets/EMBO/biolang). It was then fine-tuned for token classification on the SourceData [sd-figures](https://huggingface.co/datasets/EMBO/sd-figures) dataset with the `PANELIZATION` task to perform 'parsing' or 'segmentation' of figure legends into fragments corresponding to sub-panels.
20
 
21
- Figures are usually composite representations of results obtained with heterogenous experimental approaches and systems. Breaking figures into panels allows to identify more coherent descriptions of individual scientific experiments.
22
 
23
  ## Intended uses & limitations
24
 
@@ -44,15 +44,15 @@ The model must be used with the `roberta-base` tokenizer.
44
 
45
  ## Training data
46
 
47
- The model was trained for token classification using the [EMBO/sd-figures `PANELIZATION`](https://huggingface.co/datasets/EMBO/sd-panels) dataset wich includes manually annotated examples.
48
 
49
  ## Training procedure
50
 
51
- The training was run on a NVIDIA DGX Station with 4XTesla V100 GPUs.
52
 
53
  Training code is available at https://github.com/source-data/soda-roberta
54
 
55
- - Model fine-tuned: EMMBO/bio-lm
56
  - Tokenizer vocab size: 50265
57
  - Training data: EMBO/sd-figures
58
  - Dataset configuration: PANELIZATION
 
18
 
19
  This model is a [RoBERTa base model](https://huggingface.co/roberta-base) that was further trained using a masked language modeling task on a compendium of english scientific textual examples from the life sciences using the [BioLang dataset](https://huggingface.co/datasets/EMBO/biolang). It was then fine-tuned for token classification on the SourceData [sd-figures](https://huggingface.co/datasets/EMBO/sd-figures) dataset with the `PANELIZATION` task to perform 'parsing' or 'segmentation' of figure legends into fragments corresponding to sub-panels.
20
 
21
+ Figures are usually composite representations of results obtained with heterogeneous experimental approaches and systems. Breaking figures into panels allows identifying more coherent descriptions of individual scientific experiments.
22
 
23
  ## Intended uses & limitations
24
 
 
44
 
45
  ## Training data
46
 
47
+ The model was trained for token classification using the [`EMBO/sd-figures PANELIZATION`](https://huggingface.co/datasets/EMBO/sd-figures) dataset which includes manually annotated examples.
48
 
49
  ## Training procedure
50
 
51
+ The training was run on an NVIDIA DGX Station with 4XTesla V100 GPUs.
52
 
53
  Training code is available at https://github.com/source-data/soda-roberta
54
 
55
+ - Model fine-tuned: EMBO/bio-lm
56
  - Tokenizer vocab size: 50265
57
  - Training data: EMBO/sd-figures
58
  - Dataset configuration: PANELIZATION