Narrativa commited on
Commit
c9a4e07
1 Parent(s): 0e532b0

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -0
README.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: es
3
+ tags:
4
+ - Spanish
5
+ - BART
6
+ - biology
7
+ - medical
8
+ - seq2seq
9
+ license: mit
10
+ ---
11
+
12
+ ## BIOMEDtra
13
+
14
+ **BIOMEDtra** (small) is an Electra like model (discriminator in this case) trained on
15
+
16
+
17
+ ## 🦠 NarbioBART 🏥
18
+
19
+ **NarbioBART** (base) is a BART-like model trained on [Spanish Biomedical Crawled Corpus](https://zenodo.org/record/5510033#.Yhdk1ZHMLJx).
20
+
21
+ BART is a transformer *encoder-decoder* (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. BART is pre-trained by (1) corrupting text with an arbitrary noising function and (2) learning a model to reconstruct the original text.
22
+
23
+ This model is particularly effective when fine-tuned for text generation tasks (e.g., summarization, translation) but also works well for comprehension tasks (e.g., text classification, question answering).
24
+
25
+
26
+
27
+ ## Training details
28
+
29
+ - Dataset: `Spanish Biomedical Crawled Corpus` - 90% for training / 10% for validation.
30
+ - Training script: see [here](https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/run_bart_dlm_flax.py)
31
+
32
+
33
+ ## [Evaluation metrics](https://huggingface.co/mrm8488/bart-bio-base-es/tensorboard?params=scalars#frame) 🧾
34
+
35
+ |Metric | # Value |
36
+ |-------|---------|
37
+ |Accuracy| 0.802|
38
+ |Loss| 1.04|
39
+
40
+
41
+ ## Benchmarks 🔨
42
+
43
+ WIP 🚧
44
+
45
+ ## How to use with `transformers`
46
+
47
+ ```py
48
+ from transformers import BartForConditionalGeneration, BartTokenizer
49
+
50
+ model_id = "Narrativa/NarbioBART"
51
+
52
+ model = BartForConditionalGeneration.from_pretrained(model_id, forced_bos_token_id=0)
53
+ tokenizer = BartTokenizer.from_pretrained(model_id)
54
+
55
+ def fill_mask_span(text):
56
+ batch = tokenizer(text, return_tensors="pt")
57
+ generated_ids = model.generate(batch["input_ids"])
58
+ print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True))
59
+
60
+ text = "your text with a <mask> token."
61
+ fill_mask_span(text)
62
+ ```
63
+
64
+ ## Citation