julien-c HF staff commited on
Commit
23a065c
1 Parent(s): c2ca5dd

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/neurocode/IsRoBERTa/README.md

Files changed (1) hide show
  1. README.md +74 -0
README.md ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: is
3
+ datasets:
4
+ - Icelandic portion of the OSCAR corpus from INRIA
5
+ - oscar
6
+ ---
7
+
8
+ # IsRoBERTa a RoBERTa-like masked language model
9
+
10
+ Probably the first icelandic transformer language model!
11
+
12
+ ## Overview
13
+ **Language:** Icelandic
14
+ **Downstream-task:** masked-lm
15
+ **Training data:** OSCAR corpus
16
+ **Code:** See [here](https://github.com/neurocode-io/icelandic-language-model)
17
+ **Infrastructure**: 1x Nvidia K80
18
+
19
+ ## Hyperparameters
20
+
21
+ ```
22
+ per_device_train_batch_size = 48
23
+ n_epochs = 1
24
+ vocab_size = 52.000
25
+ max_position_embeddings = 514
26
+ num_attention_heads = 12
27
+ num_hidden_layers = 6
28
+ type_vocab_size = 1
29
+ learning_rate=0.00005
30
+ ```
31
+
32
+
33
+ ## Usage
34
+
35
+ ### In Transformers
36
+ ```python
37
+ from transformers import (
38
+ pipeline,
39
+ AutoTokenizer,
40
+ AutoModelWithLMHead
41
+ )
42
+
43
+ model_name = "neurocode/IsRoBERTa"
44
+
45
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
46
+ model = AutoModelWithLMHead.from_pretrained(model_name)
47
+ >>> fill_mask = pipeline(
48
+ ... "fill-mask",
49
+ ... model=model,
50
+ ... tokenizer=tokenizer
51
+ ... )
52
+ >>> result = fill_mask("Hann fór út að <mask>.")
53
+ >>> result
54
+ [
55
+ {'sequence': '<s>Hann fór út að nýju.</s>', 'score': 0.03395755589008331, 'token': 2219, 'token_str': 'Ġnýju'},
56
+ {'sequence': '<s>Hann fór út að undanförnu.</s>', 'score': 0.029087543487548828, 'token': 7590, 'token_str': 'Ġundanförnu'},
57
+ {'sequence': '<s>Hann fór út að lokum.</s>', 'score': 0.024420788511633873, 'token': 4384, 'token_str': 'Ġlokum'},
58
+ {'sequence': '<s>Hann fór út að þessu.</s>', 'score': 0.021231256425380707, 'token': 921, 'token_str': 'Ġþessu'},
59
+ {'sequence': '<s>Hann fór út að honum.</s>', 'score': 0.0205782949924469, 'token': 1136, 'token_str': 'Ġhonum'}
60
+ ]
61
+ ```
62
+
63
+
64
+ ## Authors
65
+ Bobby Donchev: `contact [at] donchev.is`
66
+ Elena Cramer: `elena.cramer [at] neurocode.io`
67
+
68
+ ## About us
69
+
70
+ We bring AI software for our customers live
71
+ Our focus: AI software development
72
+
73
+ Get in touch:
74
+ [LinkedIn](https://de.linkedin.com/company/neurocodeio) | [Website](https://neurocode.io)