PyTorch
Serbian
Croatian
xlm-roberta
procesaur commited on
Commit
359d1f5
·
verified ·
1 Parent(s): 9096ae2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +195 -1
README.md CHANGED
@@ -12,4 +12,198 @@ language:
12
  - hr
13
  base_model:
14
  - FacebookAI/xlm-roberta-base
15
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  - hr
13
  base_model:
14
  - FacebookAI/xlm-roberta-base
15
+ ---
16
+ <table style="width:100%;height:100%">
17
+ <tr>
18
+ <td colspan=2>
19
+ <h4><i class="highlight-container"><b class="highlight">XLMali</b></i></h4>
20
+ </td>
21
+ </tr>
22
+ <tr style="width:100%;height:100%">
23
+ <td width=50%>
24
+ <p>Вишејезични модел, 279 милиона параметара</p>
25
+ <p>Обучаван над корпусима српског и српскохрватског језика - 20 милијарди речи</p>
26
+ <p>Једнака подршка уноса на ћирилици и латиници!</p>
27
+ </td>
28
+ <td>
29
+ <p>Multilingual model, 279 million parameters</p>
30
+ <p>Trained on Serbian and Serbo-Croatian corpora - 20 billion words</p>
31
+ <p>Equal support for Cyrillic and Latin input!</p>
32
+ </td>
33
+ </tr>
34
+ </table>
35
+
36
+ ```python
37
+ >>> from transformers import pipeline
38
+ >>> unmasker = pipeline('fill-mask', model='te-sla/teslaXLM')
39
+ >>> unmasker("Kada bi čovek znao gde će pasti on bi<mask>.")
40
+ ```
41
+
42
+ ```python
43
+ >>> from transformers import AutoTokenizer, AutoModelForMaskedLM
44
+ >>> from torch import LongTensor, no_grad
45
+ >>> from scipy import spatial
46
+ >>> tokenizer = AutoTokenizer.from_pretrained('te-sla/teslaXLM')
47
+ >>> model = AutoModelForMaskedLM.from_pretrained('te-sla/teslaXLM', output_hidden_states=True)
48
+ >>> x = " pas"
49
+ >>> y = " mačka"
50
+ >>> z = " svemir"
51
+ >>> tensor_x = LongTensor(tokenizer.encode(x, add_special_tokens=False)).unsqueeze(0)
52
+ >>> tensor_y = LongTensor(tokenizer.encode(y, add_special_tokens=False)).unsqueeze(0)
53
+ >>> tensor_z = LongTensor(tokenizer.encode(z, add_special_tokens=False)).unsqueeze(0)
54
+ >>> model.eval()
55
+ >>> with no_grad():
56
+ >>> vektor_x = model(input_ids=tensor_x).hidden_states[-1].squeeze()
57
+ >>> vektor_y = model(input_ids=tensor_y).hidden_states[-1].squeeze()
58
+ >>> vektor_z = model(input_ids=tensor_z).hidden_states[-1].squeeze()
59
+ >>> print(spatial.distance.cosine(vektor_x, vektor_y))
60
+ >>> print(spatial.distance.cosine(vektor_x, vektor_z))
61
+ ```
62
+
63
+ <div class="inline-flex flex-col" style="line-height: 1.5;padding-right:50px">
64
+ <div style="text-align: center; margin-top: 3px; font-size: 16px; font-weight: 800">Author</div>
65
+ <a href="https://huggingface.co/procesaur">
66
+ <div class="flex">
67
+ <div
68
+ style="display:DISPLAY_1; margin-left: auto; margin-right: auto; width: 92px; height:92px; border-radius: 50%;
69
+ background-size: cover; background-image: url(&#39;https://cdn-uploads.huggingface.co/production/uploads/1673534533167-63bc254fb8c61b8aa496a39b.jpeg?w=200&h=200&f=face&#39;)">
70
+ </div>
71
+ </div>
72
+ </a>
73
+ <div style="text-align: center; font-size: 16px; font-weight: 800">Mihailo Škorić</div>
74
+ <div>
75
+ <a href="https://huggingface.co/procesaur">
76
+ <div style="text-align: center; font-size: 14px;">@procesaur</div>
77
+ </a>
78
+ </div>
79
+ </div>
80
+ </div>
81
+
82
+ <div class="inline-flex flex-col" style="line-height: 1.5;padding-right:50px">
83
+ <div style="text-align: center; margin-top: 3px; font-size: 16px; font-weight: 800">Author</div>
84
+ <a href="https://huggingface.co/tanor">
85
+ <div class="flex">
86
+ <div
87
+ style="display:DISPLAY_1; margin-left: auto; margin-right: auto; width: 92px; height:92px; border-radius: 50%;
88
+ background-size: cover; background-image: url('https://cdn-avatars.huggingface.co/v1/production/uploads/6409d3d71ee054d66a673701/KTOOnCRS9NhpAMZIvLlU7.png?w=200&h=200&f=face')">
89
+ </div>
90
+ </div>
91
+ </a>
92
+ <div style="text-align: center; font-size: 16px; font-weight: 800">Saša Petalinkar</div>
93
+ <div>
94
+ <a href="https://huggingface.co/tanor">
95
+ <div style="text-align: center; font-size: 14px;">@tanor</div>
96
+ </a>
97
+ </div>
98
+ </div>
99
+ </div>
100
+
101
+ <div class="inline-flex flex-col" style="line-height: 1.5;">
102
+ <div style="text-align: center; margin-top: 3px; font-size: 16px; font-weight: 800">Computation</div>
103
+ <a href="https://tesla.rgf.bg.ac.rs">
104
+ <div class="flex">
105
+ <div
106
+ style="display:DISPLAY_1; margin-left: auto; margin-right: auto; width: 92px; height:92px; border-radius: 50%;
107
+ background-size: cover; background-image: url(https://cdn-avatars.huggingface.co/v1/production/uploads/63bc254fb8c61b8aa496a39b/TfM_-sc8-b34ddfhHBGTA.png?w=200&h=200&f=face)">
108
+ </div>
109
+ </div>
110
+ </a>
111
+ <div style="text-align: center; font-size: 16px; font-weight: 800">TESLA project</div>
112
+ <div>
113
+ <a href="https://huggingface.co/te-sla">
114
+ <div style="text-align: center; font-size: 14px;">@te-sla</div>
115
+ </a>
116
+ </div>
117
+ </div>
118
+ </div>
119
+ <br/><br/>
120
+ <div id="zastava">
121
+ <div class="grb">
122
+ <img src="https://www.ai.gov.rs/img/logo_60x120-2.png" style="position:relative; left:30px; z-index:10; height:85px">
123
+ </div>
124
+ <table width=100% style="border:0px">
125
+ <tr style="background-color:#C6363C;width:100%;border:0px;height:30px"><td style="width:100vw"></td></tr>
126
+ <tr style="background-color:#0C4076;width:100%;border:0px;height:30px"><td></td></tr>
127
+ <tr style="background-color:#ffffff;width:100%;border:0px;height:30px"><td></td></tr>
128
+ </table>
129
+ </div>
130
+
131
+ <table style="width:100%;height:100%">
132
+ <tr style="width:100%;height:100%">
133
+ <td width=50%>
134
+ <p>Истраживање jе спроведено уз подршку Фонда за науку Републике Србиjе, #7276, Text Embeddings – Serbian Language Applications – TESLA</p>
135
+ </td>
136
+ <td>
137
+ <p>This research was supported by the Science Fund of the Republic of Serbia, #7276, Text Embeddings - Serbian Language Applications - TESLA</p>
138
+ </td>
139
+ </tr>
140
+ </table>
141
+
142
+
143
+
144
+ <style>
145
+ .ffeat: {
146
+ color:red
147
+ }
148
+
149
+ .cover {
150
+ width: 100%;
151
+ margin-bottom: 5pt
152
+ }
153
+
154
+ .highlight-container, .highlight {
155
+ position: relative;
156
+ text-decoration:none
157
+ }
158
+
159
+ .highlight-container {
160
+ display: inline-block;
161
+
162
+ }
163
+
164
+ .highlight{
165
+ color:white;
166
+ text-transform:uppercase;
167
+ font-size: 16pt;
168
+ }
169
+
170
+ .highlight-container{
171
+ padding:5px 10px
172
+ }
173
+
174
+ .highlight-container:before {
175
+ content: " ";
176
+ display: block;
177
+ height: 100%;
178
+ width: 100%;
179
+ margin-left: 0px;
180
+ margin-right: 0px;
181
+ position: absolute;
182
+ background: #e80909;
183
+ transform: rotate(2deg);
184
+ top: -1px;
185
+ left: -1px;
186
+ border-radius: 20% 25% 20% 24%;
187
+ padding: 10px 18px 18px 10px;
188
+ }
189
+
190
+ div.grb, #zastava>table {
191
+ position:absolute;
192
+ top:0px;
193
+ left: 0px;
194
+ margin:0px
195
+ }
196
+
197
+ div.grb>img, #zastava>table{
198
+ margin:0px
199
+ }
200
+
201
+ #zastava {
202
+ position: relative;
203
+ margin-bottom:120px
204
+ }
205
+
206
+ p {
207
+ font-size:14pt
208
+ }
209
+ </style>