normundsg commited on
Commit
4a8c2a8
1 Parent(s): 4dfa048

Updated README

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -6,16 +6,16 @@ language:
6
 
7
  # Latvian BERT base model (cased)
8
 
9
- A BERT model pretrained on the Latvian language using the masked language modeling and next sentence prediction objectives.
10
- It was introduced in [this paper](http://ebooks.iospress.nl/volumearticle/55531) and first released via [this repository](https://github.com/LUMII-AILab/LVBERT).
 
11
 
12
- This model is case-sensitive. It is primarily intended to be fine-tuned on downstream natural language understanding (NLU) tasks.
13
-
14
- Developed at [AiLab.lv](https://ailab.lv)
15
 
16
  ## Training data
17
 
18
- LVBERT was pretrained on texts from the [Balanced Corpus of Modern Latvian](https://korpuss.lv/en/id/LVK2018), [Latvian Wikipedia](https://korpuss.lv/en/id/Vikipēdija), [Corpus of News Portal Articles](https://korpuss.lv/en/id/Ziņas), as well as [Corpus of News Portal Comments](https://korpuss.lv/en/id/Barometrs); 500M tokens in total.
19
 
20
  ## Tokenization
21
 
 
6
 
7
  # Latvian BERT base model (cased)
8
 
9
+ A BERT model pretrained on Latvian language data using the masked language modeling and next sentence prediction objectives.
10
+ It was introduced in [this paper](http://ebooks.iospress.nl/volumearticle/55531) and first released via a [GitHub repository](https://github.com/LUMII-AILab/LVBERT).
11
+ The current HF repository contains an improved version of LVBERT.
12
 
13
+ This model is case-sensitive. It is primarily intended to be fine-tuned on downstream natural language understanding tasks like text classification, named entity recognition, question answering.
14
+ However, the model can be used as is to compute contextual embeddings for tasks like text similarity and clustering, semantic search.
 
15
 
16
  ## Training data
17
 
18
+ LVBERT was pretrained on texts from the [Balanced Corpus of Modern Latvian](https://korpuss.lv/en/id/LVK2018), [Latvian Wikipedia](https://korpuss.lv/en/id/Vikipēdija), [Corpus of News Portal Articles](https://korpuss.lv/en/id/Ziņas), as well as [Corpus of News Portal Comments](https://korpuss.lv/en/id/Barometrs); around 500M tokens in total.
19
 
20
  ## Tokenization
21