w11wo commited on
Commit
f18e48e
1 Parent(s): a0f1a7d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -5
README.md CHANGED
@@ -13,9 +13,27 @@ pinned: false
13
  <img src="https://raw.githubusercontent.com/LazarusNLP/lazarusnlp.github.io/main/docs/assets/images/logo_web.png" alt="logo" width="400"/>
14
  </p>
15
 
16
- Particularly, we plan to apply the following technologies on languages of Indonesia:
17
 
18
- - Neural Machine Translation
19
- - Automatic Speech Recognition
20
- - Text-to-Speech
21
- - Speech-to-Speech Translation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  <img src="https://raw.githubusercontent.com/LazarusNLP/lazarusnlp.github.io/main/docs/assets/images/logo_web.png" alt="logo" width="400"/>
14
  </p>
15
 
16
+ ## Projects
17
 
18
+ <table>
19
+ <tr>
20
+ <td valign="top">
21
+ <h3>IndoT5: T5 Language Models for the Indonesian Language</h3>
22
+ <p>IndoT5 is a T5-based language model trained specifically for the Indonesian language. With just 8 hours of training on a limited budget, we developed a competitive sequence-to-sequence, encoder-decode model capable of fine-tuning tasks such as summarization, chit-chat, and question-answering. Despite the limited training constraints, our model is competitive when evaluated on the <a href="https://github.com/IndoNLP/indonlg">IndoNLG</a> (text generation) benchmark.</p>
23
+ </td>
24
+ <td valign="top">
25
+ <h3>Indonesian Sentence Embedding Models</h3>
26
+ <p>We trained open-source sentence embedding models for Indonesian, enabling applications such as information retrieval (useful for retrieval-augmented generation!) semantic text similarity, and zero-shot text classification. We leverage existing pre-trained Indonesian language models like <a href="https://github.com/IndoNLP/indonlu">IndoBERT</a> and state-of-the-art unsupervised techniques and established sentence embedding benchmarks.</p>
27
+ </td>
28
+ </tr>
29
+ <tr>
30
+ <td valign="top">
31
+ <h3>Indonesian Natural Language Inference Models</h3>
32
+ <p>Open-source lightweight NLI models that are competitive with larger models on IndoNLI benchmark, with significantly less parameters. We applied knowledge distillation methods to small existing pre-trained language models like IndoBERT Lite. These models offer efficient solutions for tasks requiring natural language inference capabilities while minimizing computational resources such as cross-encoder-based semantic search.</p>
33
+ </td>
34
+ <td valign="top">
35
+ <h3>Many-to-Many Multilingual Translation Models</h3>
36
+ <p>Adapting mT5 to 45 languages of Indonesia, we developed a robust baseline model for multilingual translation for languages of Indonesia. This facilitates further fine-tuning for niche domains and low-resource languages, contributing to greater linguistic inclusivity. Our models are competitive with existing multilingual translation models on the <a href="https://github.com/IndoNLP/nusax">NusaX</a> benchmark.</p>
37
+ </td>
38
+ </tr>
39
+ </table>