taidopurason
commited on
Commit
•
f76b124
1
Parent(s):
9eacc3b
Update README.md
Browse files
README.md
CHANGED
@@ -5,13 +5,13 @@ tags:
|
|
5 |
language:
|
6 |
- et
|
7 |
base_model:
|
8 |
-
-
|
9 |
pipeline_tag: text-generation
|
10 |
---
|
11 |
|
12 |
# Llammas-base-p1-llama-errors-p2-GEC
|
13 |
|
14 |
-
GEC model for Estonian based on
|
15 |
|
16 |
|
17 |
For training and inference code used in our paper see our repository [https://github.com/TartuNLP/gec-llm](https://github.com/TartuNLP/gec-llm).
|
@@ -71,7 +71,7 @@ input_sentence = "Ma läheb koju"
|
|
71 |
# 1)
|
72 |
PROMPT = '### Instruction:\nReply with a corrected version of the input sentence in Estonian with all grammatical and spelling errors fixed. If there are no errors, reply with a copy of the original sentence.\n\n### Input:\n{input}\n\n### Response:\n'
|
73 |
example = PROMPT.format(input=input_sentence)
|
74 |
-
# 2) or use the chat template
|
75 |
example = tokenizer.apply_chat_template([{"role": "user", "content": input_sentence}], tokenize=False)
|
76 |
|
77 |
gec_pipe(example, max_new_tokens=300)[0]["generated_text"][len(example):]
|
@@ -109,3 +109,4 @@ that also did whitespace and quote normalization, so you might also want to appl
|
|
109 |
}
|
110 |
````
|
111 |
|
|
|
|
5 |
language:
|
6 |
- et
|
7 |
base_model:
|
8 |
+
- tartuNLP/Llammas-base
|
9 |
pipeline_tag: text-generation
|
10 |
---
|
11 |
|
12 |
# Llammas-base-p1-llama-errors-p2-GEC
|
13 |
|
14 |
+
GEC model for Estonian based on [tartuNLP/Llammas-base](https://huggingface.co/tartuNLP/Llammas-base) and fine-tuned on 1) correcting 1M synthetic errors produced by our Llama-based error generation model 2) human GEC data.
|
15 |
|
16 |
|
17 |
For training and inference code used in our paper see our repository [https://github.com/TartuNLP/gec-llm](https://github.com/TartuNLP/gec-llm).
|
|
|
71 |
# 1)
|
72 |
PROMPT = '### Instruction:\nReply with a corrected version of the input sentence in Estonian with all grammatical and spelling errors fixed. If there are no errors, reply with a copy of the original sentence.\n\n### Input:\n{input}\n\n### Response:\n'
|
73 |
example = PROMPT.format(input=input_sentence)
|
74 |
+
# 2) or use the chat template provided by us that does the same thing
|
75 |
example = tokenizer.apply_chat_template([{"role": "user", "content": input_sentence}], tokenize=False)
|
76 |
|
77 |
gec_pipe(example, max_new_tokens=300)[0]["generated_text"][len(example):]
|
|
|
109 |
}
|
110 |
````
|
111 |
|
112 |
+
Arxiv link: [https://arxiv.org/abs/2403.05493](https://arxiv.org/abs/2403.05493)
|