tartuNLP
/

Llammas-base-p1-llama-errors-p2-GEC

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

taidopurason commited on 10 days ago

Commit

f76b124

•

1 Parent(s): 9eacc3b

Update README.md

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -5,13 +5,13 @@ tags:
 language:
 - et
 base_model:
-- meta-llama/Llama-2-7b-hf
 pipeline_tag: text-generation
 ---
 # Llammas-base-p1-llama-errors-p2-GEC
-GEC model for Estonian based on Llama-2 7B and fine-tuned on 1) correcting 1M synthetic errors produced by our Llama-based error generation model 2) human GEC data.
 For training and inference code used in our paper see our repository [https://github.com/TartuNLP/gec-llm](https://github.com/TartuNLP/gec-llm).
@@ -71,7 +71,7 @@ input_sentence = "Ma läheb koju"
 # 1)
 PROMPT = '### Instruction:\nReply with a corrected version of the input sentence in Estonian with all grammatical and spelling errors fixed. If there are no errors, reply with a copy of the original sentence.\n\n### Input:\n{input}\n\n### Response:\n'
 example = PROMPT.format(input=input_sentence)
-# 2) or use the chat template provied by use that does the same thing
 example = tokenizer.apply_chat_template([{"role": "user", "content": input_sentence}], tokenize=False)
 gec_pipe(example, max_new_tokens=300)[0]["generated_text"][len(example):]
@@ -109,3 +109,4 @@ that also did whitespace and quote normalization, so you might also want to appl
 }
 ````

 language:
 - et
 base_model:
+- tartuNLP/Llammas-base
 pipeline_tag: text-generation
 ---
 # Llammas-base-p1-llama-errors-p2-GEC
+GEC model for Estonian based on [tartuNLP/Llammas-base](https://huggingface.co/tartuNLP/Llammas-base) and fine-tuned on 1) correcting 1M synthetic errors produced by our Llama-based error generation model 2) human GEC data.
 For training and inference code used in our paper see our repository [https://github.com/TartuNLP/gec-llm](https://github.com/TartuNLP/gec-llm).
 # 1)
 PROMPT = '### Instruction:\nReply with a corrected version of the input sentence in Estonian with all grammatical and spelling errors fixed. If there are no errors, reply with a copy of the original sentence.\n\n### Input:\n{input}\n\n### Response:\n'
 example = PROMPT.format(input=input_sentence)
+# 2) or use the chat template provided by us that does the same thing
 example = tokenizer.apply_chat_template([{"role": "user", "content": input_sentence}], tokenize=False)
 gec_pipe(example, max_new_tokens=300)[0]["generated_text"][len(example):]
 }
 ````
+Arxiv link: [https://arxiv.org/abs/2403.05493](https://arxiv.org/abs/2403.05493)