taidopurason commited on
Commit
f76b124
1 Parent(s): 9eacc3b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -5,13 +5,13 @@ tags:
5
  language:
6
  - et
7
  base_model:
8
- - meta-llama/Llama-2-7b-hf
9
  pipeline_tag: text-generation
10
  ---
11
 
12
  # Llammas-base-p1-llama-errors-p2-GEC
13
 
14
- GEC model for Estonian based on Llama-2 7B and fine-tuned on 1) correcting 1M synthetic errors produced by our Llama-based error generation model 2) human GEC data.
15
 
16
 
17
  For training and inference code used in our paper see our repository [https://github.com/TartuNLP/gec-llm](https://github.com/TartuNLP/gec-llm).
@@ -71,7 +71,7 @@ input_sentence = "Ma läheb koju"
71
  # 1)
72
  PROMPT = '### Instruction:\nReply with a corrected version of the input sentence in Estonian with all grammatical and spelling errors fixed. If there are no errors, reply with a copy of the original sentence.\n\n### Input:\n{input}\n\n### Response:\n'
73
  example = PROMPT.format(input=input_sentence)
74
- # 2) or use the chat template provied by use that does the same thing
75
  example = tokenizer.apply_chat_template([{"role": "user", "content": input_sentence}], tokenize=False)
76
 
77
  gec_pipe(example, max_new_tokens=300)[0]["generated_text"][len(example):]
@@ -109,3 +109,4 @@ that also did whitespace and quote normalization, so you might also want to appl
109
  }
110
  ````
111
 
 
 
5
  language:
6
  - et
7
  base_model:
8
+ - tartuNLP/Llammas-base
9
  pipeline_tag: text-generation
10
  ---
11
 
12
  # Llammas-base-p1-llama-errors-p2-GEC
13
 
14
+ GEC model for Estonian based on [tartuNLP/Llammas-base](https://huggingface.co/tartuNLP/Llammas-base) and fine-tuned on 1) correcting 1M synthetic errors produced by our Llama-based error generation model 2) human GEC data.
15
 
16
 
17
  For training and inference code used in our paper see our repository [https://github.com/TartuNLP/gec-llm](https://github.com/TartuNLP/gec-llm).
 
71
  # 1)
72
  PROMPT = '### Instruction:\nReply with a corrected version of the input sentence in Estonian with all grammatical and spelling errors fixed. If there are no errors, reply with a copy of the original sentence.\n\n### Input:\n{input}\n\n### Response:\n'
73
  example = PROMPT.format(input=input_sentence)
74
+ # 2) or use the chat template provided by us that does the same thing
75
  example = tokenizer.apply_chat_template([{"role": "user", "content": input_sentence}], tokenize=False)
76
 
77
  gec_pipe(example, max_new_tokens=300)[0]["generated_text"][len(example):]
 
109
  }
110
  ````
111
 
112
+ Arxiv link: [https://arxiv.org/abs/2403.05493](https://arxiv.org/abs/2403.05493)