thorunna
commited on
Commit
•
62753ff
1
Parent(s):
6ce586a
Updated readme
Browse files
README.md
CHANGED
@@ -10,34 +10,21 @@ ICELANDIC GPT-SW3 FOR SPELL AND GRAMMAR CHECKING
|
|
10 |
|
11 |
This is a model for correcting spelling and grammar errors in Icelandic text. It is a GPT-SW3 model (https://huggingface.co/AI-Sweden-Models/gpt-sw3-6.7b) finetuned on Icelandic and particularly on the spell and grammar checking task.
|
12 |
|
13 |
-
Provided here is the model along with a script for running it
|
14 |
|
15 |
-
To run the
|
16 |
|
17 |
> pip install -r requirements.txt
|
18 |
|
19 |
The current version of transformers includes a bug which has to be fixed in the user's environment before the model can be run. To fix it, change "gpt-sw3-7b" in line no. 138 in transformers/models/gpt_sw3/tokenization_gpt_sw3.py to "gpt-sw3-6.7b".
|
20 |
|
21 |
-
|
22 |
-
|
23 |
-
The model is fine-tuned on the following three tasks:
|
24 |
- Task 1: The model evaluates one text with regards to e.g. grammar and spelling, and returns all errors in the input text as a list, with their position in the text and their corrections.
|
25 |
- Task 2: The model evaluates two texts and chooses which one is better with regards to e.g. grammar and spelling.
|
26 |
- Task 3: The model evaluates one text with regards to e.g. grammar and spelling, and returns a corrected version of the text.
|
27 |
|
28 |
-
|
29 |
-
- --task: A number (1-3) representing the intended task. The script includes prompts for each task.
|
30 |
-
- --input-file: A file containing text to be evaluated. The format of the input file differs between tasks, and is described further below.
|
31 |
-
- --output-file: A path to a desired output file to be created by the script. The format of the file differs between tasks, and is described further below.
|
32 |
-
|
33 |
-
An input file for tasks 1 and 3 should be a .txt file consisting of texts per line. An example of both files can be found under ./example_inputs.
|
34 |
-
An input file for task 2 should be a .jsonl file, where each line is a dictionary object showing two texts. Keys in the dictionary are "a" and "b" and texts to be evaluated are their values. An example of this file can be found under ./example_inputs.
|
35 |
-
|
36 |
-
All output files are .txt files and output examples for each task are shown in ./example_outputs. An output file for task 1 shows each text which was evaluated, followed by a list of corrections. Text outputs are separated by an empty line. An output file for task 2 shows 'A' or 'B' for which text is preferred, one choice per line. An output file for task 3 shows the corrected text, one text per line.
|
37 |
-
|
38 |
-
Run the script with
|
39 |
-
|
40 |
-
> python run_model.py --task 3 --input-file example_inputs/task3_example.txt --output-file example_outputs/task3_example.txt
|
41 |
|
42 |
-
|
43 |
|
|
|
|
10 |
|
11 |
This is a model for correcting spelling and grammar errors in Icelandic text. It is a GPT-SW3 model (https://huggingface.co/AI-Sweden-Models/gpt-sw3-6.7b) finetuned on Icelandic and particularly on the spell and grammar checking task.
|
12 |
|
13 |
+
Provided here is the model along with a script for running it through a Hugging Face endpoint. An authorized Hugging Face API key is required to do so. Once you have retrieved an API key and it has been authorized, add it to you environment as "HF_API_KEY".
|
14 |
|
15 |
+
To run the model you will need a python3 environment. Install the required dependencies by running
|
16 |
|
17 |
> pip install -r requirements.txt
|
18 |
|
19 |
The current version of transformers includes a bug which has to be fixed in the user's environment before the model can be run. To fix it, change "gpt-sw3-7b" in line no. 138 in transformers/models/gpt_sw3/tokenization_gpt_sw3.py to "gpt-sw3-6.7b".
|
20 |
|
21 |
+
The model is fine-tuned on the following three tasks. Output examples for each task are shown in ./example_outputs.
|
|
|
|
|
22 |
- Task 1: The model evaluates one text with regards to e.g. grammar and spelling, and returns all errors in the input text as a list, with their position in the text and their corrections.
|
23 |
- Task 2: The model evaluates two texts and chooses which one is better with regards to e.g. grammar and spelling.
|
24 |
- Task 3: The model evaluates one text with regards to e.g. grammar and spelling, and returns a corrected version of the text.
|
25 |
|
26 |
+
Run the model with
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
|
28 |
+
> python run_model.py
|
29 |
|
30 |
+
Input text(s) and the task type need to be specified in the script.
|