Initial version

Browse files

Files changed (4) hide show

README.md +67 -0
model.bin +3 -0
shared_vocabulary.txt +0 -0
sp_m.model +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,67 @@

+---
+language:
+- en
+tags:
+- gec
+library_name: opennmt
+license: mit
+metrics:
+- bleu
+inference: false
+---
+### Introduction
+This repository contains a description on how to use OpenNMT on the Grammar Error Correction (GEC) task. The idea is to approch GEC as a translation task
+### Usage
+Install the necessary dependencies:
+```bash
+pip3 install ctranslate2 pyonmttok
+```
+Simple tokenization & translation using Python:
+```python
+import ctranslate2
+import pyonmttok
+from huggingface_hub import snapshot_download
+model_dir = snapshot_download(repo_id="jordimas/gec-opennmt-english", revision="main")
+tokenizer=pyonmttok.Tokenizer(mode="none", sp_model_path = model_dir + "/sp_m.model")
+tokenized=tokenizer.tokenize("The water are hot. My friends are going to be late. Today mine mother is in Barcelona.")
+translator = ctranslate2.Translator(model_dir)
+translated = translator.translate_batch([tokenized[0]])
+print(tokenizer.detokenize(translated[0][0]['tokens']))
+```
+# Model
+The model has been training using the [clang8](https://github.com/google-research-datasets/clang8) corpus for English language.
+Details:
+  * Model: TransformerBase
+  * Tokenizer: SentencePiece
+  * BLEU = 85.50
+# Papers
+Relevant papers:
+* [Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task](https://aclanthology.org/N18-1055.pdf)
+* [A Simple Recipe for Multilingual Grammatical Error Correction](https://arxiv.org/pdf/2106.03830.pdf)
+# Contact
+Email address: Jordi Mas: jmas@softcatala.org

model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4bb56a2291e653a7ddc3d445af09735488dba4b196baf228cd454eedbadd21f2
+size 122328622

shared_vocabulary.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

sp_m.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:661943a6befb807ca696fc5d0656a1afae2a18e21dd2c823cb0c3be25d8dd441
+size 1131052