flax-community
/

norsk-gpt-wiki

Text Generation

text-generation-inference

Model card Files Files and versions Metrics Training metrics Community

birgermoell commited on Jul 17, 2021

Commit

4c8c2f1

·

1 Parent(s): e8362e2

Update README.md

Files changed (1) hide show

README.md +52 -4

README.md CHANGED Viewed

@@ -10,6 +10,46 @@ part of the wiki40b dataset.
 https://huggingface.co/datasets/wiki40b
 ## Data cleaning and preprocessing
 The data was cleaned and preprocessed using the following script. Make sure to install depencies for beam_runner to make the dataset work.
@@ -25,10 +65,18 @@ def load_and_clean_wiki():
     return filtered_dataset
 def filter_wikipedia(batch):
-    batch["text"] = " ".join(batch["text"].split("\n_START_SECTION_\n"))
-    batch["text"] = " ".join(batch["text"].split("\n_START_ARTICLE_\n"))
-    batch["text"] = " ".join(batch["text"].split("\n_START_ARTICLE_\n"))
-    batch["text"] = " ".join(batch["text"].split("\n_START_PARAGRAPH_\n"))
     batch["text"] = " ".join(batch["text"].split("_NEWLINE_"))
     batch["text"] = " ".join(batch["text"].split("\xa0"))
     return batch

 https://huggingface.co/datasets/wiki40b
+## Model series
+This model is part of a series of models training on TPU with Flax Jax during Huggingface Flax/Jax challenge.
+## Gpt models
+## Swedish Gpt
+https://huggingface.co/birgermoell/swedish-gpt/
+## Swedish gpt wiki
+https://huggingface.co/flax-community/swe-gpt-wiki
+# Nordic gpt wiki
+https://huggingface.co/flax-community/nordic-gpt-wiki
+## Dansk gpt wiki
+https://huggingface.co/flax-community/dansk-gpt-wiki
+## Norsk gpt wiki
+https://huggingface.co/flax-community/norsk-gpt-wiki
+## Roberta models
+## Nordic Roberta Wiki
+https://huggingface.co/flax-community/nordic-roberta-wiki
+## Swe Roberta Wiki Oscar
+https://huggingface.co/flax-community/swe-roberta-wiki-oscar
+## Roberta Swedish Scandi
+https://huggingface.co/birgermoell/roberta-swedish-scandi
+## Roberta Swedish
+https://huggingface.co/birgermoell/roberta-swedish
+## Swedish T5 model
+https://huggingface.co/birgermoell/t5-base-swedish
 ## Data cleaning and preprocessing
 The data was cleaned and preprocessed using the following script. Make sure to install depencies for beam_runner to make the dataset work.
     return filtered_dataset
 def filter_wikipedia(batch):
+    batch["text"] = " ".join(batch["text"].split("\
+_START_SECTION_\
+"))
+    batch["text"] = " ".join(batch["text"].split("\
+_START_ARTICLE_\
+"))
+    batch["text"] = " ".join(batch["text"].split("\
+_START_ARTICLE_\
+"))
+    batch["text"] = " ".join(batch["text"].split("\
+_START_PARAGRAPH_\
+"))
     batch["text"] = " ".join(batch["text"].split("_NEWLINE_"))
     batch["text"] = " ".join(batch["text"].split("\xa0"))
     return batch