nguyenvulebinh
/

spoken-norm

Inference Endpoints

Model card Files Files and versions Community

nguyenvulebinh commited on Dec 28, 2021

Commit

d64d0a2

•

1 Parent(s): 210d258

Update README.md

Files changed (1) hide show

README.md +33 -7

README.md CHANGED Viewed

@@ -1,7 +1,34 @@
 # Transformation spoken text to written text
 ![Model](https://raw.githubusercontent.com/nguyenvulebinh/spoken-norm/main/spoken_norm_model.svg)
 ```python
 import torch
 import model_handling
@@ -11,7 +38,7 @@ import os
 os.environ["CUDA_VISIBLE_DEVICES"] = ""
 ```
-# Init tokenizer and model
 ```python
@@ -20,7 +47,7 @@ model = EncoderDecoderSpokenNorm.from_pretrained('nguyenvulebinh/spoken-norm', c
 data_collator = DataCollatorForNormSeq2Seq(tokenizer)
 ```
-# Infer sample
 ```python
@@ -82,9 +109,8 @@ for output in outputs.cpu().detach().numpy().tolist():
     28/4 cô vít bùng phát ở sờ cốt lờn chiếm 80 % là biến chủng đen ta và bê ta
-## About
-*Built by Binh Nguyen*
-[![Follow](https://img.shields.io/twitter/follow/nguyenvulebinh?style=social)](https://twitter.com/intent/follow?screen_name=nguyenvulebinh)
-For more details, visit the project repository.
-[![GitHub stars](https://img.shields.io/github/stars/nguyenvulebinh/spoken-norm?style=social)](https://github.com/nguyenvulebinh/spoken-norm)

 # Transformation spoken text to written text
+This model is used for formatting raw asr text output from spoken text to written text (Eg. date, number, id, ...). It also supports formatting "out of vocab" by using external vocabulary.
+Some of examples:
+```text
+input  : tám giờ chín phút ngày mười tám tháng năm năm hai nghìn không trăm hai mươi hai
+output : 8h9 18/5/2022
+input  : mã số quy đê tê tê đê hai tám chéo hai không không ba
+output : mã số qdttd28/2003
+input  : thể tích tám mét khối trọng lượng năm mươi ki lô gam
+output : thể tích 8 m3 trọng lượng 50 kg
+input    : ngày hai tám tháng tư cô vít bùng phát ở sờ cốt lờn chiếm tám mươi phần trăm là biến chủng đen ta và bê ta
+ex_vocab : ['scotland', 'covid', 'delta', 'beta']
+output   : 28/4 covid bùng phát ở scotland chiếm 80 % là biến chủng delta và beta
+```
+## Model architecture
 ![Model](https://raw.githubusercontent.com/nguyenvulebinh/spoken-norm/main/spoken_norm_model.svg)
+# Infer model
+- Play around at [Huggingface Space](https://huggingface.co/spaces/nguyenvulebinh/spoken-norm)
 ```python
 import torch
 import model_handling
 os.environ["CUDA_VISIBLE_DEVICES"] = ""
 ```
+## Init tokenizer and model
 ```python
 data_collator = DataCollatorForNormSeq2Seq(tokenizer)
 ```
+## Infer sample
 ```python
     28/4 cô vít bùng phát ở sờ cốt lờn chiếm 80 % là biến chủng đen ta và bê ta
+## Contact
+nguyenvulebinh@gmail.com
+[![Follow](https://img.shields.io/twitter/follow/nguyenvulebinh?style=social)](https://twitter.com/intent/follow?screen_name=nguyenvulebinh)