lst-nectec
/

HoogBERTa

Inference Endpoints

Model card Files Files and versions Community

new5558 commited on Mar 31, 2023

Commit

812e130

•

1 Parent(s): 1abd6ae

docs: add kaggle conversion code

Files changed (1) hide show

README.md +7 -0

README.md CHANGED Viewed

@@ -16,6 +16,7 @@ This repository includes the Thai pretrained language representation (HoogBERTa_
 # Documentation
 ## Prerequisite
 Since we use subword-nmt BPE encoding, input needs to be pre-tokenize using [BEST](https://huggingface.co/datasets/best2009) standard before inputting into HoogBERTa
 ```
@@ -81,6 +82,12 @@ with torch.no_grad():
   features = model(token_ids) # where token_ids is a tensor with type "long".
 ```
 # Citation
 Please cite as:

 # Documentation
 ## Prerequisite
 Since we use subword-nmt BPE encoding, input needs to be pre-tokenize using [BEST](https://huggingface.co/datasets/best2009) standard before inputting into HoogBERTa
 ```
   features = model(token_ids) # where token_ids is a tensor with type "long".
 ```
+## Conversion Code
+If you are interested in how to convert Fairseq and subword-nmt Roberta into Huggingface hub here is my code used to do the conversion and test for parity match:
+https://www.kaggle.com/norapatbuppodom/hoogberta-conversion
 # Citation
 Please cite as: