new5558 commited on
Commit
812e130
1 Parent(s): 1abd6ae

docs: add kaggle conversion code

Browse files
Files changed (1) hide show
  1. README.md +7 -0
README.md CHANGED
@@ -16,6 +16,7 @@ This repository includes the Thai pretrained language representation (HoogBERTa_
16
 
17
  # Documentation
18
 
 
19
  ## Prerequisite
20
  Since we use subword-nmt BPE encoding, input needs to be pre-tokenize using [BEST](https://huggingface.co/datasets/best2009) standard before inputting into HoogBERTa
21
  ```
@@ -81,6 +82,12 @@ with torch.no_grad():
81
  features = model(token_ids) # where token_ids is a tensor with type "long".
82
  ```
83
 
 
 
 
 
 
 
84
  # Citation
85
 
86
  Please cite as:
 
16
 
17
  # Documentation
18
 
19
+
20
  ## Prerequisite
21
  Since we use subword-nmt BPE encoding, input needs to be pre-tokenize using [BEST](https://huggingface.co/datasets/best2009) standard before inputting into HoogBERTa
22
  ```
 
82
  features = model(token_ids) # where token_ids is a tensor with type "long".
83
  ```
84
 
85
+
86
+ ## Conversion Code
87
+ If you are interested in how to convert Fairseq and subword-nmt Roberta into Huggingface hub here is my code used to do the conversion and test for parity match:
88
+ https://www.kaggle.com/norapatbuppodom/hoogberta-conversion
89
+
90
+
91
  # Citation
92
 
93
  Please cite as: