JminJ
/

koElectra_base_Bad_Sentence_Classifier

Text Classification

Transformers

PyTorch

electra

Inference Endpoints

Model card Files Files and versions Community

JminJ commited on Apr 11, 2022

Commit

51a4437

•

1 Parent(s): fb1e399

Update README.md

Browse files

Files changed (1) hide show

README.md +3 -45

README.md CHANGED Viewed

@@ -22,7 +22,7 @@ NOTE)
 **Korean Unsmile Dataset에 clean으로 labeling 되어있던 데이터 중 몇개의 데이터를 0 (bad sentence)으로 수정하였습니다.**
 * "~노"가 포함된 문장 중, "이기", "노무"가 포함된 데이터는 0 (bad sentence)으로 수정
 * "좆", "봊" 등 성 관련 뉘앙스가 포함된 데이터는 0 (bad sentence)으로 수정
-</br></br>
 ## Model Training
 * huggingface transformers의 ElectraForSequenceClassification를 사용해 finetuning을 수행하였습니다.
@@ -32,54 +32,13 @@ NOTE)
 * [monologg/koELECTRA](https://github.com/monologg/KoELECTRA)
 * [tunib/electra-ko-base](https://huggingface.co/tunib/electra-ko-base)
-### how to train?
-```BASH
-python codes/model_source/train_torch_sch.py \
-    --learning_rate=3e-06 \
-    --use_float_16=True \
-    --weight-decay=0.001 \
-    --base_save_ckpt_path=BASE_SAVE_CHPT_PATH \
-    --epochs=10 \
-    --batch_size=128 \
-    --model_type=MODEL_TYPE
-```
-### parameters
-| parameter | type | description | default |
-| ---------- | ---------- | ---------- | --------- |
-| learning_rate | float | decise learning rate for train | 5e-05 |
-| use_float_16 | bool | decise to apply float 16 or not | False |
-| weight_decay | float | define weight decay lambda | None |
-| base_ckpt_save_path | str | base path that will be saved trained checkpoints | None |
-| epochs | int | full train epochs | 5 |
-| batch_size | int | batch size using in train time | 64 |
-| model_type | int | used to choose what electra model using for training | 0 |
-```
-NOTE) train dataset, valid dataset은 train_torch_sch.py 내의 config 부분에서 지정하실 수 있습니다
-```
-</br>
 ## How to use model?
 ```PYTHON
 from transformers import AutoModelForSequenceClassification, AutoTokenizer
-model = AutoModelForSequenceClassification.from_pretrained('JminJ/kcElectra_base_Bad_Sentence_Classifier')
-tokenizer = AutoTokenizer.from_pretrained('JminJ/kcElectra_base_Bad_Sentence_Classifier')
-```
-</br>
-## Predict model
-사용자가 테스트 해보고 싶은 문장을 넣어 predict를 수행해 볼 수 있습니다.
-```BASH
-python codes/model_source/utils/predict.py \
-    --input_text=INPUT_TEXT \
-    --base_ckpt=BASE_CKPT
 ```
-### parameters
-| parameter | type | description | default |
-| ---------- | ---------- | ---------- | --------- |
-| input_text | str | user input text | "반갑습니다. JminJ입니다!" |
-| base_ckpt | str | base path that saved trained checkpoints | False |
-</br>
 ## Model Valid Accuracy
 | mdoel | accuracy |
@@ -91,7 +50,6 @@ python codes/model_source/utils/predict.py \
 Note)
 모든 모델은 동일한 seed, learning_rate(3e-06), weight_decay lambda(0.001), batch_size(128)로 학습되었습니다.
 ```
-</br>
 ## Contact
 * jminju254@gmail.com

 **Korean Unsmile Dataset에 clean으로 labeling 되어있던 데이터 중 몇개의 데이터를 0 (bad sentence)으로 수정하였습니다.**
 * "~노"가 포함된 문장 중, "이기", "노무"가 포함된 데이터는 0 (bad sentence)으로 수정
 * "좆", "봊" 등 성 관련 뉘앙스가 포함된 데이터는 0 (bad sentence)으로 수정
+</br>
 ## Model Training
 * huggingface transformers의 ElectraForSequenceClassification를 사용해 finetuning을 수행하였습니다.
 * [monologg/koELECTRA](https://github.com/monologg/KoELECTRA)
 * [tunib/electra-ko-base](https://huggingface.co/tunib/electra-ko-base)
 ## How to use model?
 ```PYTHON
 from transformers import AutoModelForSequenceClassification, AutoTokenizer
+model = AutoModelForSequenceClassification.from_pretrained('JminJ/koElectra_base_Bad_Sentence_Classifier')
+tokenizer = AutoTokenizer.from_pretrained('JminJ/koElectra_base_Bad_Sentence_Classifier')
 ```
 ## Model Valid Accuracy
 | mdoel | accuracy |
 Note)
 모든 모델은 동일한 seed, learning_rate(3e-06), weight_decay lambda(0.001), batch_size(128)로 학습되었습니다.
 ```
 ## Contact
 * jminju254@gmail.com