--- license: apache-2.0 language: - tr tags: - deprem-clf-v1 metrics: - accuracy - recall - f1 library_name: transformers pipeline_tag: text-classification model-index: - name: deprem_v12 results: - task: type: text-classification dataset: type: deprem_private_dataset_v1_2 name: deprem_private_dataset_v1_2 metrics: - type: recall value: 0.8 verified: false - type: f1 value: 0.75 verified: false --- ### Deprem NER Training Results ``` precision recall f1-score support 0 0.85 0.91 0.88 734 1 0.77 0.84 0.80 207 2 0.71 0.88 0.79 130 3 0.68 0.76 0.72 94 4 0.80 0.85 0.82 362 5 0.63 0.59 0.61 112 6 0.73 0.82 0.77 108 7 0.55 0.77 0.64 78 8 0.65 0.71 0.68 31 9 0.70 0.85 0.76 117 micro avg 0.77 0.85 0.81 1973 macro avg 0.71 0.80 0.75 1973 weighted avg 0.77 0.85 0.81 1973 samples avg 0.82 0.87 0.83 1973 ``` ### Preprocessing Funcs ``` tr_stopwords = stopwords.words('turkish') tr_stopwords.append("hic") tr_stopwords.append("dm") tr_stopwords.append("vs") tr_stopwords.append("ya") def remove_punct(tok): tok = re.sub(r'[^\w\s]', '', tok) return tok def normalize(tok): if tok.isdigit(): tok = "digit" return tok def clean(tok): tok = remove_punct(tok) tok = normalize(tok) return tok def exceptions(tok): if not tok.isdigit() and len(tok)==1: return False if not tok: return False if tok in tr_stopwords: return False if tok.startswith('#') or tok.startswith("@"): return False return True sm_tok = lambda text: [clean(tok) for tok in text.split(" ") if exceptions(tok)] ``` ### Other HyperParams ``` training_args = TrainingArguments( output_dir="./output", evaluation_strategy="epoch", per_device_train_batch_size=32, per_device_eval_batch_size=32, weight_decay=0.01, report_to=None, num_train_epochs=4 ) ``` ``` class_weights[0] = 1.0 class_weights[1] = 1.5167249178108022 class_weights[2] = 1.7547338578655642 class_weights[3] = 1.9610520059358458 class_weights[4] = 1.269341370129623 class_weights[5] = 1.8684086209021484 class_weights[6] = 1.8019018017117145 class_weights[7] = 2.110648663094536 class_weights[8] = 3.081208739200435 class_weights[9] = 1.7994815143101963 ``` Threshold: 0.25 ```