--- language: tr library_name: peft pipeline_tag: token-classification base_model: dbmdz/bert-base-turkish-cased --- ## Training procedure This is a fine-tuned model of base model "dbmdz/bert-base-turkish-cased" using the Parameter Efficient Fine Tuning (PEFT) with Low-Rank Adaptation (LoRA) technique using a reviewed version of well known Turkish NER dataset (https://github.com/stefan-it/turkish-bert/files/4558187/nerdata.txt). trainable params: 702,734 || all params: 110,627,342 || trainable%: 0.6352263258752072 # Fine-tuning parameters: ``` task = "ner" model_checkpoint = "dbmdz/bert-base-turkish-cased" batch_size = 16 label_list = ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC'] max_length = 512 learning_rate = 1e-3 num_train_epochs = 7 weight_decay = 0.01 ``` # PEFT Parameters ``` inference_mode=False r=16 lora_alpha=16 lora_dropout=0.1 bias="all" ``` # How to use: ``` peft_model_id = "akdeniz27/bert-base-turkish-cased-ner-lora" config = PeftConfig.from_pretrained(peft_model_id) inference_model = AutoModelForTokenClassification.from_pretrained( config.base_model_name_or_path, num_labels=7, id2label=id2label, label2id=label2id ) tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path) model = PeftModel.from_pretrained(inference_model, peft_model_id) text = "Mustafa Kemal Atatürk 1919 yılında Samsun'a çıktı." inputs = tokenizer(text, return_tensors="pt") with torch.no_grad(): logits = model(**inputs).logits tokens = inputs.tokens() predictions = torch.argmax(logits, dim=2) for token, prediction in zip(tokens, predictions[0].numpy()): print((token, model.config.id2label[prediction])) ``` # Reference test results: * accuracy: 0.993297 * f1: 0.949696 * precision: 0.942554 * recall: 0.956946