Model Card for Model ID

Model Description

Given 2 words in Turkish, the model predicts whether they share an affix or not. Fine-tuned on dbmdz/bert-base-turkish-cased, fine-tuned on a task similar to NLI, but on word level and with 2 labels. It was created as a final project for one of my classes.

Developed by: Scoup123
Model type: BERT
Language(s) (NLP): Turkish
Finetuned from model [optional]: dbmdz/bert-base-turkish-cased

Model Sources [optional]

Repository: [More Information Needed]
Paper [optional]: in-works

Uses

It can be used in morphological analyzing tasks.

Direct Use

It can probably be used without additional finetuning on Turkish.

Training Details

Training Data

scoup123/affixfinder

The dataset used was generated from a generated dataset mentioned in the paper titled Turkish language resources: Morphological parser, morphological disambiguator and web corpus.

Evaluation

Test Accuracy: 0.9874 Precision: 0.9874 Recall: 0.9874 F1 Score: 0.9874

**It should be used with caution as these scores are too high.

Testing Data, Factors & Metrics

Testing Data

A testing split data was created from the training data

Summary

This model aims to create an affix identifier for Turkish.

Model Examination [optional]

I have just created it, so further testing needed to check if it actually works. Additionally, you should check it if it works before using it.

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: Free Colab T4 GPU
Hours used: ~2.5 hours
Cloud Provider: Google
Compute Region: Europe
Carbon Emitted: [More Information Needed]

Citation [optional]

APA:

Sak, H., Güngör, T., & Saraçlar, M. (2008). Turkish language resources: Morphological parser, morphological disambiguator and web corpus. In Advances in natural language processing (pp. 417-427). Springer Berlin Heidelberg.

Model Card Authors [optional]

Kaan Bayar

Model Card Contact

kaan.bayar13@gmail.com