Edit model card

cryptoNER

This model is a fine-tuned version of xlm-roberta-base on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0058
  • F1: 0.9970

Model description

This model is a fine-tuned version of xlm-roberta-base, specializing in Named Entity Recognition (NER) within the cryptocurrency domain. It is optimized to recognize and classify entities such as cryptocurrency TICKER SYMBOL, NAME, and blockscanner ADDRESS within text.

Intended uses

Designed primarily for NER tasks in the cryptocurrency sector, this model excels in identifying and categorizing ticker symbol, token name, and blockscanner address in textual content.

Limitations

Performance may be subpar when the model encounters entities outside its training data or infrequently occurring entities within the cryptocurrency domain. The model might also be susceptible to variations in entity presentation and context.

Training and evaluation data

The model was trained using a diverse dataset, including artificially generated tweets and ERC20 token metadata fetched through the Covalent API (https://www.covalenthq.com/docs/unified-api/). GPT was employed to generate 500 synthetic tweets tailored for the cryptocurrency domain. The Covalent API was instrumental in obtaining a rich set of 20K+ unique ERC20 token metadata entries, enhancing the model's understanding and recognition of cryptocurrency entities.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 6

Training results

Training Loss Epoch Step Validation Loss F1
0.0269 1.0 750 0.0080 0.9957
0.0049 2.0 1500 0.0074 0.9960
0.0042 3.0 2250 0.0074 0.9965
0.0034 4.0 3000 0.0058 0.9971
0.0028 5.0 3750 0.0059 0.9971
0.0024 6.0 4500 0.0058 0.9970

Framework versions

  • Transformers 4.34.1
  • Pytorch 2.1.0+cu118
  • Datasets 2.14.6
  • Tokenizers 0.14.1
Downloads last month
99

Finetuned from