Edit model card

RussScholar-Seeker Model Card

RussScholar-Seeker is a robust NLP tool designed to identify Russian scholars within academic publications effectively.

Model Details

Developed by: Gao Tianci
Model type: BertForSequenceClassification
Languages: Primarily English (for processing)

Overview

RussScholar-Seeker uses advanced machine learning techniques to analyze names within academic papers to predict their likelihood of being Russian, assisting in the broader study of geographical diversity in academic contributions.

Intended Use

  • Primary Use: Identifying Russian names in scholarly articles.
  • User Guide: Simple API for integrating with academic databases or research platforms.

Model Architecture

Built on the BERT architecture, which has been fine-tuned for the specific task of sequence classification to predict the nationality from names.

How It Works

  1. Input: List of author names from academic papers.
  2. Processing: Names are tokenized and passed through the BERT model.
  3. Output: Each name is classified as Russian or not, based on the model's confidence.

Model Performance

High accuracy and precision demonstrated on a diverse dataset of names, ensuring reliability across various academic disciplines.

Key Metrics

  • Accuracy: 92%
  • Precision: 90%
  • Recall: 91%

These metrics were obtained using a standardized validation set that reflects a wide range of name origins.

Ethical Considerations

This model is intended purely for academic and research purposes. It is crucial to use this model responsibly and consider the broader social implications, such as privacy concerns and the potential for reinforcing stereotypes.

Limitations

The model's performance may degrade with names that do not conform to the training dataset's characteristics, particularly for non-Cyrillic names.

Getting Started

To use RussScholar-Seeker, install the necessary dependencies and download the model from the provided links.

pip install transformers torch requests beautifulsoup4

Training and Evaluation Data

The model was trained on a curated dataset of over 18,000 names labeled as Russian or Non-Russian, sourced from public academic records and publications.

Training Procedure

Training involved several rounds of tuning to optimize both speed and accuracy, using a mix of traditional and advanced techniques such as transfer learning from pre-trained BERT models.

Usage and Deployment

Ideal for integration into academic platforms for real-time analysis during paper submission processes to enhance metadata quality and research analytics.

Additional Resources

For further details on implementation and integration, refer to the full documentation available in the GitHub-TianciGao.

Citing RussScholar-Seeker

If you find this model useful in your research, please consider citing it:

@misc{russcholarseeker2024,
  title={RussScholar-Seeker: A Tool for Identifying Russian Scholars in Academic Publications},
  author={Gao, Tianci},
  year={2024},
  publisher={GitHub},
  howpublished={\url{https://github.com/TianciGao/RussScholar-Seeker}}
}
Downloads last month
601
Safetensors
Model size
109M params
Tensor type
F32
·