--- license: apache-2.0 pipeline_tag: text-classification --- # RussScholar-Seeker Model Card **RussScholar-Seeker** is a robust NLP tool designed to identify Russian scholars within academic publications effectively. ## Model Details **Developed by:** Gao Tianci **Model type:** `BertForSequenceClassification` **Languages:** Primarily English (for processing) ## Overview **RussScholar-Seeker** uses advanced machine learning techniques to analyze names within academic papers to predict their likelihood of being Russian, assisting in the broader study of geographical diversity in academic contributions. ### Intended Use - **Primary Use:** Identifying Russian names in scholarly articles. - **User Guide:** Simple API for integrating with academic databases or research platforms. ### Model Architecture Built on the BERT architecture, which has been fine-tuned for the specific task of sequence classification to predict the nationality from names. ### How It Works 1. **Input:** List of author names from academic papers. 2. **Processing:** Names are tokenized and passed through the BERT model. 3. **Output:** Each name is classified as Russian or not, based on the model's confidence. ## Model Performance High accuracy and precision demonstrated on a diverse dataset of names, ensuring reliability across various academic disciplines. ### Key Metrics - **Accuracy:** 92% - **Precision:** 90% - **Recall:** 91% These metrics were obtained using a standardized validation set that reflects a wide range of name origins. ## Ethical Considerations This model is intended purely for academic and research purposes. It is crucial to use this model responsibly and consider the broader social implications, such as privacy concerns and the potential for reinforcing stereotypes. ### Limitations The model's performance may degrade with names that do not conform to the training dataset's characteristics, particularly for non-Cyrillic names. ## Getting Started To use **RussScholar-Seeker**, install the necessary dependencies and download the model from the provided links. ```bash pip install transformers torch requests beautifulsoup4 ``` ## Training and Evaluation Data **The model was trained on a curated dataset** of over 18,000 names labeled as Russian or Non-Russian, sourced from public academic records and publications. ## Training Procedure Training involved **several rounds of tuning** to optimize both speed and accuracy, using a mix of traditional and advanced techniques such as transfer learning from pre-trained BERT models. ## Usage and Deployment Ideal for **integration into academic platforms** for real-time analysis during paper submission processes to enhance metadata quality and research analytics. ## Additional Resources For further details on implementation and integration, refer to the **full documentation** available in the [GitHub-TianciGao](https://github.com/TianciGao/RussScholar-Seeker). ## Citing RussScholar-Seeker If you find this model useful in your research, please consider citing it: ```bibtex @misc{russcholarseeker2024, title={RussScholar-Seeker: A Tool for Identifying Russian Scholars in Academic Publications}, author={Gao, Tianci}, year={2024}, publisher={GitHub}, howpublished={\url{https://github.com/TianciGao/RussScholar-Seeker}} } ```