File size: 3,348 Bytes
ee13307
 
 
 
bff5b23
 
 
 
 
 
 
 
 
ee13307
bff5b23
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41d04d8
bff5b23
 
 
b9559c6
bff5b23
 
 
 
 
 
 
 
 
 
 
06209de
bff5b23
 
 
 
 
 
 
 
 
 
 
 
1dec2cd
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
license: apache-2.0
pipeline_tag: text-classification
---
# RussScholar-Seeker Model Card

**RussScholar-Seeker** is a robust NLP tool designed to identify Russian scholars within academic publications effectively.

## Model Details

**Developed by:** Gao Tianci  
**Model type:** `BertForSequenceClassification`  
**Languages:** Primarily English (for processing)  


## Overview

**RussScholar-Seeker** uses advanced machine learning techniques to analyze names within academic papers to predict their likelihood of being Russian, assisting in the broader study of geographical diversity in academic contributions.

### Intended Use

- **Primary Use:** Identifying Russian names in scholarly articles.
- **User Guide:** Simple API for integrating with academic databases or research platforms.

### Model Architecture

Built on the BERT architecture, which has been fine-tuned for the specific task of sequence classification to predict the nationality from names.

### How It Works

1. **Input:** List of author names from academic papers.
2. **Processing:** Names are tokenized and passed through the BERT model.
3. **Output:** Each name is classified as Russian or not, based on the model's confidence.

## Model Performance

High accuracy and precision demonstrated on a diverse dataset of names, ensuring reliability across various academic disciplines.

### Key Metrics

- **Accuracy:** 92%
- **Precision:** 90%
- **Recall:** 91%

These metrics were obtained using a standardized validation set that reflects a wide range of name origins.

## Ethical Considerations

This model is intended purely for academic and research purposes. It is crucial to use this model responsibly and consider the broader social implications, such as privacy concerns and the potential for reinforcing stereotypes.

### Limitations

The model's performance may degrade with names that do not conform to the training dataset's characteristics, particularly for non-Cyrillic names.

## Getting Started

To use **RussScholar-Seeker**, install the necessary dependencies and download the model from the provided links.

```bash
pip install transformers torch requests beautifulsoup4
```

## Training and Evaluation Data

**The model was trained on a curated dataset** of over 18,000 names labeled as Russian or Non-Russian, sourced from public academic records and publications.

## Training Procedure

Training involved **several rounds of tuning** to optimize both speed and accuracy, using a mix of traditional and advanced techniques such as transfer learning from pre-trained BERT models.

## Usage and Deployment

Ideal for **integration into academic platforms** for real-time analysis during paper submission processes to enhance metadata quality and research analytics.

## Additional Resources

For further details on implementation and integration, refer to the **full documentation** available in the [GitHub-TianciGao](https://github.com/TianciGao/RussScholar-Seeker).

## Citing RussScholar-Seeker

If you find this model useful in your research, please consider citing it:

```bibtex
@misc{russcholarseeker2024,
  title={RussScholar-Seeker: A Tool for Identifying Russian Scholars in Academic Publications},
  author={Gao, Tianci},
  year={2024},
  publisher={GitHub},
  howpublished={\url{https://github.com/TianciGao/RussScholar-Seeker}}
}
```