Upload folder using huggingface_hub
Browse files
README.md
CHANGED
@@ -14,6 +14,32 @@ library_name: transformers
|
|
14 |
**Repository on Hugging Face**: [IsmatS/xlm_roberta_large_az_ner](https://huggingface.co/IsmatS/xlm_roberta_large_az_ner)
|
15 |
**Repository on GitHub**: [Named Entity Recognition](https://github.com/Ismat-Samadov/Named_Entity_Recognition)
|
16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
## Project Overview
|
18 |
|
19 |
This project leverages `xlm-roberta-large`, a multilingual transformer model, fine-tuned for Azerbaijani Named Entity Recognition (NER). The model identifies various named entities, including persons, organizations, dates, etc., using a dataset specially designed for the Azerbaijani language.
|
|
|
14 |
**Repository on Hugging Face**: [IsmatS/xlm_roberta_large_az_ner](https://huggingface.co/IsmatS/xlm_roberta_large_az_ner)
|
15 |
**Repository on GitHub**: [Named Entity Recognition](https://github.com/Ismat-Samadov/Named_Entity_Recognition)
|
16 |
|
17 |
+
## File Structure
|
18 |
+
|
19 |
+
```plaintext
|
20 |
+
.
|
21 |
+
βββ README.md # Documentation for the project
|
22 |
+
βββ config.json # Configuration file for model deployment
|
23 |
+
βββ model-001.safetensors # Model weights in Safetensors format for safe deployment
|
24 |
+
βββ sentencepiece.bpe.model # SentencePiece model for tokenization
|
25 |
+
βββ special_tokens_map.json # Map for special tokens (e.g., <PAD>, <CLS>)
|
26 |
+
βββ tokenizer.json # JSON configuration for tokenizer
|
27 |
+
βββ tokenizer_config.json # Additional tokenizer configurations
|
28 |
+
βββ xlm_roberta_large.ipynb # Jupyter Notebook for training and experimentation
|
29 |
+
βββ xlm_roberta_large.py # Python script for training and evaluation
|
30 |
+
```
|
31 |
+
|
32 |
+
**Explanation**:
|
33 |
+
- **README.md**: Provides detailed information on the project, including setup, usage, and evaluation.
|
34 |
+
- **config.json**: Stores configuration details for model deployment, such as model parameters.
|
35 |
+
- **model-001.safetensors**: Contains model weights in a secure, efficient format.
|
36 |
+
- **sentencepiece.bpe.model**: Tokenization model used to segment sentences into subwords for `xlm-roberta-large`.
|
37 |
+
- **special_tokens_map.json**: Maps special tokens required by the tokenizer (e.g., `<PAD>` for padding).
|
38 |
+
- **tokenizer.json**: Contains the main tokenizer configuration.
|
39 |
+
- **tokenizer_config.json**: Additional configuration settings for the tokenizer.
|
40 |
+
- **xlm_roberta_large.ipynb**: A Jupyter notebook for experimenting with and training the model.
|
41 |
+
- **xlm_roberta_large.py**: Python script for training and running evaluations outside of Jupyter.
|
42 |
+
|
43 |
## Project Overview
|
44 |
|
45 |
This project leverages `xlm-roberta-large`, a multilingual transformer model, fine-tuned for Azerbaijani Named Entity Recognition (NER). The model identifies various named entities, including persons, organizations, dates, etc., using a dataset specially designed for the Azerbaijani language.
|