IsmatS commited on
Commit
58bdc7c
Β·
verified Β·
1 Parent(s): 7a06de8

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +26 -0
README.md CHANGED
@@ -14,6 +14,32 @@ library_name: transformers
14
  **Repository on Hugging Face**: [IsmatS/xlm_roberta_large_az_ner](https://huggingface.co/IsmatS/xlm_roberta_large_az_ner)
15
  **Repository on GitHub**: [Named Entity Recognition](https://github.com/Ismat-Samadov/Named_Entity_Recognition)
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ## Project Overview
18
 
19
  This project leverages `xlm-roberta-large`, a multilingual transformer model, fine-tuned for Azerbaijani Named Entity Recognition (NER). The model identifies various named entities, including persons, organizations, dates, etc., using a dataset specially designed for the Azerbaijani language.
 
14
  **Repository on Hugging Face**: [IsmatS/xlm_roberta_large_az_ner](https://huggingface.co/IsmatS/xlm_roberta_large_az_ner)
15
  **Repository on GitHub**: [Named Entity Recognition](https://github.com/Ismat-Samadov/Named_Entity_Recognition)
16
 
17
+ ## File Structure
18
+
19
+ ```plaintext
20
+ .
21
+ β”œβ”€β”€ README.md # Documentation for the project
22
+ β”œβ”€β”€ config.json # Configuration file for model deployment
23
+ β”œβ”€β”€ model-001.safetensors # Model weights in Safetensors format for safe deployment
24
+ β”œβ”€β”€ sentencepiece.bpe.model # SentencePiece model for tokenization
25
+ β”œβ”€β”€ special_tokens_map.json # Map for special tokens (e.g., <PAD>, <CLS>)
26
+ β”œβ”€β”€ tokenizer.json # JSON configuration for tokenizer
27
+ β”œβ”€β”€ tokenizer_config.json # Additional tokenizer configurations
28
+ β”œβ”€β”€ xlm_roberta_large.ipynb # Jupyter Notebook for training and experimentation
29
+ └── xlm_roberta_large.py # Python script for training and evaluation
30
+ ```
31
+
32
+ **Explanation**:
33
+ - **README.md**: Provides detailed information on the project, including setup, usage, and evaluation.
34
+ - **config.json**: Stores configuration details for model deployment, such as model parameters.
35
+ - **model-001.safetensors**: Contains model weights in a secure, efficient format.
36
+ - **sentencepiece.bpe.model**: Tokenization model used to segment sentences into subwords for `xlm-roberta-large`.
37
+ - **special_tokens_map.json**: Maps special tokens required by the tokenizer (e.g., `<PAD>` for padding).
38
+ - **tokenizer.json**: Contains the main tokenizer configuration.
39
+ - **tokenizer_config.json**: Additional configuration settings for the tokenizer.
40
+ - **xlm_roberta_large.ipynb**: A Jupyter notebook for experimenting with and training the model.
41
+ - **xlm_roberta_large.py**: Python script for training and running evaluations outside of Jupyter.
42
+
43
  ## Project Overview
44
 
45
  This project leverages `xlm-roberta-large`, a multilingual transformer model, fine-tuned for Azerbaijani Named Entity Recognition (NER). The model identifies various named entities, including persons, organizations, dates, etc., using a dataset specially designed for the Azerbaijani language.