IsmatS
/

xlm_roberta_large_az_ner

Token Classification

Inference Endpoints

Model card Files Files and versions Community

IsmatS commited on Nov 5, 2024

Commit

58bdc7c

·

verified ·

1 Parent(s): 7a06de8

Upload folder using huggingface_hub

Files changed (1) hide show

README.md +26 -0

README.md CHANGED Viewed

@@ -14,6 +14,32 @@ library_name: transformers
 **Repository on Hugging Face**: [IsmatS/xlm_roberta_large_az_ner](https://huggingface.co/IsmatS/xlm_roberta_large_az_ner)
 **Repository on GitHub**: [Named Entity Recognition](https://github.com/Ismat-Samadov/Named_Entity_Recognition)
 ## Project Overview
 This project leverages `xlm-roberta-large`, a multilingual transformer model, fine-tuned for Azerbaijani Named Entity Recognition (NER). The model identifies various named entities, including persons, organizations, dates, etc., using a dataset specially designed for the Azerbaijani language.

 **Repository on Hugging Face**: [IsmatS/xlm_roberta_large_az_ner](https://huggingface.co/IsmatS/xlm_roberta_large_az_ner)
 **Repository on GitHub**: [Named Entity Recognition](https://github.com/Ismat-Samadov/Named_Entity_Recognition)
+## File Structure
+```plaintext
+.
+├── README.md                   # Documentation for the project
+├── config.json                 # Configuration file for model deployment
+├── model-001.safetensors       # Model weights in Safetensors format for safe deployment
+├── sentencepiece.bpe.model     # SentencePiece model for tokenization
+├── special_tokens_map.json     # Map for special tokens (e.g., <PAD>, <CLS>)
+├── tokenizer.json              # JSON configuration for tokenizer
+├── tokenizer_config.json       # Additional tokenizer configurations
+├── xlm_roberta_large.ipynb     # Jupyter Notebook for training and experimentation
+└── xlm_roberta_large.py        # Python script for training and evaluation
+```
+**Explanation**:
+- **README.md**: Provides detailed information on the project, including setup, usage, and evaluation.
+- **config.json**: Stores configuration details for model deployment, such as model parameters.
+- **model-001.safetensors**: Contains model weights in a secure, efficient format.
+- **sentencepiece.bpe.model**: Tokenization model used to segment sentences into subwords for `xlm-roberta-large`.
+- **special_tokens_map.json**: Maps special tokens required by the tokenizer (e.g., `<PAD>` for padding).
+- **tokenizer.json**: Contains the main tokenizer configuration.
+- **tokenizer_config.json**: Additional configuration settings for the tokenizer.
+- **xlm_roberta_large.ipynb**: A Jupyter notebook for experimenting with and training the model.
+- **xlm_roberta_large.py**: Python script for training and running evaluations outside of Jupyter.
 ## Project Overview
 This project leverages `xlm-roberta-large`, a multilingual transformer model, fine-tuned for Azerbaijani Named Entity Recognition (NER). The model identifies various named entities, including persons, organizations, dates, etc., using a dataset specially designed for the Azerbaijani language.