|
--- |
|
library_name: transformers |
|
license: mit |
|
datasets: |
|
- mlsquare/CLIENT_samantar_mixed_train_val |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
Adapter for mlsquare/pico_seshu_test using LoRA on "model.layers.3.dt_proj". Standard use of PEFT on Mamba-hf model |
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Developed by:** MLsquare |
|
- **Model type:** Next Character Generation |
|
- **Language(s) (NLP):** All languages in ai4bharat/samanantar dataset |
|
- **License:** MIT |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Developed by:** MLsquare |
|
- **Model type:** Next Character Generation |
|
- **Language(s) (NLP):** All languages in ai4bharat/samanantar dataset |
|
- **License:** MIT |
|
|
|
### Model Sources [optional] |
|
|
|
- **Repository:** https://github.com/LegallyCoder/mamba-hf |
|
- **Paper:** https://arxiv.org/abs/2312.00752 |
|
|
|
## Uses |
|
|
|
Refer to the github repository for more information |
|
### Direct Use |
|
Refer to the github repository for more information |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
Refer to the github repository: https://github.com/mlsquare/fedem |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
Individual target and source sentences from the AI4Bharat Samanantar dataset. All 11 language sentences and their translations have been stacked and used for next character generation task. |
|
|
|
### Training Procedure |
|
|
|
Trained on the next character generation task using cross-entropy loss. |
|
|
|
#### Preprocessing [optional] |
|
|
|
converted to raw UTF8 characters before training by using ByT5-large tokenizer |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** |
|
output_dir="mamba", |
|
per_device_train_batch_size=1, |
|
per_device_eval_batch_size=1, |
|
num_train_epochs=4, |
|
weight_decay=0.1, |
|
lr_scheduler_type="cosine", |
|
learning_rate=5e-4, |
|
fp16=False, |
|
|
|
## Evaluation |
|
|
|
A simple cross-entropy loss has been used to test the pipeline and working of the model. |
|
|
|
|
|
## Model Card Contact |
|
|
|
MLsquare |