Text Generation
Transformers
Safetensors
English
Inference Endpoints
yashwardhan20417's picture
Update README.md
743becf verified
---
library_name: transformers
license: mit
datasets:
- mlsquare/CLIENT_samantar_mixed_train_val
language:
- en
pipeline_tag: text-generation
---
# Model Card for Model ID
Adapter for mlsquare/pico_seshu_test using LoRA on "model.layers.3.dt_proj". Standard use of PEFT on Mamba-hf model
## Model Details
### Model Description
- **Developed by:** MLsquare
- **Model type:** Next Character Generation
- **Language(s) (NLP):** All languages in ai4bharat/samanantar dataset
- **License:** MIT
## Model Details
### Model Description
- **Developed by:** MLsquare
- **Model type:** Next Character Generation
- **Language(s) (NLP):** All languages in ai4bharat/samanantar dataset
- **License:** MIT
### Model Sources [optional]
- **Repository:** https://github.com/LegallyCoder/mamba-hf
- **Paper:** https://arxiv.org/abs/2312.00752
## Uses
Refer to the github repository for more information
### Direct Use
Refer to the github repository for more information
## How to Get Started with the Model
Refer to the github repository: https://github.com/mlsquare/fedem
## Training Details
### Training Data
Individual target and source sentences from the AI4Bharat Samanantar dataset. All 11 language sentences and their translations have been stacked and used for next character generation task.
### Training Procedure
Trained on the next character generation task using cross-entropy loss.
#### Preprocessing [optional]
converted to raw UTF8 characters before training by using ByT5-large tokenizer
#### Training Hyperparameters
- **Training regime:**
output_dir="mamba",
per_device_train_batch_size=1,
per_device_eval_batch_size=1,
num_train_epochs=4,
weight_decay=0.1,
lr_scheduler_type="cosine",
learning_rate=5e-4,
fp16=False,
## Evaluation
A simple cross-entropy loss has been used to test the pipeline and working of the model.
## Model Card Contact
MLsquare