Instructions to use Ranjit/moshiko-kame-hinglish-ft-exp with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ranjit/moshiko-kame-hinglish-ft-exp with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Ranjit/moshiko-kame-hinglish-ft-exp", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Model Card
A multimodal text + audio model currently in training. This card documents the in-progress run; metrics and details will be updated when training completes.
Status: ๐ข Training in progress โ ~38% complete (Epoch 2 of 3, step ~12,510 of ~32,500). No instability observed.
Model Details
- Modalities: Joint text and audio
- Objective: Combined text + audio loss
- Status: Mid-training checkpoint (not final)
Training Procedure
Configuration
Experimental Setup here: https://github.com/Ranjit246/duplex-model-exp/tree/hinglish-indic-adaptation-kame-moshi (self-exploratory)
| Setting | Value |
|---|---|
| Epochs | 3 |
| Examples per epoch | 86,671 |
| Micro-batch size | 1 |
| Gradient accumulation | 8 |
| Effective batch size | 8 |
| Steps per epoch | ~10,834 |
| Total planned steps | ~32,500 |
| Learning rate | 3e-5 |
| LR schedule | WarmupLR (linear warmup to 3e-5 by ~step 110, held flat, no decay) |
| Checkpoint interval | Every 500 steps |
| Throughput | ~10 sec/step (+ ~5โ6 min checkpoint stall per 500 steps) |
Run Timeline
- Started: 2026-05-23 20:37
- Last logged: 2026-06-01 04:03 (~8.5 days elapsed, still running)
- Estimated remaining: ~20,000 steps, on the order of a couple more days
- Checkpoints retained:
step_12000,step_12500(older checkpoints rotated out)
Loss Curve
Loss averaged per 1,000 steps. The large initial drop occurs during warmup; thereafter both losses grind down steadily, with text falling faster than audio. Audio loss is the harder signal and is plateauing around ~1.4โ1.5.
| Step range | Total | Text | Audio |
|---|---|---|---|
| 0โ999 | 4.65 | 2.12 | 2.53 |
| 2kโ3k | 2.77 | 1.16 | 1.61 |
| 5kโ6k | 2.65 | 1.13 | 1.51 |
| 8kโ9k | 2.52 | 1.02 | 1.50 |
| 9kโ10k | 2.34 | 0.93 | 1.41 |
| 11kโ12k | 2.47 | 1.04 | 1.42 |
| 12k+ | 2.14 | 0.84 | 1.30 |
Notes on loss: Per-step loss is noisy (individual steps swing from ~0.17 to ~4.4), which is expected with micro-batch=1 and grad-accum to an effective batch of 8. The binned per-1,000-step averages are the meaningful view of the trend.
Stability
No NaN, no OOM, no exceptions, and no tracebacks across the full run. Training is progressing normally with loss still trending down.
Intended Use
This is an intermediate training artifact. The final model and evaluation results are not yet available. Use mid-training checkpoints only for monitoring or experimentation, not for production.
Limitations
- Training is not complete; performance will continue to change.
- No formal evaluation has been run yet.
- Audio loss is plateauing higher than text loss, reflecting the greater difficulty of the audio signal.
- Downloads last month
- 647