YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Ankahi (ΰ€…ΰ€¨ΰ€•ΰ€Ήΰ₯€) β€” AAC Voice for Indian Children with Cerebral Palsy

"The unspoken" β€” giving a voice to 2.5 million Indian children with CP

Hackathon: Gemma 4 Good (Kaggle Γ— Google DeepMind) Tracks: Digital Equity (primary) Β· Health & Sciences Β· Google AI Edge Β· Unsloth Special Mention License: Apache 2.0


What is Ankahi?

Ankahi is an offline, per-child-personalised AAC (Augmentative & Alternative Communication) app for Indian children with cerebral palsy. It runs Gemma 4 E4B fully on-device on a ~β‚Ή10,000 Android tablet.

A child taps pictograms, makes sounds, or points the camera at objects. Ankahi predicts the full sentence they intended β€” then speaks it in the parent's own voice β€” in Hindi, Punjabi, Bengali, Tamil, Telugu, Marathi, or English, with natural code-switching.

The critical differentiator: per-child LoRA adapters (30MB each) that learn each child's vocabulary, syntax, and language preferences over time, trained on your H100 and deployed to the tablet.


Why this matters

  • ~2.5 million Indian children have cerebral palsy
  • Commercial AAC devices cost β‚Ή20,000–₹2,00,000 and speak no Indian language naturally
  • Most of these children go through life without ever communicating a full sentence
  • Ankahi costs one-twentieth of the cheapest alternative and runs 100% offline

Quick start (H100 training)

git clone https://github.com/your-org/ankahi
cd ankahi
bash scripts/00_env_setup.sh

# Sanity check first β€” never skip this
python scripts/07_overfit_sanity_check.py

# Then data
python scripts/01_download_arasaac.py
python scripts/02_download_mulberry.py
python scripts/05_generate_personas.py
python scripts/04_generate_synth_dialogues.py
python scripts/06_build_eval_sets.py

# Training
python src/ankahi/training/stage1_base.py --config configs/train_stage1_base.yaml
python src/ankahi/training/stage2_persona.py --persona ananya --config configs/train_stage2_persona_ananya.yaml
# ... repeat for all 5 personas
python src/ankahi/training/stage3_audio.py --config configs/train_stage3_audio.yaml
python src/ankahi/training/stage4_safety.py --config configs/train_stage4_safety.yaml

# Deploy
python src/ankahi/deploy/merge_and_quantize.py
python src/ankahi/deploy/convert_to_litertlm.py

System architecture

Child input (pictogram tap / camera / mic)
        ↓
Gemma 4 E4B INT8 + per-child LoRA adapter (30MB)
        ↓
Full sentence prediction (multilingual, code-switching aware)
        ↓
Parent-voice TTS (AI4Bharat / svara-TTS, zero-shot cloned)
        ↓
Speaker output + large-text screen display

Everything runs on the tablet. Zero bytes to the cloud.


5 Personas trained

Name Age Languages Disability profile City
Ananya 6 Tamil + English Spastic quadriplegia Chennai
Arjun 9 Punjabi + Hindi + English Dyskinetic CP + mild ID Ludhiana
Priya 4 Bengali + English CP + CVI Kolkata
Rohan 11 Hindi + English Athetoid CP Delhi
Zara 7 Marathi + English Spastic CP Pune

Repository layout

ankahi/
β”œβ”€β”€ scripts/          Data download, env setup, sanity checks
β”œβ”€β”€ configs/          YAML configs for each training stage
β”œβ”€β”€ src/ankahi/       Core Python package
β”‚   β”œβ”€β”€ data/         Collators, prompts, persona, augmentation
β”‚   β”œβ”€β”€ training/     Stage 1–4 training scripts
β”‚   β”œβ”€β”€ eval/         Metrics, specificity, latency evaluation
β”‚   β”œβ”€β”€ deploy/       Merge, quantise, convert to .litertlm
β”‚   └── tts/          Parent-voice cloning + synthesis
β”œβ”€β”€ mobile/           Flutter Android app
β”œβ”€β”€ notebooks/        Colab/Jupyter notebooks for each stage
β”œβ”€β”€ writeup/          Technical report (markdown β†’ PDF)
β”œβ”€β”€ demo/             Video script and shot list
└── tests/            Unit tests

H100 compute budget

Stage Hours Output
Stage 0: Sanity check 0.5h Go/no-go
Stage 1: Base multimodal SFT 12h base adapter (rank 16)
Stage 2: 5Γ— persona LoRA (rank 8) 10h persona adapters
Stage 3: Audio/dysarthric adapter 10h audio adapter
Stage 4: Safety tuning 4h safety-merged base
Stage 5: Merge + quantise + convert 4h ankahi.litertlm (2.5GB)
Stage 6: TTS voice cloning Γ— 3 3h voice models
Buffer 10h β€”
Total ~53h

Evaluation highlights

  • BLEU-4 + chrF++ on 500-sample held-out set per persona
  • 5Γ—5 adapter-specificity heatmap β€” proves personalisation works
  • Audio disambiguation accuracy (with vs. without audio adapter)
  • On-device benchmarks across 3 phone tiers (flagship / mid / budget)
  • Ablations: rank, vision FT on/off, data scaling curve

Partners / data credits

  • ARASAAC (pictograms, CC-BY-NC-SA)
  • Mulberry Symbols (CC-BY-SA)
  • AI4Bharat (Indic-TTS, 13 languages)
  • TORGO + UA-Speech (dysarthric speech corpora)
  • Unsloth (Gemma 4 E4B fine-tuning kernels)

Potential clinical partners: Ummeed Child Development Center, CP Guild India, SPASTN


Ethical statement

Ankahi does not make medical diagnoses. It is a communication aid. No data leaves the device. The app stores only what the family explicitly provides. All demo footage of children was recorded with full written parental consent (or substituted with adult collaborators with clear disclosure).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using bhriguverma/ankahi 1