YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Ankahi (ΰ€ ΰ€¨ΰ€ΰ€Ήΰ₯) β AAC Voice for Indian Children with Cerebral Palsy
"The unspoken" β giving a voice to 2.5 million Indian children with CP
Hackathon: Gemma 4 Good (Kaggle Γ Google DeepMind) Tracks: Digital Equity (primary) Β· Health & Sciences Β· Google AI Edge Β· Unsloth Special Mention License: Apache 2.0
What is Ankahi?
Ankahi is an offline, per-child-personalised AAC (Augmentative & Alternative Communication) app for Indian children with cerebral palsy. It runs Gemma 4 E4B fully on-device on a ~βΉ10,000 Android tablet.
A child taps pictograms, makes sounds, or points the camera at objects. Ankahi predicts the full sentence they intended β then speaks it in the parent's own voice β in Hindi, Punjabi, Bengali, Tamil, Telugu, Marathi, or English, with natural code-switching.
The critical differentiator: per-child LoRA adapters (30MB each) that learn each child's vocabulary, syntax, and language preferences over time, trained on your H100 and deployed to the tablet.
Why this matters
- ~2.5 million Indian children have cerebral palsy
- Commercial AAC devices cost βΉ20,000ββΉ2,00,000 and speak no Indian language naturally
- Most of these children go through life without ever communicating a full sentence
- Ankahi costs one-twentieth of the cheapest alternative and runs 100% offline
Quick start (H100 training)
git clone https://github.com/your-org/ankahi
cd ankahi
bash scripts/00_env_setup.sh
# Sanity check first β never skip this
python scripts/07_overfit_sanity_check.py
# Then data
python scripts/01_download_arasaac.py
python scripts/02_download_mulberry.py
python scripts/05_generate_personas.py
python scripts/04_generate_synth_dialogues.py
python scripts/06_build_eval_sets.py
# Training
python src/ankahi/training/stage1_base.py --config configs/train_stage1_base.yaml
python src/ankahi/training/stage2_persona.py --persona ananya --config configs/train_stage2_persona_ananya.yaml
# ... repeat for all 5 personas
python src/ankahi/training/stage3_audio.py --config configs/train_stage3_audio.yaml
python src/ankahi/training/stage4_safety.py --config configs/train_stage4_safety.yaml
# Deploy
python src/ankahi/deploy/merge_and_quantize.py
python src/ankahi/deploy/convert_to_litertlm.py
System architecture
Child input (pictogram tap / camera / mic)
β
Gemma 4 E4B INT8 + per-child LoRA adapter (30MB)
β
Full sentence prediction (multilingual, code-switching aware)
β
Parent-voice TTS (AI4Bharat / svara-TTS, zero-shot cloned)
β
Speaker output + large-text screen display
Everything runs on the tablet. Zero bytes to the cloud.
5 Personas trained
| Name | Age | Languages | Disability profile | City |
|---|---|---|---|---|
| Ananya | 6 | Tamil + English | Spastic quadriplegia | Chennai |
| Arjun | 9 | Punjabi + Hindi + English | Dyskinetic CP + mild ID | Ludhiana |
| Priya | 4 | Bengali + English | CP + CVI | Kolkata |
| Rohan | 11 | Hindi + English | Athetoid CP | Delhi |
| Zara | 7 | Marathi + English | Spastic CP | Pune |
Repository layout
ankahi/
βββ scripts/ Data download, env setup, sanity checks
βββ configs/ YAML configs for each training stage
βββ src/ankahi/ Core Python package
β βββ data/ Collators, prompts, persona, augmentation
β βββ training/ Stage 1β4 training scripts
β βββ eval/ Metrics, specificity, latency evaluation
β βββ deploy/ Merge, quantise, convert to .litertlm
β βββ tts/ Parent-voice cloning + synthesis
βββ mobile/ Flutter Android app
βββ notebooks/ Colab/Jupyter notebooks for each stage
βββ writeup/ Technical report (markdown β PDF)
βββ demo/ Video script and shot list
βββ tests/ Unit tests
H100 compute budget
| Stage | Hours | Output |
|---|---|---|
| Stage 0: Sanity check | 0.5h | Go/no-go |
| Stage 1: Base multimodal SFT | 12h | base adapter (rank 16) |
| Stage 2: 5Γ persona LoRA (rank 8) | 10h | persona adapters |
| Stage 3: Audio/dysarthric adapter | 10h | audio adapter |
| Stage 4: Safety tuning | 4h | safety-merged base |
| Stage 5: Merge + quantise + convert | 4h | ankahi.litertlm (2.5GB) |
| Stage 6: TTS voice cloning Γ 3 | 3h | voice models |
| Buffer | 10h | β |
| Total | ~53h |
Evaluation highlights
- BLEU-4 + chrF++ on 500-sample held-out set per persona
- 5Γ5 adapter-specificity heatmap β proves personalisation works
- Audio disambiguation accuracy (with vs. without audio adapter)
- On-device benchmarks across 3 phone tiers (flagship / mid / budget)
- Ablations: rank, vision FT on/off, data scaling curve
Partners / data credits
- ARASAAC (pictograms, CC-BY-NC-SA)
- Mulberry Symbols (CC-BY-SA)
- AI4Bharat (Indic-TTS, 13 languages)
- TORGO + UA-Speech (dysarthric speech corpora)
- Unsloth (Gemma 4 E4B fine-tuning kernels)
Potential clinical partners: Ummeed Child Development Center, CP Guild India, SPASTN
Ethical statement
Ankahi does not make medical diagnoses. It is a communication aid. No data leaves the device. The app stores only what the family explicitly provides. All demo footage of children was recorded with full written parental consent (or substituted with adult collaborators with clear disclosure).