Malayalam Whisper to FUTO Keyboard: Full Pipeline
This repository contains an end-to-end Jupyter Notebook pipeline designed to prepare Hugging Face Whisper models for use with the FUTO Keyboard on Android.
The pipeline specifically focuses on adapting Malayalam Whisper models (such as whisper-small-malayalam) by applying Audio Context Fine-Tuning (ACFT), converting the standard Hugging Face weights into the GGML format, and quantizing the model for efficient mobile inference.
Features
- Audio Context Fine-Tuning (ACFT): Trains the target model to handle dynamic audio contexts (under 30 seconds) without endless looping or repetition, using a frozen reference model.
- GGML Conversion: Automatically clones the required OpenAI and
whisper.cpprepositories to convert the fine-tuned.safetensorsmodel into a standard.binfile. - Mobile Quantization: Compiles the
whisper-quantizetool and generates optimized, quantized versions (e.g.,q5_0) of the model tailored for smartphone hardware constraints. - Fast Dependency Management: Utilizes
uvfor rapid package installation and environment setup within the Colab runtime.
Requirements
This notebook is designed to be executed in Google Colab to leverage cloud GPU acceleration and avoid local hardware memory constraints.
- Environment: Google Colab
- Hardware: T4 GPU (Required for the ACFT training loop to complete in a reasonable timeframe)
- Storage: Access to Google Drive (if loading local datasets like Mozilla Common Voice
.tar.gzarchives or saving the final output directly to Drive).
Usage Instructions
1. Environment Setup
- Upload the
malayalam_whisper_full_pipeline.ipynbnotebook to your Google Colab environment. - Navigate to Runtime > Change runtime type and select T4 GPU.
- Run the first execution cell to install the required dependencies (
torch,transformers,datasets,librosa, etc.) viauv.
2. Execution
Execute the notebook cells sequentially. The pipeline handles:
- Downloading the target Whisper model and the Common Voice dataset.
- Running the 500-step ACFT training loop to minimize MSE loss between the target and reference models.
- Merging the updated weights and saving the PyTorch structure.
- Running
convert-h5-to-ggml.pyto generate the baseggml-model.binfile. - Executing the
whisper.cppMakefile and generating quantized.binfiles.
Training ACFT (500 steps)... Step 0 | Loss: 1.133623 Step 50 | Loss: 0.153679 Step 100 | Loss: 0.253860 Step 150 | Loss: 0.100465 Step 200 | Loss: 0.100469 Step 250 | Loss: 0.130127 Step 300 | Loss: 0.059387 Step 350 | Loss: 0.124414 Step 400 | Loss: 0.077903 Step 450 | Loss: 0.073374
3. Deployment
Once the notebook finishes executing all quantization steps, the final models will be available in the designated /content/output/ directory (or your mounted Google Drive).
- Download the highly recommended
malayalam-futo-q5_0.binfile. - Transfer the
.binfile to your Android device's internal storage. - Open the FUTO Keyboard settings, navigate to the Voice Input section, and import the downloaded model.