Post
1221
Remember when
@Google
launched MediaPipe in an effort to create efficient on-device pipelines?
They've just unlocked the ability to run 7B+ parameter language models directly in your browser. This is a game-changer for on-device AI!
Yes, they are streaming 8.6 GB model files!
Currently, they have Gemma 2B/7B running, but imagine Dynamic LoRA, multimodal support, quantization, and you never leaving Chrome!
This is a significant technical advancement, especially in Memory Optimization:
- Redesigned the model-loading code to work around WebAssembly's 4 GB memory limit.
- Implemented asynchronous loading of transformer stack layers (28 for Gemma 1.1 7B).
- Reduced peak WebAssembly memory usage to less than 1% of previous requirements.
Cross-Platform Compatibility
- Compiled the C++ codebase to WebAssembly for broad browser support.
- Utilized the WebGPU API for native GPU acceleration in browsers.
Here's why this matters:
1. Privacy: No need to send data to remote servers.
2. Cost-Efficiency: Eliminates server expenses.
3. Offline Capabilities: Use powerful AI without an internet connection.
Blog: https://research.google/blog/unlocking-7b-language-models-in-your-browser-a-deep-dive-with-google-ai-edges-mediapipe/
They've just unlocked the ability to run 7B+ parameter language models directly in your browser. This is a game-changer for on-device AI!
Yes, they are streaming 8.6 GB model files!
Currently, they have Gemma 2B/7B running, but imagine Dynamic LoRA, multimodal support, quantization, and you never leaving Chrome!
This is a significant technical advancement, especially in Memory Optimization:
- Redesigned the model-loading code to work around WebAssembly's 4 GB memory limit.
- Implemented asynchronous loading of transformer stack layers (28 for Gemma 1.1 7B).
- Reduced peak WebAssembly memory usage to less than 1% of previous requirements.
Cross-Platform Compatibility
- Compiled the C++ codebase to WebAssembly for broad browser support.
- Utilized the WebGPU API for native GPU acceleration in browsers.
Here's why this matters:
1. Privacy: No need to send data to remote servers.
2. Cost-Efficiency: Eliminates server expenses.
3. Offline Capabilities: Use powerful AI without an internet connection.
Blog: https://research.google/blog/unlocking-7b-language-models-in-your-browser-a-deep-dive-with-google-ai-edges-mediapipe/