We just released Transformers.js v3.1 and you're not going to believe what's now possible in the browser w/ WebGPU! π€― Let's take a look: π Janus from Deepseek for unified multimodal understanding and generation (Text-to-Image and Image-Text-to-Text) ποΈ Qwen2-VL from Qwen for dynamic-resolution image understanding π’ JinaCLIP from Jina AI for general-purpose multilingual multimodal embeddings π LLaVA-OneVision from ByteDance for Image-Text-to-Text generation π€ΈββοΈ ViTPose for pose estimation π MGP-STR for optical character recognition (OCR) π PatchTST & PatchTSMixer for time series forecasting
That's right, everything running 100% locally in your browser (no data sent to a server)! π₯ Huge for privacy!