Vaibhav Srivastav

reach-vb

AI & ML interests

TTS + LM performance prediction

Recent Activity

liked a Space 31 minutes ago
ai4bharat/indic-parler-tts
liked a model 33 minutes ago
ai4bharat/indic-parler-tts
liked a model about 8 hours ago
arcee-ai/Virtuoso-Small
View all activity

Articles

Organizations

Hugging Face's profile picture Notebooks-explorers's profile picture Whisper fine-tuning sprint's profile picture Hugging Face Course's profile picture Whisper Fine-Tuning Event's profile picture Kensho's profile picture Mozilla Foundation's profile picture PolinaOrg's profile picture Coqui.ai's profile picture Internal Data & Models for Speech Recognition Event's profile picture Speech Recognition Community Event Version 2's profile picture onnx's profile picture Hugging Test Lab's profile picture Internal Data's profile picture The Team Ten's profile picture Huggingface Projects's profile picture EuroPython 2022's profile picture Whisper Distillation's profile picture BigCode's profile picture Hugging Face OSS Metrics's profile picture Harmonai's Dance Diffusion Community's profile picture EuroSciPy 2022's profile picture LaLoka Labs's profile picture Core ML Projects's profile picture meta-private's profile picture Blog-explorers's profile picture Music Gen Sprint's profile picture Hugging Face for Audio's profile picture Hugging Face TB Research's profile picture Open ASR Leaderboard's profile picture test's profile picture MusicGen Internal's profile picture TTS Eval (OLD)'s profile picture ZeroGPU Explorers's profile picture Editing Audio's profile picture ggml.ai's profile picture LocalLLaMA's profile picture gg-hf's profile picture Unofficial Mistral Community's profile picture Journalists on Hugging Face's profile picture Llzama's profile picture finding-nemo's profile picture diarizers-community's profile picture MLX Community's profile picture Cartesia's profile picture IBM Granite's profile picture On-device Squad's profile picture TTS AGI's profile picture Social Post Explorers's profile picture Apple CoreNet Models 's profile picture LM Studio Community's profile picture gg-gguf's profile picture hsramall's profile picture Parler TTS's profile picture ibm-ai-platform's profile picture Lina Speech's profile picture Dev Mode Explorers's profile picture Sweet Dream(Booth)s's profile picture private beta for deeplinks's profile picture Paris AI Running Club's profile picture gg-tt's profile picture Kyutai's profile picture OuteAI's profile picture Hugging Face Discord Community's profile picture LLHF's profile picture SLLHF's profile picture Ratchet Community's profile picture Hugging Quants's profile picture CoreML Scratchpad's profile picture blhf's profile picture kmhf's profile picture nltpt's profile picture nltpt-q's profile picture ai4b-hf's profile picture Ollama Tools's profile picture Spirit LM's profile picture qrias's profile picture Audio Collabs's profile picture Consumer AI Edge Hackathon (Meta, Hugging Face, Pytorch, Scaleway & Unaite)'s profile picture open/ acc's profile picture ExecuTorch Community's profile picture wut?'s profile picture DDUF's profile picture

Posts 11

view post
Post
2420
Massive week for Open AI/ ML:

Mistral Pixtral & Instruct Large - ~123B, 128K context, multilingual, json + function calling & open weights
mistralai/Pixtral-Large-Instruct-2411
mistralai/Mistral-Large-Instruct-2411

Allen AI Tülu 70B & 8B - competive with claude 3.5 haiku, beats all major open models like llama 3.1 70B, qwen 2.5 and nemotron
allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5
allenai/tulu-3-datasets-673b8df14442393f7213f372

Llava o1 - vlm capable of spontaneous, systematic reasoning, similar to GPT-o1, 11B model outperforms gemini-1.5-pro, gpt-4o-mini, and llama-3.2-90B-vision
Xkev/Llama-3.2V-11B-cot

Black Forest Labs Flux.1 tools - four new state of the art model checkpoints & 2 adapters for fill, depth, canny & redux, open weights
reach-vb/black-forest-labs-flux1-6743847bde9997dd26609817

Jina AI Jina CLIP v2 - general purpose multilingual and multimodal (text & image) embedding model, 900M params, 512 x 512 resolution, matroyoshka representations (1024 to 64)
jinaai/jina-clip-v2

Apple AIM v2 & CoreML MobileCLIP - large scale vision encoders outperform CLIP and SigLIP. CoreML optimised MobileCLIP models
apple/aimv2-6720fe1558d94c7805f7688c
apple/coreml-mobileclip

A lot more got released like, OpenScholar ( OpenScholar/openscholar-v1-67376a89f6a80f448da411a6), smoltalk ( HuggingFaceTB/smoltalk), Hymba ( nvidia/hymba-673c35516c12c4b98b5e845f), Open ASR Leaderboard ( hf-audio/open_asr_leaderboard) and much more..

Can't wait for the next week! 🤗
view post
Post
4225
What a brilliant week for Open Source AI!

Qwen 2.5 Coder by Alibaba - 0.5B / 1.5B / 3B / 7B / 14B/ 32B (Base + Instruct) Code generation LLMs, with 32B tackling giants like Gemnini 1.5 Pro, Claude Sonnet
Qwen/qwen25-coder-66eaa22e6f99801bf65b0c2f

LLM2CLIP from Microsoft - Leverage LLMs to train ultra-powerful CLIP models! Boosts performance over the previous SOTA by ~17%
microsoft/llm2clip-672323a266173cfa40b32d4c

Athene v2 Chat & Agent by NexusFlow - SoTA general LLM fine-tuned from Qwen 2.5 72B excels at Chat + Function Calling/ JSON/ Agents
Nexusflow/athene-v2-6735b85e505981a794fb02cc

Orca Agent Instruct by Microsoft - 1 million instruct pairs covering text editing, creative writing, coding, reading comprehension, etc - permissively licensed
microsoft/orca-agentinstruct-1M-v1

Ultravox by FixieAI - 70B/ 8B model approaching GPT4o level, pick any LLM, train an adapter with Whisper as Audio Encoder
reach-vb/ultravox-audio-language-model-release-67373b602af0a52b2a88ae71

JanusFlow 1.3 by DeepSeek - Next iteration of their Unified MultiModal LLM Janus with RectifiedFlow
deepseek-ai/JanusFlow-1.3B

Common Corpus by Pleais - 2,003,039,184,047 multilingual, commercially permissive and high quality tokens!
PleIAs/common_corpus

I'm sure I missed a lot, can't wait for the next week!

Put down in comments what I missed! 🤗