Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
Kaviyarasan V
kaveeshwaran
Follow
0 followers
ยท
10 following
settings
v-kaviyarasan-v-2a111525a
AI & ML interests
I want to be an AI Developer
Recent Activity
new
activity
6 days ago
huggingface/HuggingDiscussions:
[FEEDBACK] Notifications
replied
to
philschmid
's
post
6 days ago
Gemini 2.5 Flash is here! We excited launch our first hybrid reasoning Gemini model. In Flash 2.5 developer can turn thinking off. **TL;DR:** - ๐ง Controllable "Thinking" with thinking budget with up to 24k token - ๐ 1 Million multimodal input context for text, image, video, audio, and pdf - ๐ ๏ธ Function calling, structured output, google search & code execution. - ๐ฆ $0.15 1M input tokens; $0.6 or $3.5 (thinking on) per million output tokens (thinking tokens are billed as output tokens) - ๐ก Knowledge cut of January 2025 - ๐ Rate limits - Free 10 RPM 500 req/day - ๐ Outperforms 2.0 Flash on every benchmark Try it โฌ๏ธ https://aistudio.google.com/prompts/new_chat?model=gemini-2.5-flash-preview-04-17
reacted
to
m-ric
's
post
with ๐ฅ
6 days ago
New king of open VLMs: InternVL3 takes Qwen 2.5's crown! ๐ InternVL have been a wildly successful series of model : and the latest iteration has just taken back their crown thanks to their superior, natively multimodal vision training pipeline. โก๏ธ Most of the vision language models (VLMs) these days are built like Frankenstein : take a good text-only Large Language Model (LLM) backbone, stitch a specific vision transformer (ViT) on top of it. Then the training is sequential ๐ข : 1. Freeze the LLM weights while you train the ViT only to work with the LLM part, then 2. Unfreeze all weights to train all weights in order to work together. ๐ซ The Shanghai Lab decided to challenge this paradigm and chose this approach that they call "native". For each of their model sizes, they still start from a good LLM (mostly Qwen-2.5 series, did I tell you I'm a huge fan of Qwen? โค๏ธ), and stitch the ViT, but they don't freeze anything : they train all weights together with interleaved text and image understanding data in a single pre-training phase ๐จ. They claim it results in more seamless interactions between modalities. And the results prove them right: they took the crown of top VLMs, at nearly all sizes, from their Qwen-2.5 parents. ๐
View all activity
Organizations
None yet
models
1
kaveeshwaran/distilbert-base-uncased-finetuned-sst-2-english
Updated
Feb 25
datasets
1
kaveeshwaran/face_recog-doc
Updated
6 days ago
โข
57