Alireza Hajebrahimi's picture

Alireza Hajebrahimi

iarata
Ā·

AI & ML interests

Meta-Learning, Few-shot Learning, Character Recognition, Diffusion Models

Recent Activity

liked a Space 3 days ago
akhaliq/anychat
reacted to m-ric's post with šŸš€ 19 days ago
šŸ’„ š—šš—¼š—¼š—“š—¹š—² š—暝—²š—¹š—²š—®š˜€š—²š˜€ š—šš—²š—ŗš—¶š—»š—¶ šŸ®.šŸ¬, š˜€š˜š—®š—暝˜š—¶š—»š—“ š˜„š—¶š˜š—µ š—® š—™š—¹š—®š˜€š—µ š—ŗš—¼š—±š—²š—¹ š˜š—µš—®š˜ š˜€š˜š—²š—®š—ŗš—暝—¼š—¹š—¹š˜€ š—šš—£š—§-šŸ°š—¼ š—®š—»š—± š—–š—¹š—®š˜‚š—±š—²-šŸÆ.šŸ² š—¦š—¼š—»š—»š—²š˜! And they start a huge effort on agentic capabilities. šŸš€ The performance improvements are crazy for such a fast model: ā€£ Gemini 2.0 Flash outperforms the previous 1.5 Pro model at twice the speed ā€£ Now supports both input AND output of images, video, audio and text ā€£ Can natively use tools like Google Search and execute code āž”ļø If the price is on par with previous Flash iteration ($0.30 / M tokens, to compare with GPT-4o's $1.25) the competition will have a big problem with this 4x cheaper model that gets better benchmarks šŸ¤Æ šŸ¤– What about the agentic capabilities? ā€£ Project Astra: A universal AI assistant that can use Google Search, Lens and Maps ā€£ Project Mariner: A Chrome extension that can complete complex web tasks (83.5% success rate on WebVoyager benchmark, this is really impressive!) ā€£ Jules: An AI coding agent that integrates with GitHub workflows I'll be eagerly awaiting further news from Google! Read their blogpost here šŸ‘‰ https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/
View all activity

Organizations

Hajebrahimi.com's profile picture

iarata's activity

reacted to m-ric's post with šŸš€ 19 days ago
view post
Post
2476
šŸ’„ š—šš—¼š—¼š—“š—¹š—² š—暝—²š—¹š—²š—®š˜€š—²š˜€ š—šš—²š—ŗš—¶š—»š—¶ šŸ®.šŸ¬, š˜€š˜š—®š—暝˜š—¶š—»š—“ š˜„š—¶š˜š—µ š—® š—™š—¹š—®š˜€š—µ š—ŗš—¼š—±š—²š—¹ š˜š—µš—®š˜ š˜€š˜š—²š—®š—ŗš—暝—¼š—¹š—¹š˜€ š—šš—£š—§-šŸ°š—¼ š—®š—»š—± š—–š—¹š—®š˜‚š—±š—²-šŸÆ.šŸ² š—¦š—¼š—»š—»š—²š˜! And they start a huge effort on agentic capabilities.

šŸš€ The performance improvements are crazy for such a fast model:
ā€£ Gemini 2.0 Flash outperforms the previous 1.5 Pro model at twice the speed
ā€£ Now supports both input AND output of images, video, audio and text
ā€£ Can natively use tools like Google Search and execute code

āž”ļø If the price is on par with previous Flash iteration ($0.30 / M tokens, to compare with GPT-4o's $1.25) the competition will have a big problem with this 4x cheaper model that gets better benchmarks šŸ¤Æ

šŸ¤– What about the agentic capabilities?

ā€£ Project Astra: A universal AI assistant that can use Google Search, Lens and Maps
ā€£ Project Mariner: A Chrome extension that can complete complex web tasks (83.5% success rate on WebVoyager benchmark, this is really impressive!)
ā€£ Jules: An AI coding agent that integrates with GitHub workflows

I'll be eagerly awaiting further news from Google!

Read their blogpost here šŸ‘‰ https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/
reacted to akhaliq's post with ā¤ļø 11 months ago
view post
Post
Here is my selection of papers for today (12 Jan)

https://huggingface.co/papers

PALP: Prompt Aligned Personalization of Text-to-Image Models

Object-Centric Diffusion for Efficient Video Editing

TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering

Diffusion Priors for Dynamic View Synthesis from Monocular Videos

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

TOFU: A Task of Fictitious Unlearning for LLMs

Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models

Secrets of RLHF in Large Language Models Part II: Reward Modeling

LEGO:Language Enhanced Multi-modal Grounding Model

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages

A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism

Towards Conversational Diagnostic AI

Transformers are Multi-State RNNs

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Distilling Vision-Language Models on Millions of Videos

Efficient LLM inference solution on Intel GPU

TrustLLM: Trustworthiness in Large Language Models