taesiri (taesiri)

upvoted 4 papers about 7 hours ago

No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding

Paper • 2405.08344 • Published 2 days ago • 7

Understanding the performance gap between online and offline alignment algorithms

Paper • 2405.08448 • Published 2 days ago • 6

Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

Paper • 2405.08054 • Published 3 days ago • 11

VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

Paper • 2403.06098 • Published Mar 10 • 15

upvoted a paper 1 day ago

What matters when building vision-language models?

Paper • 2405.02246 • Published 13 days ago • 66

upvoted 2 papers 2 days ago

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

Paper • 2405.07990 • Published 3 days ago • 14

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Paper • 2305.11175 • Published May 18, 2023 • 2

upvoted an article 2 days ago

Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Apr 15

• 124

upvoted a collection 4 days ago

Yi-1.5 (2024/05)

Collection

6 items • Updated 4 days ago • 58

upvoted 4 papers 4 days ago

upvoted 2 collections 9 days ago

Granite Code Models

Collection

A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 10 items • Updated 4 days ago • 116

MGM

Collection

Official model collection for the paper "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" • 13 items • Updated 13 days ago • 43

upvoted 7 papers 13 days ago

FLAME: Factuality-Aware Alignment for Large Language Models

Paper • 2405.01525 • Published 14 days ago • 20

NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment

Paper • 2405.01481 • Published 14 days ago • 20

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published 17 days ago • 104

WildChat: 1M ChatGPT Interaction Logs in the Wild

Paper • 2405.01470 • Published 14 days ago • 52

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published 14 days ago • 88

PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training

Paper • 2309.10400 • Published Sep 19, 2023 • 22

KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published 16 days ago • 86

upvoted 3 papers 14 days ago

Spectrally Pruned Gaussian Fields with Neural Compensation

Paper • 2405.00676 • Published 15 days ago • 8

A Careful Examination of Large Language Model Performance on Grade School Arithmetic

Paper • 2405.00332 • Published 15 days ago • 23

Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3

Paper • 2405.00664 • Published 15 days ago • 16

upvoted 3 papers 15 days ago

Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published 16 days ago • 61

Octopus v4: Graph of language models

Paper • 2404.19296 • Published 16 days ago • 89

Extending Llama-3's Context Ten-Fold Overnight

Paper • 2404.19553 • Published 16 days ago • 26

upvoted 4 papers 16 days ago

BlenderAlchemy: Editing 3D Graphics with Vision-Language Models

Paper • 2404.17672 • Published 19 days ago • 17

Capabilities of Gemini Models in Medicine

Paper • 2404.18416 • Published 17 days ago • 21

Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations

Paper • 2404.17521 • Published 20 days ago • 12

Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

Paper • 2404.18796 • Published 17 days ago • 62

upvoted 2 papers 17 days ago

AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs

Paper • 2404.16873 • Published 24 days ago • 25

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published 20 days ago • 30

upvoted an article 17 days ago

Article

🦙⚗️ Using Llama3 and distilabel to build fine-tuning datasets

By

•

19 days ago

• 54

upvoted a paper 18 days ago

ChatAnything: Facetime Chat with LLM-Enhanced Personas

Paper • 2311.06772 • Published Nov 12, 2023 • 33

upvoted 5 papers 19 days ago

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Paper • 2404.16375 • Published 21 days ago • 14

Interactive3D: Create What You Want by Interactive 3D Generation

Paper • 2404.16510 • Published 21 days ago • 17

NeRF-XL: Scaling NeRFs with Multiple GPUs

Paper • 2404.16221 • Published 21 days ago • 11

Tele-FLM Technical Report

Paper • 2404.16645 • Published 21 days ago • 17

Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published 21 days ago • 54

upvoted 3 papers 20 days ago

Make Your LLM Fully Utilize the Context

Paper • 2404.16811 • Published 21 days ago • 50

SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension

Paper • 2404.16790 • Published 21 days ago • 7

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Paper • 2404.16821 • Published 21 days ago • 48

upvoted a paper 21 days ago

Editable Image Elements for Controllable Synthesis

Paper • 2404.16029 • Published 22 days ago • 9

upvoted 3 papers 22 days ago

Transformers Can Represent n-gram Language Models

Paper • 2404.14994 • Published 23 days ago • 17

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published 23 days ago • 120

Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published 23 days ago • 53

upvoted 6 papers 23 days ago

SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

Paper • 2404.14396 • Published 24 days ago • 17

A Multimodal Automated Interpretability Agent

Paper • 2404.14394 • Published 24 days ago • 19

How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

Paper • 2404.14047 • Published 24 days ago • 37

FlowMind: Automatic Workflow Generation with LLMs

Paper • 2404.13050 • Published Mar 17 • 32

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Paper • 2404.13208 • Published 26 days ago • 37

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published 24 days ago • 229

upvoted 2 articles 24 days ago

Article

The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare

27 days ago

• 64

Article

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

24 days ago

• 71

upvoted 4 papers 24 days ago

PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

Paper • 2404.13026 • Published 27 days ago • 21

AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation

Paper • 2404.12753 • Published 27 days ago • 38

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Paper • 2404.13013 • Published 27 days ago • 26

TextSquare: Scaling up Text-Centric Visual Instruction Tuning

Paper • 2404.12803 • Published 27 days ago • 27

taesiri PRO

AI & ML interests

Organizations

taesiri's activity

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

🦙⚗️ Using Llama3 and distilabel to build fine-tuning datasets

The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent