ojasvisingh786 (Ojasvi Singh Yadav)

upvoted 2 papers 4 days ago

3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting

Paper • 2405.18424 • Published 4 days ago • 6

ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

Paper • 2405.15738 • Published 8 days ago • 41

upvoted 7 papers 9 days ago

LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models

Paper • 2405.14477 • Published 10 days ago • 14

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Paper • 2405.14598 • Published 10 days ago • 10

upvoted a paper 12 days ago

Octo: An Open-Source Generalist Robot Policy

Paper • 2405.12213 • Published 12 days ago • 22

upvoted 2 papers 17 days ago

ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models

Paper • 2405.09220 • Published 18 days ago • 22

Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model

Paper • 2405.09215 • Published 18 days ago • 14

upvoted an article 18 days ago

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

19 days ago

• 131

upvoted 2 papers 19 days ago

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

Paper • 2405.07990 • Published 19 days ago • 15

SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

Paper • 2405.07518 • Published 20 days ago • 21

upvoted a paper 20 days ago

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Paper • 2403.13372 • Published Mar 20 • 57

upvoted a paper 26 days ago

What matters when building vision-language models?

Paper • 2405.02246 • Published 29 days ago • 87

upvoted 2 papers 30 days ago

Self-Play Preference Optimization for Language Model Alignment

Paper • 2405.00675 • Published May 1 • 18

Customizing Text-to-Image Models with a Single Image Pair

Paper • 2405.01536 • Published about 1 month ago • 17

upvoted 18 papers about 1 month ago

Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge

Paper • 2405.00263 • Published May 1 • 13

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

Paper • 2405.00233 • Published Apr 30 • 12

Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting

Paper • 2404.19758 • Published Apr 30 • 9

Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

Paper • 2404.19752 • Published Apr 30 • 19

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Paper • 2404.19427 • Published Apr 30 • 65

Stylus: Automatic Adapter Selection for Diffusion Models

Paper • 2404.18928 • Published Apr 29 • 14

Implicit Style-Content Separation using B-LoRA

Paper • 2403.14572 • Published Mar 21 • 3

DressCode: Autoregressively Sewing and Generating Garments from Text Guidance

Paper • 2401.16465 • Published Jan 29 • 9

Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting

Paper • 2404.18911 • Published Apr 29 • 26

BlenderAlchemy: Editing 3D Graphics with Vision-Language Models

Paper • 2404.17672 • Published Apr 26 • 17

Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

Paper • 2404.18796 • Published Apr 29 • 63

MaPa: Text-driven Photorealistic Material Painting for 3D Shapes

Paper • 2404.17569 • Published Apr 26 • 10

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published Apr 25 • 31

NeRF-XL: Scaling NeRFs with Multiple GPUs

Paper • 2404.16221 • Published Apr 24 • 11

Transformers Can Represent n-gram Language Models

Paper • 2404.14994 • Published Apr 23 • 18

Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

Paper • 2404.13686 • Published Apr 21 • 26

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 238

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Paper • 2404.13208 • Published Apr 19 • 37

upvoted an article about 1 month ago

Article

Welcome Llama 3 - Meta's new open LLM

Apr 18

• 245

upvoted a paper about 1 month ago

MeshLRM: Large Reconstruction Model for High-Quality Mesh

Paper • 2404.12385 • Published Apr 18 • 24

upvoted 16 papers about 2 months ago

CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting

Paper • 2404.09458 • Published Apr 15 • 6

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12 • 62

Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

Paper • 2404.09833 • Published Apr 15 • 27

TransformerFAM: Feedback attention is working memory

Paper • 2404.09173 • Published Apr 14 • 42

WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents

Paper • 2404.05902 • Published Apr 8 • 20

HGRN2: Gated Linear RNNs with State Expansion

Paper • 2404.07904 • Published Apr 11 • 16

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

Paper • 2404.07544 • Published Apr 11 • 15

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11 • 32

Transferable and Principled Efficiency for Open-Vocabulary Segmentation

Paper • 2404.07448 • Published Apr 11 • 10

Audio Dialogues: Dialogues dataset for audio and music understanding

Paper • 2404.07616 • Published Apr 11 • 14

RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9 • 32

Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior

Paper • 2404.06780 • Published Apr 10 • 9

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 93

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

Paper • 2404.07199 • Published Apr 10 • 22

Revising Densification in Gaussian Splatting

Paper • 2404.06109 • Published Apr 9 • 8

Hash3D: Training-free Acceleration for 3D Generation

Paper • 2404.06091 • Published Apr 9 • 12

upvoted an article about 2 months ago

Article

Hugging Face and AWS partner to make AI more accessible

Feb 21, 2023

• 1

upvoted a paper about 2 months ago

SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing

Paper • 2404.05717 • Published Apr 8 • 23

upvoted an article about 2 months ago

Article

CodeGemma - an official Google release for code LLMs

Apr 9

• 97

upvoted 2 papers about 2 months ago

PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

Paper • 2404.04421 • Published Apr 5 • 14

YaART: Yet Another ART Rendering Technology

Paper • 2404.05666 • Published Apr 8 • 14

Ojasvi Singh Yadav

AI & ML interests

Organizations

ojasvisingh786's activity

PaliGemma – Google's Cutting-Edge Open Vision Language Model

Welcome Llama 3 - Meta's new open LLM

Hugging Face and AWS partner to make AI more accessible

CodeGemma - an official Google release for code LLMs