diff --git "a/daily_papers_enriched (3).csv" "b/daily_papers_enriched (3).csv" new file mode 100644--- /dev/null +++ "b/daily_papers_enriched (3).csv" @@ -0,0 +1,3188 @@ +date,arxiv_id,github,title,paper_page,upvotes,num_comments,hf_mention,num_models,num_datasets,num_spaces +2024-07-19,2406.13897,,CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets,https://huggingface.co/papers/2406.13897,4,2,0,0,0,0 +2024-07-19,2407.12982,,Retrieval-Enhanced Machine Learning: Synthesis and Opportunities,https://huggingface.co/papers/2407.12982,4,2,0,0,0,0 +2024-07-19,2407.13481,,Attention Overflow: Language Model Input Blur during Long-Context Missing Items Recommendation,https://huggingface.co/papers/2407.13481,5,2,0,0,0,0 +2024-07-19,2407.12883,,BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval,https://huggingface.co/papers/2407.12883,4,2,0,0,1,0 +2024-07-19,2406.07057,,Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study,https://huggingface.co/papers/2406.07057,8,2,0,0,0,0 +2024-07-19,2407.13764,,Shape of Motion: 4D Reconstruction from a Single Video,https://huggingface.co/papers/2407.13764,14,2,0,0,0,0 +2024-07-19,2407.13739,,Scaling Granite Code Models to 128K Context,https://huggingface.co/papers/2407.13739,10,2,0,0,0,0 +2024-07-19,2407.13709,,Understanding Reference Policies in Direct Preference Optimization,https://huggingface.co/papers/2407.13709,11,3,0,0,0,0 +2024-07-19,2407.13696,https://github.com/ibm/benchbench,Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation,https://huggingface.co/papers/2407.13696,2,3,0,0,0,2 +2024-07-19,2407.13638,,A Comparative Study on Automatic Coding of Medical Letters with Explainability,https://huggingface.co/papers/2407.13638,4,2,0,0,0,0 +2024-07-19,2407.13244,,PM-LLM-Benchmark: Evaluating Large Language Models on Process Mining Tasks,https://huggingface.co/papers/2407.13244,2,2,0,0,0,0 +2024-07-19,2407.12854,https://github.com/rulinshao/retrieval-scaling,Scaling Retrieval-Based Language Models with a Trillion-Token Datastore,https://huggingface.co/papers/2407.12854,25,2,1,0,0,0 +2024-07-19,2407.13759,,Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion,https://huggingface.co/papers/2407.13759,12,2,0,0,0,0 +2024-07-19,2407.10424,,CodeV: Empowering LLMs for Verilog Generation through Multi-Level Summarization,https://huggingface.co/papers/2407.10424,3,2,0,0,0,0 +2024-07-19,2407.13623,,Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies,https://huggingface.co/papers/2407.13623,33,2,0,2,0,0 +2024-07-18,2407.12581,https://github.com/py85252876/uvd,Towards Understanding Unsafe Video Generation,https://huggingface.co/papers/2407.12581,0,2,1,0,1,0 +2024-07-18,2407.12366,https://github.com/gengzezhou/navgpt-2,NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models,https://huggingface.co/papers/2407.12366,3,2,0,0,1,0 +2024-07-18,2407.11298,,ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter,https://huggingface.co/papers/2407.11298,3,2,0,0,0,0 +2024-07-18,2407.10223,,Practical Unlearning for Large Language Models,https://huggingface.co/papers/2407.10223,2,2,0,0,0,0 +2024-07-18,2407.11854,,Zero-shot Cross-Lingual Transfer for Synthetic Data Generation in Grammatical Error Detection,https://huggingface.co/papers/2407.11854,2,2,0,0,0,0 +2024-07-18,2407.09018,,AUITestAgent: Automatic Requirements Oriented GUI Function Testing,https://huggingface.co/papers/2407.09018,4,2,0,0,0,0 +2024-07-18,2407.12784,https://github.com/BillChan226/AgentPoison,AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases,https://huggingface.co/papers/2407.12784,42,2,1,1,0,0 +2024-07-18,2407.12077,https://github.com/recursal/GoldFinch-paper,GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression,https://huggingface.co/papers/2407.12077,42,5,1,1,0,0 +2024-07-18,2407.12504,https://github.com/choosewhatulike/case2code,Case2Code: Learning Inductive Reasoning with Synthetic Data,https://huggingface.co/papers/2407.12504,5,4,0,0,0,0 +2024-07-18,2407.12043,,The Art of Saying No: Contextual Noncompliance in Language Models,https://huggingface.co/papers/2407.12043,4,2,0,0,1,0 +2024-07-18,2407.12327,https://github.com/nolanoorg/spectrasuite,"Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models",https://huggingface.co/papers/2407.12327,59,2,1,0,0,0 +2024-07-18,2407.12781,,VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control,https://huggingface.co/papers/2407.12781,10,3,0,0,0,0 +2024-07-18,2407.12306,,Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections,https://huggingface.co/papers/2407.12306,5,2,0,0,0,0 +2024-07-18,2407.12563,,Audio Conditioning for Music Generation via Discrete Bottleneck Features,https://huggingface.co/papers/2407.12563,5,2,0,0,0,0 +2024-07-18,2407.12580,https://github.com/kongds/e5-v,E5-V: Universal Embeddings with Multimodal Large Language Models,https://huggingface.co/papers/2407.12580,31,3,1,1,0,0 +2024-07-18,2407.12665,https://github.com/shaochenze/patchtrain,Patch-Level Training for Large Language Models,https://huggingface.co/papers/2407.12665,14,3,1,0,0,0 +2024-07-18,2407.12679,,Goldfish: Vision-Language Understanding of Arbitrarily Long Videos,https://huggingface.co/papers/2407.12679,5,2,0,0,1,0 +2024-07-18,2407.12705,https://github.com/muzishen/imagdressing,IMAGDressing-v1: Customizable Virtual Dressing,https://huggingface.co/papers/2407.12705,6,2,1,1,0,0 +2024-07-18,2407.12772,https://github.com/evolvinglmms-lab/lmms-eval,LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models,https://huggingface.co/papers/2407.12772,24,2,1,0,1,0 +2024-07-17,2407.11062,,EfficientQAT: Efficient Quantization-Aware Training for Large Language Models,https://huggingface.co/papers/2407.11062,3,2,0,28,0,0 +2024-07-17,2407.11282,https://github.com/qcznlp/uncertainty_attack,Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models,https://huggingface.co/papers/2407.11282,1,2,0,0,0,0 +2024-07-17,2407.11828,https://github.com/jhauret/vibravox,Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors,https://huggingface.co/papers/2407.11828,4,2,1,13,1,0 +2024-07-17,2407.11522,,FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models,https://huggingface.co/papers/2407.11522,8,2,0,0,1,0 +2024-07-17,2407.10957,,Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes,https://huggingface.co/papers/2407.10957,23,3,0,0,0,0 +2024-07-17,2407.11691,https://github.com/open-compass/vlmevalkit,VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models,https://huggingface.co/papers/2407.11691,11,2,1,0,0,0 +2024-07-17,2407.11895,,OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces,https://huggingface.co/papers/2407.11895,7,2,0,0,0,0 +2024-07-17,2407.10718,https://github.com/ag2s1/sibyl-system,Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning,https://huggingface.co/papers/2407.10718,11,2,1,0,0,0 +2024-07-17,2407.11239,https://github.com/vita-group/welore,From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients,https://huggingface.co/papers/2407.11239,5,2,1,0,0,0 +2024-07-17,2407.11385,,Grasping Diverse Objects with Simulated Humanoids,https://huggingface.co/papers/2407.11385,4,2,0,0,0,0 +2024-07-17,2407.11394,https://github.com/kaist-cvml-lab/DreamCatalyst,DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation,https://huggingface.co/papers/2407.11394,10,2,0,0,0,0 +2024-07-17,2407.11398,,Animate3D: Animating Any 3D Model with Multi-view Video Diffusion,https://huggingface.co/papers/2407.11398,7,2,0,0,0,0 +2024-07-17,2407.11633,https://github.com/feizc/dit-moe,Scaling Diffusion Transformers to 16 Billion Parameters,https://huggingface.co/papers/2407.11633,21,2,1,0,0,0 +2024-07-17,2407.11793,,Click-Gaussian: Interactive Segmentation to Any 3D Gaussians,https://huggingface.co/papers/2407.11793,3,2,0,0,0,0 +2024-07-17,2407.11963,https://github.com/open-compass/opencompass,NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?,https://huggingface.co/papers/2407.11963,37,3,1,0,0,0 +2024-07-17,2407.11966,,Efficient Training with Denoised Neural Weights,https://huggingface.co/papers/2407.11966,7,3,0,0,0,0 +2024-07-17,2407.11784,https://github.com/modelscope/data-juicer,Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development,https://huggingface.co/papers/2407.11784,4,2,1,0,0,0 +2024-07-17,2407.10759,https://github.com/qwenlm/qwen2-audio,Qwen2-Audio Technical Report,https://huggingface.co/papers/2407.10759,29,2,1,0,0,0 +2024-07-17,2407.11144,https://github.com/google-research/google-research,"YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus",https://huggingface.co/papers/2407.11144,7,4,0,0,1,0 +2024-07-16,2407.10817,,Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation,https://huggingface.co/papers/2407.10817,10,2,0,0,0,0 +2024-07-16,2407.10956,https://github.com/xlang-ai/spider2-v,Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?,https://huggingface.co/papers/2407.10956,5,2,1,0,0,0 +2024-07-16,2407.10827,,LLM Circuit Analyses Are Consistent Across Training and Scale,https://huggingface.co/papers/2407.10827,4,2,0,0,0,0 +2024-07-16,2407.10953,,MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models,https://huggingface.co/papers/2407.10953,4,2,0,2,2,0 +2024-07-16,2407.10910,https://github.com/explainableml/datadream,DataDream: Few-shot Guided Dataset Generation,https://huggingface.co/papers/2407.10910,6,2,0,0,0,0 +2024-07-16,2407.10969,,Q-Sparse: All Large Language Models can be Fully Sparsely-Activated,https://huggingface.co/papers/2407.10969,16,3,0,0,0,0 +2024-07-16,2407.10058,https://github.com/zhliu0106/learning-to-refuse,Learning to Refuse: Towards Mitigating Privacy Risks in LLMs,https://huggingface.co/papers/2407.10058,28,3,1,0,1,0 +2024-07-16,2407.10943,https://github.com/openrobotlab/grutopia,GRUtopia: Dream General Robots in a City at Scale,https://huggingface.co/papers/2407.10943,20,2,0,0,0,0 +2024-07-16,2407.10362,,LAB-Bench: Measuring Capabilities of Language Models for Biology Research,https://huggingface.co/papers/2407.10362,4,2,0,0,1,0 +2024-07-16,2407.10387,,Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity,https://huggingface.co/papers/2407.10387,5,2,0,0,0,0 +2024-07-16,2407.07523,https://github.com/paranioar/sherl,SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning,https://huggingface.co/papers/2407.07523,4,2,0,0,0,0 +2024-07-16,2407.10285,https://github.com/yangqy1110/nc-sdedit,Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models,https://huggingface.co/papers/2407.10285,4,2,0,0,0,0 +2024-07-16,2407.10973,,Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion,https://huggingface.co/papers/2407.10973,8,2,0,0,0,0 +2024-07-16,2407.09533,https://github.com/manantomar/video-occupancy-models,Video Occupancy Models,https://huggingface.co/papers/2407.09533,5,2,1,1,1,0 +2024-07-16,2407.10671,https://github.com/qwenlm/qwen2,Qwen2 Technical Report,https://huggingface.co/papers/2407.10671,141,3,1,0,0,4 +2024-07-16,2407.10457,https://github.com/yifan-song793/goodbadgreedy,"The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism",https://huggingface.co/papers/2407.10457,19,3,1,0,0,0 +2024-07-15,2407.09732,https://github.com/xi-j/mamba-asr,"Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis",https://huggingface.co/papers/2407.09732,7,2,0,0,0,0 +2024-07-15,2407.07874,,Toto: Time Series Optimized Transformer for Observability,https://huggingface.co/papers/2407.07874,27,3,0,0,0,0 +2024-07-15,2407.06397,,RRM: Relightable assets using Radiance guided Material extraction,https://huggingface.co/papers/2407.06397,3,2,0,0,0,0 +2024-07-15,2407.09413,https://github.com/google/spiqa,SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers,https://huggingface.co/papers/2407.09413,9,3,1,0,0,0 +2024-07-15,2407.08770,https://github.com/lucywang720/model-surgery,Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing,https://huggingface.co/papers/2407.08770,16,2,1,0,0,0 +2024-07-15,2407.09121,https://github.com/robustnlp/derta,Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training,https://huggingface.co/papers/2407.09121,4,2,1,4,0,0 +2024-07-15,2407.09072,,New Desiderata for Direct Preference Optimization,https://huggingface.co/papers/2407.09072,6,2,0,0,0,0 +2024-07-15,2407.08892,,Characterizing Prompt Compression Methods for Long Context Inference,https://huggingface.co/papers/2407.08892,5,2,0,0,0,0 +2024-07-15,2407.09012,,TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models,https://huggingface.co/papers/2407.09012,8,2,0,0,0,0 +2024-07-15,2407.09473,,StyleSplat: 3D Object Style Transfer with Gaussian Splatting,https://huggingface.co/papers/2407.09473,10,3,0,0,0,0 +2024-07-15,2406.02265,https://github.com/lyan62/RobustCap,Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning,https://huggingface.co/papers/2406.02265,5,2,0,0,0,0 +2024-07-15,2407.09276,,H2O-Danube3 Technical Report,https://huggingface.co/papers/2407.09276,18,2,0,4,0,1 +2024-07-15,2407.09298,,Transformer Layers as Painters,https://huggingface.co/papers/2407.09298,11,2,0,0,0,0 +2024-07-15,2407.09435,,MUSCLE: A Model Update Strategy for Compatible LLM Evolution,https://huggingface.co/papers/2407.09435,18,2,0,0,0,0 +2024-07-15,2407.09450,,Human-like Episodic Memory for Infinite Context LLMs,https://huggingface.co/papers/2407.09450,50,4,0,0,0,0 +2024-07-15,2407.09025,,SpreadsheetLLM: Encoding Spreadsheets for Large Language Models,https://huggingface.co/papers/2407.09025,102,6,0,0,0,0 +2024-07-15,2407.09388,,GAVEL: Generating Games Via Evolution and Language Models,https://huggingface.co/papers/2407.09388,12,2,0,0,0,0 +2024-07-12,2407.06946,https://github.com/trdavidson/self-recognition,Self-Recognition in Language Models,https://huggingface.co/papers/2407.06946,20,2,0,0,0,0 +2024-07-12,2407.07176,,Scaling Up Personalized Aesthetic Assessment via Task Vector Customization,https://huggingface.co/papers/2407.07176,3,2,0,0,0,0 +2024-07-12,2407.07053,https://github.com/zwq2018/multi-modal-self-instruct,Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model,https://huggingface.co/papers/2407.07053,37,3,1,0,1,0 +2024-07-12,2407.08733,,Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist,https://huggingface.co/papers/2407.08733,18,4,0,0,1,0 +2024-07-12,2407.08642,,Towards Building Specialized Generalist AI with System 1 and System 2 Fusion,https://huggingface.co/papers/2407.08642,9,2,0,0,0,0 +2024-07-12,2407.08726,,Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data,https://huggingface.co/papers/2407.08726,8,2,0,0,0,0 +2024-07-12,2407.08296,https://github.com/VITA-Group/Q-GaLore,Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients,https://huggingface.co/papers/2407.08296,28,3,0,0,0,0 +2024-07-12,2407.08713,https://github.com/open-compass/GTA,GTA: A Benchmark for General Tool Agents,https://huggingface.co/papers/2407.08713,9,2,1,0,1,0 +2024-07-12,2407.08447,,WildGaussians: 3D Gaussian Splatting in the Wild,https://huggingface.co/papers/2407.08447,7,2,0,0,0,0 +2024-07-12,2407.08551,,Autoregressive Speech Synthesis without Vector Quantization,https://huggingface.co/papers/2407.08551,12,2,0,0,0,0 +2024-07-12,2407.08680,,Generalizable Implicit Motion Modeling for Video Frame Interpolation,https://huggingface.co/papers/2407.08680,7,2,0,0,0,0 +2024-07-12,2407.08683,https://github.com/tencentarc/seed-story,SEED-Story: Multimodal Long Story Generation with Large Language Model,https://huggingface.co/papers/2407.08683,18,5,1,1,1,0 +2024-07-12,2407.08701,,Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models,https://huggingface.co/papers/2407.08701,8,2,0,1,0,0 +2024-07-12,2407.08711,,OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects,https://huggingface.co/papers/2407.08711,5,2,0,0,0,0 +2024-07-12,2407.08737,https://github.com/mihirp1998/vader,Video Diffusion Alignment via Reward Gradients,https://huggingface.co/papers/2407.08737,41,2,0,2,0,1 +2024-07-12,2407.08083,https://github.com/nvlabs/mambavision,MambaVision: A Hybrid Mamba-Transformer Vision Backbone,https://huggingface.co/papers/2407.08083,19,3,1,6,0,0 +2024-07-12,2407.08250,https://github.com/nvlabs/gbrl_sb3,Gradient Boosting Reinforcement Learning,https://huggingface.co/papers/2407.08250,10,2,0,0,0,0 +2024-07-12,2407.08303,https://github.com/baaivision/densefusion,DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception,https://huggingface.co/papers/2407.08303,17,2,1,0,1,0 +2024-07-12,2407.08583,https://github.com/modelscope/data-juicer,The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective,https://huggingface.co/papers/2407.08583,10,3,1,0,0,0 +2024-07-12,2407.08348,,Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On,https://huggingface.co/papers/2407.08348,46,4,0,0,0,0 +2024-07-12,2407.08739,https://github.com/zrrskywalker/mavis,MAVIS: Mathematical Visual Instruction Tuning,https://huggingface.co/papers/2407.08739,26,3,1,0,2,0 +2024-07-11,2407.06188,,CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation,https://huggingface.co/papers/2407.06188,1,1,0,0,0,0 +2024-07-11,2407.08674,,Still-Moving: Customized Video Generation without Customized Video Data,https://huggingface.co/papers/2407.08674,10,2,0,0,0,0 +2024-07-11,2302.06555,https://github.com/jiaangli/vlca,Do Vision and Language Models Share Concepts? A Vector Space Alignment Study,https://huggingface.co/papers/2302.06555,7,2,1,0,2,0 +2024-07-11,2407.05530,,This&That: Language-Gesture Controlled Video Generation for Robot Planning,https://huggingface.co/papers/2407.05530,3,1,0,0,0,0 +2024-07-11,2407.07315,,CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging,https://huggingface.co/papers/2407.07315,4,1,0,0,0,0 +2024-07-11,2407.07565,,On Leakage of Code Generation Evaluation Datasets,https://huggingface.co/papers/2407.07565,4,3,0,0,1,0 +2024-07-11,2407.05528,,An accurate detection is not all you need to combat label noise in web-noisy datasets,https://huggingface.co/papers/2407.05528,2,1,0,0,0,0 +2024-07-11,2407.07788,,BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark,https://huggingface.co/papers/2407.07788,1,1,0,0,0,0 +2024-07-11,2407.07860,,Controlling Space and Time with Diffusion Models,https://huggingface.co/papers/2407.07860,15,1,0,0,0,0 +2024-07-11,2407.07304,https://github.com/intel/xfastertransformer,Inference Performance Optimization for Large Language Models on CPUs,https://huggingface.co/papers/2407.07304,47,4,1,0,0,0 +2024-07-11,2407.07726,https://github.com/google-research/big_vision,PaliGemma: A versatile 3B VLM for transfer,https://huggingface.co/papers/2407.07726,58,4,0,100,0,21 +2024-07-11,2407.07895,https://github.com/LLaVA-VL/LLaVA-NeXT,"LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models",https://huggingface.co/papers/2407.07895,34,2,1,6,0,12 +2024-07-11,2407.07667,,VEnhancer: Generative Space-Time Enhancement for Video Generation,https://huggingface.co/papers/2407.07667,8,1,0,0,0,0 +2024-07-11,2407.07464,,Video-to-Audio Generation with Hidden Alignment,https://huggingface.co/papers/2407.07464,11,2,0,0,0,0 +2024-07-10,2407.06533,,LETS-C: Leveraging Language Embedding for Time Series Classification,https://huggingface.co/papers/2407.06533,2,3,0,0,0,0 +2024-07-10,2407.03203,https://github.com/RickySkywalker/TheoremLlama,TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts,https://huggingface.co/papers/2407.03203,9,1,1,1,1,0 +2024-07-10,2407.03618,https://github.com/xhluca/bm25s,BM25S: Orders of magnitude faster lexical search via eager sparse scoring,https://huggingface.co/papers/2407.03618,10,1,1,3,0,0 +2024-07-10,2407.06723,,Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions,https://huggingface.co/papers/2407.06723,9,1,0,0,2,0 +2024-07-10,2407.07080,,Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities,https://huggingface.co/papers/2407.07080,20,1,0,8,1,8 +2024-07-10,2407.05015,https://github.com/nikolamilosevic86/verif.ai,How do you know that? Teaching Generative Language Models to Reference Answers to Biomedical Questions,https://huggingface.co/papers/2407.05015,4,1,1,2,1,0 +2024-07-10,2407.02880,https://github.com/fredzzhang/atlas,Knowledge Composition using Task Vectors with Learned Anisotropic Scaling,https://huggingface.co/papers/2407.02880,9,2,0,0,0,0 +2024-07-10,2407.06071,https://github.com/mivg/fallbacks,From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty,https://huggingface.co/papers/2407.06071,7,1,1,0,0,0 +2024-07-10,2407.03502,,AgentInstruct: Toward Generative Teaching with Agentic Flows,https://huggingface.co/papers/2407.03502,34,6,0,0,0,0 +2024-07-10,2407.07061,https://github.com/openbmb/ioa,Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence,https://huggingface.co/papers/2407.07061,23,4,0,0,0,0 +2024-07-10,2407.06189,https://github.com/orrzohar/Video-STaR,Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision,https://huggingface.co/papers/2407.06189,24,2,1,0,1,1 +2024-07-10,2407.06938,,RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models,https://huggingface.co/papers/2407.06938,20,1,0,0,0,0 +2024-07-10,2407.06581,,Vision language models are blind,https://huggingface.co/papers/2407.06581,73,9,0,0,1,0 +2024-07-10,2407.06304,,VIMI: Grounding Video Generation through Multi-modal Instruction,https://huggingface.co/papers/2407.06304,8,1,0,0,0,0 +2024-07-10,2407.06358,,MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions,https://huggingface.co/papers/2407.06358,15,1,0,0,1,0 +2024-07-10,2407.07071,https://github.com/voidism/lookback-lens,Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps,https://huggingface.co/papers/2407.07071,10,2,1,0,0,0 +2024-07-09,2407.06076,,Understanding Visual Feature Reliance through the Lens of Complexity,https://huggingface.co/papers/2407.06076,4,1,0,0,0,0 +2024-07-09,2407.03651,https://github.com/snorkel-ai/long-context-eval,"Evaluating Language Model Context Windows: A ""Working Memory"" Test and Inference-time Correction",https://huggingface.co/papers/2407.03651,14,1,0,0,0,0 +2024-07-09,2407.03471,https://github.com/McGill-NLP/AURORA,Learning Action and Reasoning-Centric Image Editing from Videos and Simulations,https://huggingface.co/papers/2407.03471,26,2,1,1,2,1 +2024-07-09,2407.05282,,UltraEdit: Instruction-based Fine-Grained Image Editing at Scale,https://huggingface.co/papers/2407.05282,8,1,0,0,0,0 +2024-07-09,2407.05463,,Training Task Experts through Retrieval Based Distillation,https://huggingface.co/papers/2407.05463,6,1,0,0,0,0 +2024-07-09,2407.06192,,Multi-Object Hallucination in Vision-Language Models,https://huggingface.co/papers/2407.06192,7,1,0,0,0,0 +2024-07-09,2407.04841,https://github.com/RodkinIvan/associative-recurrent-memory-transformer,Associative Recurrent Memory Transformer,https://huggingface.co/papers/2407.04841,29,2,0,0,0,0 +2024-07-09,2407.06182,,Compositional Video Generation as Flow Equalization,https://huggingface.co/papers/2407.06182,12,1,0,0,0,0 +2024-07-09,2407.06027,,PAS: Data-Efficient Plug-and-Play Prompt Augmentation System,https://huggingface.co/papers/2407.06027,8,2,0,0,0,0 +2024-07-09,2407.05700,https://github.com/wyt2000/InverseCoder,InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct,https://huggingface.co/papers/2407.05700,8,2,1,3,3,0 +2024-07-09,2407.04020,https://github.com/THU-KEG/LLMAEL,LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking,https://huggingface.co/papers/2407.04020,2,1,1,1,0,0 +2024-07-09,2407.04604,https://github.com/kamwoh/partcraft,PartCraft: Crafting Creative Objects by Parts,https://huggingface.co/papers/2407.04604,3,1,1,0,0,0 +2024-07-09,2407.06135,https://github.com/gair-nlp/anole,"ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation",https://huggingface.co/papers/2407.06135,19,2,1,1,0,1 +2024-07-09,2407.04842,https://github.com/MJ-Bench/MJ-Bench,MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?,https://huggingface.co/papers/2407.04842,49,3,1,23,2,1 +2024-07-09,2407.04693,https://github.com/open-compass/anah,ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models,https://huggingface.co/papers/2407.04693,1,1,1,1,0,0 +2024-07-09,2407.06191,,Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images,https://huggingface.co/papers/2407.06191,9,1,0,3,1,2 +2024-07-09,2407.05975,https://github.com/cone-mt/llamax,LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages,https://huggingface.co/papers/2407.05975,32,2,1,11,0,1 +2024-07-08,2407.04952,https://github.com/ethanm88/GPTGeoChat,Granular Privacy Control for Geolocation with Vision Language Models,https://huggingface.co/papers/2407.04952,3,1,0,0,0,0 +2024-07-08,2407.05131,https://github.com/richard-peng-xia/rule,RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models,https://huggingface.co/papers/2407.05131,19,2,1,0,0,0 +2024-07-08,2406.08085,https://github.com/IVGSZ/Flash-VStream,Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams,https://huggingface.co/papers/2406.08085,11,1,1,1,1,4 +2024-07-08,2407.02855,https://github.com/thu-coai/safeunlearning,Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks,https://huggingface.co/papers/2407.02855,9,1,1,2,0,0 +2024-07-08,2407.03958,,Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge,https://huggingface.co/papers/2407.03958,15,1,0,0,0,0 +2024-07-08,2407.03923,,CRiM-GS: Continuous Rigid Motion-Aware Gaussian Splatting from Motion Blur Images,https://huggingface.co/papers/2407.03923,7,1,0,0,0,0 +2024-07-08,2407.04172,https://github.com/vis-nlp/chartgemma,ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild,https://huggingface.co/papers/2407.04172,19,4,1,1,1,1 +2024-07-08,2407.04078,https://github.com/chengpengli1003/dotamath,DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning,https://huggingface.co/papers/2407.04078,14,3,0,0,0,0 +2024-07-08,2407.04363,https://github.com/airi-institute/arigraph,AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents,https://huggingface.co/papers/2407.04363,25,2,0,0,0,0 +2024-07-08,2407.03418,https://github.com/pliang279/hemm,HEMM: Holistic Evaluation of Multimodal Foundation Models,https://huggingface.co/papers/2407.03418,8,1,1,0,0,0 +2024-07-08,2406.11832,https://github.com/baaivision/eve,Unveiling Encoder-Free Vision-Language Models,https://huggingface.co/papers/2406.11832,45,3,1,3,0,0 +2024-07-08,2407.04620,https://github.com/test-time-training/ttt-lm-jax,Learning to (Learn at Test Time): RNNs with Expressive Hidden States,https://huggingface.co/papers/2407.04620,22,2,0,0,0,0 +2024-07-08,2407.04622,,On scalable oversight with weak LLMs judging strong LLMs,https://huggingface.co/papers/2407.04622,11,1,0,0,0,0 +2024-07-08,2407.03963,,LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs,https://huggingface.co/papers/2407.03963,13,1,0,0,0,0 +2024-07-08,2407.04051,https://github.com/funaudiollm/cosyvoice,FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs,https://huggingface.co/papers/2407.04051,33,1,0,0,0,0 +2024-07-05,2407.03321,https://github.com/batsresearch/planetarium,Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages,https://huggingface.co/papers/2407.03321,14,1,1,0,1,0 +2024-07-05,2407.01392,https://github.com/buoyancy99/diffusion-forcing,Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion,https://huggingface.co/papers/2407.01392,39,1,0,0,0,0 +2024-07-05,2407.01906,https://github.com/deepseek-ai/esft,Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models,https://huggingface.co/papers/2407.01906,33,1,0,1,0,0 +2024-07-04,2407.01100,,Eliminating Position Bias of Language Models: A Mechanistic Approach,https://huggingface.co/papers/2407.01100,6,1,0,0,0,0 +2024-07-04,2407.02551,,A False Sense of Safety: Unsafe Information Leakage in 'Safe' AI Responses,https://huggingface.co/papers/2407.02551,7,1,0,0,0,0 +2024-07-04,2406.19380,https://github.com/yandex-research/tabred,TabReD: A Benchmark of Tabular Machine Learning in-the-Wild,https://huggingface.co/papers/2406.19380,46,3,0,0,0,0 +2024-07-04,2407.03169,,Investigating Decoder-only Large Language Models for Speech-to-text Translation,https://huggingface.co/papers/2407.03169,9,1,0,0,0,0 +2024-07-04,2407.03300,https://github.com/gcorso/disco-diffdock,DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents,https://huggingface.co/papers/2407.03300,10,1,0,0,0,0 +2024-07-04,2407.02869,https://github.com/picoaudio/picoaudio,PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation,https://huggingface.co/papers/2407.02869,15,2,1,1,1,1 +2024-07-04,2407.02687,,"No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models",https://huggingface.co/papers/2407.02687,20,1,0,0,0,0 +2024-07-04,2407.03320,https://github.com/internlm/internlm-xcomposer,InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output,https://huggingface.co/papers/2407.03320,87,5,1,1,0,1 +2024-07-04,2407.02392,https://github.com/circleradon/tokenpacker,TokenPacker: Efficient Visual Projector for Multimodal LLM,https://huggingface.co/papers/2407.02392,20,3,1,0,0,0 +2024-07-03,2407.01791,https://github.com/Ale9806/eVLLM,μ-Bench: A Vision-Language Benchmark for Microscopy Understanding,https://huggingface.co/papers/2407.01791,5,1,0,0,0,0 +2024-07-03,2407.02489,,Magic Insert: Style-Aware Drag-and-Drop,https://huggingface.co/papers/2407.02489,17,1,0,0,0,0 +2024-07-03,2407.02477,,Understanding Alignment in Multimodal LLMs: A Comprehensive Study,https://huggingface.co/papers/2407.02477,19,2,0,0,0,0 +2024-07-03,2406.19238,https://github.com/copenlu/llm-pct-tropes,Revealing Fine-Grained Values and Opinions in Large Language Models,https://huggingface.co/papers/2406.19238,13,1,1,0,1,0 +2024-07-03,2407.01920,https://github.com/zjunlp/knowundo,To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models,https://huggingface.co/papers/2407.01920,13,3,0,0,1,0 +2024-07-03,2407.02371,,OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation,https://huggingface.co/papers/2407.02371,47,4,0,1,1,0 +2024-07-03,2406.19568,,What Matters in Detecting AI-Generated Videos like Sora?,https://huggingface.co/papers/2406.19568,13,2,0,0,0,0 +2024-07-03,2407.01370,https://github.com/salesforce/summary-of-a-haystack,Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems,https://huggingface.co/papers/2407.01370,81,6,1,0,1,0 +2024-07-03,2407.01489,https://github.com/OpenAutoCoder/Agentless,Agentless: Demystifying LLM-based Software Engineering Agents,https://huggingface.co/papers/2407.01489,41,7,0,0,0,0 +2024-07-03,2407.02490,https://github.com/microsoft/MInference,MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention,https://huggingface.co/papers/2407.02490,23,3,1,0,0,1 +2024-07-03,2407.02398,https://github.com/yangling0818/consistency_flow_matching,Consistency Flow Matching: Defining Straight Flows with Velocity Consistency,https://huggingface.co/papers/2407.02398,14,2,0,0,0,0 +2024-07-03,2407.01494,https://github.com/open-mmlab/foleycrafter,FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds,https://huggingface.co/papers/2407.01494,10,2,1,0,0,2 +2024-07-02,2406.19999,https://github.com/shin-ee-chen/SIFo,The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models,https://huggingface.co/papers/2406.19999,3,1,0,0,0,0 +2024-07-02,2406.20087,,ProgressGym: Alignment with a Millennium of Moral Progress,https://huggingface.co/papers/2406.20087,3,2,0,36,3,0 +2024-07-02,2406.18284,,RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network,https://huggingface.co/papers/2406.18284,16,2,0,0,0,0 +2024-07-02,2406.20086,,Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs,https://huggingface.co/papers/2406.20086,3,2,0,1,0,0 +2024-07-02,2407.00402,,Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP,https://huggingface.co/papers/2407.00402,22,1,0,0,0,0 +2024-07-02,2407.00088,https://github.com/microsoft/t-mac,T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge,https://huggingface.co/papers/2407.00088,7,1,1,0,0,0 +2024-07-02,2407.00111,,Accurate Prediction of Ligand-Protein Interaction Affinities with Fine-Tuned Small Language Models,https://huggingface.co/papers/2407.00111,5,2,0,0,0,0 +2024-07-02,2406.19741,https://github.com/huawei-noah/hebo,ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning,https://huggingface.co/papers/2406.19741,56,3,0,0,0,0 +2024-07-02,2406.20085,,Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language,https://huggingface.co/papers/2406.20085,9,1,0,0,0,0 +2024-07-02,2407.01272,,"Show Less, Instruct More: Enriching Prompts with Definitions and Guidelines for Zero-Shot NER",https://huggingface.co/papers/2407.01272,8,1,0,1,0,0 +2024-07-02,2407.00788,,InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation,https://huggingface.co/papers/2407.00788,20,3,0,0,0,0 +2024-07-02,2407.00106,,UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI,https://huggingface.co/papers/2407.00106,5,1,0,0,0,0 +2024-07-02,2407.00653,,Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs,https://huggingface.co/papers/2407.00653,11,2,0,0,0,0 +2024-07-02,2406.19997,,Wavelets Are All You Need for Autoregressive Image Generation,https://huggingface.co/papers/2406.19997,27,2,0,0,0,0 +2024-07-02,2407.00114,,OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents,https://huggingface.co/papers/2407.00114,12,4,0,0,0,0 +2024-07-02,2407.01470,,DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging,https://huggingface.co/papers/2407.01470,5,1,0,0,0,0 +2024-07-02,2407.01449,,ColPali: Efficient Document Retrieval with Vision Language Models,https://huggingface.co/papers/2407.01449,29,1,0,6,5,2 +2024-07-02,2407.00837,,Towards Robust Speech Representation Learning for Thousands of Languages,https://huggingface.co/papers/2407.00837,9,1,0,1,3,0 +2024-07-02,2407.00367,,SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix,https://huggingface.co/papers/2407.00367,9,1,0,0,0,0 +2024-07-02,2407.01492,https://github.com/sail-sg/regmix,RegMix: Data Mixture as Regression for Language Model Pre-training,https://huggingface.co/papers/2407.01492,30,4,1,5,2,1 +2024-07-02,2407.00782,https://github.com/mathllm/Step-Controlled_DPO,Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning,https://huggingface.co/papers/2407.00782,21,3,1,0,0,0 +2024-07-02,2407.01519,,DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models,https://huggingface.co/papers/2407.01519,22,2,0,0,0,1 +2024-07-02,2407.01231,,MIRAI: Evaluating LLM Agents for Event Forecasting,https://huggingface.co/papers/2407.01231,15,2,0,0,0,0 +2024-07-02,2407.00320,,LiteSearch: Efficacious Tree Search for LLM,https://huggingface.co/papers/2407.00320,37,5,0,0,0,0 +2024-07-02,2406.18009,,E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS,https://huggingface.co/papers/2406.18009,18,3,0,0,0,0 +2024-07-02,2407.00468,https://github.com/chenllliang/mmevalpro,MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation,https://huggingface.co/papers/2407.00468,35,2,1,0,1,0 +2024-07-02,2407.01284,https://github.com/we-math/we-math,We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?,https://huggingface.co/papers/2407.01284,72,4,1,0,0,0 +2024-07-01,2407.00617,,Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning,https://huggingface.co/papers/2407.00617,7,1,0,0,0,0 +2024-07-01,2406.16845,,RaTEScore: A Metric for Radiology Report Generation,https://huggingface.co/papers/2406.16845,4,1,0,0,2,0 +2024-07-01,2406.17720,https://github.com/baskargroup/Arboretum,Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity,https://huggingface.co/papers/2406.17720,7,1,1,0,1,0 +2024-07-01,2406.20095,https://github.com/lostxine/llara,LLaRA: Supercharging Robot Learning Data for Vision-Language Policy,https://huggingface.co/papers/2406.20095,17,1,1,4,0,0 +2024-07-01,2406.19320,https://github.com/vmicheli/delta-iris,Efficient World Models with Context-Aware Tokenization,https://huggingface.co/papers/2406.19320,7,1,1,1,0,0 +2024-07-01,2406.18462,,GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality,https://huggingface.co/papers/2406.18462,11,3,0,0,0,0 +2024-07-01,2406.20076,https://github.com/hustvl/evf-sam,EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model,https://huggingface.co/papers/2406.20076,7,3,1,0,0,0 +2024-07-01,2406.19774,,Direct Preference Knowledge Distillation for Large Language Models,https://huggingface.co/papers/2406.19774,21,1,0,0,0,0 +2024-07-01,2406.19251,,AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation,https://huggingface.co/papers/2406.19251,8,1,0,0,0,0 +2024-07-01,2406.19280,https://github.com/freedomintelligence/huatuogpt-vision,"HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale",https://huggingface.co/papers/2406.19280,55,6,1,2,2,0 +2024-07-01,2406.20094,https://github.com/tencent-ailab/persona-hub,"Scaling Synthetic Data Creation with 1,000,000,000 Personas",https://huggingface.co/papers/2406.20094,85,5,1,0,1,0 +2024-06-28,2406.10900,,AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models,https://huggingface.co/papers/2406.10900,11,2,0,0,0,0 +2024-06-28,2406.19223,,T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings,https://huggingface.co/papers/2406.19223,8,3,0,0,0,0 +2024-06-28,2406.18676,https://github.com/dongguanting/dpa-rag,Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation,https://huggingface.co/papers/2406.18676,5,4,1,0,0,0 +2024-06-28,2406.18790,,MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data,https://huggingface.co/papers/2406.18790,32,3,0,0,0,0 +2024-06-28,2406.17513,,Benchmarking Mental State Representations in Language Models,https://huggingface.co/papers/2406.17513,3,1,0,0,0,0 +2024-06-28,2406.19395,https://github.com/MoSalama98/DSiRe,Dataset Size Recovery from LoRA Weights,https://huggingface.co/papers/2406.19395,17,4,1,0,1,0 +2024-06-28,2406.18125,https://github.com/noran-mohamed/Resume-Classification-Dataset,ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models,https://huggingface.co/papers/2406.18125,3,2,0,2,1,0 +2024-06-28,2406.18120,https://github.com/ahmedheakl/arazn-llm,ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs,https://huggingface.co/papers/2406.18120,5,3,1,13,2,0 +2024-06-28,2406.19314,https://github.com/livebench/livebench,"LiveBench: A Challenging, Contamination-Free LLM Benchmark",https://huggingface.co/papers/2406.19314,14,2,1,0,8,0 +2024-06-28,2406.14629,https://github.com/imagination-research/lbt,Can LLMs Learn by Teaching? A Preliminary Study,https://huggingface.co/papers/2406.14629,17,2,1,0,0,0 +2024-06-28,2406.08316,,Is Programming by Example solved by LLMs?,https://huggingface.co/papers/2406.08316,11,1,0,0,0,1 +2024-06-28,2406.14909,https://github.com/thu-nics/moa,MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression,https://huggingface.co/papers/2406.14909,12,2,1,0,0,0 +2024-06-28,2406.19389,,"OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding",https://huggingface.co/papers/2406.19389,51,4,0,0,0,0 +2024-06-28,2406.19226,,Simulating Classroom Education with LLM-Empowered Agents,https://huggingface.co/papers/2406.19226,28,3,0,0,0,0 +2024-06-28,2406.19263,https://github.com/eric-ai-lab/Screen-Point-and-Read,Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding,https://huggingface.co/papers/2406.19263,9,2,1,0,4,0 +2024-06-28,2406.18629,https://github.com/dvlab-research/step-dpo,Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs,https://huggingface.co/papers/2406.18629,37,2,1,7,1,0 +2024-06-28,2406.19227,,Aligning Teacher with Student Preferences for Tailored Training Data Generation,https://huggingface.co/papers/2406.19227,23,2,0,0,0,0 +2024-06-28,2406.19215,https://github.com/thu-keg/seakr,SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation,https://huggingface.co/papers/2406.19215,28,1,0,0,0,0 +2024-06-27,2406.15334,,Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning,https://huggingface.co/papers/2406.15334,8,1,0,0,0,0 +2024-06-27,2406.18510,,WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models,https://huggingface.co/papers/2406.18510,8,1,0,4,1,0 +2024-06-27,2406.18522,https://github.com/pku-yuangroup/chronomagic-bench,ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation,https://huggingface.co/papers/2406.18522,40,2,1,1,3,1 +2024-06-27,2406.18532,https://github.com/aiwaves-cn/agents,Symbolic Learning Enables Self-Evolving Agents,https://huggingface.co/papers/2406.18532,10,1,0,0,0,0 +2024-06-27,2406.17294,https://github.com/hzq950419/math-llava,Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models,https://huggingface.co/papers/2406.17294,9,1,1,1,1,0 +2024-06-27,2406.16979,,Understanding and Diagnosing Deep Reinforcement Learning,https://huggingface.co/papers/2406.16979,8,1,0,0,0,0 +2024-06-27,2406.18530,,MatchTime: Towards Automatic Soccer Game Commentary Generation,https://huggingface.co/papers/2406.18530,11,2,0,0,1,0 +2024-06-27,2406.17565,,MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool,https://huggingface.co/papers/2406.17565,5,1,0,0,0,0 +2024-06-27,2406.18219,https://github.com/kamanphoebe/look-into-moes,A Closer Look into Mixture-of-Experts in Large Language Models,https://huggingface.co/papers/2406.18219,14,2,1,0,0,0 +2024-06-27,2406.16793,https://github.com/zyushun/adam-mini,Adam-mini: Use Fewer Learning Rates To Gain More,https://huggingface.co/papers/2406.16793,65,3,1,1,0,0 +2024-06-27,2406.18495,https://github.com/allenai/wildguard,"WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs",https://huggingface.co/papers/2406.18495,12,1,1,1,3,0 +2024-06-27,2406.16341,https://github.com/dustn1259/ehrcon,EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records,https://huggingface.co/papers/2406.16341,11,2,0,0,0,0 +2024-06-27,2406.18082,,Octo-planner: On-device Language Model for Planner-Action Agents,https://huggingface.co/papers/2406.18082,47,3,0,2,0,0 +2024-06-27,2406.18521,https://github.com/princeton-nlp/CharXiv,CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs,https://huggingface.co/papers/2406.18521,25,2,1,0,1,0 +2024-06-26,2406.18518,,APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets,https://huggingface.co/papers/2406.18518,22,1,0,6,1,1 +2024-06-26,2406.15279,,Cross-Modality Safety Alignment,https://huggingface.co/papers/2406.15279,3,1,0,0,1,0 +2024-06-26,2406.17055,https://github.com/theryanl/llm-rationality,Large Language Models Assume People are More Rational than We Really are,https://huggingface.co/papers/2406.17055,4,1,0,0,0,0 +2024-06-26,2406.17660,https://github.com/aashiqmuhamed/grass,Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients,https://huggingface.co/papers/2406.17660,5,1,0,0,0,0 +2024-06-26,2406.17419,https://github.com/mozerwang/loong,Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA,https://huggingface.co/papers/2406.17419,14,1,1,0,0,0 +2024-06-26,2406.17563,https://github.com/danielsc4/dynamic-activation-composition,Multi-property Steering of Large Language Models with Dynamic Activation Composition,https://huggingface.co/papers/2406.17563,4,1,1,0,0,0 +2024-06-26,2406.17774,,Fast and Uncertainty-Aware SVBRDF Recovery from Multi-View Capture using Frequency Domain Analysis,https://huggingface.co/papers/2406.17774,3,1,0,0,0,0 +2024-06-26,2406.13144,https://github.com/jiho283/simulator,DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue Understanding of Conversational Agents,https://huggingface.co/papers/2406.13144,11,1,0,0,1,0 +2024-06-26,2406.16678,https://github.com/segment-any-text/wtpsplit,"Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation",https://huggingface.co/papers/2406.16678,13,2,1,10,0,0 +2024-06-26,2406.16377,,"On the Transformations across Reward Model, Parameter Update, and In-Context Prompt",https://huggingface.co/papers/2406.16377,11,1,0,0,0,0 +2024-06-26,2406.15339,,Image Conductor: Precision Control for Interactive Video Synthesis,https://huggingface.co/papers/2406.15339,8,3,0,1,0,1 +2024-06-26,2406.17636,,Aligning Diffusion Models with Noise-Conditioned Perception,https://huggingface.co/papers/2406.17636,26,1,0,1,0,0 +2024-06-26,2406.17758,,MotionBooth: Motion-Aware Customized Text-to-Video Generation,https://huggingface.co/papers/2406.17758,18,1,0,1,1,0 +2024-06-26,2406.17557,,The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale,https://huggingface.co/papers/2406.17557,75,3,0,0,1,3 +2024-06-26,2406.16863,https://github.com/arthur-qiu/freetraj,FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models,https://huggingface.co/papers/2406.16863,10,3,1,0,0,1 +2024-06-26,2406.17763,https://github.com/jhhuangchloe/DiffusionPDE,DiffusionPDE: Generative PDE-Solving Under Partial Observation,https://huggingface.co/papers/2406.17763,23,1,0,0,0,0 +2024-06-26,2406.17588,,LongIns: A Challenging Long-context Instruction-based Exam for LLMs,https://huggingface.co/papers/2406.17588,19,1,0,0,0,0 +2024-06-26,2406.17770,https://github.com/phoenixz810/mg-llava,MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning,https://huggingface.co/papers/2406.17770,18,1,1,1,0,0 +2024-06-26,2406.17245,https://github.com/wenyudu/migu,Unlocking Continual Learning Abilities in Language Models,https://huggingface.co/papers/2406.17245,28,1,1,0,0,0 +2024-06-26,2406.16273,,YouDream: Generating Anatomically Controllable Consistent Text-to-3D Animals,https://huggingface.co/papers/2406.16273,40,1,0,0,0,0 +2024-06-25,2406.16683,https://github.com/nzilberstein/Repulsive-score-distillation-RSD-,Repulsive Score Distillation for Diverse Sampling of Diffusion Models,https://huggingface.co/papers/2406.16683,4,2,1,0,0,0 +2024-06-25,2406.16008,,Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization,https://huggingface.co/papers/2406.16008,6,1,0,0,0,0 +2024-06-25,2406.13632,,Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations,https://huggingface.co/papers/2406.13632,5,1,0,0,0,0 +2024-06-25,2406.16254,,Confidence Regulation Neurons in Language Models,https://huggingface.co/papers/2406.16254,10,1,0,0,0,0 +2024-06-25,2406.15718,https://github.com/thunlp/duplex-model,Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models,https://huggingface.co/papers/2406.15718,14,2,1,1,1,0 +2024-06-25,2406.16747,,Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers,https://huggingface.co/papers/2406.16747,16,1,0,0,0,0 +2024-06-25,2406.16714,https://github.com/thu-coai/autodetect,AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models,https://huggingface.co/papers/2406.16714,10,2,1,0,0,0 +2024-06-25,2406.14051,,How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics,https://huggingface.co/papers/2406.14051,9,1,0,0,0,0 +2024-06-25,2406.16048,,Evaluating D-MERIT of Partial-annotation on Information Retrieval,https://huggingface.co/papers/2406.16048,34,2,0,0,0,0 +2024-06-25,2406.16772,https://github.com/gair-nlp/olympicarena,OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?,https://huggingface.co/papers/2406.16772,2,2,1,0,0,0 +2024-06-25,2406.16860,https://github.com/cambrian-mllm/cambrian,"Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs",https://huggingface.co/papers/2406.16860,52,4,1,4,3,0 +2024-06-25,2406.15927,,Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs,https://huggingface.co/papers/2406.15927,13,1,0,0,0,0 +2024-06-25,2406.15704,https://github.com/bytedance/salmonn,video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models,https://huggingface.co/papers/2406.15704,5,1,1,1,0,1 +2024-06-25,2406.16815,,ClotheDreamer: Text-Guided Garment Generation with 3D Gaussians,https://huggingface.co/papers/2406.16815,7,1,0,0,0,0 +2024-06-25,2406.14540,,IRASim: Learning Interactive Real-Robot Action Simulators,https://huggingface.co/papers/2406.14540,6,1,0,0,0,0 +2024-06-25,2406.16768,,WARP: On the Benefits of Weight Averaged Rewarded Policies,https://huggingface.co/papers/2406.16768,21,1,0,0,0,0 +2024-06-25,2406.16690,https://github.com/opennlplab/scalinglaws,Scaling Laws for Linear Complexity Language Models,https://huggingface.co/papers/2406.16690,21,2,0,0,0,0 +2024-06-25,2406.16235,https://github.com/batsresearch/cross-lingual-detox,Preference Tuning For Toxicity Mitigation Generalizes Across Languages,https://huggingface.co/papers/2406.16235,12,1,1,6,0,0 +2024-06-25,2406.16852,https://github.com/evolvinglmms-lab/longva,Long Context Transfer from Language to Vision,https://huggingface.co/papers/2406.16852,32,2,1,2,0,0 +2024-06-25,2406.14833,,Efficient Continual Pre-training by Mitigating the Stability Gap,https://huggingface.co/papers/2406.14833,19,1,0,0,0,0 +2024-06-25,2406.16855,https://github.com/yuangpeng/dreambench_plus,DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation,https://huggingface.co/papers/2406.16855,53,3,1,0,0,0 +2024-06-25,2406.16758,https://github.com/Kthyeon/Multilingual-SpecBench,Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters,https://huggingface.co/papers/2406.16758,18,2,0,0,0,0 +2024-06-25,2406.16338,,VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models,https://huggingface.co/papers/2406.16338,23,2,0,0,1,0 +2024-06-25,2406.16260,,Video-Infinity: Distributed Long Video Generation,https://huggingface.co/papers/2406.16260,28,2,0,0,0,0 +2024-06-24,2406.15877,https://github.com/bigcode-project/bigcodebench-annotation,BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions,https://huggingface.co/papers/2406.15877,43,5,0,0,4,1 +2024-06-24,2406.14596,,ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights,https://huggingface.co/papers/2406.14596,4,2,0,0,0,0 +2024-06-24,2406.13236,https://github.com/shangdatalab/deep-contam,Data Contamination Can Cross Language Barriers,https://huggingface.co/papers/2406.13236,8,2,0,0,0,0 +2024-06-24,2406.11654,,Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming,https://huggingface.co/papers/2406.11654,6,1,0,0,0,0 +2024-06-24,2406.11617,https://github.com/declare-lab/della,DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling,https://huggingface.co/papers/2406.11617,7,1,0,0,0,0 +2024-06-24,2406.13527,,4K4DGen: Panoramic 4D Generation at 4K Resolution,https://huggingface.co/papers/2406.13527,7,1,0,0,0,0 +2024-06-24,2406.14783,,Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework,https://huggingface.co/papers/2406.14783,15,2,0,0,0,0 +2024-06-24,2406.14764,,RE-AdaptIR: Improving Information Retrieval through Reverse Engineered Adaptation,https://huggingface.co/papers/2406.14764,4,1,0,0,0,0 +2024-06-24,2406.14213,,Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task,https://huggingface.co/papers/2406.14213,20,3,0,0,0,0 +2024-06-24,2406.12564,https://github.com/vityavitalich/meritfed,Low-Resource Machine Translation through the Lens of Personalized Federated Learning,https://huggingface.co/papers/2406.12564,3,1,0,0,0,0 +2024-06-24,2406.13393,,Style-NeRF2NeRF: 3D Style Transfer From Style-Aligned Multi-View Images,https://huggingface.co/papers/2406.13393,5,1,0,0,0,0 +2024-06-24,2406.14938,,Towards Retrieval Augmented Generation over Large Video Libraries,https://huggingface.co/papers/2406.14938,18,1,0,0,0,0 +2024-06-24,2406.14972,https://github.com/florin-git/Base-vs-Instruct-LLMs-in-RAG-Systems,A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems,https://huggingface.co/papers/2406.14972,6,1,1,0,1,0 +2024-06-24,2406.13457,https://github.com/dachunkai/evtexture,EvTexture: Event-driven Texture Enhancement for Video Super-Resolution,https://huggingface.co/papers/2406.13457,15,1,0,0,0,0 +2024-06-24,2406.15349,https://github.com/autonomousvision/navsim,NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking,https://huggingface.co/papers/2406.15349,5,1,0,0,0,0 +2024-06-24,2406.11403,https://github.com/leloykun/mmfm-challenge,Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report,https://huggingface.co/papers/2406.11403,4,1,1,0,0,0 +2024-06-24,2406.12056,https://github.com/liugangcode/InfoAlign,Learning Molecular Representation in a Cell,https://huggingface.co/papers/2406.12056,6,1,1,1,1,0 +2024-06-24,2406.15275,,Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model,https://huggingface.co/papers/2406.15275,10,0,0,0,0,0 +2024-06-24,2406.14035,,Two Giraffes in a Dirt Field: Using Game Play to Investigate Situation Modelling in Large Multimodal Models,https://huggingface.co/papers/2406.14035,10,1,0,0,0,0 +2024-06-24,2406.12624,https://github.com/UMass-Meta-LLM-Eval/llm_eval,Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges,https://huggingface.co/papers/2406.12624,35,2,1,0,0,0 +2024-06-24,2406.14599,,Stylebreeder: Exploring and Democratizing Artistic Styles through Text-to-Image Models,https://huggingface.co/papers/2406.14599,16,2,0,0,1,0 +2024-06-24,2406.15193,https://github.com/declare-lab/darwin,Reward Steering with Evolutionary Heuristics for Decoding-time Alignment,https://huggingface.co/papers/2406.15193,12,3,0,0,0,0 +2024-06-24,2406.15252,,MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation,https://huggingface.co/papers/2406.15252,14,1,0,2,2,1 +2024-06-24,2406.15319,,LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs,https://huggingface.co/papers/2406.15319,57,5,0,0,1,0 +2024-06-24,2406.14805,,How Well Do LLMs Represent Values Across Cultures? Empirical Analysis of LLM Responses Based on Hofstede Cultural Dimensions,https://huggingface.co/papers/2406.14805,3,1,0,0,0,0 +2024-06-24,2406.14835,,ToVo: Toxicity Taxonomy via Voting,https://huggingface.co/papers/2406.14835,3,1,0,1,0,0 +2024-06-24,2406.14393,https://github.com/zhxieml/remiss-jailbreak,Jailbreaking as a Reward Misspecification Problem,https://huggingface.co/papers/2406.14393,12,2,1,0,0,0 +2024-06-21,2406.13099,,Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models,https://huggingface.co/papers/2406.13099,4,1,0,0,0,0 +2024-06-21,2406.11289,,A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models,https://huggingface.co/papers/2406.11289,5,2,0,0,0,0 +2024-06-21,2406.13735,,StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images,https://huggingface.co/papers/2406.13735,5,1,0,0,0,0 +2024-06-21,2406.14563,,Model Merging and Safety Alignment: One Bad Model Spoils the Bunch,https://huggingface.co/papers/2406.14563,30,1,0,0,0,0 +2024-06-21,2406.12618,,From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP,https://huggingface.co/papers/2406.12618,5,1,0,0,0,0 +2024-06-21,2406.13663,https://github.com/betswish/mirage,Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation,https://huggingface.co/papers/2406.13663,7,1,0,0,0,0 +2024-06-21,2406.10601,https://github.com/airi-institute/stylefeatureeditor,The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing,https://huggingface.co/papers/2406.10601,65,2,1,1,0,1 +2024-06-21,2406.13621,https://github.com/guyyariv/vlmig,Improving Visual Commonsense in Language Models via Multiple Image Generation,https://huggingface.co/papers/2406.13621,13,2,1,0,0,0 +2024-06-21,2406.13542,https://github.com/QwenLM/AutoIF,Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models,https://huggingface.co/papers/2406.13542,16,2,1,0,0,0 +2024-06-21,2406.14319,https://github.com/chuangtaochen-tum/livemind,LiveMind: Low-latency Large Language Models with Simultaneous Inference,https://huggingface.co/papers/2406.14319,14,2,1,0,0,0 +2024-06-21,2406.14539,,Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps,https://huggingface.co/papers/2406.14539,26,1,0,0,0,2 +2024-06-21,2406.11410,https://github.com/liteai-team/hare,"HARE: HumAn pRiors, a key to small language model Efficiency",https://huggingface.co/papers/2406.11410,38,1,1,2,0,0 +2024-06-21,2406.14347,,nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials,https://huggingface.co/papers/2406.14347,99,1,0,0,0,0 +2024-06-21,2406.12925,,GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks,https://huggingface.co/papers/2406.12925,20,2,0,1,0,7 +2024-06-21,2406.11817,,Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level,https://huggingface.co/papers/2406.11817,13,1,0,1,0,0 +2024-06-21,2406.14562,,Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities,https://huggingface.co/papers/2406.14562,27,1,0,0,0,0 +2024-06-21,2406.14515,https://github.com/open-compass/vlmevalkit,MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding,https://huggingface.co/papers/2406.14515,29,1,1,0,0,0 +2024-06-21,2406.14544,https://github.com/sparksjoe/prism,Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs,https://huggingface.co/papers/2406.14544,34,2,1,2,0,0 +2024-06-21,2406.13923,,PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents,https://huggingface.co/papers/2406.13923,21,1,0,0,1,0 +2024-06-21,2406.12045,https://github.com/sierra-research/tau-bench,τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains,https://huggingface.co/papers/2406.12045,4,1,0,0,0,0 +2024-06-21,2406.14130,https://github.com/modelscope/DiffSynth-Studio,ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning,https://huggingface.co/papers/2406.14130,10,2,1,1,0,4 +2024-06-21,2406.11896,,DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning,https://huggingface.co/papers/2406.11896,18,1,0,0,0,0 +2024-06-21,2406.11927,https://github.com/FSoft-AI4Code/RepoExec,REPOEXEC: Evaluate Code Generation with a Repository-Level Executable Benchmark,https://huggingface.co/papers/2406.11927,9,1,1,0,0,0 +2024-06-21,2406.14491,https://github.com/microsoft/lmops,Instruction Pre-Training: Language Models are Supervised Multitask Learners,https://huggingface.co/papers/2406.14491,76,8,0,15,3,1 +2024-06-20,2406.11715,,Measuring memorization in RLHF for code completion,https://huggingface.co/papers/2406.11715,6,1,0,0,0,0 +2024-06-20,2406.11614,https://github.com/yihuaihong/conceptvectors,Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces,https://huggingface.co/papers/2406.11614,3,2,1,0,1,0 +2024-06-20,2406.12209,https://github.com/atosystem/ssl_interface,Interface Design for Self-Supervised Speech Models,https://huggingface.co/papers/2406.12209,6,1,0,0,0,0 +2024-06-20,2406.11139,,Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance,https://huggingface.co/papers/2406.11139,12,1,0,0,0,0 +2024-06-20,2406.12034,,Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts,https://huggingface.co/papers/2406.12034,12,1,0,0,0,0 +2024-06-20,2406.12649,,Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models,https://huggingface.co/papers/2406.12649,15,1,0,0,0,0 +2024-06-20,2406.11230,https://github.com/wang-ml-lab/multimodal-needle-in-a-haystack,Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models,https://huggingface.co/papers/2406.11230,34,1,0,0,0,0 +2024-06-20,2406.11612,https://github.com/jetbrains-research/lca-baselines,Long Code Arena: a Set of Benchmarks for Long-Context Code Models,https://huggingface.co/papers/2406.11612,20,1,1,0,6,1 +2024-06-19,2406.11431,https://github.com/keven980716/weak-to-strong-deception,Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization,https://huggingface.co/papers/2406.11431,4,2,0,0,0,0 +2024-06-19,2406.11909,https://github.com/wutaiqiang/moslora,Mixture-of-Subspaces in Low-Rank Adaptation,https://huggingface.co/papers/2406.11909,3,1,0,0,0,0 +2024-06-19,2406.12303,,Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment,https://huggingface.co/papers/2406.12303,4,1,0,0,0,0 +2024-06-19,2406.12042,https://github.com/rezashkv/diffusion_pruning,Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models,https://huggingface.co/papers/2406.12042,8,1,1,1,0,0 +2024-06-19,2406.12673,,Estimating Knowledge in Large Language Models Without Generating a Single Token,https://huggingface.co/papers/2406.12673,7,1,0,0,0,0 +2024-06-19,2406.11801,https://github.com/declare-lab/safety-arithmetic,Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations,https://huggingface.co/papers/2406.11801,15,1,0,0,1,0 +2024-06-19,2406.12274,,SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models,https://huggingface.co/papers/2406.12274,13,2,0,0,1,0 +2024-06-19,2406.12050,https://github.com/ytyz1307zzh/RefAug,Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning,https://huggingface.co/papers/2406.12050,10,1,0,0,0,0 +2024-06-19,2406.12831,,VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing,https://huggingface.co/papers/2406.12831,5,1,0,0,0,0 +2024-06-19,2406.12814,https://github.com/chenwu98/agent-attack,Adversarial Attacks on Multimodal Agents,https://huggingface.co/papers/2406.12814,4,1,0,0,0,0 +2024-06-19,2406.11939,https://github.com/lm-sys/arena-hard-auto,From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline,https://huggingface.co/papers/2406.11939,5,1,1,0,0,0 +2024-06-19,2406.12849,,Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation,https://huggingface.co/papers/2406.12849,48,2,0,0,0,2 +2024-06-19,2406.11912,https://github.com/fsoft-ai4code/agilecoder,AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology,https://huggingface.co/papers/2406.11912,25,2,0,0,0,0 +2024-06-19,2406.12031,https://github.com/mlfoundations/tabliblib,Large Scale Transfer Learning for Tabular Data via Language Modeling,https://huggingface.co/papers/2406.12031,8,1,1,1,2,0 +2024-06-19,2406.12459,https://github.com/humansplat/humansplat.github.io,HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors,https://huggingface.co/papers/2406.12459,11,1,0,0,0,0 +2024-06-19,2406.12311,,Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models,https://huggingface.co/papers/2406.12311,7,1,0,0,0,0 +2024-06-19,2406.11687,,Tokenization Falling Short: The Curse of Tokenization,https://huggingface.co/papers/2406.11687,13,1,0,0,0,0 +2024-06-19,2406.12742,https://github.com/dtennant/mirb_eval,"Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning",https://huggingface.co/papers/2406.12742,14,3,1,0,1,0 +2024-06-19,2406.09760,https://github.com/sail-sg/dice,Bootstrapping Language Models with DPO Implicit Rewards,https://huggingface.co/papers/2406.09760,37,1,1,4,0,0 +2024-06-19,2406.12168,,BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM,https://huggingface.co/papers/2406.12168,7,1,0,0,0,0 +2024-06-19,2406.12292,,JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning,https://huggingface.co/papers/2406.12292,4,2,0,0,0,0 +2024-06-19,2406.12644,https://github.com/devichand579/HPT,Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models,https://huggingface.co/papers/2406.12644,4,1,1,0,0,0 +2024-06-19,2406.12824,,From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries,https://huggingface.co/papers/2406.12824,20,2,0,0,0,0 +2024-06-19,2406.12793,https://github.com/thudm/chatglm-6b,ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools,https://huggingface.co/papers/2406.12793,30,2,1,4,0,20 +2024-06-19,2406.11931,https://github.com/deepseek-ai/deepseek-coder-v2,DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence,https://huggingface.co/papers/2406.11931,54,2,1,0,0,0 +2024-06-19,2406.12246,https://github.com/byungkwanlee/trol,TroL: Traversal of Layers for Large Language and Vision Models,https://huggingface.co/papers/2406.12246,34,2,1,3,0,1 +2024-06-19,2406.12753,https://github.com/gair-nlp/olympicarena,OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI,https://huggingface.co/papers/2406.12753,14,2,1,0,1,1 +2024-06-19,2406.12275,https://github.com/Yxxxb/VoCo-LLaMA,VoCo-LLaMA: Towards Vision Compression with Large Language Models,https://huggingface.co/papers/2406.12275,29,2,1,0,0,0 +2024-06-19,2406.11811,,RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content,https://huggingface.co/papers/2406.11811,15,1,0,0,1,0 +2024-06-19,2406.12066,https://github.com/bittermanlab/rabbits,Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks,https://huggingface.co/papers/2406.12066,8,1,1,0,1,1 +2024-06-18,2406.10023,https://github.com/luckeciano/bal-pm,Deep Bayesian Active Learning for Preference Modeling in Large Language Models,https://huggingface.co/papers/2406.10023,2,1,0,0,0,0 +2024-06-18,2406.10522,https://github.com/yguooo/cartoon-caption-generation,Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning,https://huggingface.co/papers/2406.10522,7,2,1,0,0,0 +2024-06-18,2406.11251,,Unifying Multimodal Retrieval via Document Screenshot Embedding,https://huggingface.co/papers/2406.11251,6,1,0,0,0,0 +2024-06-18,2406.11430,,A Simple and Effective L_2 Norm-Based Strategy for KV Cache Compression,https://huggingface.co/papers/2406.11430,22,2,0,0,0,0 +2024-06-18,2406.09455,,Pandora: Towards General World Model with Natural Language Actions and Video States,https://huggingface.co/papers/2406.09455,13,1,0,1,0,0 +2024-06-18,2406.11194,https://github.com/bigai-ai/ICE,In-Context Editing: Learning Knowledge from Self-Induced Distributions,https://huggingface.co/papers/2406.11194,15,1,1,0,1,0 +2024-06-18,2406.10803,,HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies,https://huggingface.co/papers/2406.10803,4,1,0,0,0,0 +2024-06-18,2406.11463,,Just How Flexible are Neural Networks in Practice?,https://huggingface.co/papers/2406.11463,6,1,0,0,0,0 +2024-06-18,2406.11202,https://github.com/kongdai123/consistency2,Consistency^2: Consistent and Fast 3D Painting with Latent Consistency Models,https://huggingface.co/papers/2406.11202,3,1,0,0,0,0 +2024-06-18,2406.11831,,Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models,https://huggingface.co/papers/2406.11831,19,1,0,0,0,0 +2024-06-18,2406.11833,https://github.com/liuziyu77/mmdu,MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs,https://huggingface.co/papers/2406.11833,61,4,1,0,1,0 +2024-06-18,2406.10670,https://github.com/davidbrandfonbrener/color-filter-olmo,CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training,https://huggingface.co/papers/2406.10670,4,1,1,0,1,0 +2024-06-18,2406.11813,,How Do Large Language Models Acquire Factual Knowledge During Pretraining?,https://huggingface.co/papers/2406.11813,29,1,0,0,0,0 +2024-06-18,2406.11840,,LLaNA: Large Language and NeRF Assistant,https://huggingface.co/papers/2406.11840,17,2,0,0,0,0 +2024-06-18,2406.10906,https://gitlab.com/bachstelze/causal_generation,Breaking the Attention Bottleneck,https://huggingface.co/papers/2406.10906,4,2,0,0,0,0 +2024-06-18,2406.11271,https://github.com/mlfoundations/mint-1t,MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens,https://huggingface.co/papers/2406.11271,10,1,0,0,0,0 +2024-06-18,2406.11402,https://github.com/neelabhsinha/lm-application-eval-kit,"Evaluating Open Language Models Across Task Types, Application Domains, and Reasoning Types: An In-Depth Experimental Analysis",https://huggingface.co/papers/2406.11402,6,1,1,0,0,0 +2024-06-18,2406.10328,,From Pixels to Prose: A Large Dataset of Dense Image Captions,https://huggingface.co/papers/2406.10328,16,2,0,0,1,0 +2024-06-18,2406.11775,https://github.com/jieyuz2/taskmeanything,Task Me Anything,https://huggingface.co/papers/2406.11775,7,1,1,0,0,0 +2024-06-18,2406.11827,https://github.com/wzhouad/wpo,WPO: Enhancing RLHF with Weighted Preference Optimization,https://huggingface.co/papers/2406.11827,13,1,0,0,0,0 +2024-06-18,2406.11816,,VideoLLM-online: Online Video Large Language Model for Streaming Video,https://huggingface.co/papers/2406.11816,20,1,0,1,1,0 +2024-06-18,2406.11069,,WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences,https://huggingface.co/papers/2406.11069,12,2,0,0,1,1 +2024-06-18,2406.10996,,THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation,https://huggingface.co/papers/2406.10996,31,1,0,0,0,0 +2024-06-18,2406.11196,https://github.com/rishab-partha/Vid3D,Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion,https://huggingface.co/papers/2406.11196,8,1,1,0,0,0 +2024-06-18,2406.10163,https://github.com/buaacyw/meshanything,MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers,https://huggingface.co/papers/2406.10163,30,2,1,1,0,4 +2024-06-18,2406.10324,,L4GM: Large 4D Gaussian Reconstruction Model,https://huggingface.co/papers/2406.10324,12,1,0,0,0,0 +2024-06-18,2406.11839,,mDPO: Conditional Preference Optimization for Multimodal Large Language Models,https://huggingface.co/papers/2406.11839,36,1,0,0,0,0 +2024-06-18,2406.11794,,DataComp-LM: In search of the next generation of training sets for language models,https://huggingface.co/papers/2406.11794,44,3,0,5,2,0 +2024-06-18,2406.11768,,GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities,https://huggingface.co/papers/2406.11768,20,1,0,0,0,2 +2024-06-17,2406.10209,https://github.com/ahans30/goldfish-loss,"Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs",https://huggingface.co/papers/2406.10209,8,1,0,0,0,0 +2024-06-17,2406.08545,https://github.com/NVlabs/RVT,RVT-2: Learning Precise Manipulation from Few Demonstrations,https://huggingface.co/papers/2406.08545,7,1,0,0,0,0 +2024-06-17,2406.08920,,AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis,https://huggingface.co/papers/2406.08920,7,1,0,0,0,0 +2024-06-17,2406.08659,,Vivid-ZOO: Multi-View Video Generation with Diffusion Model,https://huggingface.co/papers/2406.08659,8,3,0,0,0,0 +2024-06-17,2406.10227,,VideoGUI: A Benchmark for GUI Automation from Instructional Videos,https://huggingface.co/papers/2406.10227,8,1,0,0,0,0 +2024-06-17,2406.10126,,Training-free Camera Control for Video Generation,https://huggingface.co/papers/2406.10126,12,2,0,0,0,0 +2024-06-17,2406.10118,https://github.com/SEACrowd/seacrowd-datahub,SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages,https://huggingface.co/papers/2406.10118,25,1,0,1,100,0 +2024-06-17,2406.10210,,Make It Count: Text-to-Image Generation with an Accurate Number of Objects,https://huggingface.co/papers/2406.10210,75,2,0,0,0,0 +2024-06-17,2406.10149,https://github.com/booydar/babilong,BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack,https://huggingface.co/papers/2406.10149,47,4,1,0,2,1 +2024-06-17,2406.09900,,GEB-1.3B: Open Lightweight Large Language Model,https://huggingface.co/papers/2406.09900,18,2,0,0,0,0 +2024-06-17,2406.10111,,GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors,https://huggingface.co/papers/2406.10111,6,1,0,0,0,0 +2024-06-17,2406.06263,https://github.com/cisnlp/masklid,MaskLID: Code-Switching Language Identification through Iterative Masking,https://huggingface.co/papers/2406.06263,5,1,1,0,1,1 +2024-06-17,2406.08973,,XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning,https://huggingface.co/papers/2406.08973,85,1,0,0,0,0 +2024-06-17,2406.07882,https://github.com/yc015/talktuner-chatbot-llm-dashboard,Designing a Dashboard for Transparency and Control of Conversational AI,https://huggingface.co/papers/2406.07882,9,2,0,0,0,0 +2024-06-17,2406.08845,https://github.com/ztlmememe/T2VHE,"Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality",https://huggingface.co/papers/2406.08845,8,1,0,0,0,0 +2024-06-17,2406.09961,https://github.com/chartmimic/chartmimic,ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation,https://huggingface.co/papers/2406.09961,54,2,1,0,1,0 +2024-06-17,2406.08451,https://github.com/opengvlab/gui-odyssey,GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices,https://huggingface.co/papers/2406.08451,23,1,1,0,1,0 +2024-06-17,2406.07230,https://github.com/opengvlab/mm-niah,Needle In A Multimodal Haystack,https://huggingface.co/papers/2406.07230,52,1,1,0,1,0 +2024-06-17,2406.08418,https://github.com/opengvlab/omnicorpus,OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text,https://huggingface.co/papers/2406.08418,28,2,0,0,0,0 +2024-06-17,2406.09559,,Decoding the Diversity: A Review of the Indic AI Research Landscape,https://huggingface.co/papers/2406.09559,5,1,0,0,0,0 +2024-06-17,2406.10208,,Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering,https://huggingface.co/papers/2406.10208,21,2,0,0,0,2 +2024-06-14,2406.09406,,4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities,https://huggingface.co/papers/2406.09406,12,2,0,25,0,6 +2024-06-14,2406.09358,https://github.com/locuslab/diffusion-model-hallucination,Understanding Hallucinations in Diffusion Models through Mode Interpolation,https://huggingface.co/papers/2406.09358,4,1,0,0,0,0 +2024-06-14,2406.09356,https://github.com/q-future/cmc-bench,CMC-Bench: Towards a New Paradigm of Visual Signal Compression,https://huggingface.co/papers/2406.09356,4,2,1,0,0,0 +2024-06-14,2406.09297,https://github.com/zaydzuhri/pythia-mlkv,MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding,https://huggingface.co/papers/2406.09297,4,2,1,0,0,0 +2024-06-14,2406.07457,,Estimating the Hallucination Rate of Generative AI,https://huggingface.co/papers/2406.07457,6,1,0,0,0,0 +2024-06-14,2406.05967,,CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark,https://huggingface.co/papers/2406.05967,5,1,0,0,0,0 +2024-06-14,2406.08707,,mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus,https://huggingface.co/papers/2406.08707,14,2,0,0,2,0 +2024-06-14,2406.09162,,EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts,https://huggingface.co/papers/2406.09162,13,3,0,0,0,0 +2024-06-14,2406.09305,,Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation,https://huggingface.co/papers/2406.09305,4,1,0,0,0,0 +2024-06-14,2406.09411,,MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding,https://huggingface.co/papers/2406.09411,18,2,0,0,1,0 +2024-06-14,2406.07546,,Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?,https://huggingface.co/papers/2406.07546,8,1,0,0,1,0 +2024-06-14,2406.09371,https://github.com/desaixie/zeroverse,LRM-Zero: Training Large Reconstruction Models with Synthesized Data,https://huggingface.co/papers/2406.09371,3,1,0,0,0,0 +2024-06-14,2406.08598,,Language Model Council: Benchmarking Foundation Models on Highly Subjective Tasks by Consensus,https://huggingface.co/papers/2406.08598,5,1,0,0,1,1 +2024-06-14,2406.07522,https://github.com/sustcsonglin/flash-linear-attention,Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling,https://huggingface.co/papers/2406.07522,35,2,1,0,0,0 +2024-06-14,2406.08479,,Real3D: Scaling Up Large Reconstruction Models with Real-World Images,https://huggingface.co/papers/2406.08479,6,1,0,1,0,2 +2024-06-14,2406.09170,,Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning,https://huggingface.co/papers/2406.09170,24,1,0,0,1,0 +2024-06-14,2406.08587,https://github.com/csbench/csbench,CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery,https://huggingface.co/papers/2406.08587,15,3,1,0,0,0 +2024-06-14,2406.09403,,Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models,https://huggingface.co/papers/2406.09403,18,1,0,0,0,0 +2024-06-14,2406.08656,https://github.com/weixi-feng/tc-bench,TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation,https://huggingface.co/papers/2406.08656,7,1,0,0,0,0 +2024-06-14,2406.09416,,Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models,https://huggingface.co/papers/2406.09416,28,1,0,0,0,0 +2024-06-14,2406.09413,https://github.com/snap-research/weights2weights,Interpreting the Weight Space of Customized Diffusion Models,https://huggingface.co/papers/2406.09413,18,1,1,0,0,0 +2024-06-14,2406.08657,,Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs,https://huggingface.co/papers/2406.08657,9,2,0,1,0,0 +2024-06-14,2406.09246,,OpenVLA: An Open-Source Vision-Language-Action Model,https://huggingface.co/papers/2406.09246,30,1,0,2,0,0 +2024-06-14,2406.09308,,Transformers meet Neural Algorithmic Reasoners,https://huggingface.co/papers/2406.09308,43,1,0,0,0,0 +2024-06-14,2406.09412,https://github.com/invictus717/MiCo,Explore the Limits of Omni-modal Pretraining at Scale,https://huggingface.co/papers/2406.09412,10,3,1,0,0,0 +2024-06-14,2406.08673,https://github.com/nvidia/nemo-aligner,HelpSteer2: Open-source dataset for training top-performing reward models,https://huggingface.co/papers/2406.08673,14,3,1,10,5,4 +2024-06-14,2406.08862,,Cognitively Inspired Energy-Based World Models,https://huggingface.co/papers/2406.08862,9,4,0,0,0,0 +2024-06-14,2406.09414,https://github.com/DepthAnything/Depth-Anything-V2,Depth Anything V2,https://huggingface.co/papers/2406.09414,88,5,1,5,1,5 +2024-06-14,2406.09415,,An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels,https://huggingface.co/papers/2406.09415,48,2,0,0,0,0 +2024-06-14,2406.08552,,DiTFastAttn: Attention Compression for Diffusion Transformer Models,https://huggingface.co/papers/2406.08552,21,1,0,0,0,0 +2024-06-13,2406.06462,https://github.com/tianyu-z/vcr,VCR: Visual Caption Restoration,https://huggingface.co/papers/2406.06462,10,1,1,0,16,0 +2024-06-13,2406.04329,,Simplified and Generalized Masked Diffusion for Discrete Data,https://huggingface.co/papers/2406.04329,4,0,0,0,0,0 +2024-06-13,2406.07933,,Large Language Model Unlearning via Embedding-Corrupted Prompts,https://huggingface.co/papers/2406.07933,6,0,0,0,0,0 +2024-06-13,2406.04127,https://github.com/aryopg/mmlu-redux,Are We Done with MMLU?,https://huggingface.co/papers/2406.04127,36,1,1,0,2,1 +2024-06-13,2406.05074,https://github.com/histai/hibou,Hibou: A Family of Foundational Vision Transformers for Pathology,https://huggingface.co/papers/2406.05074,6,1,1,1,0,0 +2024-06-13,2406.08487,https://github.com/yfzhang114/slime,Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models,https://huggingface.co/papers/2406.08487,10,2,1,2,1,0 +2024-06-13,2406.04338,,Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion,https://huggingface.co/papers/2406.04338,32,3,0,0,0,0 +2024-06-13,2406.04320,,Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models,https://huggingface.co/papers/2406.04320,7,1,0,0,0,0 +2024-06-13,2406.08414,https://github.com/luchris429/DiscoPOP,Discovering Preference Optimization Algorithms with and for Large Language Models,https://huggingface.co/papers/2406.08414,12,0,1,1,0,1 +2024-06-13,2406.06282,,PowerInfer-2: Fast Large Language Model Inference on a Smartphone,https://huggingface.co/papers/2406.06282,35,2,0,0,0,0 +2024-06-13,2406.05955,,Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters,https://huggingface.co/papers/2406.05955,21,1,0,4,0,0 +2024-06-13,2406.05338,https://github.com/bujiazi/motionclone,MotionClone: Training-Free Motion Cloning for Controllable Video Generation,https://huggingface.co/papers/2406.05338,39,3,1,0,0,0 +2024-06-13,2406.08392,,FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation,https://huggingface.co/papers/2406.08392,18,0,0,0,0,0 +2024-06-13,2406.05132,https://github.com/sled-group/3D-GRAND,3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination,https://huggingface.co/papers/2406.05132,27,1,1,0,0,2 +2024-06-13,2406.07476,https://github.com/damo-nlp-sg/videollama2,VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs,https://huggingface.co/papers/2406.07476,30,1,1,4,1,2 +2024-06-13,2406.06523,,NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing,https://huggingface.co/papers/2406.06523,48,2,0,0,0,3 +2024-06-13,2406.07686,,AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation,https://huggingface.co/papers/2406.07686,13,0,0,0,0,0 +2024-06-13,2406.07792,,Hierarchical Patch Diffusion Models for High-Resolution Video Generation,https://huggingface.co/papers/2406.07792,13,0,0,0,0,0 +2024-06-13,2406.08407,https://github.com/eric-ai-lab/mmworld,MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos,https://huggingface.co/papers/2406.08407,24,0,1,0,0,0 +2024-06-13,2406.08478,,What If We Recaption Billions of Web Images with LLaMA-3?,https://huggingface.co/papers/2406.08478,38,1,0,1,2,0 +2024-06-13,2406.08464,https://github.com/magpie-align/magpie,Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing,https://huggingface.co/papers/2406.08464,48,2,1,17,29,4 +2024-06-12,2406.07520,,Neural Gaffer: Relighting Any Object via Diffusion,https://huggingface.co/papers/2406.07520,4,2,0,0,0,0 +2024-06-12,2406.07188,https://github.com/vicgalle/merging-self-critique-jailbreaks,Merging Improves Self-Critique Against Jailbreak Attacks,https://huggingface.co/papers/2406.07188,3,0,1,0,0,0 +2024-06-12,2406.07436,,McEval: Massively Multilingual Code Evaluation,https://huggingface.co/papers/2406.07436,39,1,0,0,2,0 +2024-06-12,2406.05629,https://github.com/mhamilton723/DenseAV,"Separating the ""Chirp"" from the ""Chat"": Self-supervised Visual Grounding of Sound and Language",https://huggingface.co/papers/2406.05629,7,1,1,0,0,1 +2024-06-12,2406.06612,https://github.com/see2sound/see2sound,SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound,https://huggingface.co/papers/2406.06612,13,0,1,1,0,1 +2024-06-12,2406.06608,https://github.com/trigaten/Prompt_Systematic_Review,The Prompt Report: A Systematic Survey of Prompting Techniques,https://huggingface.co/papers/2406.06608,48,2,1,0,1,0 +2024-06-12,2406.06563,,Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models,https://huggingface.co/papers/2406.06563,17,3,0,2,0,0 +2024-06-12,2406.06573,,MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering,https://huggingface.co/papers/2406.06573,8,0,0,0,0,0 +2024-06-12,2406.06911,https://github.com/czg1225/asyncdiff,AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising,https://huggingface.co/papers/2406.06911,10,1,1,0,0,0 +2024-06-12,2406.07524,https://github.com/kuleshov-group/mdlm,Simple and Effective Masked Diffusion Language Models,https://huggingface.co/papers/2406.07524,7,2,1,2,0,0 +2024-06-12,2406.07394,https://github.com/trotsky1997/mathblackbox,Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B,https://huggingface.co/papers/2406.07394,17,1,0,0,0,0 +2024-06-12,2406.07550,,An Image is Worth 32 Tokens for Reconstruction and Generation,https://huggingface.co/papers/2406.07550,54,11,0,1,0,1 +2024-06-12,2406.07496,https://github.com/zou-group/textgrad,"TextGrad: Automatic ""Differentiation"" via Text",https://huggingface.co/papers/2406.07496,25,0,0,0,0,0 +2024-06-12,2406.07547,,Zero-shot Image Editing with Reference Imitation,https://huggingface.co/papers/2406.07547,30,1,0,1,0,2 +2024-06-12,2406.07472,,4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models,https://huggingface.co/papers/2406.07472,10,2,0,0,0,0 +2024-06-12,2406.06592,,Improve Mathematical Reasoning in Language Models by Automated Process Supervision,https://huggingface.co/papers/2406.06592,17,0,0,0,0,0 +2024-06-11,2406.06424,,Margin-aware Preference Optimization for Aligning Diffusion Models without Reference,https://huggingface.co/papers/2406.06424,9,0,0,4,3,0 +2024-06-11,2406.05649,,GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement,https://huggingface.co/papers/2406.05649,7,0,0,0,0,0 +2024-06-11,2406.06133,,ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models,https://huggingface.co/papers/2406.06133,5,0,0,0,0,0 +2024-06-11,2406.06040,https://github.com/mutonix/vript,Vript: A Video Is Worth Thousands of Words,https://huggingface.co/papers/2406.06040,19,0,1,0,5,1 +2024-06-11,2406.06316,,Tx-LLM: A Large Language Model for Therapeutics,https://huggingface.co/papers/2406.06316,13,0,0,0,0,0 +2024-06-11,2406.05981,https://github.com/gatech-eic/shiftaddllm,ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization,https://huggingface.co/papers/2406.05981,10,0,1,0,0,0 +2024-06-11,2406.06469,https://github.com/agent-husky/husky-v1,"Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning",https://huggingface.co/papers/2406.06469,22,2,1,0,0,0 +2024-06-11,2406.06216,https://github.com/srameo/le3d,Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis,https://huggingface.co/papers/2406.06216,16,4,0,0,0,0 +2024-06-11,2406.06527,,IllumiNeRF: 3D Relighting without Inverse Rendering,https://huggingface.co/papers/2406.06527,7,0,0,0,0,0 +2024-06-11,2406.05768,,MLCM: Multistep Consistency Distillation of Latent Diffusion Model,https://huggingface.co/papers/2406.05768,8,0,0,0,0,0 +2024-06-11,2406.05370,,VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers,https://huggingface.co/papers/2406.05370,12,0,0,0,1,0 +2024-06-11,2406.05814,,Unified Text-to-Image Generation and Retrieval,https://huggingface.co/papers/2406.05814,8,0,0,0,0,0 +2024-06-11,2406.06474,,Towards a Personal Health Large Language Model,https://huggingface.co/papers/2406.06474,15,0,0,0,0,0 +2024-06-11,2406.06525,https://github.com/foundationvision/llamagen,Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation,https://huggingface.co/papers/2406.06525,62,2,1,1,0,2 +2024-06-10,2406.04523,,Proofread: Fixes All Errors with One Tap,https://huggingface.co/papers/2406.04523,12,0,0,0,0,0 +2024-06-10,2406.04370,,Large Language Model Confidence Estimation via Black-Box Access,https://huggingface.co/papers/2406.04370,19,0,0,0,0,0 +2024-06-10,2406.04770,https://github.com/allenai/wildbench,WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild,https://huggingface.co/papers/2406.04770,24,1,1,0,1,3 +2024-06-10,2406.04391,,Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?,https://huggingface.co/papers/2406.04391,6,0,0,0,0,0 +2024-06-10,2406.04594,,Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach,https://huggingface.co/papers/2406.04594,4,0,0,0,0,0 +2024-06-10,2406.04520,,NATURAL PLAN: Benchmarking LLMs on Natural Language Planning,https://huggingface.co/papers/2406.04520,9,0,0,0,0,0 +2024-06-10,2406.04744,,CRAG -- Comprehensive RAG Benchmark,https://huggingface.co/papers/2406.04744,39,2,0,0,0,0 +2024-06-10,2406.04692,,Mixture-of-Agents Enhances Large Language Model Capabilities,https://huggingface.co/papers/2406.04692,50,3,0,0,0,2 +2024-06-10,2406.04485,,GenAI Arena: An Open Evaluation Platform for Generative Models,https://huggingface.co/papers/2406.04485,19,0,0,0,0,1 +2024-06-07,2406.01300,,pOps: Photo-Inspired Diffusion Operators,https://huggingface.co/papers/2406.01300,15,0,0,0,0,0 +2024-06-07,2406.04325,,ShareGPT4Video: Improving Video Understanding and Generation with Better Captions,https://huggingface.co/papers/2406.04325,69,4,0,2,3,4 +2024-06-07,2406.04314,,Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step,https://huggingface.co/papers/2406.04314,26,2,0,5,0,3 +2024-06-07,2406.04268,,Open-Endedness is Essential for Artificial Superhuman Intelligence,https://huggingface.co/papers/2406.04268,11,1,0,0,0,0 +2024-06-07,2406.04151,https://github.com/woooodyy/agentgym,AgentGym: Evolving Large Language Model-based Agents across Diverse Environments,https://huggingface.co/papers/2406.04151,14,1,1,1,0,0 +2024-06-07,2406.04271,https://github.com/yangling0818/buffer-of-thought-llm,Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models,https://huggingface.co/papers/2406.04271,27,1,0,0,0,0 +2024-06-07,2406.04333,https://github.com/huggingface/diffusers,BitsFusion: 1.99 bits Weight Quantization of Diffusion Model,https://huggingface.co/papers/2406.04333,36,2,1,0,0,0 +2024-06-07,2406.04277,https://github.com/yangling0818/videotetris,VideoTetris: Towards Compositional Text-to-Video Generation,https://huggingface.co/papers/2406.04277,21,1,1,0,0,0 +2024-06-07,2406.04324,,SF-V: Single Forward Video Generation Model,https://huggingface.co/papers/2406.04324,22,2,0,0,0,0 +2024-06-06,2406.01014,https://github.com/x-plug/mobileagent,Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration,https://huggingface.co/papers/2406.01014,29,2,1,0,0,1 +2024-06-06,2406.03215,,Searching Priors Makes Text-to-Video Synthesis Better,https://huggingface.co/papers/2406.03215,11,2,0,0,0,0 +2024-06-06,2406.02884,https://github.com/posterllava/posterllava,PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM,https://huggingface.co/papers/2406.02884,13,2,1,0,0,0 +2024-06-06,2406.02886,,PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs,https://huggingface.co/papers/2406.02886,7,1,0,0,0,0 +2024-06-06,2406.02900,,Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms,https://huggingface.co/papers/2406.02900,10,0,0,0,0,0 +2024-06-06,2406.03344,https://github.com/mhamzaerol/audio-mamba-aum,Audio Mamba: Bidirectional State Space Model for Audio Representation Learning,https://huggingface.co/papers/2406.03344,16,1,1,0,0,0 +2024-06-06,2406.02897,,LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes,https://huggingface.co/papers/2406.02897,13,1,0,0,0,0 +2024-06-06,2406.02657,https://github.com/itsnamgyu/block-transformer,Block Transformer: Global-to-Local Language Modeling for Fast Inference,https://huggingface.co/papers/2406.02657,36,1,1,0,0,0 +2024-06-06,2406.02844,,Item-Language Model for Conversational Recommendation,https://huggingface.co/papers/2406.02844,8,1,0,0,0,0 +2024-06-06,2406.02856,https://github.com/xiaoduoailab/xmodellm,Xmodel-LM Technical Report,https://huggingface.co/papers/2406.02856,7,1,1,1,0,0 +2024-06-06,2406.02539,,Parrot: Multilingual Visual Instruction Tuning,https://huggingface.co/papers/2406.02539,35,2,0,0,0,0 +2024-06-06,2406.03184,https://github.com/Costwen/Ouroboros3D,Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion,https://huggingface.co/papers/2406.03184,18,2,1,0,0,0 +2024-06-05,2406.02509,,CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation,https://huggingface.co/papers/2406.02509,8,4,0,0,0,0 +2024-06-05,2406.02511,,V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation,https://huggingface.co/papers/2406.02511,8,2,0,2,0,14 +2024-06-05,2406.02523,,RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots,https://huggingface.co/papers/2406.02523,8,1,0,0,0,0 +2024-06-05,2406.02507,,Guiding a Diffusion Model with a Bad Version of Itself,https://huggingface.co/papers/2406.02507,15,1,0,0,0,0 +2024-06-05,2406.02543,,To Believe or Not to Believe Your LLM,https://huggingface.co/papers/2406.02543,31,1,0,0,0,0 +2024-06-05,2406.01660,,Self-Improving Robust Preference Optimization,https://huggingface.co/papers/2406.01660,18,1,0,0,0,0 +2024-06-05,2406.02230,,I4VGen: Image as Stepping Stone for Text-to-Video Generation,https://huggingface.co/papers/2406.02230,15,3,0,0,0,0 +2024-06-05,2406.02430,https://github.com/BytedanceSpeech/seed-tts-eval,Seed-TTS: A Family of High-Quality Versatile Speech Generation Models,https://huggingface.co/papers/2406.02430,27,2,1,0,0,0 +2024-06-04,2406.00908,https://github.com/ssyang2020/ZeroSmooth,ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation,https://huggingface.co/papers/2406.00908,11,1,1,0,0,0 +2024-06-04,2406.00153,,μLO: Compute-Efficient Meta-Generalization of Learned Optimizers,https://huggingface.co/papers/2406.00153,9,0,0,0,0,0 +2024-06-04,2406.00392,,Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning,https://huggingface.co/papers/2406.00392,12,1,0,0,0,0 +2024-06-04,2406.00888,https://github.com/SALT-NLP/demonstrated-feedback,"Show, Don't Tell: Aligning Language Models with Demonstrated Feedback",https://huggingface.co/papers/2406.00888,29,1,1,0,0,0 +2024-06-04,2406.01493,,Learning Temporally Consistent Video Depth from Video Diffusion Priors,https://huggingface.co/papers/2406.01493,17,2,0,1,0,1 +2024-06-04,2406.01574,,MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark,https://huggingface.co/papers/2406.01574,42,3,0,0,4,1 +2024-06-03,2405.20541,,Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models,https://huggingface.co/papers/2405.20541,19,1,0,0,0,0 +2024-06-03,2405.20674,,4Diffusion: Multi-view Video Diffusion Model for 4D Generation,https://huggingface.co/papers/2405.20674,10,1,0,0,0,0 +2024-06-03,2405.21048,,Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling,https://huggingface.co/papers/2405.21048,11,0,0,0,0,0 +2024-06-03,2405.21075,,Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis,https://huggingface.co/papers/2405.21075,16,2,0,5,0,0 +2024-06-03,2405.18144,,4-bit Shampoo for Memory-Efficient Network Training,https://huggingface.co/papers/2405.18144,6,1,0,0,0,0 +2024-06-03,2405.21060,https://github.com/state-spaces/mamba,Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality,https://huggingface.co/papers/2405.21060,61,2,1,10,0,1 +2024-05-31,2405.19707,https://github.com/chenhaoxing/DeMamba,DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark,https://huggingface.co/papers/2405.19707,4,0,0,0,0,0 +2024-05-31,2405.19856,https://github.com/seketeam/deveval,DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories,https://huggingface.co/papers/2405.19856,7,1,1,0,0,0 +2024-05-31,2405.19888,,Parrot: Efficient Serving of LLM-based Applications with Semantic Variable,https://huggingface.co/papers/2405.19888,4,0,0,0,0,0 +2024-05-31,2405.20340,,MotionLLM: Understanding Human Behaviors from Human Motions and Videos,https://huggingface.co/papers/2405.20340,19,7,0,0,0,1 +2024-05-31,2405.19957,,PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting,https://huggingface.co/papers/2405.19957,6,0,0,0,0,0 +2024-05-31,2405.20335,https://github.com/xwin-lm/xwin-lm,Xwin-LM: Strong and Scalable Alignment Practice for LLMs,https://huggingface.co/papers/2405.20335,17,1,1,1,0,1 +2024-05-31,2405.19893,,Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts,https://huggingface.co/papers/2405.19893,26,2,0,0,0,0 +2024-05-31,2405.20204,,Jina CLIP: Your CLIP Model Is Also Your Text Retriever,https://huggingface.co/papers/2405.20204,28,1,0,3,0,3 +2024-05-31,2405.20222,https://github.com/myniuuu/mofa-video,MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model,https://huggingface.co/papers/2405.20222,10,1,1,2,0,1 +2024-05-31,2405.20327,,GECO: Generative Image-to-3D within a SECOnd,https://huggingface.co/papers/2405.20327,9,0,0,0,0,0 +2024-05-31,2405.20289,,DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation,https://huggingface.co/papers/2405.20289,9,0,0,0,0,0 +2024-05-30,2405.19332,https://github.com/shenao-zhang/selm,Self-Exploring Language Models: Active Preference Elicitation for Online Alignment,https://huggingface.co/papers/2405.19332,14,1,1,10,0,2 +2024-05-30,2405.19320,,Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF,https://huggingface.co/papers/2405.19320,9,0,0,0,0,0 +2024-05-30,2405.19325,,Nearest Neighbor Speculative Decoding for LLM Generation and Attribution,https://huggingface.co/papers/2405.19325,13,0,0,0,0,0 +2024-05-30,2405.18669,,Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities,https://huggingface.co/papers/2405.18669,11,0,0,0,0,0 +2024-05-30,2405.19107,,Offline Regularised Reinforcement Learning for Large Language Models Alignment,https://huggingface.co/papers/2405.19107,12,0,0,0,0,0 +2024-05-30,2405.18991,https://github.com/aigc-apps/easyanimate,EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture,https://huggingface.co/papers/2405.18991,12,1,1,5,0,1 +2024-05-30,2405.18515,,Atlas3D: Physically Constrained Self-Supporting Text-to-3D for Simulation and Fabrication,https://huggingface.co/papers/2405.18515,7,0,0,0,0,0 +2024-05-30,2405.18503,https://github.com/sony/soundctm,SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation,https://huggingface.co/papers/2405.18503,9,0,1,1,0,0 +2024-05-30,2405.19331,,NPGA: Neural Parametric Gaussian Avatars,https://huggingface.co/papers/2405.19331,10,0,0,0,0,0 +2024-05-30,2405.18750,https://github.com/Ji4chenLi/t2v-turbo,T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback,https://huggingface.co/papers/2405.18750,20,1,1,0,0,0 +2024-05-30,2405.18870,,LLMs achieve adult human performance on higher-order theory of mind tasks,https://huggingface.co/papers/2405.18870,16,3,0,0,0,0 +2024-05-30,2405.19327,https://github.com/multimodal-art-projection/map-neo,MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series,https://huggingface.co/papers/2405.19327,43,3,1,0,0,0 +2024-05-29,2405.18424,,3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting,https://huggingface.co/papers/2405.18424,7,0,0,0,0,0 +2024-05-29,2405.18386,https://github.com/ldzhangyx/instruct-MusicGen,Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning,https://huggingface.co/papers/2405.18386,18,3,0,0,0,0 +2024-05-29,2405.18426,,GFlow: Recovering 4D World from Monocular Video,https://huggingface.co/papers/2405.18426,15,3,0,0,0,0 +2024-05-29,2405.17976,https://github.com/ieit-yuan/yuan2.0-m32,Yuan 2.0-M32: Mixture of Experts with Attention Router,https://huggingface.co/papers/2405.17976,18,2,1,6,0,1 +2024-05-29,2405.17991,,VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections,https://huggingface.co/papers/2405.17991,9,1,0,0,0,0 +2024-05-29,2405.18047,,2BP: 2-Stage Backpropagation,https://huggingface.co/papers/2405.18047,21,3,0,0,0,0 +2024-05-29,2405.18377,,LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models,https://huggingface.co/papers/2405.18377,16,1,0,0,0,0 +2024-05-29,2405.18407,,Phased Consistency Model,https://huggingface.co/papers/2405.18407,44,7,0,1,0,5 +2024-05-28,2405.16537,,I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models,https://huggingface.co/papers/2405.16537,15,0,0,0,0,0 +2024-05-28,2405.16287,https://github.com/blackzxy/logah,LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters,https://huggingface.co/papers/2405.16287,10,1,0,0,0,0 +2024-05-28,2405.17399,https://github.com/mcleish7/arithmetic,Transformers Can Do Arithmetic with the Right Embeddings,https://huggingface.co/papers/2405.17399,50,2,0,0,0,0 +2024-05-28,2405.15757,https://github.com/Jeff-LiangF/streamv2v,Looking Backward: Streaming Video-to-Video Translation with Feature Banks,https://huggingface.co/papers/2405.15757,14,2,1,0,0,0 +2024-05-28,2405.16712,,Zamba: A Compact 7B SSM Hybrid Model,https://huggingface.co/papers/2405.16712,19,3,0,2,1,1 +2024-05-28,2405.17414,,Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control,https://huggingface.co/papers/2405.17414,10,0,0,0,0,0 +2024-05-28,2405.17405,,Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer,https://huggingface.co/papers/2405.17405,14,0,0,0,0,0 +2024-05-28,2405.17258,,Trans-LoRA: towards data-free Transferable Parameter Efficient Finetuning,https://huggingface.co/papers/2405.17258,12,0,0,0,0,0 +2024-05-28,2405.16759,,Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models,https://huggingface.co/papers/2405.16759,7,0,0,0,0,0 +2024-05-28,2405.16888,,Part123: Part-aware 3D Reconstruction from a Single-view Image,https://huggingface.co/papers/2405.16888,10,1,0,0,0,0 +2024-05-28,2405.16852,,EM Distillation for One-step Diffusion Models,https://huggingface.co/papers/2405.16852,10,1,0,0,0,0 +2024-05-28,2405.17428,,NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models,https://huggingface.co/papers/2405.17428,15,0,0,2,0,6 +2024-05-28,2405.16822,,Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels,https://huggingface.co/papers/2405.16822,11,3,0,0,0,0 +2024-05-28,2405.17247,,An Introduction to Vision-Language Modeling,https://huggingface.co/papers/2405.17247,78,2,0,0,0,0 +2024-05-28,2405.17430,,Matryoshka Multimodal Models,https://huggingface.co/papers/2405.17430,29,3,0,2,0,0 +2024-05-27,2405.15682,https://github.com/facebookresearch/schedule_free,The Road Less Scheduled,https://huggingface.co/papers/2405.15682,17,3,0,0,0,0 +2024-05-27,2405.15738,https://github.com/alibaba/conv-llava,ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models,https://huggingface.co/papers/2405.15738,43,4,1,12,0,0 +2024-05-27,2405.15319,,Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training,https://huggingface.co/papers/2405.15319,23,1,0,1,0,0 +2024-05-27,2405.15125,https://github.com/caiyuanhao1998/hdr-gs,HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting,https://huggingface.co/papers/2405.15125,5,0,0,0,0,0 +2024-05-27,2405.15223,https://github.com/thuml/iVideoGPT,iVideoGPT: Interactive VideoGPTs are Scalable World Models,https://huggingface.co/papers/2405.15223,11,1,1,1,0,0 +2024-05-27,2405.15613,https://github.com/facebookresearch/ssl-data-curation,Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach,https://huggingface.co/papers/2405.15613,12,0,0,0,0,0 +2024-05-27,2405.14979,https://github.com/wyysf-98/craftsman,CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner,https://huggingface.co/papers/2405.14979,14,1,1,1,0,2 +2024-05-27,2405.14906,,AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct,https://huggingface.co/papers/2405.14906,21,5,0,3,0,5 +2024-05-27,2405.14908,,Data Mixing Made Efficient: A Bivariate Scaling Law for Language Model Pretraining,https://huggingface.co/papers/2405.14908,11,0,0,0,0,0 +2024-05-27,2405.15032,,Aya 23: Open Weight Releases to Further Multilingual Progress,https://huggingface.co/papers/2405.15032,21,1,0,5,0,65 +2024-05-27,2405.15216,,Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition,https://huggingface.co/papers/2405.15216,11,0,0,0,0,0 +2024-05-27,2405.15574,https://github.com/byungkwanlee/meteor,Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models,https://huggingface.co/papers/2405.15574,52,3,1,2,1,1 +2024-05-27,2405.15071,https://github.com/osu-nlp-group/grokkedtransformer,Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization,https://huggingface.co/papers/2405.15071,33,1,0,0,0,0 +2024-05-24,2405.14847,,Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling,https://huggingface.co/papers/2405.14847,6,0,0,0,0,0 +2024-05-24,2405.14871,,NeRF-Casting: Improved View-Dependent Appearance with Consistent Reflections,https://huggingface.co/papers/2405.14871,7,0,0,0,0,0 +2024-05-24,2405.14866,,Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras,https://huggingface.co/papers/2405.14866,5,0,0,0,0,0 +2024-05-24,2405.14105,,Distributed Speculative Inference of Large Language Models,https://huggingface.co/papers/2405.14105,15,0,0,0,0,0 +2024-05-24,2405.14129,,AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability,https://huggingface.co/papers/2405.14129,9,0,0,4,0,0 +2024-05-24,2405.13817,,Thermodynamic Natural Gradient Descent,https://huggingface.co/papers/2405.13817,13,1,0,0,0,0 +2024-05-24,2405.14857,,Semantica: An Adaptable Image-Conditioned Diffusion Model,https://huggingface.co/papers/2405.14857,8,0,0,0,0,0 +2024-05-24,2405.14333,,DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data,https://huggingface.co/papers/2405.14333,29,3,0,0,0,0 +2024-05-24,2405.14860,https://github.com/joshengels/multidimensionalfeatures,Not All Language Model Features Are Linear,https://huggingface.co/papers/2405.14860,39,3,0,0,0,0 +2024-05-24,2405.14224,https://github.com/tyshiwo1/dim-diffusionmamba,DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis,https://huggingface.co/papers/2405.14224,8,0,0,0,0,0 +2024-05-24,2405.14477,,LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models,https://huggingface.co/papers/2405.14477,16,3,0,0,0,0 +2024-05-24,2405.14867,https://github.com/tianweiy/DMD2,Improved Distribution Matching Distillation for Fast Image Synthesis,https://huggingface.co/papers/2405.14867,11,0,1,2,0,3 +2024-05-24,2405.13800,https://github.com/HJYao00/DenseConnector,Dense Connector for MLLMs,https://huggingface.co/papers/2405.13800,20,4,1,5,0,1 +2024-05-24,2405.13865,,ReVideo: Remake a Video with Motion and Content Control,https://huggingface.co/papers/2405.13865,22,3,0,1,0,0 +2024-05-24,2405.14598,,Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation,https://huggingface.co/papers/2405.14598,11,1,0,0,0,0 +2024-05-24,2405.14677,https://github.com/feifeiobama/RectifID,RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance,https://huggingface.co/papers/2405.14677,8,0,1,0,0,0 +2024-05-24,2405.13195,,CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers,https://huggingface.co/papers/2405.13195,8,1,0,0,0,0 +2024-05-22,2405.12970,,Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control,https://huggingface.co/papers/2405.12970,22,3,0,2,0,1 +2024-05-22,2405.12979,https://github.com/google-research/omniglue,OmniGlue: Generalizable Feature Matching with Foundation Model Guidance,https://huggingface.co/papers/2405.12979,9,1,1,0,0,2 +2024-05-22,2405.12981,,Reducing Transformer Key-Value Cache Size with Cross-Layer Attention,https://huggingface.co/papers/2405.12981,26,2,0,0,0,0 +2024-05-22,2405.12250,https://github.com/AIRI-Institute/LLM-Microscope,Your Transformer is Secretly Linear,https://huggingface.co/papers/2405.12250,145,12,0,0,0,0 +2024-05-22,2405.12399,https://github.com/eloialonso/diamond,Diffusion for World Modeling: Visual Details Matter in Atari,https://huggingface.co/papers/2405.12399,25,3,1,1,0,0 +2024-05-22,2405.12978,,Personalized Residuals for Concept-Driven Text-to-Image Generation,https://huggingface.co/papers/2405.12978,9,1,0,0,0,0 +2024-05-21,2405.11582,https://github.com/mindspore-lab/models,SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization,https://huggingface.co/papers/2405.11582,12,0,0,0,0,0 +2024-05-21,2405.11157,,Towards Modular LLMs by Building and Reusing a Library of LoRAs,https://huggingface.co/papers/2405.11157,23,2,0,0,0,0 +2024-05-21,2405.12107,https://github.com/milvlg/imp,Imp: Highly Capable Large Multimodal Models for Mobile Devices,https://huggingface.co/papers/2405.12107,23,1,1,3,0,0 +2024-05-21,2405.12130,https://github.com/kongds/mora,MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning,https://huggingface.co/papers/2405.12130,44,6,0,0,0,0 +2024-05-21,2405.12213,,Octo: An Open-Source Generalist Robot Policy,https://huggingface.co/papers/2405.12213,22,1,0,0,0,0 +2024-05-21,2405.11473,https://github.com/jjihwan/FIFO-Diffusion_public,FIFO-Diffusion: Generating Infinite Videos from Text without Training,https://huggingface.co/papers/2405.11473,53,5,1,0,0,0 +2024-05-21,2405.11252,https://github.com/xingy038/dreamer-xl,Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching,https://huggingface.co/papers/2405.11252,11,0,0,0,0,0 +2024-05-21,2405.11143,https://github.com/OpenLLMAI/OpenRLHF,"OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework",https://huggingface.co/papers/2405.11143,33,3,1,0,0,0 +2024-05-20,2405.10626,https://github.com/cvi-szu/linly,Dynamic data sampler for cross-language transfer learning in large language models,https://huggingface.co/papers/2405.10626,4,0,1,0,0,0 +2024-05-20,2405.10637,https://github.com/whyNLP/LCKV,Layer-Condensed KV Cache for Efficient Inference of Large Language Models,https://huggingface.co/papers/2405.10637,17,1,1,0,0,0 +2024-05-20,2405.10370,https://github.com/OpenRobotLab/Grounded_3D-LLM,Grounded 3D-LLM with Referent Tokens,https://huggingface.co/papers/2405.10370,9,1,1,0,1,0 +2024-05-20,2405.10725,,INDUS: Effective and Efficient Language Models for Scientific Applications,https://huggingface.co/papers/2405.10725,30,1,0,6,3,3 +2024-05-20,2405.10938,https://github.com/ryoungj/obsscaling,Observational Scaling Laws and the Predictability of Language Model Performance,https://huggingface.co/papers/2405.10938,10,1,1,0,0,0 +2024-05-17,2405.10300,https://github.com/idea-research/grounding-dino-1.5-api,"Grounding DINO 1.5: Advance the ""Edge"" of Open-Set Object Detection",https://huggingface.co/papers/2405.10300,25,2,1,0,0,0 +2024-05-17,2405.10315,,TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction,https://huggingface.co/papers/2405.10315,9,0,0,1,1,0 +2024-05-17,2405.10320,,Toon3D: Seeing Cartoons from a New Perspective,https://huggingface.co/papers/2405.10320,19,2,0,0,0,0 +2024-05-17,2405.09874,,Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion,https://huggingface.co/papers/2405.09874,15,0,0,0,0,0 +2024-05-17,2405.09798,https://github.com/stanfordmlgroup/ManyICL,Many-Shot In-Context Learning in Multimodal Foundation Models,https://huggingface.co/papers/2405.09798,26,2,0,0,0,0 +2024-05-17,2405.10314,,CAT3D: Create Anything in 3D with Multi-View Diffusion Models,https://huggingface.co/papers/2405.10314,40,2,0,0,0,0 +2024-05-17,2405.09673,,LoRA Learns Less and Forgets Less,https://huggingface.co/papers/2405.09673,81,4,0,0,0,0 +2024-05-17,2405.09818,,Chameleon: Mixed-Modal Early-Fusion Foundation Models,https://huggingface.co/papers/2405.09818,117,8,0,3,0,3 +2024-05-16,2405.09062,,Naturalistic Music Decoding from EEG Data via Latent Diffusion Models,https://huggingface.co/papers/2405.09062,7,0,0,0,0,0 +2024-05-16,2405.09546,,BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation,https://huggingface.co/papers/2405.09546,9,0,0,0,0,0 +2024-05-16,2405.09220,,ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models,https://huggingface.co/papers/2405.09220,23,1,0,0,0,0 +2024-05-16,2405.09215,https://github.com/xiaoduoailab/xmodelvlm,Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model,https://huggingface.co/papers/2405.09215,15,1,1,1,0,1 +2024-05-15,2405.08295,,SpeechVerse: A Large-scale Generalizable Audio Language Model,https://huggingface.co/papers/2405.08295,12,0,0,0,0,0 +2024-05-15,2405.08317,,SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models,https://huggingface.co/papers/2405.08317,9,0,0,0,0,0 +2024-05-15,2405.08448,,Understanding the performance gap between online and offline alignment algorithms,https://huggingface.co/papers/2405.08448,14,0,0,0,0,0 +2024-05-15,2405.08707,,Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory,https://huggingface.co/papers/2405.08707,27,0,0,0,0,0 +2024-05-15,2405.08344,https://github.com/mindspore-lab/models,No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding,https://huggingface.co/papers/2405.08344,11,0,0,0,0,0 +2024-05-15,2405.08054,,Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning,https://huggingface.co/papers/2405.08054,21,0,0,0,0,0 +2024-05-15,2405.08748,https://github.com/tencent/hunyuandit,Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding,https://huggingface.co/papers/2405.08748,18,2,1,9,0,4 +2024-05-15,2405.08246,,Compositional Text-to-Image Generation with Dense Blob Representations,https://huggingface.co/papers/2405.08246,11,1,0,0,0,0 +2024-05-14,2405.02246,,What matters when building vision-language models?,https://huggingface.co/papers/2405.02246,93,2,0,6,1,99 +2024-05-14,2405.06694,,SUTRA: Scalable Multilingual Language Model Architecture,https://huggingface.co/papers/2405.06694,36,2,0,1,0,2 +2024-05-14,2405.07863,https://github.com/rlhflow/online-rlhf,RLHF Workflow: From Reward Modeling to Online RLHF,https://huggingface.co/papers/2405.07863,62,3,1,17,0,2 +2024-05-14,2405.06650,https://github.com/IBM/NL2PDDL,Large Language Models as Planning Domain Generators,https://huggingface.co/papers/2405.06650,8,1,0,0,0,0 +2024-05-14,2405.07065,,LogoMotion: Visually Grounded Code Generation for Content-Aware Animation,https://huggingface.co/papers/2405.07065,16,2,0,0,0,0 +2024-05-14,2405.07526,https://github.com/microsoft/ms-marco-web-search,MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels,https://huggingface.co/papers/2405.07526,16,1,0,0,0,0 +2024-05-14,2405.06932,https://github.com/hjq133/piccolo-embedding,Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training,https://huggingface.co/papers/2405.06932,15,1,1,1,0,1 +2024-05-14,2405.07518,,SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts,https://huggingface.co/papers/2405.07518,22,0,0,0,0,0 +2024-05-14,2405.07990,,Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots,https://huggingface.co/papers/2405.07990,15,2,0,0,1,0 +2024-05-03,2405.01536,,Customizing Text-to-Image Models with a Single Image Pair,https://huggingface.co/papers/2405.01536,17,1,0,0,0,1 +2024-05-03,2405.01525,,FLAME: Factuality-Aware Alignment for Large Language Models,https://huggingface.co/papers/2405.01525,23,1,0,0,0,0 +2024-05-03,2405.00983,,LLM-AD: Large Language Model based Audio Description System,https://huggingface.co/papers/2405.00983,15,1,0,0,0,0 +2024-05-03,2405.01481,https://github.com/nvidia/nemo-aligner,NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment,https://huggingface.co/papers/2405.01481,22,1,1,0,0,0 +2024-05-03,2405.00732,https://github.com/predibase/lora_bakeoff,"LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report",https://huggingface.co/papers/2405.00732,116,5,0,0,0,0 +2024-05-03,2405.01470,,WildChat: 1M ChatGPT Interaction Logs in the Wild,https://huggingface.co/papers/2405.01470,57,1,0,2,4,0 +2024-05-03,2405.01535,https://github.com/prometheus-eval/prometheus-eval,Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models,https://huggingface.co/papers/2405.01535,109,4,1,10,1,4 +2024-05-03,2405.01434,https://github.com/hvision-nku/storydiffusion,StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation,https://huggingface.co/papers/2405.01434,50,3,1,0,0,6 +2024-05-02,2405.00029,,Automatic Creative Selection with Cross-Modal Matching,https://huggingface.co/papers/2405.00029,7,1,0,0,0,0 +2024-05-02,2405.00236,,STT: Stateful Tracking with Transformers for Autonomous Driving,https://huggingface.co/papers/2405.00236,7,2,0,0,0,0 +2024-05-02,2405.00675,https://github.com/uclaml/sppo,Self-Play Preference Optimization for Language Model Alignment,https://huggingface.co/papers/2405.00675,21,3,1,29,0,1 +2024-05-02,2405.00263,,Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge,https://huggingface.co/papers/2405.00263,14,1,0,0,0,0 +2024-05-02,2404.18212,https://github.com/RotsteinNoam/Paint-by-Inpaint,Paint by Inpaint: Learning to Add Image Objects by Removing Them First,https://huggingface.co/papers/2404.18212,26,4,1,4,2,1 +2024-05-02,2405.00233,,SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound,https://huggingface.co/papers/2405.00233,13,1,0,0,1,0 +2024-05-02,2405.00664,https://github.com/scalable-model-editing/unified-model-editing,Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3,https://huggingface.co/papers/2405.00664,18,1,0,0,0,0 +2024-05-02,2405.00332,,A Careful Examination of Large Language Model Performance on Grade School Arithmetic,https://huggingface.co/papers/2405.00332,30,2,0,0,2,0 +2024-05-02,2405.00676,https://github.com/runyiyang/sundae,Spectrally Pruned Gaussian Fields with Neural Compensation,https://huggingface.co/papers/2405.00676,8,1,0,0,0,0 +2024-05-01,2404.19760,https://github.com/facebookresearch/lightplane,Lightplane: Highly-Scalable Components for Neural 3D Fields,https://huggingface.co/papers/2404.19760,5,1,0,0,0,0 +2024-05-01,2404.19525,,MicroDreamer: Zero-shot 3D Generation in sim20 Seconds by Score-based Iterative Reconstruction,https://huggingface.co/papers/2404.19525,9,1,0,0,0,0 +2024-05-01,2404.19758,,Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting,https://huggingface.co/papers/2404.19758,10,1,0,0,0,1 +2024-05-01,2404.19756,https://github.com/kindxiaoming/pykan,KAN: Kolmogorov-Arnold Networks,https://huggingface.co/papers/2404.19756,102,3,0,3,0,1 +2024-05-01,2404.19759,https://github.com/Dai-Wenxun/MotionLCM,MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model,https://huggingface.co/papers/2404.19759,24,2,1,0,0,0 +2024-05-01,2404.19296,,Octopus v4: Graph of language models,https://huggingface.co/papers/2404.19296,116,12,0,3,0,6 +2024-05-01,2404.19553,https://github.com/flagopen/flagembedding,Extending Llama-3's Context Ten-Fold Overnight,https://huggingface.co/papers/2404.19553,30,3,1,0,0,0 +2024-05-01,2404.19752,,Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation,https://huggingface.co/papers/2404.19752,21,4,0,0,0,0 +2024-05-01,2404.19149,,SAGS: Structure-Aware 3D Gaussian Splatting,https://huggingface.co/papers/2404.19149,13,1,0,0,0,0 +2024-05-01,2404.19733,,Iterative Reasoning Preference Optimization,https://huggingface.co/papers/2404.19733,45,3,0,0,0,0 +2024-05-01,2404.19737,,Better & Faster Large Language Models via Multi-token Prediction,https://huggingface.co/papers/2404.19737,72,3,0,2,0,0 +2024-05-01,2404.19702,,GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting,https://huggingface.co/papers/2404.19702,17,1,0,0,0,0 +2024-05-01,2404.19753,,DOCCI: Descriptions of Connected and Contrasting Images,https://huggingface.co/papers/2404.19753,10,1,0,0,2,0 +2024-05-01,2404.19427,,InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation,https://huggingface.co/papers/2404.19427,71,6,0,0,0,0 +2024-04-30,2404.18928,,Stylus: Automatic Adapter Selection for Diffusion Models,https://huggingface.co/papers/2404.18928,14,1,0,0,0,0 +2024-04-30,2401.16465,,DressCode: Autoregressively Sewing and Generating Garments from Text Guidance,https://huggingface.co/papers/2401.16465,10,1,0,0,0,0 +2024-04-30,2404.18911,https://github.com/Equationliu/Kangaroo,Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting,https://huggingface.co/papers/2404.18911,29,2,1,0,0,0 +2024-04-30,2404.17672,,BlenderAlchemy: Editing 3D Graphics with Vision-Language Models,https://huggingface.co/papers/2404.17672,18,2,0,0,0,0 +2024-04-30,2404.18796,,Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models,https://huggingface.co/papers/2404.18796,67,3,0,5,0,0 +2024-04-30,2404.18243,,LEGENT: Open Platform for Embodied Agents,https://huggingface.co/papers/2404.18243,20,1,0,0,0,0 +2024-04-30,2404.18416,,Capabilities of Gemini Models in Medicine,https://huggingface.co/papers/2404.18416,22,3,0,0,1,0 +2024-04-30,2404.17521,https://github.com/Xiaoyao-Li/Ag2Manip,Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations,https://huggingface.co/papers/2404.17521,12,1,0,0,0,0 +2024-04-29,2404.17569,,MaPa: Text-driven Photorealistic Material Painting for 3D Shapes,https://huggingface.co/papers/2404.17569,11,1,0,0,0,0 +2024-04-29,2404.16994,https://github.com/magic-research/PLLaVA,PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning,https://huggingface.co/papers/2404.16994,34,3,1,3,0,4 +2024-04-29,2404.16845,,HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections,https://huggingface.co/papers/2404.16845,6,1,0,0,0,0 +2024-04-29,2404.16873,https://github.com/facebookresearch/advprompter,AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs,https://huggingface.co/papers/2404.16873,27,1,0,0,0,0 +2024-04-26,2404.16375,https://github.com/zzxslp/som-llava,List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs,https://huggingface.co/papers/2404.16375,16,2,1,2,0,0 +2024-04-26,2404.16820,https://github.com/google-deepmind/gecko_benchmark_t2i,"Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings",https://huggingface.co/papers/2404.16820,15,2,0,0,0,0 +2024-04-26,2404.16221,,NeRF-XL: Scaling NeRFs with Multiple GPUs,https://huggingface.co/papers/2404.16221,12,1,0,0,0,0 +2024-04-26,2404.16811,https://github.com/microsoft/FILM,Make Your LLM Fully Utilize the Context,https://huggingface.co/papers/2404.16811,52,2,1,1,0,0 +2024-04-26,2404.16771,https://github.com/JackAILab/ConsistentID,ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving,https://huggingface.co/papers/2404.16771,16,1,1,1,0,1 +2024-04-26,2404.16710,,Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding,https://huggingface.co/papers/2404.16710,56,6,0,0,0,0 +2024-04-26,2404.16645,,Tele-FLM Technical Report,https://huggingface.co/papers/2404.16645,17,1,0,3,0,0 +2024-04-26,2404.16821,https://github.com/opengvlab/internvl,How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites,https://huggingface.co/papers/2404.16821,51,5,1,34,1,9 +2024-04-26,2404.16790,https://github.com/ailab-cvc/seed-bench,SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension,https://huggingface.co/papers/2404.16790,7,1,1,0,0,0 +2024-04-26,2404.16510,,Interactive3D: Create What You Want by Interactive 3D Generation,https://huggingface.co/papers/2404.16510,18,1,0,0,0,0 +2024-04-25,2404.16035,,MaGGIe: Masked Guided Gradual Human Instance Matting,https://huggingface.co/papers/2404.16035,8,1,0,2,1,0 +2024-04-25,2404.15653,https://github.com/apple/corenet,CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data,https://huggingface.co/papers/2404.15653,25,3,0,0,0,0 +2024-04-25,2404.16030,https://github.com/facebookresearch/metaclip,MoDE: CLIP Data Experts via Clustering,https://huggingface.co/papers/2404.16030,11,1,1,0,0,0 +2024-04-25,2404.15778,,BASS: Batched Attention-optimized Speculative Sampling,https://huggingface.co/papers/2404.15778,8,1,0,0,0,0 +2024-04-25,2404.15420,,XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference,https://huggingface.co/papers/2404.15420,7,1,0,0,0,0 +2024-04-25,2404.16029,,Editable Image Elements for Controllable Synthesis,https://huggingface.co/papers/2404.16029,10,1,0,0,0,0 +2024-04-25,2404.15789,,MotionMaster: Training-free Camera Motion Transfer For Video Generation,https://huggingface.co/papers/2404.15789,10,1,0,0,0,0 +2024-04-25,2404.15449,,ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning,https://huggingface.co/papers/2404.15449,11,1,0,0,0,0 +2024-04-25,2404.16022,https://github.com/tothebeginning/pulid,PuLID: Pure and Lightning ID Customization via Contrastive Alignment,https://huggingface.co/papers/2404.16022,16,1,1,0,0,4 +2024-04-24,2404.14994,,Transformers Can Represent n-gram Language Models,https://huggingface.co/papers/2404.14994,18,1,0,0,0,0 +2024-04-24,2404.15045,,Multi-Head Mixture-of-Experts,https://huggingface.co/papers/2404.15045,56,2,0,0,0,0 +2024-04-24,2404.14687,,Pegasus-v1 Technical Report,https://huggingface.co/papers/2404.14687,31,2,0,0,0,0 +2024-04-24,2404.14700,,FlashSpeech: Efficient Zero-Shot Speech Synthesis,https://huggingface.co/papers/2404.14700,29,4,0,0,0,0 +2024-04-24,2404.14469,https://github.com/fasterdecoding/snapkv,SnapKV: LLM Knows What You are Looking for Before Generation,https://huggingface.co/papers/2404.14469,23,2,0,0,0,0 +2024-04-24,2404.14507,,Align Your Steps: Optimizing Sampling Schedules in Diffusion Models,https://huggingface.co/papers/2404.14507,21,1,0,0,0,0 +2024-04-24,2404.14619,,OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework,https://huggingface.co/papers/2404.14619,124,7,0,22,0,10 +2024-04-23,2404.14351,,Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer,https://huggingface.co/papers/2404.14351,5,1,0,0,0,0 +2024-04-23,2404.14405,,Learning H-Infinity Locomotion Control,https://huggingface.co/papers/2404.14405,6,1,0,0,0,0 +2024-04-23,2404.14047,https://github.com/macaronlin/llama3-quantization,How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study,https://huggingface.co/papers/2404.14047,39,9,1,1,0,0 +2024-04-23,2404.13208,,The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions,https://huggingface.co/papers/2404.13208,38,6,0,0,0,0 +2024-04-23,2404.13050,,FlowMind: Automatic Workflow Generation with LLMs,https://huggingface.co/papers/2404.13050,32,1,0,0,0,0 +2024-04-23,2404.14396,https://github.com/ailab-cvc/seed-x,SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation,https://huggingface.co/papers/2404.14396,17,2,1,1,4,1 +2024-04-23,2404.13686,,Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis,https://huggingface.co/papers/2404.13686,26,2,0,3,0,21 +2024-04-23,2404.14394,,A Multimodal Automated Interpretability Agent,https://huggingface.co/papers/2404.14394,19,1,0,0,0,0 +2024-04-23,2404.14239,https://github.com/chenyangzhu1/multibooth,MultiBooth: Towards Generating All Your Concepts in an Image from Text,https://huggingface.co/papers/2404.14239,8,1,1,0,0,0 +2024-04-23,2404.14219,,Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone,https://huggingface.co/papers/2404.14219,243,29,0,5,2,4 +2024-04-23,2404.13358,,Music Consistency Models,https://huggingface.co/papers/2404.13358,12,3,0,0,0,0 +2024-04-22,2404.13013,,Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models,https://huggingface.co/papers/2404.13013,28,2,0,0,1,0 +2024-04-22,2404.12803,,TextSquare: Scaling up Text-Centric Visual Instruction Tuning,https://huggingface.co/papers/2404.12803,28,4,0,0,0,0 +2024-04-22,2404.12872,https://github.com/damo-nlp-sg/llm-r2,LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency,https://huggingface.co/papers/2404.12872,9,1,0,0,0,0 +2024-04-22,2404.12753,https://github.com/ez-hwh/autocrawler,AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation,https://huggingface.co/papers/2404.12753,40,1,0,0,0,0 +2024-04-22,2404.12833,https://github.com/ghabix/srepair,How Far Can We Go with Practical Function-Level Program Repair?,https://huggingface.co/papers/2404.12833,6,1,1,0,0,0 +2024-04-22,2404.12547,,Does Gaussian Splatting need SFM Initialization?,https://huggingface.co/papers/2404.12547,8,1,0,0,0,0 +2024-04-22,2404.13026,,PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation,https://huggingface.co/papers/2404.13026,21,1,0,0,0,0 +2024-04-19,2404.11925,,EdgeFusion: On-Device Text-to-Image Generation,https://huggingface.co/papers/2404.11925,20,1,0,0,0,0 +2024-04-19,2404.12385,,MeshLRM: Large Reconstruction Model for High-Quality Mesh,https://huggingface.co/papers/2404.12385,25,2,0,0,0,0 +2024-04-19,2404.11614,,Dynamic Typography: Bringing Words to Life,https://huggingface.co/papers/2404.11614,40,4,0,0,0,0 +2024-04-19,2404.12241,https://github.com/mlcommons/modelbench,Introducing v0.5 of the AI Safety Benchmark from MLCommons,https://huggingface.co/papers/2404.12241,10,1,0,0,0,0 +2024-04-19,2404.12318,,Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment,https://huggingface.co/papers/2404.12318,14,1,0,0,0,0 +2024-04-19,2404.11565,,MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation,https://huggingface.co/papers/2404.11565,12,1,0,0,0,0 +2024-04-19,2404.12253,,"Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing",https://huggingface.co/papers/2404.12253,52,3,0,0,0,0 +2024-04-19,2404.12195,https://bitbucket.org/paladinanalytics/qlora-finetuning,"OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data",https://huggingface.co/papers/2404.12195,11,1,0,3,3,0 +2024-04-19,2404.11912,https://github.com/Infini-AI-Lab/TriForce,TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding,https://huggingface.co/papers/2404.11912,16,1,1,0,0,0 +2024-04-19,2404.12347,,AniClipart: Clipart Animation with Text-to-Video Priors,https://huggingface.co/papers/2404.12347,11,1,0,0,0,0 +2024-04-19,2404.12387,,"Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models",https://huggingface.co/papers/2404.12387,38,1,0,0,0,0 +2024-04-19,2404.12390,,BLINK: Multimodal Large Language Models Can See but Not Perceive,https://huggingface.co/papers/2404.12390,24,2,0,0,1,0 +2024-04-17,2404.10301,,Long-form music generation with latent diffusion,https://huggingface.co/papers/2404.10301,23,1,0,0,0,0 +2024-04-17,2404.10179,,Scaling Instructable Agents Across Many Simulated Worlds,https://huggingface.co/papers/2404.10179,24,1,0,0,0,0 +2024-04-16,2404.09204,https://github.com/yuyq96/texthawk,TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models,https://huggingface.co/papers/2404.09204,10,0,0,0,0,0 +2024-04-16,2404.08856,,On Speculative Decoding for Multimodal Large Language Models,https://huggingface.co/papers/2404.08856,13,1,0,0,0,0 +2024-04-16,2404.09937,https://github.com/hkust-nlp/llm-compression-intelligence,Compression Represents Intelligence Linearly,https://huggingface.co/papers/2404.09937,27,1,1,0,1,0 +2024-04-16,2404.09458,,CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting,https://huggingface.co/papers/2404.09458,6,0,0,0,0,0 +2024-04-16,2404.08801,https://github.com/xuezhemax/megalodon,Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length,https://huggingface.co/papers/2404.08801,62,1,0,0,0,0 +2024-04-16,2404.09833,,"Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video",https://huggingface.co/papers/2404.09833,29,2,0,0,0,0 +2024-04-16,2404.09173,,TransformerFAM: Feedback attention is working memory,https://huggingface.co/papers/2404.09173,42,0,0,0,0,0 +2024-04-16,2404.09995,,Taming Latent Diffusion Model for Neural Radiance Field Inpainting,https://huggingface.co/papers/2404.09995,6,0,0,0,0,0 +2024-04-16,2404.09656,,Learn Your Reference Model for Real Good Alignment,https://huggingface.co/papers/2404.09656,81,0,0,0,0,0 +2024-04-16,2404.09990,,HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing,https://huggingface.co/papers/2404.09990,12,0,0,0,0,0 +2024-04-16,2404.09967,,Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model,https://huggingface.co/papers/2404.09967,20,0,0,0,0,0 +2024-04-16,2404.09956,https://github.com/declare-lab/tango,Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization,https://huggingface.co/papers/2404.09956,11,0,1,2,0,8 +2024-04-15,2404.08495,https://github.com/cornell-rl/drpo,Dataset Reset Policy Optimization for RLHF,https://huggingface.co/papers/2404.08495,8,0,0,0,0,0 +2024-04-15,2404.08197,,"Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies",https://huggingface.co/papers/2404.08197,27,1,0,0,0,0 +2024-04-15,2404.08639,,COCONut: Modernizing COCO Segmentation,https://huggingface.co/papers/2404.08639,26,4,0,0,0,0 +2024-04-15,2404.08636,https://github.com/mbanani/probe3d,Probing the 3D Awareness of Visual Foundation Models,https://huggingface.co/papers/2404.08636,11,0,0,0,0,0 +2024-04-15,2404.08540,https://github.com/agneet42/robustness_depth_lang,On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation,https://huggingface.co/papers/2404.08540,10,0,0,0,0,0 +2024-04-15,2404.08252,,MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance,https://huggingface.co/papers/2404.08252,5,0,0,0,0,0 +2024-04-15,2404.08634,https://github.com/Lightning-AI/lit-gpt,Pre-training Small Base LMs with Fewer Tokens,https://huggingface.co/papers/2404.08634,33,2,1,13,0,0 +2024-04-12,2404.07904,https://github.com/opennlplab/hgrn2,HGRN2: Gated Linear RNNs with State Expansion,https://huggingface.co/papers/2404.07904,16,1,1,0,0,0 +2024-04-12,2404.05902,,WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents,https://huggingface.co/papers/2404.05902,20,2,0,0,0,0 +2024-04-12,2404.07724,,Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models,https://huggingface.co/papers/2404.07724,10,1,0,0,0,0 +2024-04-12,2404.07821,,Sparse Laneformer,https://huggingface.co/papers/2404.07821,9,1,0,0,0,0 +2024-04-12,2404.07503,,Best Practices and Lessons Learned on Synthetic Data for Language Models,https://huggingface.co/papers/2404.07503,25,1,0,0,0,0 +2024-04-12,2404.07979,https://github.com/jeffreysijuntan/lloco,LLoCO: Learning Long Contexts Offline,https://huggingface.co/papers/2404.07979,15,2,1,5,0,0 +2024-04-12,2404.07972,,OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments,https://huggingface.co/papers/2404.07972,41,1,0,0,0,0 +2024-04-12,2404.07987,https://github.com/liming-ai/ControlNet_Plus_Plus,ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback,https://huggingface.co/papers/2404.07987,46,2,1,0,0,1 +2024-04-12,2404.07839,https://github.com/google-deepmind/recurrentgemma,RecurrentGemma: Moving Past Transformers for Efficient Open Language Models,https://huggingface.co/papers/2404.07839,40,2,0,0,0,0 +2024-04-12,2404.07965,https://github.com/microsoft/rho,Rho-1: Not All Tokens Are What You Need,https://huggingface.co/papers/2404.07965,80,9,1,17,0,1 +2024-04-12,2404.07973,,Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models,https://huggingface.co/papers/2404.07973,29,3,0,0,0,0 +2024-04-12,2404.07544,https://github.com/robertvacareanu/llm4regression,From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples,https://huggingface.co/papers/2404.07544,15,1,1,0,0,0 +2024-04-12,2404.07616,,Audio Dialogues: Dialogues dataset for audio and music understanding,https://huggingface.co/papers/2404.07616,15,1,0,0,0,0 +2024-04-12,2404.07413,https://github.com/myshell-ai/jetmoe,JetMoE: Reaching Llama2 Performance with 0.1M Dollars,https://huggingface.co/papers/2404.07413,32,4,1,3,0,0 +2024-04-12,2404.07448,https://github.com/xujxyang/opentrans,Transferable and Principled Efficiency for Open-Vocabulary Segmentation,https://huggingface.co/papers/2404.07448,10,1,0,0,0,0 +2024-04-11,2404.06903,,DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting,https://huggingface.co/papers/2404.06903,14,3,0,0,0,0 +2024-04-11,2404.06773,https://github.com/techmonsterwang/illama,Adapting LLaMA Decoder to Vision Transformer,https://huggingface.co/papers/2404.06773,14,1,1,0,0,0 +2024-04-11,2404.07204,,BRAVE: Broadening the visual encoding of vision-language models,https://huggingface.co/papers/2404.07204,15,1,0,0,0,0 +2024-04-11,2404.06780,,Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior,https://huggingface.co/papers/2404.06780,9,1,0,0,0,0 +2024-04-11,2404.07199,,RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion,https://huggingface.co/papers/2404.07199,22,2,0,0,0,0 +2024-04-11,2404.06654,https://github.com/hsiehjackson/ruler,RULER: What's the Real Context Size of Your Long-Context Language Models?,https://huggingface.co/papers/2404.06654,32,3,1,0,0,0 +2024-04-11,2404.07143,,Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention,https://huggingface.co/papers/2404.07143,98,11,0,1,0,1 +2024-04-10,2404.06109,,Revising Densification in Gaussian Splatting,https://huggingface.co/papers/2404.06109,8,0,0,0,0,0 +2024-04-10,2404.06091,https://github.com/Adamdad/hash3D,Hash3D: Training-free Acceleration for 3D Generation,https://huggingface.co/papers/2404.06091,12,0,0,0,0,0 +2024-04-10,2404.06507,,Reconstructing Hand-Held Objects in 3D,https://huggingface.co/papers/2404.06507,5,0,0,0,0,0 +2024-04-10,2404.05829,,SambaLingo: Teaching Large Language Models New Languages,https://huggingface.co/papers/2404.05829,12,0,0,26,0,2 +2024-04-10,2404.05875,,CodecLM: Aligning Language Models with Tailored Synthetic Data,https://huggingface.co/papers/2404.05875,16,0,0,0,0,0 +2024-04-10,2404.06209,https://github.com/interpretml/llm-tabular-memorization-checker,Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models,https://huggingface.co/papers/2404.06209,4,0,0,0,0,0 +2024-04-10,2404.06395,https://github.com/openbmb/minicpm,MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies,https://huggingface.co/papers/2404.06395,18,1,1,4,0,1 +2024-04-10,2404.06212,https://github.com/airi-institute/omnifusion,OmniFusion Technical Report,https://huggingface.co/papers/2404.06212,72,5,1,1,0,0 +2024-04-10,2404.05892,https://github.com/rwkv/rwkv-infctx-trainer,Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence,https://huggingface.co/papers/2404.05892,28,0,0,4,0,1 +2024-04-10,2404.05961,https://github.com/mcgill-nlp/llm2vec,LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders,https://huggingface.co/papers/2404.05961,63,4,1,13,0,2 +2024-04-10,2404.06393,,MuPT: A Generative Symbolic Music Pretrained Transformer,https://huggingface.co/papers/2404.06393,14,0,0,0,0,0 +2024-04-10,2404.06512,https://github.com/internlm/internlm-xcomposer,InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD,https://huggingface.co/papers/2404.06512,29,1,1,0,0,0 +2024-04-10,2404.06429,,Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion,https://huggingface.co/papers/2404.06429,6,0,0,1,0,0 +2024-04-09,2404.05674,https://github.com/bytedance/MoMA,MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation,https://huggingface.co/papers/2404.05674,12,1,1,1,0,2 +2024-04-09,2404.04421,,PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations,https://huggingface.co/papers/2404.04421,14,0,0,0,0,0 +2024-04-09,2404.04319,,SpatialTracker: Tracking Any 2D Pixels in 3D Space,https://huggingface.co/papers/2404.04319,22,1,0,0,0,0 +2024-04-09,2404.05666,,YaART: Yet Another ART Rendering Technology,https://huggingface.co/papers/2404.05666,14,0,0,0,0,0 +2024-04-09,2404.05717,,SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing,https://huggingface.co/papers/2404.05717,24,0,0,0,0,0 +2024-04-09,2404.04346,,Koala: Key frame-conditioned long video-LLM,https://huggingface.co/papers/2404.04346,5,2,0,0,0,1 +2024-04-09,2404.04526,,DATENeRF: Depth-Aware Text-based Editing of NeRFs,https://huggingface.co/papers/2404.04526,7,0,0,0,0,0 +2024-04-09,2404.04465,,Aligning Diffusion Models by Optimizing Human Utility,https://huggingface.co/papers/2404.04465,12,1,0,0,0,0 +2024-04-09,2404.04544,,BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion,https://huggingface.co/papers/2404.04544,20,0,0,0,0,0 +2024-04-09,2404.04860,,"ByteEdit: Boost, Comply and Accelerate Generative Image Editing",https://huggingface.co/papers/2404.04860,24,1,0,0,0,0 +2024-04-09,2404.05719,,Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs,https://huggingface.co/papers/2404.05719,58,3,0,0,0,0 +2024-04-09,2404.05014,https://github.com/pku-yuangroup/magictime,MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators,https://huggingface.co/papers/2404.05014,53,2,1,1,2,3 +2024-04-09,2404.04478,https://github.com/feizc/diffusion-rwkv,Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models,https://huggingface.co/papers/2404.04478,11,0,1,0,0,0 +2024-04-09,2404.05726,https://github.com/boheumd/MA-LMM,MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding,https://huggingface.co/papers/2404.05726,19,0,0,0,0,0 +2024-04-09,2404.05595,,UniFL: Improve Stable Diffusion via Unified Feedback Learning,https://huggingface.co/papers/2404.05595,22,1,0,0,0,0 +2024-04-08,2404.04256,https://github.com/zifuwan/sigma,Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation,https://huggingface.co/papers/2404.04256,5,1,0,0,0,0 +2024-04-08,2404.04204,,Social Skill Training with Large Language Models,https://huggingface.co/papers/2404.04204,15,0,0,0,0,0 +2024-04-08,2404.04211,,Robust Gaussian Splatting,https://huggingface.co/papers/2404.04211,7,0,0,0,0,0 +2024-04-08,2404.03820,,CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues,https://huggingface.co/papers/2404.03820,22,1,0,0,0,0 +2024-04-08,2404.04167,,Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model,https://huggingface.co/papers/2404.04167,8,1,0,6,2,1 +2024-04-08,2404.04125,https://github.com/bethgelab/frequency_determines_performance,"No ""Zero-Shot"" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance",https://huggingface.co/papers/2404.04125,27,1,1,0,0,0 +2024-04-08,2404.03673,https://github.com/Owen-Oertell/rlcm,RL for Consistency Models: Faster Reward Guided Text-to-Image Generation,https://huggingface.co/papers/2404.03673,14,2,0,0,0,0 +2024-04-08,2404.03715,,Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences,https://huggingface.co/papers/2404.03715,58,1,0,0,0,0 +2024-04-08,2404.03683,https://github.com/kanishkg/stream-of-search,Stream of Search (SoS): Learning to Search in Language,https://huggingface.co/papers/2404.03683,21,0,0,0,0,0 +2024-04-05,2404.03413,https://github.com/Vision-CAIR/MiniGPT4-video,MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens,https://huggingface.co/papers/2404.03413,22,1,1,0,0,0 +2024-04-05,2404.03204,,RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis,https://huggingface.co/papers/2404.03204,7,0,0,0,0,0 +2024-04-05,2404.03118,,LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models,https://huggingface.co/papers/2404.03118,20,1,0,0,0,0 +2024-04-05,2404.03566,,PointInfinity: Resolution-Invariant Point Diffusion Models,https://huggingface.co/papers/2404.03566,13,1,0,0,0,0 +2024-04-05,2404.03653,https://github.com/Karine-Huang/T2I-CompBench,CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching,https://huggingface.co/papers/2404.03653,29,3,0,0,0,0 +2024-04-05,2404.03411,,Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?,https://huggingface.co/papers/2404.03411,8,0,0,0,0,0 +2024-04-05,2404.03648,https://github.com/thudm/autowebglm,AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent,https://huggingface.co/papers/2404.03648,23,3,1,0,0,0 +2024-04-05,2404.03626,,Training LLMs over Neurally Compressed Text,https://huggingface.co/papers/2404.03626,21,3,0,0,0,0 +2024-04-05,2404.03592,https://github.com/stanfordnlp/pyreft,ReFT: Representation Finetuning for Language Models,https://huggingface.co/papers/2404.03592,77,4,1,0,0,4 +2024-04-05,2404.03543,,CodeEditorBench: Evaluating Code Editing Capability of Large Language Models,https://huggingface.co/papers/2404.03543,15,1,0,0,1,0 +2024-04-04,2404.02514,,Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition,https://huggingface.co/papers/2404.02514,9,0,0,0,0,0 +2024-04-04,2404.02258,,Mixture-of-Depths: Dynamically allocating compute in transformer-based language models,https://huggingface.co/papers/2404.02258,102,6,0,1,0,1 +2024-04-04,2404.02575,,Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models,https://huggingface.co/papers/2404.02575,46,4,0,0,0,0 +2024-04-04,2404.02733,https://github.com/instantstyle/instantstyle,InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation,https://huggingface.co/papers/2404.02733,20,4,1,0,0,0 +2024-04-04,2404.02747,,Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models,https://huggingface.co/papers/2404.02747,11,1,0,0,0,0 +2024-04-04,2404.02893,https://github.com/thudm/chatglm-math,ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline,https://huggingface.co/papers/2404.02893,19,1,0,0,0,0 +2024-04-04,2404.02883,,On the Scalability of Diffusion-based Text-to-Image Generation,https://huggingface.co/papers/2404.02883,17,0,0,0,0,0 +2024-04-04,2404.02905,https://github.com/FoundationVision/VAR,Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction,https://huggingface.co/papers/2404.02905,63,3,1,1,0,0 +2024-04-03,2404.01954,,HyperCLOVA X Technical Report,https://huggingface.co/papers/2404.01954,19,1,0,0,0,0 +2024-04-03,2404.02125,,3D Congealing: 3D-Aware Image Alignment in the Wild,https://huggingface.co/papers/2404.02125,6,1,0,0,0,0 +2024-04-03,2404.01744,,Octopus v2: On-device language model for super agent,https://huggingface.co/papers/2404.01744,55,6,0,3,0,6 +2024-04-03,2404.01856,,Poro 34B and the Blessing of Multilinguality,https://huggingface.co/papers/2404.01856,12,1,0,2,0,2 +2024-04-03,2404.01367,,Bigger is not Always Better: Scaling Properties of Latent Diffusion Models,https://huggingface.co/papers/2404.01367,19,1,0,0,0,0 +2024-04-03,2404.01617,,LLM-ABR: Designing Adaptive Bitrate Algorithms via Large Language Models,https://huggingface.co/papers/2404.01617,6,1,0,0,0,0 +2024-04-03,2404.01331,https://github.com/intellabs/multimodal_cognitive_ai,LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model,https://huggingface.co/papers/2404.01331,23,1,0,2,0,0 +2024-04-03,2404.01475,https://github.com/lamalab-org/chem-bench,Are large language models superhuman chemists?,https://huggingface.co/papers/2404.01475,15,1,1,0,0,0 +2024-04-03,2404.02101,https://github.com/hehao13/cameractrl,CameraCtrl: Enabling Camera Control for Text-to-Video Generation,https://huggingface.co/papers/2404.02101,20,1,1,0,0,0 +2024-04-03,2404.02078,https://github.com/openbmb/eurus,Advancing LLM Reasoning Generalists with Preference Trees,https://huggingface.co/papers/2404.02078,41,2,1,9,3,0 +2024-04-03,2404.02060,https://github.com/tiger-ai-lab/longiclbench,Long-context LLMs Struggle with Long In-context Learning,https://huggingface.co/papers/2404.02060,33,4,1,0,1,1 +2024-04-02,2404.01297,https://github.com/google-research/scenic,Streaming Dense Video Captioning,https://huggingface.co/papers/2404.01297,11,1,0,0,0,0 +2024-04-02,2404.00308,https://github.com/TencentARC/ST-LLM,ST-LLM: Large Language Models Are Effective Temporal Learners,https://huggingface.co/papers/2404.00308,4,1,1,0,0,0 +2024-04-02,2404.00488,,Noise-Aware Training of Layout-Aware Language Models,https://huggingface.co/papers/2404.00488,6,1,0,0,0,0 +2024-04-02,2404.00656,,WavLLM: Towards Robust and Adaptive Speech Large Language Model,https://huggingface.co/papers/2404.00656,8,1,0,0,0,0 +2024-04-02,2404.01294,,CosmicMan: A Text-to-Image Foundation Model for Humans,https://huggingface.co/papers/2404.01294,15,1,0,2,0,1 +2024-04-02,2404.01292,https://github.com/learn2phoenix/csd,Measuring Style Similarity in Diffusion Models,https://huggingface.co/papers/2404.01292,14,1,1,0,0,0 +2024-04-02,2404.00399,,Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order,https://huggingface.co/papers/2404.00399,40,1,0,1,1,0 +2024-04-02,2404.00345,,"MaGRITTe: Manipulative and Generative 3D Realization from Image, Topview and Text",https://huggingface.co/papers/2404.00345,16,10,0,0,0,0 +2024-04-02,2404.01143,,Condition-Aware Neural Network for Controlled Image Generation,https://huggingface.co/papers/2404.01143,11,1,0,0,0,0 +2024-04-02,2404.01258,https://github.com/riflezhang/llava-hound-dpo,Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward,https://huggingface.co/papers/2404.01258,10,1,1,0,2,0 +2024-04-02,2404.00987,https://github.com/zhaorw02/FlexiDreamer,FlexiDreamer: Single Image-to-3D Generation with FlexiCubes,https://huggingface.co/papers/2404.00987,21,2,0,0,0,0 +2024-04-02,2404.01197,https://github.com/SPRIGHT-T2I/SPRIGHT,Getting it Right: Improving Spatial Consistency in Text-to-Image Models,https://huggingface.co/papers/2404.01197,29,3,1,1,5,2 +2024-04-01,2403.19851,https://github.com/googleinterns/localizing-paragraph-memorization,Localizing Paragraph Memorization in Language Models,https://huggingface.co/papers/2403.19851,13,1,0,0,0,0 +2024-04-01,2403.19928,https://github.com/yuchuantian/dijiang,DiJiang: Efficient Large Language Models through Compact Kernelization,https://huggingface.co/papers/2403.19928,9,1,1,0,0,0 +2024-04-01,2403.19888,,MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection,https://huggingface.co/papers/2403.19888,9,1,0,0,0,0 +2024-04-01,2403.20041,,Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs,https://huggingface.co/papers/2403.20041,34,3,0,0,0,0 +2024-04-01,2403.20327,,Gecko: Versatile Text Embeddings Distilled from Large Language Models,https://huggingface.co/papers/2403.20327,47,4,0,0,0,0 +2024-04-01,2403.19887,,Jamba: A Hybrid Transformer-Mamba Language Model,https://huggingface.co/papers/2403.19887,100,5,0,3,0,6 +2024-04-01,2403.20329,,ReALM: Reference Resolution As Language Modeling,https://huggingface.co/papers/2403.20329,20,2,0,0,0,0 +2024-04-01,2403.20275,,"Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces",https://huggingface.co/papers/2403.20275,8,1,0,0,0,0 +2024-04-01,2403.20309,,InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds,https://huggingface.co/papers/2403.20309,16,2,0,0,0,0 +2024-04-01,2403.20331,https://github.com/atsumiyai/upd,Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models,https://huggingface.co/papers/2403.20331,14,2,1,0,1,0 +2024-03-29,2403.19270,,sDPO: Don't Use Your Data All at Once,https://huggingface.co/papers/2403.19270,32,3,0,5,0,37 +2024-03-29,2403.19319,,Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation,https://huggingface.co/papers/2403.19319,6,1,0,0,0,0 +2024-03-29,2403.19655,,GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling,https://huggingface.co/papers/2403.19655,15,1,0,0,0,0 +2024-03-29,2403.19046,https://github.com/nvlabs/lita,LITA: Language Instructed Temporal-Localization Assistant,https://huggingface.co/papers/2403.19046,16,1,1,0,0,0 +2024-03-29,2403.18978,,TextCraftor: Your Text Encoder Can be Image Quality Controller,https://huggingface.co/papers/2403.18978,12,1,0,0,0,0 +2024-03-28,2403.18818,,ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion,https://huggingface.co/papers/2403.18818,23,3,0,0,0,0 +2024-03-28,2403.18118,,EgoLifter: Open-world 3D Segmentation for Egocentric Perception,https://huggingface.co/papers/2403.18118,8,1,0,0,0,0 +2024-03-28,2403.18783,,Towards a World-English Language Model for On-Device Virtual Assistants,https://huggingface.co/papers/2403.18783,4,1,0,0,0,0 +2024-03-28,2403.18605,,FlexEdit: Flexible and Controllable Diffusion-based Object-centric Image Editing,https://huggingface.co/papers/2403.18605,5,1,0,0,0,0 +2024-03-28,2403.18816,,Garment3DGen: 3D Garment Stylization and Texture Generation,https://huggingface.co/papers/2403.18816,19,2,0,0,0,0 +2024-03-28,2403.18802,https://github.com/google-deepmind/long-form-factuality,Long-form factuality in large language models,https://huggingface.co/papers/2403.18802,23,2,0,0,0,0 +2024-03-28,2403.18361,,ViTAR: Vision Transformer with Any Resolution,https://huggingface.co/papers/2403.18361,49,2,0,0,0,0 +2024-03-28,2403.18814,https://github.com/dvlab-research/minigemini,Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models,https://huggingface.co/papers/2403.18814,42,4,1,11,0,4 +2024-03-28,2403.18421,https://github.com/stanford-crfm/biomedlm,BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text,https://huggingface.co/papers/2403.18421,21,2,1,1,0,28 +2024-03-28,2403.18795,https://github.com/SkyworkAI/Gamba,Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction,https://huggingface.co/papers/2403.18795,17,2,0,0,0,0 +2024-03-27,2403.17694,https://github.com/scutzzj/aniportrait,AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation,https://huggingface.co/papers/2403.17694,10,2,1,1,0,1 +2024-03-27,2403.17888,https://github.com/hbb1/2d-gaussian-splatting,2D Gaussian Splatting for Geometrically Accurate Radiance Fields,https://huggingface.co/papers/2403.17888,25,3,1,0,0,0 +2024-03-27,2403.17237,,DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion,https://huggingface.co/papers/2403.17237,8,1,0,0,0,0 +2024-03-27,2403.17898,https://github.com/city-super/Octree-GS,Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians,https://huggingface.co/papers/2403.17898,12,1,1,0,0,0 +2024-03-27,2403.17297,,InternLM2 Technical Report,https://huggingface.co/papers/2403.17297,27,1,0,24,0,21 +2024-03-27,2403.17607,https://github.com/intel/tiny-dpcpp-nn,Fully-fused Multi-Layer Perceptrons on Intel Data Center GPUs,https://huggingface.co/papers/2403.17607,7,1,0,0,0,0 +2024-03-27,2403.17920,,TC4D: Trajectory-Conditioned Text-to-4D Generation,https://huggingface.co/papers/2403.17920,15,1,0,0,0,0 +2024-03-27,2403.17804,,Improving Text-to-Image Consistency via Automatic Prompt Optimization,https://huggingface.co/papers/2403.17804,14,1,0,0,0,0 +2024-03-27,2403.17887,,The Unreasonable Ineffectiveness of the Deeper Layers,https://huggingface.co/papers/2403.17887,75,8,0,22,0,0 +2024-03-26,2403.15447,,Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression,https://huggingface.co/papers/2403.15447,15,1,0,0,0,0 +2024-03-26,2403.15484,,RakutenAI-7B: Extending Large Language Models for Japanese,https://huggingface.co/papers/2403.15484,12,2,0,6,0,1 +2024-03-26,2403.17008,,FlashFace: Human Image Personalization with High-fidelity Identity Preservation,https://huggingface.co/papers/2403.17008,18,1,0,0,0,0 +2024-03-26,2403.16990,,Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation,https://huggingface.co/papers/2403.16990,24,2,0,0,0,1 +2024-03-26,2403.17001,,VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation,https://huggingface.co/papers/2403.17001,6,1,0,0,0,0 +2024-03-26,2403.16971,https://github.com/agiresearch/aios,LLM Agent Operating System,https://huggingface.co/papers/2403.16971,64,4,1,0,0,0 +2024-03-26,2403.17005,,TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models,https://huggingface.co/papers/2403.17005,13,1,0,0,0,0 +2024-03-26,2403.16627,https://github.com/IDKiro/sdxs,SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions,https://huggingface.co/papers/2403.16627,20,2,1,4,0,16 +2024-03-25,2403.15157,,AllHands: Ask Me Anything on Large-scale Verbatim Feedback via Large Language Models,https://huggingface.co/papers/2403.15157,6,2,0,0,0,0 +2024-03-25,2403.15382,,DragAPart: Learning a Part-Level Motion Prior for Articulated Objects,https://huggingface.co/papers/2403.15382,9,1,0,0,0,0 +2024-03-25,2403.14781,https://github.com/fudan-generative-vision/champ,Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance,https://huggingface.co/papers/2403.14781,14,2,1,2,0,0 +2024-03-25,2403.14714,,Compiler generated feedback for Large Language Models,https://huggingface.co/papers/2403.14714,4,1,0,0,0,0 +2024-03-25,2403.14870,,VidLA: Video-Language Alignment at Scale,https://huggingface.co/papers/2403.14870,11,1,0,0,0,0 +2024-03-25,2403.15383,https://github.com/3DTopia/ThemeStation,ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars,https://huggingface.co/papers/2403.15383,12,1,1,0,0,0 +2024-03-25,2403.15385,,LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis,https://huggingface.co/papers/2403.15385,5,1,0,0,0,0 +2024-03-25,2403.15246,https://github.com/orionw/followir,FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions,https://huggingface.co/papers/2403.15246,8,1,1,2,1,2 +2024-03-25,2403.15042,https://github.com/squeezeailab/llm2llm,LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement,https://huggingface.co/papers/2403.15042,24,2,0,0,0,0 +2024-03-25,2403.14773,https://github.com/picsart-ai-research/streamingt2v,"StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text",https://huggingface.co/papers/2403.14773,8,1,1,1,0,3 +2024-03-25,2403.15371,,Can large language models explore in-context?,https://huggingface.co/papers/2403.15371,30,2,0,0,0,0 +2024-03-25,2403.15360,https://github.com/badripatro/simba,SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series,https://huggingface.co/papers/2403.15360,11,1,0,0,0,0 +2024-03-25,2403.15377,https://github.com/opengvlab/internvideo2,InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding,https://huggingface.co/papers/2403.15377,18,1,0,10,1,2 +2024-03-22,2403.14602,,ReNoise: Real Image Inversion Through Iterative Noising,https://huggingface.co/papers/2403.14602,19,1,0,0,0,2 +2024-03-22,2403.14621,https://github.com/justimyhxu/grm,GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation,https://huggingface.co/papers/2403.14621,14,2,0,0,0,1 +2024-03-22,2403.14520,https://github.com/h-zhao1997/cobra,Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference,https://huggingface.co/papers/2403.14520,31,2,1,1,0,1 +2024-03-22,2403.14599,,MyVLM: Personalizing VLMs for User-Specific Queries,https://huggingface.co/papers/2403.14599,15,2,0,1,1,0 +2024-03-22,2403.14467,,Recourse for reclamation: Chatting with generative language models,https://huggingface.co/papers/2403.14467,6,1,0,0,0,0 +2024-03-22,2403.14624,,MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?,https://huggingface.co/papers/2403.14624,50,3,0,0,3,0 +2024-03-22,2403.14554,,Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering,https://huggingface.co/papers/2403.14554,12,1,0,0,0,0 +2024-03-22,2403.14186,https://github.com/jeolpyeoni/StyleCineGAN,StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN,https://huggingface.co/papers/2403.14186,8,1,0,0,0,0 +2024-03-22,2403.14611,,Explorative Inbetweening of Time and Space,https://huggingface.co/papers/2403.14611,11,1,0,0,0,0 +2024-03-22,2403.14613,,DreamReward: Text-to-3D Generation with Human Preference,https://huggingface.co/papers/2403.14613,33,2,0,0,1,0 +2024-03-22,2403.14468,,AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks,https://huggingface.co/papers/2403.14468,19,1,0,0,0,1 +2024-03-22,2403.14148,,Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition,https://huggingface.co/papers/2403.14148,17,1,0,0,0,0 +2024-03-21,2403.13187,https://github.com/sakanaai/evolutionary-model-merge,Evolutionary Optimization of Model Merging Recipes,https://huggingface.co/papers/2403.13187,48,4,1,5,3,4 +2024-03-21,2403.13447,https://github.com/dcdmllm/hyperllava,HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models,https://huggingface.co/papers/2403.13447,17,1,0,0,0,0 +2024-03-21,2403.13802,,ZigMa: Zigzag Mamba Diffusion Model,https://huggingface.co/papers/2403.13802,17,2,0,0,0,0 +2024-03-21,2403.13788,,DepthFM: Fast Monocular Depth Estimation with Flow Matching,https://huggingface.co/papers/2403.13788,16,1,0,0,0,0 +2024-03-21,2403.13501,https://github.com/boschresearch/VSTAR,VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis,https://huggingface.co/papers/2403.13501,9,2,0,0,0,0 +2024-03-21,2403.13806,,RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS,https://huggingface.co/papers/2403.13806,18,1,0,0,0,0 +2024-03-21,2403.13745,https://github.com/g-u-n/be-your-outpainter,Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation,https://huggingface.co/papers/2403.13745,11,1,1,0,0,0 +2024-03-21,2403.13372,https://github.com/hiyouga/llama-factory,LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models,https://huggingface.co/papers/2403.13372,58,2,1,5,0,4 +2024-03-21,2403.13524,,Compress3D: a Compressed Latent Space for 3D Generation from a Single Image,https://huggingface.co/papers/2403.13524,8,2,0,0,0,0 +2024-03-21,2403.13044,,Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos,https://huggingface.co/papers/2403.13044,14,1,0,0,0,0 +2024-03-21,2401.13923,https://github.com/lsh0520/3d-molm,Towards 3D Molecule-Text Interpretation in Language Models,https://huggingface.co/papers/2401.13923,9,1,1,0,1,0 +2024-03-21,2403.13043,https://github.com/bfshi/scaling_on_scales,When Do We Not Need Larger Vision Models?,https://huggingface.co/papers/2403.13043,25,2,1,1,0,0 +2024-03-21,2403.13535,,IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models,https://huggingface.co/papers/2403.13535,20,1,0,0,0,0 +2024-03-21,2403.13799,,Reverse Training to Nurse the Reversal Curse,https://huggingface.co/papers/2403.13799,12,1,0,0,0,0 +2024-03-21,2403.13787,https://github.com/allenai/reward-bench,RewardBench: Evaluating Reward Models for Language Modeling,https://huggingface.co/papers/2403.13787,19,2,1,1,1,1 +2024-03-21,2403.13793,,Evaluating Frontier Models for Dangerous Capabilities,https://huggingface.co/papers/2403.13793,7,1,0,0,0,0 +2024-03-21,2403.13064,,SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model,https://huggingface.co/papers/2403.13064,31,2,0,0,0,0 +2024-03-21,2403.13248,https://github.com/lichao-sun/mora,Mora: Enabling Generalist Video Generation via A Multi-Agent Framework,https://huggingface.co/papers/2403.13248,74,5,0,0,0,0 +2024-03-20,2403.12962,https://github.com/williamyang1991/fresco,FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation,https://huggingface.co/papers/2403.12962,7,1,1,0,1,1 +2024-03-20,2403.12906,,TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation,https://huggingface.co/papers/2403.12906,4,1,0,0,0,0 +2024-03-20,2403.12881,https://github.com/internlm/agent-flan,Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models,https://huggingface.co/papers/2403.12881,15,1,1,1,2,0 +2024-03-20,2403.12895,https://github.com/x-plug/mplug-docowl,mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding,https://huggingface.co/papers/2403.12895,29,5,1,0,0,2 +2024-03-20,2403.12968,https://github.com/microsoft/LLMLingua,LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression,https://huggingface.co/papers/2403.12968,24,4,1,2,2,11 +2024-03-20,2403.12365,,GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation,https://huggingface.co/papers/2403.12365,10,2,0,0,0,0 +2024-03-20,2403.12943,,Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers,https://huggingface.co/papers/2403.12943,13,1,0,0,0,0 +2024-03-20,2403.12173,,TnT-LLM: Text Mining at Scale with Large Language Models,https://huggingface.co/papers/2403.12173,18,2,0,0,0,0 +2024-03-20,2403.12957,,GVGEN: Text-to-3D Generation with Volumetric Representation,https://huggingface.co/papers/2403.12957,5,1,0,1,0,0 +2024-03-20,2403.12596,,Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs,https://huggingface.co/papers/2403.12596,9,1,0,0,0,0 +2024-03-20,2403.12963,https://github.com/leonhlj/fouriscale,FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis,https://huggingface.co/papers/2403.12963,6,1,1,0,0,0 +2024-03-20,2403.12409,,ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance,https://huggingface.co/papers/2403.12409,9,2,0,0,0,0 +2024-03-20,2403.12706,,AnimateDiff-Lightning: Cross-Model Diffusion Distillation,https://huggingface.co/papers/2403.12706,17,1,0,2,0,24 +2024-03-19,2403.12034,,VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models,https://huggingface.co/papers/2403.12034,5,2,0,0,0,0 +2024-03-19,2403.10616,,DiPaCo: Distributed Path Composition,https://huggingface.co/papers/2403.10616,12,1,0,0,0,0 +2024-03-19,2403.10704,,PERL: Parameter Efficient Reinforcement Learning from Human Feedback,https://huggingface.co/papers/2403.10704,56,4,0,0,0,0 +2024-03-19,2403.11703,https://github.com/thunlp/llava-uhd,LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images,https://huggingface.co/papers/2403.11703,15,1,1,1,0,4 +2024-03-19,2403.11901,,Larimar: Large Language Models with Episodic Memory Control,https://huggingface.co/papers/2403.11901,31,5,0,0,0,0 +2024-03-19,2403.12019,,LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation,https://huggingface.co/papers/2403.12019,8,2,0,1,0,0 +2024-03-19,2403.12032,https://github.com/Lakonik/MVEdit,Generic 3D Diffusion Adapter Using Controlled Multi-View Editing,https://huggingface.co/papers/2403.12032,14,2,1,1,0,1 +2024-03-19,2403.11207,https://github.com/medarc-ai/mindeyev2,MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data,https://huggingface.co/papers/2403.11207,14,2,1,0,0,0 +2024-03-19,2403.11481,,VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding,https://huggingface.co/papers/2403.11481,11,1,0,0,0,0 +2024-03-19,2403.10615,,LightIt: Illumination Modeling and Control for Diffusion Models,https://huggingface.co/papers/2403.10615,15,1,0,0,0,0 +2024-03-19,2403.11781,,Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm,https://huggingface.co/papers/2403.11781,17,2,0,0,0,0 +2024-03-19,2403.12008,,SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion,https://huggingface.co/papers/2403.12008,19,1,0,1,0,0 +2024-03-19,2403.12015,,Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation,https://huggingface.co/papers/2403.12015,60,2,0,0,0,0 +2024-03-18,2403.10242,,FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model,https://huggingface.co/papers/2403.10242,10,2,0,0,0,0 +2024-03-18,2403.10425,https://github.com/neufieldrobotics/neuflow,"NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices",https://huggingface.co/papers/2403.10425,2,1,0,0,0,0 +2024-03-18,2403.10131,https://github.com/ShishirPatil/gorilla,RAFT: Adapting Language Model to Domain Specific RAG,https://huggingface.co/papers/2403.10131,65,3,1,0,1,0 +2024-03-18,2403.10301,,Uni-SMART: Universal Science Multimodal Analysis and Research Transformer,https://huggingface.co/papers/2403.10301,50,4,0,0,0,0 +2024-03-18,2403.09977,https://github.com/terrypei/efficientvmamba,EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba,https://huggingface.co/papers/2403.09977,8,1,1,0,0,0 +2024-03-18,2403.10517,,VideoAgent: Long-form Video Understanding with Large Language Model as Agent,https://huggingface.co/papers/2403.10517,29,2,0,0,0,0 +2024-03-18,2403.10395,https://github.com/pkunliu/isotropic3d,Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding,https://huggingface.co/papers/2403.10395,6,1,1,1,0,0 +2024-03-18,2403.09704,,Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations,https://huggingface.co/papers/2403.09704,30,2,0,0,0,0 +2024-03-18,2403.09919,https://github.com/apple/ml-recurrent-drafter,Recurrent Drafter for Fast Speculative Decoding in Large Language Models,https://huggingface.co/papers/2403.09919,20,1,1,0,0,0 +2024-03-18,2403.09981,https://github.com/WU-CVGL/MVControl-threestudio,Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting,https://huggingface.co/papers/2403.09981,6,1,1,0,0,0 +2024-03-18,2403.10493,,MusicHiFi: Fast High-Fidelity Stereo Vocoding,https://huggingface.co/papers/2403.10493,16,1,0,0,0,0 +2024-03-15,2403.09631,,3D-VLA: A 3D Vision-Language-Action Generative World Model,https://huggingface.co/papers/2403.09631,6,1,0,0,0,0 +2024-03-15,2403.09055,https://github.com/ironjr/streammultidiffusion,StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control,https://huggingface.co/papers/2403.09055,24,2,1,0,0,5 +2024-03-15,2403.09629,https://github.com/ezelikman/quiet-star,Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking,https://huggingface.co/papers/2403.09629,54,3,1,15,0,0 +2024-03-15,2403.09394,https://github.com/haiyang-w/git,GiT: Towards Generalist Vision Transformer through Universal Language Interface,https://huggingface.co/papers/2403.09394,25,6,1,1,0,0 +2024-03-15,2403.09333,https://github.com/jefferyzhan/griffon,Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring,https://huggingface.co/papers/2403.09333,14,3,1,0,0,0 +2024-03-15,2403.09347,https://github.com/MayDomine/Burst-Attention,BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences,https://huggingface.co/papers/2403.09347,20,2,0,0,0,0 +2024-03-15,2403.09029,,Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset,https://huggingface.co/papers/2403.09029,54,4,0,2,3,15 +2024-03-15,2403.09338,https://github.com/hunto/localmamba,LocalMamba: Visual State Space Model with Windowed Selective Scan,https://huggingface.co/papers/2403.09338,7,1,0,0,0,0 +2024-03-15,2403.09626,https://github.com/opengvlab/video-mamba-suite,Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding,https://huggingface.co/papers/2403.09626,12,1,1,0,0,0 +2024-03-15,2403.08773,https://github.com/superagi/veagle,Veagle: Advancements in Multimodal Representation Learning,https://huggingface.co/papers/2403.08773,7,1,1,1,0,0 +2024-03-15,2403.09622,,Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering,https://huggingface.co/papers/2403.09622,15,1,0,0,1,2 +2024-03-15,2403.09530,,VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding,https://huggingface.co/papers/2403.09530,8,1,0,0,0,0 +2024-03-15,2403.09334,,Video Editing via Factorized Diffusion Distillation,https://huggingface.co/papers/2403.09334,21,2,0,0,1,0 +2024-03-15,2403.09611,,"MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training",https://huggingface.co/papers/2403.09611,123,9,0,0,0,0 +2024-03-14,2403.08629,,Scaling Up Dynamic Human-Scene Interaction Modeling,https://huggingface.co/papers/2403.08629,14,1,0,0,0,0 +2024-03-14,2403.08540,https://github.com/mlfoundations/scaling,Language models scale reliably with over-training and on downstream tasks,https://huggingface.co/papers/2403.08540,14,1,1,0,0,0 +2024-03-14,2403.08551,https://github.com/xinjie-q/gaussianimage,GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting,https://huggingface.co/papers/2403.08551,8,2,0,0,0,0 +2024-03-14,2403.08715,https://github.com/sotopia-lab/sotopia-pi,SOTOPIA-π: Interactive Learning of Socially Intelligent Language Agents,https://huggingface.co/papers/2403.08715,20,1,1,1,1,2 +2024-03-14,2403.07918,,On the Societal Impact of Open Foundation Models,https://huggingface.co/papers/2403.07918,16,2,0,0,0,0 +2024-03-14,2403.08763,https://github.com/eleutherai/gpt-neox,Simple and Scalable Strategies to Continually Pre-train Large Language Models,https://huggingface.co/papers/2403.08763,48,1,1,9,14,0 +2024-03-14,2403.08268,https://github.com/mayuelala/followyourclick,Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts,https://huggingface.co/papers/2403.08268,15,5,0,0,0,0 +2024-03-14,2403.08764,,VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis,https://huggingface.co/papers/2403.08764,34,3,0,0,0,0 +2024-03-14,2403.08295,,Gemma: Open Models Based on Gemini Research and Technology,https://huggingface.co/papers/2403.08295,45,3,0,100,0,24 +2024-03-13,2403.07487,https://github.com/steve-zeyu-zhang/MotionMamba,Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM,https://huggingface.co/papers/2403.07487,12,3,1,0,0,0 +2024-03-13,2403.07563,,Learning Generalizable Feature Fields for Mobile Manipulation,https://huggingface.co/papers/2403.07563,6,1,0,0,0,0 +2024-03-13,2403.07128,,FAX: Scalable and Differentiable Federated Primitives in JAX,https://huggingface.co/papers/2403.07128,11,2,0,0,0,0 +2024-03-13,2403.07508,https://github.com/ByungKwanLee/MoAI,MoAI: Mixture of All Intelligence for Large Language and Vision Models,https://huggingface.co/papers/2403.07508,73,4,1,0,0,0 +2024-03-13,2403.07815,https://github.com/amazon-science/chronos-forecasting,Chronos: Learning the Language of Time Series,https://huggingface.co/papers/2403.07815,43,5,1,11,1,7 +2024-03-13,2403.07816,,Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM,https://huggingface.co/papers/2403.07816,37,2,0,1,0,0 +2024-03-13,2403.07750,,Synth^2: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings,https://huggingface.co/papers/2403.07750,19,1,0,0,0,0 +2024-03-13,2403.07420,https://github.com/showlab/draganything,DragAnything: Motion Control for Anything using Entity Representation,https://huggingface.co/papers/2403.07420,12,1,1,0,0,0 +2024-03-12,2403.06764,https://github.com/pkunlp-icler/fastv,An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models,https://huggingface.co/papers/2403.06764,24,2,1,0,0,0 +2024-03-12,2403.06807,,Multistep Consistency Models,https://huggingface.co/papers/2403.06807,13,1,0,0,0,0 +2024-03-12,2403.06098,https://github.com/wangwenhao0716/vidprom,VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models,https://huggingface.co/papers/2403.06098,15,3,1,0,1,0 +2024-03-12,2403.06775,https://github.com/modelscope/facechain,FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation,https://huggingface.co/papers/2403.06775,3,1,1,0,0,0 +2024-03-12,2403.06738,https://github.com/heheyas/v3d,V3D: Video Diffusion Models are Effective 3D Generators,https://huggingface.co/papers/2403.06738,28,4,0,0,0,0 +2024-03-12,2403.05812,https://github.com/epoch-research/lm-algorithmic-progress,Algorithmic progress in language models,https://huggingface.co/papers/2403.05812,17,1,0,0,0,0 +2024-03-12,2403.06977,https://github.com/opengvlab/videomamba,VideoMamba: State Space Model for Efficient Video Understanding,https://huggingface.co/papers/2403.06977,26,2,1,2,0,3 +2024-03-12,2403.06504,,Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU,https://huggingface.co/papers/2403.06504,52,4,0,0,0,0 +2024-03-12,2403.06634,,Stealing Part of a Production Language Model,https://huggingface.co/papers/2403.06634,87,2,0,0,0,0 +2024-03-11,2403.05185,,Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks,https://huggingface.co/papers/2403.05185,19,1,0,0,0,0 +2024-03-11,2403.05530,,Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context,https://huggingface.co/papers/2403.05530,52,3,0,0,1,0 +2024-03-11,2403.05121,,CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion,https://huggingface.co/papers/2403.05121,18,1,0,0,0,0 +2024-03-11,2403.05034,,CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model,https://huggingface.co/papers/2403.05034,17,2,0,1,0,5 +2024-03-11,2403.05525,https://github.com/deepseek-ai/deepseek-vl,DeepSeek-VL: Towards Real-World Vision-Language Understanding,https://huggingface.co/papers/2403.05525,39,4,1,4,0,25 +2024-03-11,2403.05135,,ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment,https://huggingface.co/papers/2403.05135,40,2,0,1,0,0 +2024-03-11,2403.05438,https://github.com/ybybzhang/videoelevator,VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models,https://huggingface.co/papers/2403.05438,16,1,1,0,0,0 +2024-03-08,2403.04634,,Pix2Gif: Motion-Guided Diffusion for GIF Generation,https://huggingface.co/papers/2403.04634,14,1,0,0,0,0 +2024-03-08,2403.04116,https://github.com/caiyuanhao1998/x-gaussian,Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis,https://huggingface.co/papers/2403.04116,3,1,0,0,0,0 +2024-03-08,2403.04692,,PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation,https://huggingface.co/papers/2403.04692,40,1,0,4,0,7 +2024-03-08,2403.04732,https://github.com/apple/ml-rpm-bench,How Far Are We from Intelligent Visual Deductive Reasoning?,https://huggingface.co/papers/2403.04732,18,1,0,0,0,0 +2024-03-08,2403.04746,https://github.com/microsoft/simulated-trial-and-error,LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error,https://huggingface.co/papers/2403.04746,21,1,0,0,0,0 +2024-03-08,2403.04706,https://github.com/xwin-lm/xwin-lm,Common 7B Language Models Already Possess Strong Math Capabilities,https://huggingface.co/papers/2403.04706,16,1,1,10,6,0 +2024-03-08,2403.04642,,Teaching Large Language Models to Reason with Reinforcement Learning,https://huggingface.co/papers/2403.04642,46,2,0,0,0,0 +2024-03-08,2403.04132,,Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference,https://huggingface.co/papers/2403.04132,38,1,0,0,1,4 +2024-03-08,2403.04437,,StableDrag: Stable Dragging for Point-based Image Editing,https://huggingface.co/papers/2403.04437,24,4,0,0,0,0 +2024-03-08,2403.04652,https://github.com/01-ai/yi,Yi: Open Foundation Models by 01.AI,https://huggingface.co/papers/2403.04652,59,3,1,100,0,100 +2024-03-07,2403.03234,https://github.com/kuleshov-group/caduceus,Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling,https://huggingface.co/papers/2403.03234,11,1,1,2,0,0 +2024-03-07,2403.03950,,Stop Regressing: Training Value Functions via Classification for Scalable Deep RL,https://huggingface.co/papers/2403.03950,11,1,0,0,0,0 +2024-03-07,2403.03956,https://github.com/rosewang2008/backtracing,Backtracing: Retrieving the Cause of the Query,https://huggingface.co/papers/2403.03956,10,1,1,0,1,0 +2024-03-07,2403.03346,,Enhancing Vision-Language Pre-training with Rich Supervisions,https://huggingface.co/papers/2403.03346,13,1,0,0,0,0 +2024-03-07,2403.03853,,ShortGPT: Layers in Large Language Models are More Redundant Than You Expect,https://huggingface.co/papers/2403.03853,61,8,0,0,0,0 +2024-03-07,2403.03870,https://github.com/clinicalml/co-llm,Learning to Decode Collaboratively with Multiple Language Models,https://huggingface.co/papers/2403.03870,17,4,0,0,0,0 +2024-03-07,2403.03883,,SaulLM-7B: A pioneering Large Language Model for Law,https://huggingface.co/papers/2403.03883,72,4,0,9,0,1 +2024-03-07,2403.03954,https://github.com/YanjieZe/3D-Diffusion-Policy,3D Diffusion Policy,https://huggingface.co/papers/2403.03954,11,1,0,0,0,0 +2024-03-07,2403.03507,https://github.com/jiaweizzhao/galore,GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection,https://huggingface.co/papers/2403.03507,180,11,1,6,0,1 +2024-03-06,2403.02626,,Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use,https://huggingface.co/papers/2403.02626,9,1,0,0,0,0 +2024-03-06,2403.03194,,MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets,https://huggingface.co/papers/2403.03194,12,1,0,0,0,0 +2024-03-06,2403.02709,,RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches,https://huggingface.co/papers/2403.02709,7,1,0,0,0,0 +2024-03-06,2403.02460,,MagicClay: Sculpting Meshes With Generative Neural Fields,https://huggingface.co/papers/2403.02460,6,1,0,0,0,0 +2024-03-06,2403.03100,,NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models,https://huggingface.co/papers/2403.03100,33,2,0,3,0,3 +2024-03-06,2403.02827,,Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation,https://huggingface.co/papers/2403.02827,6,1,0,0,0,0 +2024-03-06,2403.02545,,Wukong: Towards a Scaling Law for Large-Scale Recommendation,https://huggingface.co/papers/2403.02545,15,1,0,0,0,0 +2024-03-06,2403.02775,,EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs,https://huggingface.co/papers/2403.02775,11,3,0,0,0,0 +2024-03-06,2403.03003,https://github.com/luogen1996/llava-hr,Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models,https://huggingface.co/papers/2403.03003,9,1,1,0,0,0 +2024-03-06,2403.02677,,Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters,https://huggingface.co/papers/2403.02677,16,1,0,0,0,0 +2024-03-06,2403.03163,,Design2Code: How Far Are We From Automating Front-End Engineering?,https://huggingface.co/papers/2403.03163,92,2,0,1,4,0 +2024-03-06,2403.03206,,Scaling Rectified Flow Transformers for High-Resolution Image Synthesis,https://huggingface.co/papers/2403.03206,47,3,0,11,1,100 +2024-03-06,2403.02884,,MathScale: Scaling Instruction Tuning for Mathematical Reasoning,https://huggingface.co/papers/2403.02884,15,2,0,9,7,0 +2024-03-05,2403.00818,https://github.com/wailordhe/densessm,DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models,https://huggingface.co/papers/2403.00818,13,2,1,2,0,0 +2024-03-05,2403.01823,,RT-H: Action Hierarchies Using Language,https://huggingface.co/papers/2403.01823,7,1,0,0,0,0 +2024-03-05,2403.02338,,Twisting Lids Off with Two Hands,https://huggingface.co/papers/2403.02338,5,1,0,0,0,0 +2024-03-05,2403.01444,https://github.com/SJoJoK/3DGStream,3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos,https://huggingface.co/papers/2403.01444,4,0,0,0,0,0 +2024-03-05,2403.02151,https://github.com/vast-ai-research/triposr,TripoSR: Fast 3D Object Reconstruction from a Single Image,https://huggingface.co/papers/2403.02151,9,3,1,1,0,27 +2024-03-05,2403.01487,,InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding,https://huggingface.co/papers/2403.01487,14,1,0,1,0,0 +2024-03-05,2403.01779,https://github.com/levihsu/ootdiffusion,OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on,https://huggingface.co/papers/2403.01779,26,2,1,4,0,35 +2024-03-05,2403.01422,,MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies,https://huggingface.co/papers/2403.01422,25,6,0,0,0,0 +2024-03-05,2403.01800,,AtomoVideo: High Fidelity Image-to-Video Generation,https://huggingface.co/papers/2403.01800,18,4,0,0,0,0 +2024-03-05,2403.02084,https://github.com/bytedance/res-adapter,ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models,https://huggingface.co/papers/2403.02084,12,1,1,1,0,2 +2024-03-05,2403.01807,https://github.com/facebookresearch/viewdiff,ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models,https://huggingface.co/papers/2403.01807,7,1,0,0,0,0 +2024-03-04,2403.00483,,RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization,https://huggingface.co/papers/2403.00483,9,1,0,0,0,0 +2024-03-04,2403.00071,https://github.com/sheryc/resonance_rope,Resonance RoPE: Improving Context Length Generalization of Large Language Models,https://huggingface.co/papers/2403.00071,19,1,1,0,0,0 +2024-03-04,2403.00745,,AtP*: An efficient and scalable method for localizing LLM behaviour to components,https://huggingface.co/papers/2403.00745,8,2,0,0,0,0 +2024-03-04,2403.00504,,Learning and Leveraging World Models in Visual Representation Learning,https://huggingface.co/papers/2403.00504,28,1,0,0,0,0 +2024-03-04,2403.00522,,VisionLLaMA: A Unified LLaMA Interface for Vision Tasks,https://huggingface.co/papers/2403.00522,41,4,0,0,0,0 +2024-03-01,2402.19481,https://github.com/mit-han-lab/distrifuser,DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models,https://huggingface.co/papers/2402.19481,17,1,1,0,0,0 +2024-03-01,2402.18796,,MOSAIC: A Modular System for Assistive and Interactive Cooking,https://huggingface.co/papers/2402.18796,22,1,0,0,0,0 +2024-03-01,2402.18734,,Priority Sampling of Large Language Models for Compilers,https://huggingface.co/papers/2402.18734,15,1,0,0,0,0 +2024-03-01,2402.18668,https://github.com/hazyresearch/based,Simple linear attention language models balance the recall-throughput tradeoff,https://huggingface.co/papers/2402.18668,18,5,1,2,0,1 +2024-03-01,2402.19159,https://github.com/jabir-zheng/TCD,Trajectory Consistency Distillation,https://huggingface.co/papers/2402.19159,13,2,1,2,0,3 +2024-03-01,2402.19173,,StarCoder 2 and The Stack v2: The Next Generation,https://huggingface.co/papers/2402.19173,126,4,0,28,6,80 +2024-03-01,2402.19469,,Humanoid Locomotion as Next Token Prediction,https://huggingface.co/papers/2402.19469,25,2,0,0,0,0 +2024-03-01,2402.19155,,Beyond Language Models: Byte Models are Digital World Simulators,https://huggingface.co/papers/2402.19155,47,4,0,1,0,0 +2024-03-01,2402.19427,,Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models,https://huggingface.co/papers/2402.19427,50,4,0,11,0,3 +2024-03-01,2402.18842,https://github.com/Wi-sc/ViewFusion,ViewFusion: Towards Multi-View Consistency via Interpolated Denoising,https://huggingface.co/papers/2402.18842,13,1,1,0,0,0 +2024-03-01,2402.19479,,Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers,https://huggingface.co/papers/2402.19479,31,3,0,0,0,0 +2024-02-28,2402.17427,,VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction,https://huggingface.co/papers/2402.17427,9,45,0,0,0,0 +2024-02-28,2402.17463,https://github.com/hkunlp/chunkllama,Training-Free Long-Context Scaling of Large Language Models,https://huggingface.co/papers/2402.17463,19,2,1,0,0,0 +2024-02-28,2402.17753,,Evaluating Very Long-Term Conversational Memory of LLM Agents,https://huggingface.co/papers/2402.17753,17,2,0,0,0,0 +2024-02-28,2402.17412,,DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model,https://huggingface.co/papers/2402.17412,21,1,0,0,0,0 +2024-02-28,2402.17245,,Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation,https://huggingface.co/papers/2402.17245,10,1,0,2,1,58 +2024-02-28,2402.17193,,"When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method",https://huggingface.co/papers/2402.17193,23,3,0,0,0,0 +2024-02-28,2402.16936,,Disentangled 3D Scene Generation with Layout Learning,https://huggingface.co/papers/2402.16936,10,1,0,0,0,0 +2024-02-28,2402.17759,,Towards Optimal Learning of Language Models,https://huggingface.co/papers/2402.17759,16,1,0,0,0,0 +2024-02-28,2402.17723,,Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners,https://huggingface.co/papers/2402.17723,16,1,0,0,0,0 +2024-02-28,2402.17764,,The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits,https://huggingface.co/papers/2402.17764,581,45,0,15,0,11 +2024-02-28,2402.17553,,OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web,https://huggingface.co/papers/2402.17553,21,3,0,0,1,0 +2024-02-28,2402.17403,,Sora Generates Videos with Stunning Geometrical Consistency,https://huggingface.co/papers/2402.17403,16,1,0,0,0,0 +2024-02-28,2402.17485,,EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions,https://huggingface.co/papers/2402.17485,184,19,0,0,0,0 +2024-02-28,2402.17139,,Video as the New Language for Real-World Decision Making,https://huggingface.co/papers/2402.17139,18,1,0,0,0,0 +2024-02-28,2402.17177,https://github.com/lichao-sun/sorareview,"Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models",https://huggingface.co/papers/2402.17177,88,2,1,0,0,0 +2024-02-27,2402.16819,,Nemotron-4 15B Technical Report,https://huggingface.co/papers/2402.16819,42,4,0,0,0,0 +2024-02-27,2402.16822,,Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts,https://huggingface.co/papers/2402.16822,15,0,0,0,0,0 +2024-02-27,2402.16641,,Towards Open-ended Visual Quality Comparison,https://huggingface.co/papers/2402.16641,16,1,0,2,1,1 +2024-02-27,2402.16837,,Do Large Language Models Latently Perform Multi-Hop Reasoning?,https://huggingface.co/papers/2402.16837,24,1,0,0,0,0 +2024-02-27,2402.16671,,StructLM: Towards Building Generalist Models for Structured Knowledge Grounding,https://huggingface.co/papers/2402.16671,26,1,0,4,2,1 +2024-02-27,2402.16840,https://github.com/mbzuai-oryx/mobillama,MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT,https://huggingface.co/papers/2402.16840,23,1,1,6,0,0 +2024-02-27,2402.15627,https://github.com/volcengine/vescale,"MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs",https://huggingface.co/papers/2402.15627,32,2,0,0,0,0 +2024-02-27,2402.16843,,Multi-LoRA Composition for Image Generation,https://huggingface.co/papers/2402.16843,28,0,0,0,0,0 +2024-02-27,2402.16153,https://github.com/hf-lin/ChatMusician,ChatMusician: Understanding and Generating Music Intrinsically with LLM,https://huggingface.co/papers/2402.16153,55,1,1,2,3,2 +2024-02-27,2402.16107,,FuseChat: Knowledge Fusion of Chat Models,https://huggingface.co/papers/2402.16107,36,4,0,25,1,4 +2024-02-26,2402.15509,https://github.com/BarqueroGerman/FlowMDM,Seamless Human Motion Composition with Blended Positional Encodings,https://huggingface.co/papers/2402.15509,14,1,0,0,0,0 +2024-02-26,2402.14905,https://github.com/facebookresearch/mobilellm,MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases,https://huggingface.co/papers/2402.14905,103,10,1,2,0,0 +2024-02-26,2402.15000,,Divide-or-Conquer? Which Part Should You Distill Your LLM?,https://huggingface.co/papers/2402.15000,22,1,0,0,0,0 +2024-02-26,2402.15220,https://github.com/microsoft/chunk-attention,ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition,https://huggingface.co/papers/2402.15220,18,3,1,0,0,0 +2024-02-26,2402.15319,,GPTVQ: The Blessing of Dimensionality for LLM Quantization,https://huggingface.co/papers/2402.15319,19,3,0,0,0,0 +2024-02-26,2402.15491,https://github.com/ibm/api-blend,API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs,https://huggingface.co/papers/2402.15491,13,3,0,0,0,0 +2024-02-26,2402.15506,https://github.com/SalesforceAIResearch/xLAM/tree/main/xLAM,AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning,https://huggingface.co/papers/2402.15506,11,3,0,1,0,0 +2024-02-26,2402.15391,,Genie: Generative Interactive Environments,https://huggingface.co/papers/2402.15391,68,7,0,0,0,0 +2024-02-26,2402.14830,,Orca-Math: Unlocking the potential of SLMs in Grade School Math,https://huggingface.co/papers/2402.14830,24,2,0,3,13,0 +2024-02-26,2402.14848,https://github.com/alonj/Same-Task-More-Tokens,"Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models",https://huggingface.co/papers/2402.14848,18,3,1,0,1,0 +2024-02-26,2402.14904,,Watermarking Makes Language Models Radioactive,https://huggingface.co/papers/2402.14904,22,2,0,0,0,0 +2024-02-26,2402.15021,https://github.com/netflix/clove,CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models,https://huggingface.co/papers/2402.15021,12,3,1,0,0,0 +2024-02-26,2402.15504,https://github.com/louisYen/Gen4Gen,Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition,https://huggingface.co/papers/2402.15504,20,2,1,0,0,0 +2024-02-23,2402.14792,,Consolidating Attention Features for Multi-view Image Editing,https://huggingface.co/papers/2402.14792,7,1,0,0,0,0 +2024-02-23,2402.14180,,Linear Transformers are Versatile In-Context Learners,https://huggingface.co/papers/2402.14180,6,2,0,0,0,0 +2024-02-23,2402.14194,,BeTAIL: Behavior Transformer Adversarial Imitation Learning from Human Racing Gameplay,https://huggingface.co/papers/2402.14194,5,1,0,0,0,0 +2024-02-23,2402.14795,,CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation,https://huggingface.co/papers/2402.14795,5,1,0,0,0,0 +2024-02-23,2402.14810,https://github.com/meowuu7/geneoh-diffusion,GeneOH Diffusion: Towards Generalizable Hand-Object Interaction Denoising via Denoising Diffusion,https://huggingface.co/papers/2402.14810,8,1,1,0,0,0 +2024-02-23,2402.14650,,GaussianPro: 3D Gaussian Splatting with Progressive Propagation,https://huggingface.co/papers/2402.14650,6,1,0,0,0,0 +2024-02-23,2402.14083,https://github.com/facebookresearch/searchformer,Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping,https://huggingface.co/papers/2402.14083,43,7,0,0,0,0 +2024-02-23,2402.14253,,MVD^2: Efficient Multiview 3D Reconstruction for Multiview Diffusion,https://huggingface.co/papers/2402.14253,5,1,0,0,0,0 +2024-02-23,2402.14547,https://github.com/google-research/optformer,OmniPred: Language Models as Universal Regressors,https://huggingface.co/papers/2402.14547,11,1,0,0,0,0 +2024-02-23,2402.14590,,Scaling Up LLM Reviews for Google Ads Content Moderation,https://huggingface.co/papers/2402.14590,7,1,0,0,0,0 +2024-02-23,2402.14818,https://github.com/mbzuai-oryx/palo,PALO: A Polyglot Large Multimodal Model for 5B People,https://huggingface.co/papers/2402.14818,23,2,1,0,3,0 +2024-02-23,2402.14086,https://github.com/BatsResearch/LexC-Gen,LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons,https://huggingface.co/papers/2402.14086,9,1,1,0,2,0 +2024-02-23,2402.14034,https://github.com/modelscope/agentscope,AgentScope: A Flexible yet Robust Multi-Agent Platform,https://huggingface.co/papers/2402.14034,12,1,0,0,0,0 +2024-02-23,2402.14261,,Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming,https://huggingface.co/papers/2402.14261,10,1,0,0,0,0 +2024-02-23,2402.14289,https://github.com/dlcv-buaa/tinyllavabench,TinyLLaVA: A Framework of Small-scale Large Multimodal Models,https://huggingface.co/papers/2402.14289,18,2,1,10,0,5 +2024-02-23,2402.14327,https://github.com/chendelong1999/subobjects,Subobject-level Image Tokenization,https://huggingface.co/papers/2402.14327,16,2,1,1,0,0 +2024-02-23,2402.14167,https://github.com/nvlabs/t-stitch,T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching,https://huggingface.co/papers/2402.14167,9,1,1,0,0,0 +2024-02-23,2402.14797,,Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis,https://huggingface.co/papers/2402.14797,19,1,0,0,0,0 +2024-02-23,2402.14658,,OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement,https://huggingface.co/papers/2402.14658,79,5,0,55,2,20 +2024-02-22,2402.13573,,ToDo: Token Downsampling for Efficient Generation of High-Resolution Images,https://huggingface.co/papers/2402.13573,8,1,0,0,0,2 +2024-02-22,2402.13763,,Music Style Transfer with Time-Varying Inversion of Diffusion Models,https://huggingface.co/papers/2402.13763,9,1,0,0,0,0 +2024-02-22,2402.14017,,D-Flow: Differentiating through Flows for Controlled Generation,https://huggingface.co/papers/2402.14017,5,1,0,0,0,0 +2024-02-22,2402.13616,https://github.com/WongKinYiu/YOLO,YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information,https://huggingface.co/papers/2402.13616,44,3,1,4,0,4 +2024-02-22,2402.14020,https://github.com/jonasgeiping/carving,Coercing LLMs to do and reveal (almost) anything,https://huggingface.co/papers/2402.14020,12,2,1,0,0,0 +2024-02-22,2402.13720,,Ouroboros: Speculative Decoding with Large Model Enhanced Drafting,https://huggingface.co/papers/2402.13720,5,1,0,0,0,0 +2024-02-22,2402.13598,,User-LLM: Efficient LLM Contextualization with User Embeddings,https://huggingface.co/papers/2402.13598,18,1,0,0,0,0 +2024-02-22,2402.13577,,BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models,https://huggingface.co/papers/2402.13577,5,1,0,0,0,0 +2024-02-22,2402.13929,,SDXL-Lightning: Progressive Adversarial Diffusion Distillation,https://huggingface.co/papers/2402.13929,26,1,0,1,1,83 +2024-02-22,2402.12479,,"In deep reinforcement learning, a pruned network is a good network",https://huggingface.co/papers/2402.12479,16,1,0,0,0,0 +2024-02-22,2402.13349,https://github.com/facebookresearch/projectaria_tools,Aria Everyday Activities Dataset,https://huggingface.co/papers/2402.13349,28,1,0,0,0,0 +2024-02-22,2402.13753,https://github.com/microsoft/longrope,LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens,https://huggingface.co/papers/2402.13753,108,10,1,1,0,0 +2024-02-21,2402.13251,,FlashTex: Fast Relightable Mesh Texturing with LightControlNet,https://huggingface.co/papers/2402.13251,13,1,0,0,0,0 +2024-02-21,2402.12847,https://github.com/edward-sun/pit,Instruction-tuned Language Models are Better Knowledge Learners,https://huggingface.co/papers/2402.12847,24,1,0,0,0,0 +2024-02-21,2402.13252,https://github.com/nemo1999/joint-tensorf,Improving Robustness for Joint Optimization of Camera Poses and Decomposed Low-Rank Tensorial Radiance Fields,https://huggingface.co/papers/2402.13252,17,1,0,0,0,0 +2024-02-21,2402.12908,,RealCompo: Dynamic Equilibrium between Realism and Compositionality Improves Text-to-Image Diffusion Models,https://huggingface.co/papers/2402.12908,7,1,0,0,0,0 +2024-02-21,2402.12659,,The FinBen: An Holistic Financial Benchmark for Large Language Models,https://huggingface.co/papers/2402.12659,15,2,0,0,0,0 +2024-02-21,2402.13220,,How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts,https://huggingface.co/papers/2402.13220,12,2,0,0,0,0 +2024-02-21,2402.13232,https://github.com/Max-Fu/tvl,"A Touch, Vision, and Language Dataset for Multimodal Alignment",https://huggingface.co/papers/2402.13232,12,1,1,1,1,0 +2024-02-21,2402.13249,https://github.com/amazon-science/tofueval,TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization,https://huggingface.co/papers/2402.13249,10,4,0,0,2,0 +2024-02-21,2402.13064,,Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models,https://huggingface.co/papers/2402.13064,46,2,0,1,2,1 +2024-02-21,2402.13217,,VideoPrism: A Foundational Visual Encoder for Video Understanding,https://huggingface.co/papers/2402.13217,20,2,0,0,0,0 +2024-02-21,2402.12712,,MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction,https://huggingface.co/papers/2402.12712,14,3,0,0,0,0 +2024-02-21,2402.13250,https://github.com/md-mohaiminul/VideoRecap,Video ReCap: Recursive Captioning of Hour-Long Videos,https://huggingface.co/papers/2402.13250,21,1,1,0,0,0 +2024-02-21,2402.13144,,Neural Network Diffusion,https://huggingface.co/papers/2402.13144,94,8,0,0,0,0 +2024-02-20,2402.12225,,Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability,https://huggingface.co/papers/2402.12225,5,1,0,0,0,0 +2024-02-20,2402.11690,,Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning,https://huggingface.co/papers/2402.11690,7,1,0,0,0,0 +2024-02-20,2402.11450,,Learning to Learn Faster from Human Feedback with Language Model Predictive Control,https://huggingface.co/papers/2402.11450,20,2,0,0,0,0 +2024-02-20,2402.12377,,Binary Opacity Grids: Capturing Fine Geometric Detail for Mesh-Based View Synthesis,https://huggingface.co/papers/2402.12377,8,1,0,0,0,0 +2024-02-20,2402.11550,https://github.com/zuucan/needleinahaystack-plus,LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration,https://huggingface.co/papers/2402.11550,14,1,0,0,0,0 +2024-02-20,2402.11248,https://github.com/ByungKwanLee/CoLLaVO,CoLLaVO: Crayon Large Language and Vision mOdel,https://huggingface.co/papers/2402.11248,18,5,1,0,0,0 +2024-02-20,2402.12219,https://github.com/gair-nlp/realign,Reformatted Alignment,https://huggingface.co/papers/2402.12219,15,2,1,0,0,0 +2024-02-20,2402.10963,,"GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements",https://huggingface.co/papers/2402.10963,9,1,0,0,0,0 +2024-02-20,2402.11295,https://github.com/xuyuzhuang11/onebit,OneBit: Towards Extremely Low-bit Large Language Models,https://huggingface.co/papers/2402.11295,21,7,1,0,0,0 +2024-02-20,2402.11131,,Speculative Streaming: Fast LLM Inference without Auxiliary Models,https://huggingface.co/papers/2402.11131,41,2,0,0,0,0 +2024-02-20,2402.12376,https://github.com/whlzy/fit,FiT: Flexible Vision Transformer for Diffusion Model,https://huggingface.co/papers/2402.12376,48,3,1,0,0,0 +2024-02-20,2402.12226,,AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling,https://huggingface.co/papers/2402.12226,37,7,0,0,0,0 +2024-02-20,2402.10986,,FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models,https://huggingface.co/papers/2402.10986,75,3,0,0,0,0 +2024-02-20,2402.11929,,DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation,https://huggingface.co/papers/2402.11929,9,1,0,1,0,1 +2024-02-19,2402.10644,https://github.com/corl-team/rebased,Linear Transformers with Learnable Kernel Functions are Better In-Context Models,https://huggingface.co/papers/2402.10644,75,3,0,0,0,0 +2024-02-19,2402.10329,,Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots,https://huggingface.co/papers/2402.10329,13,2,0,0,0,0 +2024-02-19,2402.10893,https://github.com/austrian-code-wizard/c3po,RLVF: Learning from Verbal Feedback without Overgeneralization,https://huggingface.co/papers/2402.10893,10,2,0,0,0,1 +2024-02-19,2402.10555,https://github.com/jyonn/legommenders,SPAR: Personalized Content-Based Recommendation via Long Engagement Attention,https://huggingface.co/papers/2402.10555,32,2,0,0,0,0 +2024-02-19,2402.10896,,PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter,https://huggingface.co/papers/2402.10896,14,2,0,0,0,0 +2024-02-19,2402.10294,,LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing,https://huggingface.co/papers/2402.10294,22,2,0,0,0,0 +2024-02-19,2402.10379,https://github.com/datadreamer-dev/datadreamer,DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows,https://huggingface.co/papers/2402.10379,28,2,1,0,0,0 +2024-02-19,2402.10524,,LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models,https://huggingface.co/papers/2402.10524,20,2,0,0,0,0 +2024-02-19,2402.10790,,In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss,https://huggingface.co/papers/2402.10790,40,4,0,0,2,1 +2024-02-19,2402.10259,https://github.com/GaussianObject/GaussianObject,GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting,https://huggingface.co/papers/2402.10259,13,2,0,0,0,0 +2024-02-19,2402.10466,https://github.com/facebookresearch/fnctod,Large Language Models as Zero-shot Dialogue State Tracker through Function Calling,https://huggingface.co/papers/2402.10466,16,3,1,0,0,0 +2024-02-19,2402.10491,https://github.com/guolanqing/self-cascade,Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation,https://huggingface.co/papers/2402.10491,16,1,0,0,0,0 +2024-02-16,2402.10211,https://github.com/raunaqbhirangi/hiss,Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling,https://huggingface.co/papers/2402.10211,8,1,0,0,0,0 +2024-02-16,2402.10009,,Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion,https://huggingface.co/papers/2402.10009,18,2,0,0,0,14 +2024-02-16,2402.09812,https://github.com/KU-CVLAB/DreamMatcher,DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization,https://huggingface.co/papers/2402.09812,11,1,0,0,0,0 +2024-02-16,2402.10128,https://github.com/ajhamdi/ges-splatting,GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering,https://huggingface.co/papers/2402.10128,14,1,0,0,0,0 +2024-02-16,2402.09906,https://github.com/contextualai/gritlm,Generative Representational Instruction Tuning,https://huggingface.co/papers/2402.09906,50,5,1,8,2,3 +2024-02-16,2402.10200,,Chain-of-Thought Reasoning Without Prompting,https://huggingface.co/papers/2402.10200,92,2,0,0,0,0 +2024-02-16,2402.10176,https://github.com/kipok/nemo-skills,OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset,https://huggingface.co/papers/2402.10176,33,2,1,16,4,1 +2024-02-16,2402.09470,,Rolling Diffusion Models,https://huggingface.co/papers/2402.09470,8,1,0,0,0,0 +2024-02-16,2402.10171,https://github.com/franxyao/long-context-data-engineering,Data Engineering for Scaling Language Models to 128K Context,https://huggingface.co/papers/2402.10171,19,2,1,0,0,0 +2024-02-16,2402.09727,,A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts,https://huggingface.co/papers/2402.09727,35,3,0,0,0,0 +2024-02-16,2402.09668,,How to Train Data-Efficient LLMs,https://huggingface.co/papers/2402.09668,37,3,0,0,1,0 +2024-02-16,2402.10193,https://github.com/FasterDecoding/BitDelta,BitDelta: Your Fine-Tune May Only Be Worth One Bit,https://huggingface.co/papers/2402.10193,17,5,1,0,0,0 +2024-02-16,2402.10210,,Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation,https://huggingface.co/papers/2402.10210,28,4,0,3,0,1 +2024-02-15,2402.08958,,Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers,https://huggingface.co/papers/2402.08958,3,1,0,0,0,0 +2024-02-15,2402.09126,https://github.com/scientific-computing-lab-nrcn/mpi-rigen,MPIrigen: MPI Code Generation through Domain-Specific Language Models,https://huggingface.co/papers/2402.09126,11,1,1,0,0,0 +2024-02-15,2402.08855,,GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and Agency,https://huggingface.co/papers/2402.08855,9,2,0,0,0,0 +2024-02-15,2402.08714,,PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models,https://huggingface.co/papers/2402.08714,10,1,0,0,0,0 +2024-02-15,2402.08797,,Computing Power and the Governance of Artificial Intelligence,https://huggingface.co/papers/2402.08797,11,2,0,0,0,0 +2024-02-15,2402.09371,,Transformers Can Achieve Length Generalization But Not Robustly,https://huggingface.co/papers/2402.09371,12,1,0,0,0,0 +2024-02-15,2402.08939,,Premise Order Matters in Reasoning with Large Language Models,https://huggingface.co/papers/2402.08939,24,3,0,0,0,0 +2024-02-15,2402.09052,,L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects,https://huggingface.co/papers/2402.09052,16,1,0,0,0,0 +2024-02-15,2402.09368,https://github.com/zhen-dong/magic-me,Magic-Me: Identity-Specific Video Customized Diffusion,https://huggingface.co/papers/2402.09368,24,2,1,0,0,1 +2024-02-14,2402.08420,,Vision-Based Hand Gesture Customization from a Single Demonstration,https://huggingface.co/papers/2402.08420,7,1,0,0,0,0 +2024-02-14,2402.08303,https://github.com/zjunlp/chatcell,ChatCell: Facilitating Single-Cell Analysis with Natural Language,https://huggingface.co/papers/2402.08303,9,4,1,0,0,0 +2024-02-14,2402.07939,https://github.com/microsoft/UFO,UFO: A UI-Focused Agent for Windows OS Interaction,https://huggingface.co/papers/2402.07939,13,3,0,0,0,0 +2024-02-14,2402.08654,,Learning Continuous 3D Words for Text-to-Image Generation,https://huggingface.co/papers/2402.08654,9,4,0,0,0,0 +2024-02-14,2402.08017,,Lumos : Empowering Multimodal LLMs with Scene Text Recognition,https://huggingface.co/papers/2402.08017,24,2,0,0,0,0 +2024-02-14,2402.08268,,World Model on Million-Length Video And Language With RingAttention,https://huggingface.co/papers/2402.08268,36,4,0,0,0,0 +2024-02-14,2402.08682,,IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation,https://huggingface.co/papers/2402.08682,12,1,0,0,0,0 +2024-02-14,2402.08678,https://github.com/graphmamba/gmn,Graph Mamba: Towards Learning on Graphs with State Space Models,https://huggingface.co/papers/2402.08678,13,1,0,0,0,0 +2024-02-14,2402.08622,,NeRF Analogies: Example-Based Visual Attribute Transfer for NeRFs,https://huggingface.co/papers/2402.08622,3,1,0,0,0,0 +2024-02-14,2402.08093,,BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data,https://huggingface.co/papers/2402.08093,53,9,0,0,0,0 +2024-02-14,2402.08644,,Tandem Transformers for Inference Efficient LLMs,https://huggingface.co/papers/2402.08644,7,1,0,0,0,0 +2024-02-14,2402.08609,https://github.com/google/dopamine,Mixtures of Experts Unlock Parameter Scaling for Deep RL,https://huggingface.co/papers/2402.08609,34,2,0,0,0,0 +2024-02-13,2402.07319,,ODIN: Disentangled Reward Mitigates Hacking in RLHF,https://huggingface.co/papers/2402.07319,13,1,0,1,0,0 +2024-02-13,2402.07383,,Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like,https://huggingface.co/papers/2402.07383,13,1,0,0,0,0 +2024-02-13,2402.07827,,Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model,https://huggingface.co/papers/2402.07827,43,2,0,2,0,13 +2024-02-13,2402.07865,https://github.com/tri-ml/prismatic-vlms,Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models,https://huggingface.co/papers/2402.07865,12,2,1,1,0,1 +2024-02-13,2402.07871,https://github.com/llm-random/llm-random,Scaling Laws for Fine-Grained Mixture of Experts,https://huggingface.co/papers/2402.07871,11,1,0,0,0,0 +2024-02-13,2402.07872,,PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs,https://huggingface.co/papers/2402.07872,15,2,0,0,0,2 +2024-02-13,2402.07876,,Policy Improvement using Language Feedback Models,https://huggingface.co/papers/2402.07876,5,1,0,0,0,0 +2024-02-13,2402.06859,,LiRank: Industrial Large Scale Ranking Models at LinkedIn,https://huggingface.co/papers/2402.06859,8,1,0,0,0,0 +2024-02-13,2402.07207,,GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting,https://huggingface.co/papers/2402.07207,7,1,0,0,0,0 +2024-02-13,2402.07610,,Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping,https://huggingface.co/papers/2402.07610,7,1,0,0,0,0 +2024-02-13,2402.07625,,AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts,https://huggingface.co/papers/2402.07625,11,1,0,1,1,1 +2024-02-13,2402.07896,,Suppressing Pink Elephants with Direct Principle Feedback,https://huggingface.co/papers/2402.07896,8,1,0,0,0,0 +2024-02-13,2402.06852,,ChemLLM: A Chemical Large Language Model,https://huggingface.co/papers/2402.06852,25,4,0,6,5,1 +2024-02-13,2402.07033,https://github.com/efeslab/fiddler,Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models,https://huggingface.co/papers/2402.07033,16,1,1,0,0,0 +2024-02-13,2402.07043,,A Tale of Tails: Model Collapse as a Change of Scaling Laws,https://huggingface.co/papers/2402.07043,13,1,0,0,0,0 +2024-02-13,2402.07456,https://github.com/OS-Copilot/FRIDAY,OS-Copilot: Towards Generalist Computer Agents with Self-Improvement,https://huggingface.co/papers/2402.07456,40,4,0,0,0,0 +2024-02-12,2402.06187,,Premier-TACO: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss,https://huggingface.co/papers/2402.06187,9,2,0,0,0,0 +2024-02-12,2402.06102,,Real-World Fluid Directed Rigid Body Control via Deep Reinforcement Learning,https://huggingface.co/papers/2402.06102,4,1,0,0,0,0 +2024-02-12,2402.06088,,Animated Stickers: Bringing Stickers to Life with Video Diffusion,https://huggingface.co/papers/2402.06088,9,2,0,0,0,0 +2024-02-12,2402.06155,https://github.com/john-hewitt/model-editing-canonical-examples,Model Editing with Canonical Examples,https://huggingface.co/papers/2402.06155,10,1,0,0,0,0 +2024-02-12,2402.06147,,DeAL: Decoding-time Alignment for Large Language Models,https://huggingface.co/papers/2402.06147,7,1,0,0,0,0 +2024-02-12,2402.06082,,SubGen: Token Generation in Sublinear Time and Memory,https://huggingface.co/papers/2402.06082,10,2,0,0,0,0 +2024-02-12,2402.06071,,Keyframer: Empowering Animation Design using Large Language Models,https://huggingface.co/papers/2402.06071,13,1,0,0,0,0 +2024-02-12,2402.06332,https://github.com/internlm/internlm-math,InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning,https://huggingface.co/papers/2402.06332,18,1,1,6,0,2 +2024-02-12,2402.06178,https://github.com/ldzhangyx/MusicMagus,MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models,https://huggingface.co/papers/2402.06178,12,4,0,0,0,0 +2024-02-12,2402.06619,,Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning,https://huggingface.co/papers/2402.06619,51,1,0,1,19,0 +2024-02-12,2402.06149,,HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting,https://huggingface.co/papers/2402.06149,15,2,0,0,0,0 +2024-02-12,2402.06118,,ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling,https://huggingface.co/papers/2402.06118,13,1,0,0,0,0 +2024-02-09,2402.05546,,Offline Actor-Critic Reinforcement Learning Scales to Large Models,https://huggingface.co/papers/2402.05546,4,1,0,0,0,0 +2024-02-09,2402.05672,https://github.com/microsoft/unilm,Multilingual E5 Text Embeddings: A Technical Report,https://huggingface.co/papers/2402.05672,16,2,1,7,0,100 +2024-02-09,2402.05937,,InstaGen: Enhancing Object Detection by Training on Synthetic Dataset,https://huggingface.co/papers/2402.05937,8,1,0,0,0,0 +2024-02-09,2402.05861,,Memory Consolidation Enables Long-Context Video Understanding,https://huggingface.co/papers/2402.05861,7,1,0,0,0,0 +2024-02-09,2402.05403,,In-Context Principle Learning from Mistakes,https://huggingface.co/papers/2402.05403,12,1,0,0,0,0 +2024-02-09,2402.05930,https://github.com/McGill-NLP/weblinx,WebLINX: Real-World Website Navigation with Multi-Turn Dialogue,https://huggingface.co/papers/2402.05930,35,4,1,8,1,6 +2024-02-09,2402.05140,https://github.com/sjunhongshen/tag-llm,Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains,https://huggingface.co/papers/2402.05140,19,1,1,0,0,0 +2024-02-09,2402.05472,,Question Aware Vision Transformer for Multimodal Reasoning,https://huggingface.co/papers/2402.05472,7,2,0,0,0,0 +2024-02-09,2402.05755,,SpiRit-LM: Interleaved Spoken and Written Language Model,https://huggingface.co/papers/2402.05755,7,1,0,0,0,0 +2024-02-09,2402.05929,,An Interactive Agent Foundation Model,https://huggingface.co/papers/2402.05929,25,4,0,0,0,0 +2024-02-09,2402.05932,,Driving Everywhere with Large Language Model Policy Adaptation,https://huggingface.co/papers/2402.05932,3,1,0,0,0,0 +2024-02-09,2402.05935,https://github.com/alpha-vllm/llama2-accessory,SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models,https://huggingface.co/papers/2402.05935,13,1,1,0,0,0 +2024-02-09,2402.05195,,λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space,https://huggingface.co/papers/2402.05195,16,3,0,1,0,1 +2024-02-09,2402.05468,,Implicit Diffusion: Efficient Optimization through Stochastic Sampling,https://huggingface.co/papers/2402.05468,5,1,0,0,0,0 +2024-02-09,2402.05120,,More Agents Is All You Need,https://huggingface.co/papers/2402.05120,47,5,0,0,0,0 +2024-02-08,2402.05099,https://github.com/jordan-benjamin/hydragen,Hydragen: High-Throughput LLM Inference with Shared Prefixes,https://huggingface.co/papers/2402.05099,17,3,1,0,0,0 +2024-02-08,2402.04347,,The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry,https://huggingface.co/papers/2402.04347,13,2,0,0,0,0 +2024-02-08,2402.05008,,EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss,https://huggingface.co/papers/2402.05008,19,1,0,1,0,3 +2024-02-08,2402.04925,,TP-Aware Dequantization,https://huggingface.co/papers/2402.04925,3,2,0,0,0,0 +2024-02-08,2402.04825,https://github.com/stability-ai/stable-audio-tools,Fast Timing-Conditioned Latent Audio Diffusion,https://huggingface.co/papers/2402.04825,7,1,1,0,0,0 +2024-02-08,2402.05054,,LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation,https://huggingface.co/papers/2402.05054,25,1,0,8,0,16 +2024-02-08,2402.04494,,Grandmaster-Level Chess Without Search,https://huggingface.co/papers/2402.04494,65,8,0,0,0,0 +2024-02-08,2402.04379,https://github.com/facebookresearch/crystal-llm,Fine-Tuned Language Models Generate Stable Inorganic Materials as Text,https://huggingface.co/papers/2402.04379,7,1,1,1,1,0 +2024-02-08,2402.04744,https://github.com/abhibambhaniya/progressive_gradient_flow_nm_sparsity,Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers,https://huggingface.co/papers/2402.04744,1,1,0,0,0,0 +2024-02-08,2402.04858,https://github.com/Qualcomm-AI-research/codeit,CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay,https://huggingface.co/papers/2402.04858,14,1,0,0,0,0 +2024-02-08,2402.04615,https://github.com/google-research-datasets/screen_qa,ScreenAI: A Vision-Language Model for UI and Infographics Understanding,https://huggingface.co/papers/2402.04615,33,4,0,0,4,1 +2024-02-08,2402.04291,https://github.com/aaronhuang-778/billm,BiLLM: Pushing the Limit of Post-Training Quantization for LLMs,https://huggingface.co/papers/2402.04291,48,3,1,0,0,0 +2024-02-08,2402.04792,,Direct Language Model Alignment from Online AI Feedback,https://huggingface.co/papers/2402.04792,25,3,0,0,0,0 +2024-02-08,2402.04324,https://github.com/TIGER-AI-Lab/ConsistI2V,ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation,https://huggingface.co/papers/2402.04324,23,2,1,1,0,1 +2024-02-07,2402.03570,,Diffusion World Model,https://huggingface.co/papers/2402.03570,7,1,0,0,0,0 +2024-02-07,2402.03908,https://github.com/kxhit/EscherNet,EscherNet: A Generative Model for Scalable View Synthesis,https://huggingface.co/papers/2402.03908,5,1,1,0,0,1 +2024-02-07,2402.03944,,IMUSIC: IMU-based Facial Expression Capture,https://huggingface.co/papers/2402.03944,5,1,0,0,0,0 +2024-02-07,2402.04229,,MusicRL: Aligning Music Generation to Human Preferences,https://huggingface.co/papers/2402.04229,16,1,0,0,0,0 +2024-02-07,2402.04252,https://github.com/baaivision/eva,EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters,https://huggingface.co/papers/2402.04252,22,2,1,3,0,1 +2024-02-07,2402.04141,,Multi-line AI-assisted Code Authoring,https://huggingface.co/papers/2402.04141,8,2,0,0,0,0 +2024-02-07,2402.04236,https://github.com/thudm/cogcom,CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations,https://huggingface.co/papers/2402.04236,7,1,1,0,1,0 +2024-02-07,2402.03749,https://github.com/ggjy/vision_weak_to_strong,Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models,https://huggingface.co/papers/2402.03749,10,1,0,0,0,0 +2024-02-07,2402.03766,https://github.com/meituan-automl/mobilevlm,MobileVLM V2: Faster and Stronger Baseline for Vision Language Model,https://huggingface.co/papers/2402.03766,10,4,1,4,0,1 +2024-02-07,2402.04248,https://github.com/krafton-ai/mambaformer-icl,Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks,https://huggingface.co/papers/2402.04248,25,1,0,0,0,0 +2024-02-07,2402.04177,,Scaling Laws for Downstream Task Performance of Large Language Models,https://huggingface.co/papers/2402.04177,17,4,0,0,0,0 +2024-02-07,2402.03620,,Self-Discover: Large Language Models Self-Compose Reasoning Structures,https://huggingface.co/papers/2402.03620,107,10,0,0,0,1 +2024-02-06,2402.01935,,Code Representation Learning At Scale,https://huggingface.co/papers/2402.01935,12,1,0,3,0,0 +2024-02-06,2402.03310,https://github.com/VIRL-Platform/VIRL,V-IRL: Grounding Virtual Intelligence in Real Life,https://huggingface.co/papers/2402.03310,14,2,0,0,0,0 +2024-02-06,2402.01761,https://github.com/csinva/imodelsX,Rethinking Interpretability in the Era of Large Language Models,https://huggingface.co/papers/2402.01761,20,1,0,0,0,0 +2024-02-06,2402.01831,https://github.com/NVIDIA/audio-flamingo,Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities,https://huggingface.co/papers/2402.01831,13,4,1,1,0,0 +2024-02-06,2402.02791,https://github.com/yuchuantian/rethinktinylm,Rethinking Optimization and Architecture for Tiny Language Models,https://huggingface.co/papers/2402.02791,12,1,1,0,0,0 +2024-02-06,2402.02834,,Shortened LLaMA: A Simple Depth Pruning for Large Language Models,https://huggingface.co/papers/2402.02834,12,1,0,15,0,0 +2024-02-06,2402.03161,,Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization,https://huggingface.co/papers/2402.03161,14,2,0,0,0,0 +2024-02-06,2402.01771,https://github.com/zyphra/blackmamba,BlackMamba: Mixture of Experts for State-Space Models,https://huggingface.co/papers/2402.01771,22,5,1,2,0,0 +2024-02-06,2402.01878,,LiPO: Listwise Preference Optimization through Learning-to-Rank,https://huggingface.co/papers/2402.01878,19,5,0,0,0,0 +2024-02-06,2402.01739,https://github.com/xuefuzhao/openmoe,OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models,https://huggingface.co/papers/2402.01739,26,4,1,0,0,0 +2024-02-06,2402.03162,,Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion,https://huggingface.co/papers/2402.03162,17,1,0,0,0,0 +2024-02-06,2402.03300,https://github.com/deepseek-ai/deepseek-math,DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models,https://huggingface.co/papers/2402.03300,67,6,1,24,0,7 +2024-02-06,2402.02583,https://github.com/mc-e/dragondiffusion,DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing,https://huggingface.co/papers/2402.02583,7,1,1,1,0,2 +2024-02-06,2402.03286,,Training-Free Consistent Text-to-Image Generation,https://huggingface.co/papers/2402.03286,62,11,0,0,0,0 +2024-02-06,2402.03040,https://github.com/invictus717/interactivevideo,InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions,https://huggingface.co/papers/2402.03040,16,1,1,0,0,0 +2024-02-05,2402.01613,https://github.com/nomic-ai/contrastors,Nomic Embed: Training a Reproducible Long Context Text Embedder,https://huggingface.co/papers/2402.01613,14,1,1,9,0,39 +2024-02-05,2402.01566,,Boximator: Generating Rich and Controllable Motions for Video Synthesis,https://huggingface.co/papers/2402.01566,26,4,0,0,0,0 +2024-02-05,2402.01622,https://github.com/OSU-NLP-Group/TravelPlanner,TravelPlanner: A Benchmark for Real-World Planning with Language Agents,https://huggingface.co/papers/2402.01622,31,2,1,0,1,0 +2024-02-05,2402.00892,,EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks,https://huggingface.co/papers/2402.00892,9,2,0,0,0,0 +2024-02-05,2402.01391,https://github.com/ablustrund/apps_plus,StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback,https://huggingface.co/papers/2402.01391,41,3,0,0,0,0 +2024-02-05,2402.01521,,K-Level Reasoning with Large Language Models,https://huggingface.co/papers/2402.01521,16,1,0,0,0,0 +2024-02-05,2402.01032,https://github.com/sjelassi/transformers_ssm_copy,Repeat After Me: Transformers are Better than State Space Models at Copying,https://huggingface.co/papers/2402.01032,22,4,0,0,0,0 +2024-02-05,2402.01118,,PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models,https://huggingface.co/papers/2402.01118,28,3,0,0,0,0 +2024-02-05,2402.01093,,Specialized Language Models with Cheap Inference from Limited Domain Data,https://huggingface.co/papers/2402.01093,45,2,0,0,0,0 +2024-02-02,2402.00351,https://github.com/jpmorganchase/i2i-generator-unlearning,Machine Unlearning for Image-to-Image Generative Models,https://huggingface.co/papers/2402.00351,11,2,0,0,0,0 +2024-02-02,2402.00867,,AToM: Amortized Text-to-Mesh using 2D Diffusion,https://huggingface.co/papers/2402.00867,10,2,0,0,0,0 +2024-02-02,2402.00854,https://github.com/extensityai/benchmark,SymbolicAI: A framework for logic-based approaches combining generative models and solvers,https://huggingface.co/papers/2402.00854,19,5,1,0,0,0 +2024-02-02,2402.00858,,Can Large Language Models Understand Context?,https://huggingface.co/papers/2402.00858,20,1,0,0,0,0 +2024-02-02,2402.00838,https://github.com/allenai/olmo,OLMo: Accelerating the Science of Language Models,https://huggingface.co/papers/2402.00838,75,4,1,17,1,20 +2024-02-02,2402.00518,https://github.com/pan-x-c/ee-llm,EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models,https://huggingface.co/papers/2402.00518,3,1,1,0,0,0 +2024-02-02,2402.00742,,Transforming and Combining Rewards for Aligning Large Language Models,https://huggingface.co/papers/2402.00742,11,1,0,0,0,0 +2024-02-02,2402.00786,https://github.com/manuelfay/llm-data-hub,CroissantLLM: A Truly Bilingual French-English Language Model,https://huggingface.co/papers/2402.00786,23,3,1,12,5,5 +2024-02-02,2402.00159,https://github.com/allenai/dolma,Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research,https://huggingface.co/papers/2402.00159,55,1,1,0,4,0 +2024-02-02,2402.00769,https://github.com/g-u-n/animatelcm,AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning,https://huggingface.co/papers/2402.00769,20,2,1,3,0,13 +2024-02-02,2402.00396,,Efficient Exploration for LLMs,https://huggingface.co/papers/2402.00396,19,1,0,0,0,0 +2024-02-01,2401.17895,,ReplaceAnything3D:Text-Guided 3D Scene Editing with Compositional Neural Radiance Fields,https://huggingface.co/papers/2401.17895,15,3,0,0,0,0 +2024-02-01,2401.17807,,Advances in 3D Generation: A Survey,https://huggingface.co/papers/2401.17807,16,2,0,0,0,0 +2024-02-01,2401.17509,,Anything in Any Scene: Photorealistic Video Object Insertion,https://huggingface.co/papers/2401.17509,16,1,0,0,0,0 +2024-02-01,2401.18075,,CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting,https://huggingface.co/papers/2401.18075,7,1,0,0,0,0 +2024-02-01,2401.18059,https://github.com/parthsarthi03/RAPTOR,RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval,https://huggingface.co/papers/2401.18059,28,2,1,0,0,0 +2024-02-01,2401.17464,,Efficient Tool Use with Chain-of-Abstraction Reasoning,https://huggingface.co/papers/2401.17464,16,1,0,0,0,0 +2024-02-01,2401.17583,https://github.com/lecar-lab/abs,Agile But Safe: Learning Collision-Free High-Speed Legged Locomotion,https://huggingface.co/papers/2401.17583,23,3,0,0,0,0 +2024-02-01,2401.18058,https://github.com/thudm/longalign,LongAlign: A Recipe for Long Context Alignment of Large Language Models,https://huggingface.co/papers/2401.18058,21,1,1,6,1,1 +2024-02-01,2401.17574,,Scavenging Hyena: Distilling Transformers into Long Convolution Models,https://huggingface.co/papers/2401.17574,14,1,0,0,0,0 +2024-02-01,2401.17377,,Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens,https://huggingface.co/papers/2401.17377,32,2,0,0,0,1 +2024-01-31,2401.16658,,OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer,https://huggingface.co/papers/2401.16658,12,1,0,3,0,3 +2024-01-31,2401.17264,https://github.com/facebookresearch/audioseal,Proactive Detection of Voice Cloning with Localized Watermarking,https://huggingface.co/papers/2401.17264,15,4,1,0,0,0 +2024-01-31,2401.16468,https://github.com/mv-lab/InstructIR,High-Quality Image Restoration Following Human Instructions,https://huggingface.co/papers/2401.16468,11,2,1,1,0,7 +2024-01-31,2401.17053,https://github.com/Tencent/BlockFusion,BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation,https://huggingface.co/papers/2401.17053,29,1,1,0,0,0 +2024-01-31,2401.17093,,StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis,https://huggingface.co/papers/2401.17093,18,1,0,0,0,0 +2024-01-31,2401.16677,,T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives,https://huggingface.co/papers/2401.16677,3,1,0,0,0,0 +2024-01-31,2401.16861,https://github.com/yikai-wang/res,Repositioning the Subject within Image,https://huggingface.co/papers/2401.16861,13,1,0,0,0,0 +2024-01-31,2401.16467,https://github.com/esteng/regal_program_learning,ReGAL: Refactoring Programs to Discover Generalizable Abstractions,https://huggingface.co/papers/2401.16467,7,2,1,0,0,0 +2024-01-31,2401.17181,,Transfer Learning for Text Diffusion Models,https://huggingface.co/papers/2401.17181,14,3,0,0,0,0 +2024-01-31,2401.17221,https://github.com/fudannlplab/mousi,MouSi: Poly-Visual-Expert Vision-Language Models,https://huggingface.co/papers/2401.17221,7,1,0,0,0,0 +2024-01-31,2401.17270,https://github.com/ailab-cvc/yolo-world,YOLO-World: Real-Time Open-Vocabulary Object Detection,https://huggingface.co/papers/2401.17270,30,2,1,0,0,1 +2024-01-31,2401.16818,,H2O-Danube-1.8B Technical Report,https://huggingface.co/papers/2401.16818,16,1,0,29,0,8 +2024-01-31,2401.17256,https://github.com/xuandongzhao/weak-to-strong,Weak-to-Strong Jailbreaking on Large Language Models,https://huggingface.co/papers/2401.17256,14,1,1,0,0,0 +2024-01-31,2401.17268,,Weaver: Foundation Models for Creative Writing,https://huggingface.co/papers/2401.17268,39,5,0,0,0,0 +2024-01-30,2401.15688,,Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation,https://huggingface.co/papers/2401.15688,11,0,0,0,0,0 +2024-01-30,2401.15708,,Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding,https://huggingface.co/papers/2401.15708,10,3,0,0,0,0 +2024-01-30,2401.15687,,Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance,https://huggingface.co/papers/2401.15687,20,4,0,0,0,0 +2024-01-30,2401.16013,,SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning,https://huggingface.co/papers/2401.16013,19,1,0,0,0,0 +2024-01-30,2401.15975,https://github.com/qinghew/StableIdentity,StableIdentity: Inserting Anybody into Anywhere at First Sight,https://huggingface.co/papers/2401.15975,16,2,1,0,0,0 +2024-01-30,2401.15914,https://github.com/apple/ml-ogen,Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization,https://huggingface.co/papers/2401.15914,7,1,0,0,0,0 +2024-01-30,2401.16380,,Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling,https://huggingface.co/papers/2401.16380,46,7,0,0,0,0 +2024-01-30,2401.16420,https://github.com/internlm/internlm-xcomposer,InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model,https://huggingface.co/papers/2401.16420,54,1,1,10,0,2 +2024-01-30,2401.16158,https://github.com/x-plug/mobileagent,Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception,https://huggingface.co/papers/2401.16158,16,4,1,0,0,1 +2024-01-30,2401.15977,,Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling,https://huggingface.co/papers/2401.15977,35,8,0,0,0,0 +2024-01-30,2401.15947,https://github.com/PKU-YuanGroup/MoE-LLaVA,MoE-LLaVA: Mixture of Experts for Large Vision-Language Models,https://huggingface.co/papers/2401.15947,48,4,1,10,1,2 +2024-01-29,2401.15077,https://github.com/safeailab/eagle,EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty,https://huggingface.co/papers/2401.15077,17,6,1,0,0,0 +2024-01-29,2401.14828,,TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts,https://huggingface.co/papers/2401.14828,6,1,0,0,0,0 +2024-01-29,2401.14673,,Generative Expressive Robot Behaviors using Large Language Models,https://huggingface.co/papers/2401.14673,4,1,0,0,0,0 +2024-01-29,2401.14688,https://github.com/IDEA-CCNL/Taiyi-Diffusion-XL,Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support,https://huggingface.co/papers/2401.14688,13,2,1,1,0,2 +2024-01-29,2401.14953,https://github.com/google-deepmind/neural_networks_solomonoff_induction,Learning Universal Predictors,https://huggingface.co/papers/2401.14953,18,1,0,0,0,0 +2024-01-29,2401.15024,https://github.com/microsoft/transformercompression,SliceGPT: Compress Large Language Models by Deleting Rows and Columns,https://huggingface.co/papers/2401.15024,64,6,1,0,0,0 +2024-01-29,2401.15071,,"From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities",https://huggingface.co/papers/2401.15071,33,1,0,0,0,0 +2024-01-26,2401.14367,,Genie: Achieving Human Parity in Content-Grounded Datasets Generation,https://huggingface.co/papers/2401.14367,6,1,0,0,0,0 +2024-01-26,2401.14403,,Adaptive Mobile Manipulation for Articulated Objects In the Open World,https://huggingface.co/papers/2401.14403,9,2,0,0,0,0 +2024-01-26,2401.14405,https://github.com/ailab-cvc/m2pt,Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities,https://huggingface.co/papers/2401.14405,11,2,0,0,0,0 +2024-01-26,2401.14398,https://github.com/cvlab-columbia/pix2gestalt,pix2gestalt: Amodal Segmentation by Synthesizing Wholes,https://huggingface.co/papers/2401.14398,8,1,1,1,0,0 +2024-01-26,2401.13795,,Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All,https://huggingface.co/papers/2401.13795,64,2,0,0,0,0 +2024-01-26,2401.14391,https://github.com/TonyLianLong/CrossMAE,Rethinking Patch Dependence for Masked Autoencoders,https://huggingface.co/papers/2401.14391,22,2,1,1,0,0 +2024-01-26,2401.14257,,Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation,https://huggingface.co/papers/2401.14257,9,1,0,0,0,0 +2024-01-26,2401.14112,https://github.com/microsoft/DeepSpeed,FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design,https://huggingface.co/papers/2401.14112,17,7,1,0,0,0 +2024-01-26,2401.14196,https://github.com/deepseek-ai/DeepSeek-Coder,DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence,https://huggingface.co/papers/2401.14196,45,2,1,1,2,0 +2024-01-26,2401.13974,https://github.com/SalesforceAIResearch/bootpig,BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models,https://huggingface.co/papers/2401.13974,12,1,1,0,0,0 +2024-01-26,2401.14066,https://github.com/haha-lisa/CreativeSynth,CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion,https://huggingface.co/papers/2401.14066,7,1,0,0,0,0 +2024-01-26,2401.13919,https://github.com/minorjerry/webvoyager,WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models,https://huggingface.co/papers/2401.13919,22,3,1,0,0,0 +2024-01-26,2401.14404,,Deconstructing Denoising Diffusion Models for Self-Supervised Learning,https://huggingface.co/papers/2401.14404,16,1,0,0,0,0 +2024-01-26,2401.14019,https://github.com/ibm/unitxt,"Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI",https://huggingface.co/papers/2401.14019,19,1,1,0,1,1 +2024-01-25,2401.13160,,SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection,https://huggingface.co/papers/2401.13160,9,2,0,0,0,0 +2024-01-25,2401.13303,,MaLA-500: Massive Language Adaptation of Large Language Models,https://huggingface.co/papers/2401.13303,11,1,0,2,0,1 +2024-01-25,2401.13311,https://github.com/rohan598/contextual,ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models,https://huggingface.co/papers/2401.13311,9,1,1,0,2,0 +2024-01-25,2401.13601,,MM-LLMs: Recent Advances in MultiModal Large Language Models,https://huggingface.co/papers/2401.13601,42,5,0,0,0,0 +2024-01-25,2401.13627,,Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild,https://huggingface.co/papers/2401.13627,70,14,0,0,0,1 +2024-01-25,2401.13660,,MambaByte: Token-free Selective State Space Model,https://huggingface.co/papers/2401.13660,47,4,0,6,0,0 +2024-01-25,2401.13388,,UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion,https://huggingface.co/papers/2401.13388,10,3,0,0,0,0 +2024-01-24,2401.12979,,GALA: Generating Animatable Layered Assets from a Single Scan,https://huggingface.co/papers/2401.12979,6,1,0,0,0,0 +2024-01-24,2401.12246,https://github.com/orionstarai/orion,Orion-14B: Open-source Multilingual Large Language Models,https://huggingface.co/papers/2401.12246,10,2,1,1,0,2 +2024-01-24,2401.12522,https://github.com/linfeng93/bita,BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models,https://huggingface.co/papers/2401.12522,11,1,1,0,0,0 +2024-01-24,2401.12789,,Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study,https://huggingface.co/papers/2401.12789,6,1,0,0,0,0 +2024-01-24,2401.12244,,Large-scale Reinforcement Learning for Diffusion Models,https://huggingface.co/papers/2401.12244,28,1,0,0,0,0 +2024-01-24,2401.12963,,AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents,https://huggingface.co/papers/2401.12963,12,2,0,0,0,0 +2024-01-24,2401.12954,https://github.com/suzgunmirac/meta-prompting,Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding,https://huggingface.co/papers/2401.12954,28,5,1,0,0,1 +2024-01-24,2401.12945,,Lumiere: A Space-Time Diffusion Model for Video Generation,https://huggingface.co/papers/2401.12945,85,10,0,0,0,1 +2024-01-24,2401.12474,https://github.com/ofa-sys/ditto,Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment,https://huggingface.co/papers/2401.12474,33,1,0,0,0,0 +2024-01-24,2401.12503,,Small Language Model Meets with Reinforced Vision Vocabulary,https://huggingface.co/papers/2401.12503,31,2,0,0,0,0 +2024-01-23,2401.11985,,Scaling Face Interaction Graph Networks to Real World Scenes,https://huggingface.co/papers/2401.11985,2,1,0,0,0,0 +2024-01-23,2401.11078,,UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures,https://huggingface.co/papers/2401.11078,6,1,0,0,0,0 +2024-01-23,2401.11002,,Fast Registration of Photorealistic Avatars for VR Facial Animation,https://huggingface.co/papers/2401.11002,1,1,0,0,0,0 +2024-01-23,2401.12179,,DITTO: Diffusion Inference-Time T-Optimization for Music Generation,https://huggingface.co/papers/2401.12179,18,2,0,0,0,0 +2024-01-23,2401.12175,,Single-View 3D Human Digitalization with Large Reconstruction Models,https://huggingface.co/papers/2401.12175,5,1,0,0,0,0 +2024-01-23,2401.11605,https://github.com/crowsonkb/k-diffusion,Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers,https://huggingface.co/papers/2401.11605,19,2,1,0,0,0 +2024-01-23,2401.11053,,StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion,https://huggingface.co/papers/2401.11053,8,1,0,0,0,0 +2024-01-23,2401.11067,,Make-A-Shape: a Ten-Million-scale 3D Shape Model,https://huggingface.co/papers/2401.11067,15,1,0,0,0,0 +2024-01-23,2401.11739,,EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models,https://huggingface.co/papers/2401.11739,16,2,0,0,0,0 +2024-01-23,2401.12187,,WARM: On the Benefits of Weight Averaged Reward Models,https://huggingface.co/papers/2401.12187,17,7,0,0,0,0 +2024-01-23,2401.12202,https://github.com/ok-robot/ok-robot,OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics,https://huggingface.co/papers/2401.12202,9,2,0,0,0,0 +2024-01-23,2401.12168,,SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities,https://huggingface.co/papers/2401.12168,23,2,0,2,0,1 +2024-01-23,2401.12208,,CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation,https://huggingface.co/papers/2401.12208,20,2,0,1,0,3 +2024-01-23,2401.11708,https://github.com/yangling0818/rpg-diffusionmaster,"Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs",https://huggingface.co/papers/2401.11708,28,2,1,1,0,0 +2024-01-23,2401.11944,,CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark,https://huggingface.co/papers/2401.11944,24,2,0,40,1,100 +2024-01-23,2401.12070,https://github.com/ahans30/binoculars,Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text,https://huggingface.co/papers/2401.12070,42,3,1,0,0,3 +2024-01-22,2401.10831,,Understanding Video Transformers via Universal Concept Discovery,https://huggingface.co/papers/2401.10831,7,1,0,0,0,0 +2024-01-22,2401.10838,https://github.com/berkeleyhci/rambler,Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation,https://huggingface.co/papers/2401.10838,8,2,0,0,0,0 +2024-01-22,2401.10241,https://github.com/sail-sg/zero-bubble-pipeline-parallelism,Zero Bubble Pipeline Parallelism,https://huggingface.co/papers/2401.10241,22,3,1,0,0,1 +2024-01-22,2401.10889,,Synthesizing Moving People with 3D Control,https://huggingface.co/papers/2401.10889,11,1,0,0,0,0 +2024-01-22,2401.10891,https://github.com/LiheYoung/Depth-Anything,Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data,https://huggingface.co/papers/2401.10891,54,2,1,14,1,51 +2024-01-22,2401.10404,,Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution,https://huggingface.co/papers/2401.10404,9,1,0,0,0,0 +2024-01-22,2401.10822,,ActAnywhere: Subject-Aware Video Background Generation,https://huggingface.co/papers/2401.10822,13,1,0,0,0,0 +2024-01-22,2401.10774,https://github.com/fasterdecoding/medusa,Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads,https://huggingface.co/papers/2401.10774,51,2,1,0,0,0 +2024-01-19,2401.10171,,SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild,https://huggingface.co/papers/2401.10171,11,1,0,0,0,0 +2024-01-19,2401.09603,https://github.com/google-research/google-research/tree/master/cmmd,Rethinking FID: Towards a Better Evaluation Metric for Image Generation,https://huggingface.co/papers/2401.09603,14,2,0,0,0,0 +2024-01-19,2401.10032,https://github.com/kaistmm/fregrad,FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder,https://huggingface.co/papers/2401.10032,11,1,0,0,0,0 +2024-01-19,2401.10166,https://github.com/mzeromiko/vmamba,VMamba: Visual State Space Model,https://huggingface.co/papers/2401.10166,36,2,0,0,0,0 +2024-01-19,2401.10225,,ChatQA: Building GPT-4 Level Conversational QA Models,https://huggingface.co/papers/2401.10225,32,6,0,43,2,8 +2024-01-19,2401.09865,,Improving fine-grained understanding in image-text pre-training,https://huggingface.co/papers/2401.09865,14,1,0,0,0,0 +2024-01-19,2401.10061,,DiffusionGPT: LLM-Driven Text-to-Image Generation System,https://huggingface.co/papers/2401.10061,26,4,0,0,0,0 +2024-01-19,2401.09962,,CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects,https://huggingface.co/papers/2401.09962,6,1,0,0,0,0 +2024-01-19,2401.09985,,WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens,https://huggingface.co/papers/2401.09985,14,1,0,0,0,0 +2024-01-19,2401.10020,,Self-Rewarding Language Models,https://huggingface.co/papers/2401.10020,138,16,0,20,4,6 +2024-01-18,2401.09416,,TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion,https://huggingface.co/papers/2401.09416,8,1,0,0,0,0 +2024-01-18,2401.08937,,ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization,https://huggingface.co/papers/2401.08937,5,1,0,0,0,0 +2024-01-18,2401.09048,https://github.com/tomtom1103/compose-and-conquer,Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis,https://huggingface.co/papers/2401.09048,7,2,1,0,0,0 +2024-01-18,2401.09419,https://github.com/chungmin99/garfield,GARField: Group Anything with Radiance Fields,https://huggingface.co/papers/2401.09419,16,2,1,0,0,0 +2024-01-18,2401.09135,https://github.com/google-deepmind/asyncdiloco,Asynchronous Local-SGD Training for Language Modeling,https://huggingface.co/papers/2401.09135,9,2,0,0,0,0 +2024-01-18,2401.08740,https://github.com/willisma/sit,SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers,https://huggingface.co/papers/2401.08740,10,1,0,0,1,0 +2024-01-18,2401.09417,https://github.com/hustvl/vim,Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model,https://huggingface.co/papers/2401.09417,53,3,1,3,0,0 +2024-01-18,2401.09340,,SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding,https://huggingface.co/papers/2401.09340,17,1,0,0,0,0 +2024-01-18,2401.09047,https://github.com/ailab-cvc/videocrafter,VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models,https://huggingface.co/papers/2401.09047,13,2,1,0,0,1 +2024-01-18,2401.09084,,UniVG: Towards UNIfied-modal Video Generation,https://huggingface.co/papers/2401.09084,15,13,0,0,0,0 +2024-01-18,2401.08671,https://github.com/microsoft/DeepSpeed,DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference,https://huggingface.co/papers/2401.08671,12,2,1,0,0,0 +2024-01-18,2401.08967,https://github.com/lqtrung1998/mwp_reft,ReFT: Reasoning with Reinforced Fine-Tuning,https://huggingface.co/papers/2401.08967,27,2,1,10,0,0 +2024-01-17,2401.07727,,HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation,https://huggingface.co/papers/2401.07727,8,1,0,0,0,0 +2024-01-17,2401.06951,,E^2-LLM: Efficient and Extreme Length Extension of Large Language Models,https://huggingface.co/papers/2401.06951,24,3,0,0,0,0 +2024-01-17,2401.07004,https://github.com/gair-nlp/entropy-abf,Extending LLMs' Context Window with 100 Samples,https://huggingface.co/papers/2401.07004,14,1,1,0,0,0 +2024-01-17,2401.07049,,Quantum Denoising Diffusion Models,https://huggingface.co/papers/2401.07049,12,1,0,0,0,0 +2024-01-17,2401.08565,https://github.com/alisawuffles/proxy-tuning,Tuning Language Models by Proxy,https://huggingface.co/papers/2401.08565,19,2,0,0,0,0 +2024-01-17,2401.07781,,Towards A Better Metric for Text-to-Video Generation,https://huggingface.co/papers/2401.07781,13,5,0,0,0,0 +2024-01-17,2401.08417,https://github.com/fe1ixxu/alma,Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation,https://huggingface.co/papers/2401.08417,28,3,1,12,2,5 +2024-01-17,2401.07519,https://github.com/instantid/instantid,InstantID: Zero-shot Identity-Preserving Generation in Seconds,https://huggingface.co/papers/2401.07519,51,7,1,4,0,100 +2024-01-17,2401.08541,https://github.com/apple/ml-aim,Scalable Pre-training of Large Autoregressive Image Models,https://huggingface.co/papers/2401.08541,35,6,1,4,0,0 +2024-01-12,2401.05735,,Object-Centric Diffusion for Efficient Video Editing,https://huggingface.co/papers/2401.05735,6,0,0,0,0,0 +2024-01-12,2401.06105,,PALP: Prompt Aligned Personalization of Text-to-Image Models,https://huggingface.co/papers/2401.06105,46,2,0,0,0,0 +2024-01-12,2401.06129,,Distilling Vision-Language Models on Millions of Videos,https://huggingface.co/papers/2401.06129,14,0,0,0,0,0 +2024-01-12,2401.05749,https://github.com/amazon-science/multi-way-parallel-ccmatrix,A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism,https://huggingface.co/papers/2401.05749,6,0,1,0,0,0 +2024-01-12,2401.05811,,"Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages",https://huggingface.co/papers/2401.05811,5,0,0,0,1,0 +2024-01-12,2401.06080,https://github.com/openlmlab/moss-rlhf,Secrets of RLHF in Large Language Models Part II: Reward Modeling,https://huggingface.co/papers/2401.06080,24,4,1,1,3,1 +2024-01-12,2401.06003,https://github.com/lfranke/trips,TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering,https://huggingface.co/papers/2401.06003,20,0,0,0,0,0 +2024-01-12,2401.05391,,Efficient LLM inference solution on Intel GPU,https://huggingface.co/papers/2401.05391,8,1,0,0,0,0 +2024-01-12,2401.05566,https://github.com/anthropics/sleeper-agents-paper,Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training,https://huggingface.co/papers/2401.05566,24,0,0,1,2,0 +2024-01-12,2401.06071,,LEGO:Language Enhanced Multi-modal Grounding Model,https://huggingface.co/papers/2401.06071,10,0,0,1,0,0 +2024-01-12,2401.06121,,TOFU: A Task of Fictitious Unlearning for LLMs,https://huggingface.co/papers/2401.06121,14,0,0,2,2,1 +2024-01-12,2401.05583,,Diffusion Priors for Dynamic View Synthesis from Monocular Videos,https://huggingface.co/papers/2401.05583,7,0,0,0,0,0 +2024-01-12,2401.05561,https://github.com/HowieHwong/TrustLLM,TrustLLM: Trustworthiness in Large Language Models,https://huggingface.co/papers/2401.05561,62,3,1,0,0,0 +2024-01-12,2401.06066,https://github.com/deepseek-ai/deepseek-moe,DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models,https://huggingface.co/papers/2401.06066,38,2,1,22,0,11 +2024-01-12,2401.05654,,Towards Conversational Diagnostic AI,https://huggingface.co/papers/2401.05654,13,0,0,0,0,0 +2024-01-12,2401.06102,,Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models,https://huggingface.co/papers/2401.06102,18,0,0,0,0,1 +2024-01-12,2401.05675,,Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation,https://huggingface.co/papers/2401.05675,20,1,0,0,0,0 +2024-01-12,2401.06104,https://github.com/schwartz-lab-nlp/tova,Transformers are Multi-State RNNs,https://huggingface.co/papers/2401.06104,34,4,0,0,0,0 +2024-01-11,2401.05293,,Score Distillation Sampling with Learned Manifold Corrective,https://huggingface.co/papers/2401.05293,6,1,0,0,0,0 +2024-01-11,2401.05314,https://github.com/davidmchan/anim400k,ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video,https://huggingface.co/papers/2401.05314,8,0,1,0,1,0 +2024-01-11,2401.05033,,Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk,https://huggingface.co/papers/2401.05033,14,0,0,0,0,0 +2024-01-11,2401.04925,https://github.com/jmyissb/The-Impact-of-Reasoning-Step-Length-on-Large-Language-Models,The Impact of Reasoning Step Length on Large Language Models,https://huggingface.co/papers/2401.04925,15,2,0,0,0,0 +2024-01-11,2401.05335,,InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes,https://huggingface.co/papers/2401.05335,26,0,0,0,0,0 +2024-01-11,2401.05334,,URHand: Universal Relightable Hands,https://huggingface.co/papers/2401.05334,20,0,0,0,0,0 +2024-01-11,2401.05252,https://github.com/PixArt-alpha/PixArt-alpha,PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models,https://huggingface.co/papers/2401.05252,44,4,1,0,0,0 +2024-01-10,2401.04283,,FADI-AEC: Fast Score Based Diffusion Model Guided by Far-end Signal for Acoustic Echo Cancellation,https://huggingface.co/papers/2401.04283,3,0,0,0,0,0 +2024-01-10,2401.04718,,Jump Cut Smoothing for Talking Heads,https://huggingface.co/papers/2401.04718,17,0,0,0,0,0 +2024-01-10,2401.04398,,Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding,https://huggingface.co/papers/2401.04398,19,0,0,0,0,0 +2024-01-10,2401.04658,https://github.com/opennlplab/lightning-attention,Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models,https://huggingface.co/papers/2401.04658,24,3,0,1,0,0 +2024-01-10,2401.04695,,Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers,https://huggingface.co/papers/2401.04695,8,0,0,0,0,0 +2024-01-10,2401.04577,,Masked Audio Generation using a Single Non-Autoregressive Transformer,https://huggingface.co/papers/2401.04577,39,6,0,8,0,44 +2024-01-10,2401.04575,,Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding,https://huggingface.co/papers/2401.04575,14,3,0,0,0,0 +2024-01-10,2401.04468,,MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation,https://huggingface.co/papers/2401.04468,46,6,0,0,0,0 +2024-01-09,2401.03506,https://github.com/google/speaker-id/tree/master/DiarizationLM,DiarizationLM: Speaker Diarization Post-Processing with Large Language Models,https://huggingface.co/papers/2401.03506,13,1,0,1,0,1 +2024-01-09,2401.04099,,AGG: Amortized Generative 3D Gaussians for Single Image to 3D,https://huggingface.co/papers/2401.04099,7,1,0,0,0,0 +2024-01-09,2401.03804,,TeleChat Technical Report,https://huggingface.co/papers/2401.03804,7,0,0,9,1,1 +2024-01-09,2401.04081,https://github.com/llm-random/llm-random,MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts,https://huggingface.co/papers/2401.04081,69,6,0,0,0,0 +2024-01-09,2401.02994,,"Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM",https://huggingface.co/papers/2401.02994,46,0,0,0,0,0 +2024-01-09,2401.02987,,Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach,https://huggingface.co/papers/2401.02987,8,0,0,0,0,0 +2024-01-09,2401.03003,https://github.com/gonglinyuan/ast_t5,AST-T5: Structure-Aware Pretraining for Code Generation and Understanding,https://huggingface.co/papers/2401.03003,12,2,1,1,0,0 +2024-01-09,2401.03462,https://github.com/flagopen/flagembedding,Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon,https://huggingface.co/papers/2401.03462,26,1,1,23,0,100 +2024-01-09,2401.03065,,"CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution",https://huggingface.co/papers/2401.03065,10,0,0,0,1,0 +2024-01-09,2401.04092,https://github.com/3DTopia/GPTEval3D,GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation,https://huggingface.co/papers/2401.04092,20,1,0,0,0,1 +2024-01-09,2401.04088,,Mixtral of Experts,https://huggingface.co/papers/2401.04088,156,5,0,18,0,1 +2024-01-08,2401.02955,https://github.com/harboryuan/ovsam,Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively,https://huggingface.co/papers/2401.02955,18,1,1,0,0,0 +2024-01-08,2401.02957,https://github.com/Jiawei-Yang/Denoising-ViT,Denoising Vision Transformers,https://huggingface.co/papers/2401.02957,27,2,0,0,0,0 +2024-01-08,2401.02823,,DocGraphLM: Documental Graph Language Model for Information Extraction,https://huggingface.co/papers/2401.02823,32,4,0,0,0,0 +2024-01-08,2401.02669,,Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache,https://huggingface.co/papers/2401.02669,14,2,0,0,0,0 +2024-01-08,2401.02839,https://github.com/PolyAI-LDN/pheme,Pheme: Efficient and Conversational Speech Generation,https://huggingface.co/papers/2401.02839,14,2,1,0,0,2 +2024-01-08,2401.02677,https://github.com/segmind/ssd-1b,Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss,https://huggingface.co/papers/2401.02677,21,2,1,2,0,100 +2024-01-08,2401.02954,https://github.com/deepseek-ai/deepseek-llm,DeepSeek LLM: Scaling Open-Source Language Models with Longtermism,https://huggingface.co/papers/2401.02954,39,4,1,0,0,0 +2024-01-05,2401.01970,,FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding,https://huggingface.co/papers/2401.01970,6,1,0,0,0,0 +2024-01-05,2401.01974,,Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers,https://huggingface.co/papers/2401.01974,5,1,0,0,0,0 +2024-01-05,2401.02330,,LLaVA-φ: Efficient Multi-Modal Assistant with Small Language Model,https://huggingface.co/papers/2401.02330,14,1,0,0,0,0 +2024-01-05,2401.02015,,Improving Diffusion-Based Image Synthesis with Context Prediction,https://huggingface.co/papers/2401.02015,6,1,0,0,0,0 +2024-01-05,2401.02117,,Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation,https://huggingface.co/papers/2401.02117,26,2,0,0,0,0 +2024-01-05,2401.02400,,Learning the 3D Fauna of the Web,https://huggingface.co/papers/2401.02400,9,1,0,0,0,0 +2024-01-05,2401.02415,https://github.com/tencentarc/llama-pro,LLaMA Pro: Progressive LLaMA with Block Expansion,https://huggingface.co/papers/2401.02415,50,3,1,0,0,0 +2024-01-05,2401.02385,https://github.com/jzhang38/tinyllama,TinyLlama: An Open-Source Small Language Model,https://huggingface.co/papers/2401.02385,84,11,1,0,0,0 +2024-01-05,2401.02411,,What You See is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs,https://huggingface.co/papers/2401.02411,12,1,0,0,0,0 +2024-01-05,2401.02072,,ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers,https://huggingface.co/papers/2401.02072,9,1,0,0,0,0 +2024-01-05,2401.02416,,ODIN: A Single Model for 2D and 3D Perception,https://huggingface.co/papers/2401.02416,10,1,0,0,0,0 +2024-01-05,2401.02038,,Understanding LLMs: A Comprehensive Overview from Training to Inference,https://huggingface.co/papers/2401.02038,60,2,0,0,0,0 +2024-01-05,2401.02412,,LLM Augmented LLMs: Expanding Capabilities through Composition,https://huggingface.co/papers/2401.02412,36,1,0,0,0,0 +2024-01-05,2401.01952,,Instruct-Imagen: Image Generation with Multi-modal Instruction,https://huggingface.co/papers/2401.01952,30,3,0,0,0,0 +2024-01-04,2401.01461,,Efficient Hybrid Zoom using Camera Fusion on Mobile Phones,https://huggingface.co/papers/2401.01461,7,2,0,0,0,0 +2024-01-04,2401.01827,https://github.com/salesforce/lavis,Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions,https://huggingface.co/papers/2401.01827,14,1,0,0,0,0 +2024-01-04,2401.01699,,WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope,https://huggingface.co/papers/2401.01699,6,0,0,0,0,0 +2024-01-04,2401.01854,,Multilingual Instruction Tuning With Just a Pinch of Multilinguality,https://huggingface.co/papers/2401.01854,10,0,0,0,0,0 +2024-01-04,2401.01862,,A Vision Check-up for Language Models,https://huggingface.co/papers/2401.01862,9,0,0,0,0,0 +2024-01-04,2401.01808,https://github.com/huggingface/amused,aMUSEd: An Open MUSE Reproduction,https://huggingface.co/papers/2401.01808,27,3,1,0,0,0 +2024-01-04,2401.01647,,SIGNeRF: Scene Integrated Generation for Neural Radiance Fields,https://huggingface.co/papers/2401.01647,12,1,0,0,0,0 +2024-01-04,2401.01792,,CoMoSVC: Consistency Model-based Singing Voice Conversion,https://huggingface.co/papers/2401.01792,8,0,0,0,0,0 +2024-01-04,2401.01755,,Incremental FastPitch: Chunk-based High Quality Text to Speech,https://huggingface.co/papers/2401.01755,7,3,0,0,0,0 +2024-01-04,2401.01885,https://github.com/facebookresearch/audio2photoreal,From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations,https://huggingface.co/papers/2401.01885,27,6,0,0,0,2 +2024-01-04,2401.01702,,Image Sculpting: Precise Object Editing with 3D Geometry Control,https://huggingface.co/papers/2401.01702,18,1,0,0,0,0 +2024-01-04,2401.01614,https://github.com/osu-nlp-group/seeact,"GPT-4V(ision) is a Generalist Web Agent, if Grounded",https://huggingface.co/papers/2401.01614,20,1,1,0,2,0 +2024-01-03,2401.01256,,VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM,https://huggingface.co/papers/2401.01256,19,2,0,0,0,0 +2024-01-03,2401.01325,https://github.com/datamllab/LongLM,LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning,https://huggingface.co/papers/2401.01325,26,3,0,0,1,1 +2024-01-03,2401.01117,https://github.com/q-future/q-refine,Q-Refine: A Perceptual Quality Refiner for AI-Generated Image,https://huggingface.co/papers/2401.01117,8,0,0,0,0,0 +2024-01-03,2401.01286,https://github.com/zjunlp/easyedit,A Comprehensive Study of Knowledge Editing for Large Language Models,https://huggingface.co/papers/2401.01286,16,0,1,0,1,0 +2024-01-03,2401.01173,,En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data,https://huggingface.co/papers/2401.01173,11,7,0,1,0,0 +2024-01-03,2401.00896,,TrailBlazer: Trajectory Control for Diffusion-Based Video Generation,https://huggingface.co/papers/2401.00896,13,1,0,0,0,1 +2024-01-03,2401.01055,,LLaMA Beyond English: An Empirical Study on Language Capability Transfer,https://huggingface.co/papers/2401.01055,52,4,0,0,1,0 +2024-01-03,2401.00909,,Taming Mode Collapse in Score Distillation for Text-to-3D Generation,https://huggingface.co/papers/2401.00909,9,0,0,0,0,0 +2024-01-03,2401.00935,,Boundary Attention: Learning to Find Faint Boundaries at Any Resolution,https://huggingface.co/papers/2401.00935,16,0,0,0,0,0 +2024-01-03,2401.00908,,DocLLM: A layout-aware generative language model for multimodal document understanding,https://huggingface.co/papers/2401.00908,178,23,0,0,0,0 +2024-01-03,2401.01335,https://github.com/uclaml/SPIN,Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models,https://huggingface.co/papers/2401.01335,62,2,1,21,0,5 +2024-01-02,2401.00134,,Unicron: Economizing Self-Healing LLM Training at Scale,https://huggingface.co/papers/2401.00134,9,1,0,0,0,0 +2024-01-02,2401.00849,,COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training,https://huggingface.co/papers/2401.00849,14,2,0,0,0,0 +2024-01-02,2401.00604,,SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity,https://huggingface.co/papers/2401.00604,4,1,0,0,0,0 +2024-01-02,2401.00434,https://github.com/geobrain-ai/geogalactica,GeoGalactica: A Scientific Large Language Model in Geoscience,https://huggingface.co/papers/2401.00434,7,2,1,1,0,1 +2024-01-02,2401.00368,,Improving Text Embeddings with Large Language Models,https://huggingface.co/papers/2401.00368,78,15,0,12,3,34 +2024-01-02,2401.00246,,Boosting Large Language Model for Speech Synthesis: An Empirical Study,https://huggingface.co/papers/2401.00246,9,1,0,0,0,0 +2024-01-02,2401.00788,https://github.com/agi-edgerunners/llm-adapters,Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models,https://huggingface.co/papers/2401.00788,21,1,1,0,0,0 +2024-01-02,2401.00448,,Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws,https://huggingface.co/papers/2401.00448,27,2,0,0,0,0 +2024-01-01,2312.17653,,LARP: Language-Agent Role Play for Open-World Games,https://huggingface.co/papers/2312.17653,29,1,0,0,0,0 +2024-01-01,2312.17681,,FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis,https://huggingface.co/papers/2312.17681,18,1,0,0,0,0 +2024-01-01,2312.17742,https://github.com/google-research/syn-rep-learn,Learning Vision from Models Rivals Learning Vision from Data,https://huggingface.co/papers/2312.17742,14,2,0,0,0,0 +2024-01-01,2312.17661,https://github.com/eternityyw/gemini-commonsense-evaluation,Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models,https://huggingface.co/papers/2312.17661,12,1,0,0,0,0 +2024-01-01,2312.17276,,PanGu-π: Enhancing Language Model Architectures via Nonlinearity Compensation,https://huggingface.co/papers/2312.17276,14,1,0,0,0,0 +2023-12-29,2312.16457,,City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web,https://huggingface.co/papers/2312.16457,13,1,0,0,0,0 +2023-12-29,2312.16812,https://github.com/oppo-us-research/spacetimegaussians,Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis,https://huggingface.co/papers/2312.16812,9,2,1,0,0,0 +2023-12-29,2312.17135,,InsActor: Instruction-driven Physics-based Characters,https://huggingface.co/papers/2312.17135,9,1,0,0,0,0 +2023-12-29,2312.17161,,Restoration by Generation with Constrained Priors,https://huggingface.co/papers/2312.17161,3,2,0,0,0,0 +2023-12-29,2312.17241,,Compact Neural Graphics Primitives with Learned Hash Probing,https://huggingface.co/papers/2312.17241,6,1,0,0,0,0 +2023-12-29,2312.16272,https://github.com/Xiaojiu-z/SSR_Encoder,SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation,https://huggingface.co/papers/2312.16272,6,1,1,0,0,0 +2023-12-29,2312.16256,,DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision,https://huggingface.co/papers/2312.16256,15,4,0,0,0,0 +2023-12-29,2312.17142,https://github.com/jiawei-ren/dreamgaussian4d,DreamGaussian4D: Generative 4D Gaussian Splatting,https://huggingface.co/papers/2312.17142,17,2,0,0,0,2 +2023-12-29,2312.17243,https://github.com/u2seg/u2seg,Unsupervised Universal Image Segmentation,https://huggingface.co/papers/2312.17243,18,2,0,0,0,0 +2023-12-29,2312.16218,,Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks,https://huggingface.co/papers/2312.16218,6,1,0,0,0,0 +2023-12-29,2312.16720,,Prompt Expansion for Adaptive Text-to-Image Generation,https://huggingface.co/papers/2312.16720,5,1,0,0,0,0 +2023-12-29,2312.17172,https://github.com/allenai/unified-io-2,"Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action",https://huggingface.co/papers/2312.17172,26,2,0,0,0,0 +2023-12-29,2312.16837,,DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaption by Combining 3D GANs and Diffusion Priors,https://huggingface.co/papers/2312.16837,5,1,0,0,0,0 +2023-12-29,2312.17244,https://github.com/qualcomm-ai-research/llm-surgeon,The LLM Surgeon,https://huggingface.co/papers/2312.17244,9,1,1,0,0,0 +2023-12-29,2312.16693,,I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models,https://huggingface.co/papers/2312.16693,13,1,0,0,0,0 +2023-12-29,2312.16886,,"MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices",https://huggingface.co/papers/2312.16886,19,2,0,7,0,1 +2023-12-29,2312.17120,https://github.com/gair-nlp/mathpile,Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math,https://huggingface.co/papers/2312.17120,25,10,1,0,2,0 +2023-12-29,2312.16486,,PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion,https://huggingface.co/papers/2312.16486,6,1,0,0,0,0 +2023-12-29,2312.16862,https://github.com/dlyuangod/tinygpt-v,TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones,https://huggingface.co/papers/2312.16862,29,5,1,1,0,2 +2023-12-27,2312.15980,https://github.com/byeongjun-park/HarmonyView,HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D,https://huggingface.co/papers/2312.15980,10,2,1,0,0,1 +2023-12-27,2312.15258,https://github.com/longxiang-ai/human101,Human101: Training 100+FPS Human Gaussians in 100s from 1 View,https://huggingface.co/papers/2312.15258,7,1,0,0,0,0 +2023-12-27,2312.15770,,A Recipe for Scaling up Text-to-Video Generation with Text-free Videos,https://huggingface.co/papers/2312.15770,12,1,0,0,0,0 +2023-12-27,2312.16145,,"One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications",https://huggingface.co/papers/2312.16145,8,1,0,1,0,0 +2023-12-27,2312.15821,,Audiobox: Unified Audio Generation with Natural Language Prompts,https://huggingface.co/papers/2312.15821,12,1,0,0,0,0 +2023-12-27,2312.16084,https://github.com/minghanqin/LangSplat,LangSplat: 3D Language Gaussian Splatting,https://huggingface.co/papers/2312.16084,14,2,0,0,0,0 +2023-12-27,2312.15715,https://github.com/foundationvision/uniref,UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces,https://huggingface.co/papers/2312.15715,19,1,0,0,0,0 +2023-12-27,2312.15430,,Make-A-Character: High Quality Text-to-3D Character Generation within Minutes,https://huggingface.co/papers/2312.15430,27,7,0,0,0,0 +2023-12-27,2312.15011,https://github.com/qi-zhangyang/gemini-vs-gpt4v,Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases,https://huggingface.co/papers/2312.15011,15,2,0,0,0,0 +2023-12-27,2312.15918,https://github.com/yanglinyi/supervised-knowledge-makes-large-language-models-better-in-context-learners,Supervised Knowledge Makes Large Language Models Better In-context Learners,https://huggingface.co/papers/2312.15918,8,1,1,0,0,0 +2023-12-27,2312.16171,https://github.com/vila-lab/atlas,"Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4",https://huggingface.co/papers/2312.16171,33,4,0,0,0,1 +2023-12-27,2312.15166,,SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling,https://huggingface.co/papers/2312.15166,56,9,0,91,1,55 +2023-12-26,2312.14929,,MACS: Mass Conditioned 3D Hand and Object Motion Synthesis,https://huggingface.co/papers/2312.14929,4,1,0,0,0,0 +2023-12-26,2312.14198,,ZeroShape: Regression-based Zero-shot Shape Reconstruction,https://huggingface.co/papers/2312.14198,7,1,0,0,0,1 +2023-12-26,2312.14232,https://github.com/opendatalab/clip-parrot-bias,Parrot Captions Teach CLIP to Spot Text,https://huggingface.co/papers/2312.14232,9,1,1,0,1,0 +2023-12-26,2312.14239,,PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar,https://huggingface.co/papers/2312.14239,9,1,0,0,0,0 +2023-12-26,2312.14216,,DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models,https://huggingface.co/papers/2312.14216,10,1,0,0,0,0 +2023-12-26,2312.14385,,Generative AI Beyond LLMs: System Implications of Multi-Modal Generation,https://huggingface.co/papers/2312.14385,5,1,0,0,0,0 +2023-12-26,2312.14206,,LLM4VG: Large Language Models Evaluation for Video Grounding,https://huggingface.co/papers/2312.14206,2,1,0,0,0,0 +2023-12-26,2312.14203,,Shai: A large language model for asset management,https://huggingface.co/papers/2312.14203,4,2,0,0,0,0 +2023-12-26,2312.14591,https://github.com/wwxu21/cut,Reasons to Reject? Aligning Language Models with Judgments,https://huggingface.co/papers/2312.14591,16,1,1,1,0,0 +2023-12-26,2312.14862,,YAYI 2: Multilingual Open-Source Large Language Models,https://huggingface.co/papers/2312.14862,12,1,0,1,1,0 +2023-12-26,2312.14878,,Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning,https://huggingface.co/papers/2312.14878,13,3,0,0,0,0 +2023-12-26,2312.14233,https://github.com/shi-labs/vcoder,VCoder: Versatile Vision Encoders for Multimodal Large Language Models,https://huggingface.co/papers/2312.14233,14,1,1,0,0,2 +2023-12-26,2312.14187,,WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation,https://huggingface.co/papers/2312.14187,49,5,0,5,0,0 +2023-12-26,2312.14238,https://github.com/opengvlab/internvl,InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks,https://huggingface.co/papers/2312.14238,13,1,1,43,1,4 +2023-12-26,2312.14327,,Parameter Efficient Tuning Allows Scalable Personalization of LLMs for Text Entry: A Case Study on Abbreviation Expansion,https://huggingface.co/papers/2312.14327,6,1,0,0,0,0 +2023-12-26,2312.14302,https://github.com/alignmentresearch/gpt-4-novel-apis-attacks,Exploiting Novel GPT-4 APIs,https://huggingface.co/papers/2312.14302,12,7,0,0,0,0 +2023-12-22,2312.13528,,DyBluRF: Dynamic Deblurring Neural Radiance Fields for Blurry Monocular Video,https://huggingface.co/papers/2312.13528,6,1,0,0,0,0 +2023-12-22,2312.13314,,Unlocking Pre-trained Image Backbones for Semantic Image Synthesis,https://huggingface.co/papers/2312.13314,7,1,0,0,0,0 +2023-12-22,2312.13324,,ShowRoom3D: Text to High-Quality 3D Room Generation Using 3D Priors,https://huggingface.co/papers/2312.13324,9,1,0,0,0,0 +2023-12-22,2312.13401,https://github.com/KaiNylund/lm-weights-encode-time,Time is Encoded in the Weights of Finetuned Language Models,https://huggingface.co/papers/2312.13401,18,1,1,0,0,0 +2023-12-22,2312.13771,,AppAgent: Multimodal Agents as Smartphone Users,https://huggingface.co/papers/2312.13771,49,2,0,0,0,0 +2023-12-22,2312.13964,https://github.com/open-mmlab/PIA,PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models,https://huggingface.co/papers/2312.13964,17,1,1,0,0,1 +2023-12-22,2312.13980,,Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning,https://huggingface.co/papers/2312.13980,12,1,0,0,0,0 +2023-12-22,2312.14091,https://github.com/picsart-ai-research/hd-painter,HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models,https://huggingface.co/papers/2312.14091,13,2,1,0,1,5 +2023-12-22,2312.14140,,HeadCraft: Modeling High-Detail Shape Variations for Animated 3DMMs,https://huggingface.co/papers/2312.14140,5,1,0,0,0,0 +2023-12-22,2312.13763,,Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models,https://huggingface.co/papers/2312.13763,9,1,0,0,0,0 +2023-12-22,2312.14125,,VideoPoet: A Large Language Model for Zero-Shot Video Generation,https://huggingface.co/papers/2312.14125,41,2,0,1,0,0 +2023-12-22,2312.13469,,Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation,https://huggingface.co/papers/2312.13469,10,1,0,0,0,0 +2023-12-22,2312.13691,,DreamTuner: Single Image is Enough for Subject-Driven Generation,https://huggingface.co/papers/2312.13691,23,6,0,0,0,0 +2023-12-22,2312.13789,https://github.com/xinghaochen/tinysam,TinySAM: Pushing the Envelope for Efficient Segment Anything Model,https://huggingface.co/papers/2312.13789,13,1,1,1,0,1 +2023-12-22,2312.13834,,Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis,https://huggingface.co/papers/2312.13834,26,2,0,0,0,0 +2023-12-22,2312.13578,,DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation,https://huggingface.co/papers/2312.13578,24,2,0,0,0,0 +2023-12-22,2312.13913,https://github.com/opentexture/paint3d,Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models,https://huggingface.co/papers/2312.13913,22,1,1,1,0,0 +2023-12-21,2312.13285,,UniSDF: Unifying Neural Representations for High-Fidelity 3D Reconstruction of Complex Scenes with Reflections,https://huggingface.co/papers/2312.13285,5,0,0,0,0,0 +2023-12-21,2312.13150,https://github.com/szymanowiczs/splatter-image,Splatter Image: Ultra-Fast Single-View 3D Reconstruction,https://huggingface.co/papers/2312.13150,14,0,1,0,0,2 +2023-12-21,2312.12468,,MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers,https://huggingface.co/papers/2312.12468,8,0,0,0,0,0 +2023-12-21,2312.12487,,Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models,https://huggingface.co/papers/2312.12487,8,0,0,0,0,0 +2023-12-21,2312.12865,,RadEdit: stress-testing biomedical vision models via diffusion image editing,https://huggingface.co/papers/2312.12865,3,0,0,0,0,0 +2023-12-21,2312.13252,,Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model,https://huggingface.co/papers/2312.13252,26,3,0,0,0,0 +2023-12-21,2312.12791,,Model-Based Control with Sparse Neural Dynamics,https://huggingface.co/papers/2312.12791,5,0,0,0,0,0 +2023-12-21,2312.13102,,SpecNeRF: Gaussian Directional Encoding for Specular Reflections,https://huggingface.co/papers/2312.13102,5,0,0,0,0,0 +2023-12-21,2312.13286,https://github.com/baaivision/emu,Generative Multimodal Models are In-Context Learners,https://huggingface.co/papers/2312.13286,33,0,0,3,0,2 +2023-12-21,2312.12456,https://github.com/sjtu-ipads/powerinfer,PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU,https://huggingface.co/papers/2312.12456,40,4,1,6,0,1 +2023-12-21,2312.12682,,Mini-GPTs: Efficient Large Language Models through Contextual Pruning,https://huggingface.co/papers/2312.12682,7,0,0,0,0,0 +2023-12-21,2312.12742,,Cached Transformers: Improving Transformers with Differentiable Memory Cache,https://huggingface.co/papers/2312.12742,11,1,0,0,0,0 +2023-12-21,2312.13271,https://github.com/junwuzhang19/repaint123,Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting,https://huggingface.co/papers/2312.13271,4,0,1,0,0,0 +2023-12-21,2312.12490,,InstructVideo: Instructing Video Diffusion Models with Human Feedback,https://huggingface.co/papers/2312.12490,16,1,0,0,0,0 +2023-12-21,2312.12491,https://github.com/cumulo-autumn/streamdiffusion,StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation,https://huggingface.co/papers/2312.12491,67,4,1,0,0,0 +2023-12-20,2312.11666,https://github.com/Vanessik/HAAR,HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles,https://huggingface.co/papers/2312.11666,12,1,0,0,0,0 +2023-12-20,2312.11894,https://github.com/mosamdabhi/3dlfm,3D-LFM: Lifting Foundation Model,https://huggingface.co/papers/2312.11894,13,2,0,1,0,0 +2023-12-20,2312.12030,https://github.com/hanshuyan/adjointdpm,Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint Method,https://huggingface.co/papers/2312.12030,4,2,0,0,0,0 +2023-12-20,2312.11532,https://github.com/clovaai/tvq-vae,Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided Document Generation,https://huggingface.co/papers/2312.11532,5,1,0,0,0,0 +2023-12-20,2312.11897,,Text-Conditioned Resampler For Long Form Video Understanding,https://huggingface.co/papers/2312.11897,5,1,0,0,0,0 +2023-12-20,2312.11535,,Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior,https://huggingface.co/papers/2312.11535,5,3,0,0,0,0 +2023-12-20,2312.11537,,FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with A Simple Super-Resolution Pipeline,https://huggingface.co/papers/2312.11537,6,1,0,0,0,0 +2023-12-20,2312.11595,,TIP: Text-Driven Image Processing with Semantic and Restoration Instructions,https://huggingface.co/papers/2312.11595,5,1,0,0,0,0 +2023-12-20,2312.11805,,Gemini: A Family of Highly Capable Multimodal Models,https://huggingface.co/papers/2312.11805,45,10,0,100,0,100 +2023-12-20,2312.12423,,"Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model",https://huggingface.co/papers/2312.12423,12,1,0,0,0,0 +2023-12-20,2312.11841,,MixRT: Mixed Neural Representations For Real-Time NeRF Rendering,https://huggingface.co/papers/2312.11841,10,1,0,0,0,0 +2023-12-20,2312.12433,https://github.com/WesleyHsieh0806/TAO-Amodal,Tracking Any Object Amodally,https://huggingface.co/papers/2312.12433,11,1,1,1,2,0 +2023-12-20,2312.12436,https://github.com/bradyfu/awesome-multimodal-large-language-models,A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise,https://huggingface.co/papers/2312.12436,13,3,1,0,0,0 +2023-12-20,2312.11556,https://github.com/visioncortex/vtracer,StarVector: Generating Scalable Vector Graphics Code from Images,https://huggingface.co/papers/2312.11556,26,1,0,0,0,0 +2023-12-20,2312.11514,,LLM in a flash: Efficient Large Language Model Inference with Limited Memory,https://huggingface.co/papers/2312.11514,255,8,0,0,0,1 +2023-12-19,2312.10665,,Silkie: Preference Distillation for Large Visual Language Models,https://huggingface.co/papers/2312.10665,11,1,0,1,2,0 +2023-12-19,2312.10332,,ProTIP: Progressive Tool Retrieval Improves Planning,https://huggingface.co/papers/2312.10332,7,1,0,0,0,0 +2023-12-19,2312.11392,https://github.com/modelscope/scepter,SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing,https://huggingface.co/papers/2312.11392,18,3,0,1,0,0 +2023-12-19,2312.11459,https://github.com/checkcrab/volumediffusion,VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder,https://huggingface.co/papers/2312.11459,5,1,0,1,0,0 +2023-12-19,2312.11458,,GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis,https://huggingface.co/papers/2312.11458,4,1,0,0,0,0 +2023-12-19,2312.10540,,VecFusion: Vector Font Generation with Diffusion,https://huggingface.co/papers/2312.10540,20,2,0,0,0,0 +2023-12-19,2312.10835,https://github.com/yandex-research/adaptive-diffusion,Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models,https://huggingface.co/papers/2312.10835,6,1,1,0,0,0 +2023-12-19,2312.10656,,VidToMe: Video Token Merging for Zero-Shot Video Editing,https://huggingface.co/papers/2312.10656,9,2,0,0,0,0 +2023-12-19,2312.10240,https://github.com/google-research/google-research,Rich Human Feedback for Text-to-Image Generation,https://huggingface.co/papers/2312.10240,17,1,0,0,0,0 +2023-12-19,2312.11370,https://github.com/pipilurj/g-llava,G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model,https://huggingface.co/papers/2312.11370,19,2,1,0,0,0 +2023-12-19,2312.10253,https://github.com/allenai/catwalk,Catwalk: A Unified Language Model Evaluation Framework for Many Datasets,https://huggingface.co/papers/2312.10253,7,1,0,0,0,0 +2023-12-19,2312.10523,,Paloma: A Benchmark for Evaluating Language Model Fit,https://huggingface.co/papers/2312.10523,11,2,0,0,1,0 +2023-12-19,2312.11462,https://github.com/lfsszd/cs-drafting,Cascade Speculative Drafting for Even Faster LLM Inference,https://huggingface.co/papers/2312.11462,8,1,0,0,0,0 +2023-12-19,2312.10899,,MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual Storytelling via Multi-Layered Semantic-Aware Denoising,https://huggingface.co/papers/2312.10899,14,1,0,0,0,0 +2023-12-19,2312.11461,,GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning,https://huggingface.co/papers/2312.11461,16,1,0,0,0,0 +2023-12-19,2312.11396,,MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance,https://huggingface.co/papers/2312.11396,10,1,0,0,0,0 +2023-12-19,2312.10763,https://github.com/OpenM3D/M3DBench,M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts,https://huggingface.co/papers/2312.10763,17,1,1,0,0,0 +2023-12-18,2312.09305,,Stable Score Distillation for High-Quality 3D Generation,https://huggingface.co/papers/2312.09305,7,2,0,0,0,0 +2023-12-18,2312.09323,,Perspectives on the State and Future of Deep Learning -- 2023,https://huggingface.co/papers/2312.09323,5,1,0,0,0,0 +2023-12-18,2312.10034,https://github.com/shiran-yuan/slimmerf,SlimmeRF: Slimmable Radiance Fields,https://huggingface.co/papers/2312.10034,6,2,0,0,0,0 +2023-12-18,2312.09571,,Extending Context Window of Large Language Models via Semantic Compression,https://huggingface.co/papers/2312.09571,12,1,0,0,0,0 +2023-12-18,2312.09608,https://github.com/hutaihang/faster-diffusion,Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models,https://huggingface.co/papers/2312.09608,13,1,1,0,0,0 +2023-12-18,2312.10035,https://github.com/pointcept/pointtransformerv3,"Point Transformer V3: Simpler, Faster, Stronger",https://huggingface.co/papers/2312.10035,17,2,1,0,0,0 +2023-12-18,2312.09579,https://github.com/chaoningzhang/mobilesam,MobileSAMv2: Faster Segment Anything to Everything,https://huggingface.co/papers/2312.09579,20,2,1,0,0,0 +2023-12-18,2312.09767,,DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models,https://huggingface.co/papers/2312.09767,25,2,0,0,0,12 +2023-12-18,2312.09911,https://github.com/open-mmlab/amphion,"Amphion: An Open-Source Audio, Music and Speech Generation Toolkit",https://huggingface.co/papers/2312.09911,52,4,1,0,0,0 +2023-12-18,2312.10003,,ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent,https://huggingface.co/papers/2312.10003,33,1,0,0,0,0 +2023-12-18,2312.10007,https://github.com/google-research-datasets/Synthetic-Persona-Chat,Faithful Persona-based Conversational Dataset Generation with Large Language Models,https://huggingface.co/papers/2312.10007,6,1,0,0,1,0 +2023-12-18,2312.10029,,Challenges with unsupervised LLM knowledge discovery,https://huggingface.co/papers/2312.10029,7,1,0,0,0,0 +2023-12-18,2312.09300,,Self-Evaluation Improves Selective Generation in Large Language Models,https://huggingface.co/papers/2312.09300,14,1,0,0,0,0 +2023-12-18,2312.09299,,Weight subcloning: direct initialization of transformers using larger pretrained ones,https://huggingface.co/papers/2312.09299,17,1,0,0,0,0 +2023-12-18,2312.09390,,Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision,https://huggingface.co/papers/2312.09390,32,1,0,0,0,0 +2023-12-15,2312.09252,,FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection,https://huggingface.co/papers/2312.09252,9,2,0,0,1,0 +2023-12-15,2312.08754,,UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation,https://huggingface.co/papers/2312.08754,6,1,0,0,0,0 +2023-12-15,2312.09244,,Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking,https://huggingface.co/papers/2312.09244,4,1,0,0,1,0 +2023-12-15,2312.09251,https://github.com/ailab-cvc/vl-gpt,VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation,https://huggingface.co/papers/2312.09251,6,1,0,0,1,0 +2023-12-15,2312.08889,,SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance,https://huggingface.co/papers/2312.08889,11,1,0,0,0,0 +2023-12-15,2312.09256,,LIME: Localized Image Editing via Attention Regularization in Diffusion Models,https://huggingface.co/papers/2312.09256,8,1,0,0,1,0 +2023-12-15,2312.09222,,Mosaic-SDF for 3D Generative Models,https://huggingface.co/papers/2312.09222,14,4,0,0,1,0 +2023-12-15,2312.09246,,SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds,https://huggingface.co/papers/2312.09246,5,1,0,0,1,1 +2023-12-15,2312.09109,,VideoLCM: Video Latent Consistency Model,https://huggingface.co/papers/2312.09109,22,2,0,2,0,6 +2023-12-15,2312.08723,,StemGen: A music generation model that listens,https://huggingface.co/papers/2312.08723,45,6,0,0,0,0 +2023-12-15,2312.08926,https://github.com/oashua/mathagent,Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent,https://huggingface.co/papers/2312.08926,7,2,0,0,0,0 +2023-12-15,2312.09067,https://github.com/allenai/Holodeck,Holodeck: Language Guided Generation of 3D Embodied AI Environments,https://huggingface.co/papers/2312.09067,12,2,0,0,0,0 +2023-12-15,2312.08578,https://github.com/facebookresearch/dci,A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions,https://huggingface.co/papers/2312.08578,15,1,0,0,0,0 +2023-12-15,2312.08583,https://github.com/microsoft/DeepSpeed,ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks,https://huggingface.co/papers/2312.08583,9,2,1,0,0,0 +2023-12-15,2312.08618,,Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention,https://huggingface.co/papers/2312.08618,11,1,0,0,0,0 +2023-12-15,2312.08688,https://github.com/tigerresearch/tigerbot,TigerBot: An Open Multilingual Multitask LLM,https://huggingface.co/papers/2312.08688,3,1,1,0,0,0 +2023-12-15,2312.08914,https://github.com/thudm/cogvlm,CogAgent: A Visual Language Model for GUI Agents,https://huggingface.co/papers/2312.08914,29,2,1,3,0,1 +2023-12-15,2312.09158,https://github.com/FoundationVision/GLEE,General Object Foundation Model for Images and Videos at Scale,https://huggingface.co/papers/2312.09158,8,2,1,0,0,4 +2023-12-15,2312.09187,,Vision-Language Models as a Source of Rewards,https://huggingface.co/papers/2312.09187,11,8,0,0,0,0 +2023-12-15,2312.09237,,Pixel Aligned Language Models,https://huggingface.co/papers/2312.09237,12,1,0,0,1,0 +2023-12-15,2312.09241,,TinyGSM: achieving >80% on GSM8k with small language models,https://huggingface.co/papers/2312.09241,35,4,0,0,1,0 +2023-12-14,2312.07859,,Invariant Graph Transformer,https://huggingface.co/papers/2312.07859,5,0,0,0,0,0 +2023-12-14,2312.07843,https://github.com/robotics-survey/awesome-robotics-foundation-models,"Foundation Models in Robotics: Applications, Challenges, and the Future",https://huggingface.co/papers/2312.07843,14,0,0,0,0,0 +2023-12-14,2312.07910,https://github.com/microsoft/promptbench,PromptBench: A Unified Library for Evaluation of Large Language Models,https://huggingface.co/papers/2312.07910,14,3,0,0,0,0 +2023-12-14,2312.07987,https://github.com/robertcsordas/moe_attention,SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention,https://huggingface.co/papers/2312.07987,40,2,0,0,0,0 +2023-12-14,2312.08344,https://github.com/NVlabs/FoundationPose,FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects,https://huggingface.co/papers/2312.08344,6,1,0,0,0,0 +2023-12-14,2312.08361,,Distributed Inference and Fine-tuning of Large Language Models Over The Internet,https://huggingface.co/papers/2312.08361,24,4,0,0,0,0 +2023-12-14,2312.08128,https://github.com/qualcomm-ai-research/clockwork-diffusion,Clockwork Diffusion: Efficient Generation With Model-Step Distillation,https://huggingface.co/papers/2312.08128,12,0,0,0,0,0 +2023-12-14,2312.07661,,CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor,https://huggingface.co/papers/2312.07661,14,0,0,0,0,0 +2023-12-14,2312.08136,,ProNeRF: Learning Efficient Projection-Aware Ray Sampling for Fine-Grained Implicit Neural Radiance Fields,https://huggingface.co/papers/2312.08136,2,0,0,0,0,0 +2023-12-13,2312.06908,,"""I Want It That Way"": Enabling Interactive Decision Support Using Large Language Models and Constraint Programming",https://huggingface.co/papers/2312.06908,5,1,0,0,0,0 +2023-12-13,2312.07504,,COLMAP-Free 3D Gaussian Splatting,https://huggingface.co/papers/2312.07504,10,0,0,0,0,0 +2023-12-13,2312.06674,,Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations,https://huggingface.co/papers/2312.06674,6,1,0,6,0,24 +2023-12-13,2312.07509,https://github.com/microsoft/peekaboo,PEEKABOO: Interactive Video Generation via Masked-Diffusion,https://huggingface.co/papers/2312.07509,7,1,1,0,0,0 +2023-12-13,2312.06681,https://github.com/wusche1/caa_hallucination,Steering Llama 2 via Contrastive Activation Addition,https://huggingface.co/papers/2312.06681,11,1,0,0,0,0 +2023-12-13,2312.07424,https://github.com/jameszhou-gl/gpt-4v-distribution-shift,How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation,https://huggingface.co/papers/2312.07424,7,0,1,0,1,0 +2023-12-13,2312.07231,,Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation,https://huggingface.co/papers/2312.07231,6,0,0,0,0,0 +2023-12-13,2312.07000,https://github.com/gair-nlp/alignment-for-honesty,Alignment for Honesty,https://huggingface.co/papers/2312.07000,11,0,1,0,0,0 +2023-12-13,2312.07533,https://github.com/mit-han-lab/llm-awq,VILA: On Pre-training for Visual Language Models,https://huggingface.co/papers/2312.07533,18,2,1,16,0,4 +2023-12-13,2312.07532,https://github.com/ux-decoder/find,Interfacing Foundation Models' Embeddings,https://huggingface.co/papers/2312.07532,10,0,1,0,0,0 +2023-12-13,2312.06742,https://github.com/kakaobrain/honeybee,Honeybee: Locality-enhanced Projector for Multimodal LLM,https://huggingface.co/papers/2312.06742,9,0,1,0,0,0 +2023-12-13,2312.07046,https://github.com/transmuteai/trailmet,Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models,https://huggingface.co/papers/2312.07046,12,1,0,0,0,0 +2023-12-13,2312.06971,,CCM: Adding Conditional Controls to Text-to-Image Consistency Models,https://huggingface.co/papers/2312.06971,10,0,0,0,0,0 +2023-12-13,2312.07409,,DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing,https://huggingface.co/papers/2312.07409,22,6,0,0,0,0 +2023-12-13,2312.07536,,FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition,https://huggingface.co/papers/2312.07536,16,0,0,0,0,0 +2023-12-13,2312.07537,https://github.com/tianxingwu/freeinit,FreeInit: Bridging Initialization Gap in Video Diffusion Models,https://huggingface.co/papers/2312.07537,24,2,1,0,0,2 +2023-12-12,2312.06149,,Unlocking Anticipatory Text Generation: A Constrained Approach for Faithful Decoding with Large Language Models,https://huggingface.co/papers/2312.06149,2,0,0,0,0,0 +2023-12-12,2312.05605,,TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing,https://huggingface.co/papers/2312.05605,1,0,0,0,0,0 +2023-12-12,2312.06353,https://github.com/alibaba/federatedscope,Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes,https://huggingface.co/papers/2312.06353,5,1,0,0,0,0 +2023-12-12,2312.06134,,Order Matters in the Presence of Dataset Imbalance for Multilingual Learning,https://huggingface.co/papers/2312.06134,2,0,0,0,0,0 +2023-12-12,2312.05491,,Using Captum to Explain Generative Language Models,https://huggingface.co/papers/2312.05491,3,1,0,0,0,0 +2023-12-12,2312.06351,,Evaluation of Large Language Models for Decision Making in Autonomous Driving,https://huggingface.co/papers/2312.06351,5,0,0,0,0,0 +2023-12-12,2312.05431,,Efficient Quantization Strategies for Latent Diffusion Models,https://huggingface.co/papers/2312.05431,11,0,0,0,0,0 +2023-12-12,2312.06571,,"From Text to Motion: Grounding GPT-4 in a Humanoid Robot ""Alter3""",https://huggingface.co/papers/2312.06571,12,0,0,0,0,0 +2023-12-12,2312.05708,,Context Tuning for Retrieval Augmented Generation,https://huggingface.co/papers/2312.05708,16,0,0,0,0,0 +2023-12-12,2312.06662,,Photorealistic Video Generation with Diffusion Models,https://huggingface.co/papers/2312.06662,23,2,0,0,0,0 +2023-12-12,2312.06109,https://github.com/Ucas-HaoranWei/Vary,Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models,https://huggingface.co/papers/2312.06109,20,0,1,0,0,0 +2023-12-12,2312.06585,,Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models,https://huggingface.co/papers/2312.06585,27,2,0,0,0,0 +2023-12-12,2312.06655,https://github.com/liuff19/Sherpa3D,Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior,https://huggingface.co/papers/2312.06655,21,0,1,0,0,0 +2023-12-12,2312.06550,https://github.com/llm360/analysis360,LLM360: Towards Fully Transparent Open-Source LLMs,https://huggingface.co/papers/2312.06550,54,4,1,5,1,3 +2023-12-06,2312.02963,,MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures,https://huggingface.co/papers/2312.02963,9,0,0,0,0,0 +2023-12-06,2312.02970,,Alchemist: Parametric Control of Material Properties with Diffusion Models,https://huggingface.co/papers/2312.02970,7,0,0,0,0,0 +2023-12-06,2312.02981,,ReconFusion: 3D Reconstruction with Diffusion Priors,https://huggingface.co/papers/2312.02981,8,0,0,0,0,0 +2023-12-06,2312.02772,,Generating Fine-Grained Human Motions Using ChatGPT-Refined Descriptions,https://huggingface.co/papers/2312.02772,6,0,0,0,0,0 +2023-12-06,2312.02216,https://github.com/rickyskywalker/dragvideo-official,DragVideo: Interactive Drag-style Video Editing,https://huggingface.co/papers/2312.02216,10,0,0,0,0,0 +2023-12-06,2312.02432,,Orthogonal Adaptation for Modular Customization of Diffusion Models,https://huggingface.co/papers/2312.02432,12,0,0,0,0,0 +2023-12-06,2312.02696,https://github.com/nvlabs/edm2,Analyzing and Improving the Training Dynamics of Diffusion Models,https://huggingface.co/papers/2312.02696,31,2,0,1,1,0 +2023-12-06,2312.02206,,Axiomatic Preference Modeling for Longform Question Answering,https://huggingface.co/papers/2312.02206,7,1,0,0,0,0 +2023-12-06,2312.02969,,Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models,https://huggingface.co/papers/2312.02969,12,0,0,0,0,0 +2023-12-06,2312.02931,https://github.com/lu-wo/whisbert,WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words,https://huggingface.co/papers/2312.02931,5,1,0,0,0,0 +2023-12-06,2312.02949,https://github.com/ux-decoder/llava-grounding,LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models,https://huggingface.co/papers/2312.02949,9,0,0,0,0,0 +2023-12-06,2312.02189,,StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D,https://huggingface.co/papers/2312.02189,7,3,0,0,0,0 +2023-12-06,2312.02201,,ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation,https://huggingface.co/papers/2312.02201,30,2,0,0,0,1 +2023-12-06,2312.02179,,Training Chain-of-Thought via Latent-Variable Inference,https://huggingface.co/papers/2312.02179,8,0,0,0,0,0 +2023-12-06,2312.02980,,GPT4Point: A Unified Framework for Point-Language Understanding and Generation,https://huggingface.co/papers/2312.02980,6,0,0,0,0,0 +2023-12-06,2312.02919,,Fine-grained Controllable Video Generation via Object Appearance and Context,https://huggingface.co/papers/2312.02919,9,0,0,0,0,0 +2023-12-06,2312.02928,,LivePhoto: Real Image Animation with Text-guided Motion Control,https://huggingface.co/papers/2312.02928,15,2,0,0,0,0 +2023-12-06,2312.02974,https://github.com/understanding-visual-datasets/visdiff,Describing Differences in Image Sets with Natural Language,https://huggingface.co/papers/2312.02974,12,0,1,0,0,0 +2023-12-06,2312.02238,,X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model,https://huggingface.co/papers/2312.02238,24,1,0,0,0,0 +2023-12-06,2312.02663,,FaceStudio: Put Your Face Everywhere in Seconds,https://huggingface.co/papers/2312.02663,28,1,0,0,0,0 +2023-12-05,2312.02087,,VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence,https://huggingface.co/papers/2312.02087,19,5,0,0,0,0 +2023-12-05,2312.00858,https://github.com/horseee/deepcache,DeepCache: Accelerating Diffusion Models for Free,https://huggingface.co/papers/2312.00858,20,1,1,0,0,0 +2023-12-05,2312.00860,,Segment Any 3D Gaussians,https://huggingface.co/papers/2312.00860,8,1,0,0,0,0 +2023-12-05,2312.02135,,Fast View Synthesis of Casual Videos,https://huggingface.co/papers/2312.02135,8,1,0,0,0,0 +2023-12-05,2312.01409,,Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models,https://huggingface.co/papers/2312.01409,8,2,0,0,0,0 +2023-12-05,2312.02116,https://github.com/google-research/big_vision,GIVT: Generative Infinite-Vocabulary Transformers,https://huggingface.co/papers/2312.02116,10,1,0,0,0,0 +2023-12-05,2312.02133,https://github.com/google/style-aligned,Style Aligned Image Generation via Shared Attention,https://huggingface.co/papers/2312.02133,8,1,1,0,1,0 +2023-12-05,2312.02139,https://github.com/nvlabs/diffit,DiffiT: Diffusion Vision Transformers for Image Generation,https://huggingface.co/papers/2312.02139,13,2,0,0,0,0 +2023-12-05,2312.00845,,VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models,https://huggingface.co/papers/2312.00845,36,3,0,0,0,0 +2023-12-05,2312.01279,,TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents,https://huggingface.co/papers/2312.01279,3,1,0,0,0,0 +2023-12-05,2312.01532,,Using Large Language Models to Accelerate Communication for Users with Severe Motor Impairments,https://huggingface.co/papers/2312.01532,3,1,0,0,0,0 +2023-12-05,2312.02142,https://github.com/kaiyuyue/nxtp,Object Recognition as Next Token Prediction,https://huggingface.co/papers/2312.02142,11,2,1,1,0,0 +2023-12-05,2312.00886,,Nash Learning from Human Feedback,https://huggingface.co/papers/2312.00886,14,2,0,0,0,0 +2023-12-05,2312.02147,https://github.com/oliverrensu/d-igpt,Rejuvenating image-GPT as Strong Visual Representation Learners,https://huggingface.co/papers/2312.02147,4,1,1,0,0,0 +2023-12-05,2312.02155,https://github.com/aipixel/gps-gaussian,GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis,https://huggingface.co/papers/2312.02155,11,1,0,0,0,0 +2023-12-05,2312.01407,,VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams,https://huggingface.co/papers/2312.01407,6,3,0,0,0,0 +2023-12-05,2312.01531,,SANeRF-HQ: Segment Anything for NeRF in High Quality,https://huggingface.co/papers/2312.01531,5,1,0,0,0,0 +2023-12-05,2312.01663,,Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training,https://huggingface.co/papers/2312.01663,3,1,0,0,0,0 +2023-12-05,2312.00869,https://github.com/xk-huang/segment-caption-anything,Segment and Caption Anything,https://huggingface.co/papers/2312.00869,17,1,1,0,0,0 +2023-12-05,2312.02149,,Generative Powers of Ten,https://huggingface.co/papers/2312.02149,4,1,0,0,0,0 +2023-12-05,2312.01552,,The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning,https://huggingface.co/papers/2312.01552,28,4,0,0,0,1 +2023-12-05,2312.00849,https://github.com/openbmb/omnilmm,RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback,https://huggingface.co/papers/2312.00849,8,1,1,2,1,4 +2023-12-05,2312.02120,,Magicoder: Source Code Is All You Need,https://huggingface.co/papers/2312.02120,78,4,0,17,1,17 +2023-12-04,2312.00451,,FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting,https://huggingface.co/papers/2312.00451,9,1,0,0,0,0 +2023-12-04,2312.00085,https://github.com/xmu-xiaoma666/X-Dreamer,X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation,https://huggingface.co/papers/2312.00085,6,2,1,0,0,0 +2023-12-04,2312.00210,https://github.com/jinxinzhou/dream,DREAM: Diffusion Rectification and Estimation-Adaptive Models,https://huggingface.co/papers/2312.00210,14,1,0,0,0,0 +2023-12-04,2312.00375,https://github.com/JiejiangWu/FaceG2E,Text-Guided 3D Face Synthesis -- From Generation to Editing,https://huggingface.co/papers/2312.00375,8,1,0,0,0,0 +2023-12-04,2312.00063,https://github.com/EricGuo5513/momask-codes,MoMask: Generative Masked Modeling of 3D Human Motions,https://huggingface.co/papers/2312.00063,15,1,1,1,0,1 +2023-12-04,2312.00252,https://github.com/hturki/pynerf,PyNeRF: Pyramidal Neural Radiance Fields,https://huggingface.co/papers/2312.00252,8,1,0,0,0,0 +2023-12-04,2312.00109,https://github.com/city-super/Scaffold-GS,Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering,https://huggingface.co/papers/2312.00109,9,1,0,0,0,0 +2023-12-04,2312.00164,,Towards Accurate Differential Diagnosis with Large Language Models,https://huggingface.co/papers/2312.00164,8,1,0,0,0,0 +2023-12-04,2312.00575,,Instruction-tuning Aligns LLMs to the Human Brain,https://huggingface.co/papers/2312.00575,10,4,0,0,0,0 +2023-12-04,2312.00589,,Merlin:Empowering Multimodal LLMs with Foresight Minds,https://huggingface.co/papers/2312.00589,24,1,0,0,0,0 +2023-12-04,2312.00763,,Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses,https://huggingface.co/papers/2312.00763,18,1,0,0,0,0 +2023-12-04,2312.00777,,VideoBooth: Diffusion-based Video Generation with Image Prompts,https://huggingface.co/papers/2312.00777,19,2,0,0,0,0 +2023-12-04,2312.00752,https://github.com/state-spaces/mamba,Mamba: Linear-Time Sequence Modeling with Selective State Spaces,https://huggingface.co/papers/2312.00752,134,10,1,30,0,2 +2023-12-04,2312.00093,,GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs,https://huggingface.co/papers/2312.00093,14,1,0,0,0,0 +2023-12-04,2312.00079,,HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models,https://huggingface.co/papers/2312.00079,14,2,0,0,0,0 +2023-12-04,2312.00330,https://github.com/GongyeLiu/StyleCrafter,StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter,https://huggingface.co/papers/2312.00330,10,1,1,1,0,0 +2023-12-04,2312.00438,,Dolphins: Multimodal Language Model for Driving,https://huggingface.co/papers/2312.00438,12,1,0,0,0,0 +2023-12-04,2312.00738,https://github.com/damo-nlp-sg/seallms,SeaLLMs -- Large Language Models for Southeast Asia,https://huggingface.co/papers/2312.00738,23,2,1,34,2,10 +2023-11-23,2311.13601,https://github.com/ux-decoder/dinov,Visual In-Context Prompting,https://huggingface.co/papers/2311.13601,14,2,0,0,0,0 +2023-11-23,2311.13073,https://github.com/ai-forever/kandinskyvideo,FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline,https://huggingface.co/papers/2311.13073,53,4,1,0,0,0 +2023-11-23,2311.13435,https://github.com/mbzuai-oryx/video-llava,PG-Video-LLaVA: Pixel Grounding Large Video-Language Models,https://huggingface.co/papers/2311.13435,16,3,0,0,0,0 +2023-11-23,2311.13141,https://github.com/archerfmy/sd-t2i-360panoimage,Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models,https://huggingface.co/papers/2311.13141,9,4,1,0,0,0 +2023-11-23,2311.13600,,ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs,https://huggingface.co/papers/2311.13600,41,3,0,0,0,0 +2023-11-23,2311.13384,,LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes,https://huggingface.co/papers/2311.13384,48,4,0,0,0,3 +2023-11-23,2311.13231,https://github.com/yk7333/d3po,Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model,https://huggingface.co/papers/2311.13231,25,5,1,0,1,0 +2023-11-23,2311.12908,,Diffusion Model Alignment Using Direct Preference Optimization,https://huggingface.co/papers/2311.12908,47,3,0,5,0,36 +2023-11-23,2311.12983,,GAIA: a benchmark for General AI Assistants,https://huggingface.co/papers/2311.12983,176,23,0,0,1,0 +2023-11-22,2311.12198,,PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics,https://huggingface.co/papers/2311.12198,19,1,0,0,0,0 +2023-11-22,2311.12775,https://github.com/Anttwo/SuGaR,SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering,https://huggingface.co/papers/2311.12775,28,3,0,0,0,0 +2023-11-22,2311.12024,,PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction,https://huggingface.co/papers/2311.12024,16,3,0,0,0,0 +2023-11-22,2311.12229,https://github.com/intellabs/multimodal_cognitive_ai,NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation,https://huggingface.co/papers/2311.12229,26,1,0,1,0,0 +2023-11-22,2311.12454,https://github.com/sh-lee-prml/hierspeechpp,HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis,https://huggingface.co/papers/2311.12454,27,1,1,0,0,0 +2023-11-22,2311.12092,https://github.com/rohitgandikota/sliders,Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models,https://huggingface.co/papers/2311.12092,19,4,1,0,0,0 +2023-11-22,2311.12052,,MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer,https://huggingface.co/papers/2311.12052,29,1,0,0,0,0 +2023-11-22,2311.12631,,GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning,https://huggingface.co/papers/2311.12631,12,1,0,0,0,0 +2023-11-21,2311.11315,,TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems,https://huggingface.co/papers/2311.11315,6,2,0,0,0,0 +2023-11-21,2311.11284,https://github.com/envision-research/luciddreamer,LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching,https://huggingface.co/papers/2311.11284,16,1,1,0,0,2 +2023-11-21,2311.10794,,Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression,https://huggingface.co/papers/2311.10794,22,1,0,0,0,0 +2023-11-21,2311.11829,,System 2 Attention (is something you might need too),https://huggingface.co/papers/2311.11829,39,2,0,0,0,0 +2023-11-21,2311.10982,,Make Pixels Dance: High-Dynamic Video Generation,https://huggingface.co/papers/2311.10982,65,5,0,0,0,0 +2023-11-21,2311.10775,,ToolTalk: Evaluating Tool-Usage in a Conversational Setting,https://huggingface.co/papers/2311.10775,7,1,0,0,0,0 +2023-11-21,2311.12022,https://github.com/idavidrein/gpqa,GPQA: A Graduate-Level Google-Proof Q&A Benchmark,https://huggingface.co/papers/2311.12022,24,2,1,0,2,1 +2023-11-21,2311.10768,,Memory Augmented Language Models through Mixture of Word Experts,https://huggingface.co/papers/2311.10768,16,1,0,0,0,0 +2023-11-21,2311.11077,https://github.com/adapter-hub/adapters,Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning,https://huggingface.co/papers/2311.11077,24,3,1,0,0,0 +2023-11-21,2311.11243,,AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort,https://huggingface.co/papers/2311.11243,14,2,0,0,0,0 +2023-11-21,2311.11255,https://github.com/shansongliu/M2UGen,M^{2}UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models,https://huggingface.co/papers/2311.11255,3,1,1,3,0,6 +2023-11-21,2311.12015,,GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration,https://huggingface.co/papers/2311.12015,4,1,0,0,0,0 +2023-11-21,2311.11501,,MultiLoRA: Democratizing LoRA for Better Multi-Task Learning,https://huggingface.co/papers/2311.11501,32,1,0,0,0,0 +2023-11-21,2311.10751,https://github.com/openbmb/proagent,ProAgent: From Robotic Process Automation to Agentic Process Automation,https://huggingface.co/papers/2311.10751,8,1,0,0,0,0 +2023-11-21,2311.10770,https://github.com/pbelcak/fastbert,Exponentially Faster Language Modelling,https://huggingface.co/papers/2311.10770,117,26,1,1,0,0 +2023-11-21,2311.11045,,Orca 2: Teaching Small Language Models How to Reason,https://huggingface.co/papers/2311.11045,69,6,0,23,3,74 +2023-11-20,2311.10125,https://github.com/lhbuilder/sa-segment-anything,UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework,https://huggingface.co/papers/2311.10125,4,0,0,0,0,0 +2023-11-20,2311.10122,https://github.com/PKU-YuanGroup/Video-LLaVA,Video-LLaVA: Learning United Visual Representation by Alignment Before Projection,https://huggingface.co/papers/2311.10122,25,1,1,36,0,9 +2023-11-20,2311.10111,https://github.com/hritikbansal/videocon,VideoCon: Robust Video-Language Alignment via Contrast Captions,https://huggingface.co/papers/2311.10111,7,0,1,1,1,1 +2023-11-20,2311.10538,,Testing Language Model Agents Safely in the Wild,https://huggingface.co/papers/2311.10538,9,0,0,0,0,0 +2023-11-20,2311.10709,,Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning,https://huggingface.co/papers/2311.10709,24,3,0,0,0,0 +2023-11-20,2311.10123,,MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture,https://huggingface.co/papers/2311.10123,15,1,0,0,0,0 +2023-11-20,2311.10642,,Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers,https://huggingface.co/papers/2311.10642,23,1,0,0,0,0 +2023-11-20,2311.10678,https://github.com/Stanford-ILIAD/droc,Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections,https://huggingface.co/papers/2311.10678,5,0,0,0,0,0 +2023-11-20,2311.10126,https://github.com/zysxmu/ias-vit,I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization,https://huggingface.co/papers/2311.10126,7,0,0,0,0,0 +2023-11-20,2311.10708,,SelfEval: Leveraging the discriminative nature of generative models for evaluation,https://huggingface.co/papers/2311.10708,14,0,0,0,0,0 +2023-11-20,2311.10702,https://github.com/allenai/open-instruct,Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2,https://huggingface.co/papers/2311.10702,18,5,1,68,0,60 +2023-11-17,2311.09227,,"Open-Sourcing Highly Capable Foundation Models: An evaluation of risks, benefits, and alternative methods for pursuing open-source objectives",https://huggingface.co/papers/2311.09227,5,0,0,0,0,0 +2023-11-17,2311.10090,https://github.com/flairox/jaxmarl,JaxMARL: Multi-Agent RL Environments in JAX,https://huggingface.co/papers/2311.10090,6,0,0,0,0,0 +2023-11-17,2311.10091,,Adaptive Shells for Efficient Neural Radiance Field Rendering,https://huggingface.co/papers/2311.10091,16,0,0,0,0,0 +2023-11-17,2311.10093,,The Chosen One: Consistent Characters in Text-to-Image Diffusion Models,https://huggingface.co/papers/2311.10093,55,7,0,0,0,0 +2023-11-17,2311.09835,,ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks,https://huggingface.co/papers/2311.09835,7,0,0,0,1,0 +2023-11-17,2311.09277,https://github.com/damo-nlp-sg/contrastive-cot,Contrastive Chain-of-Thought Prompting,https://huggingface.co/papers/2311.09277,31,4,0,0,0,0 +2023-11-17,2311.09578,,Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying,https://huggingface.co/papers/2311.09578,12,0,0,0,0,0 +2023-11-17,2311.09257,,UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs,https://huggingface.co/papers/2311.09257,43,6,0,0,1,0 +2023-11-16,2311.08552,,UT5: Pretraining Non autoregressive T5 with unrolled denoising,https://huggingface.co/papers/2311.08552,6,0,0,0,0,0 +2023-11-16,2311.08734,,Thread of Thought Unraveling Chaotic Contexts,https://huggingface.co/papers/2311.08734,4,1,0,0,0,0 +2023-11-16,2311.08877,,Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation,https://huggingface.co/papers/2311.08877,5,0,0,0,0,0 +2023-11-16,2311.09179,,SiRA: Sparse Mixture of Low Rank Adaptation,https://huggingface.co/papers/2311.09179,7,0,0,0,0,0 +2023-11-16,2311.09180,,PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers,https://huggingface.co/papers/2311.09180,7,0,0,0,0,0 +2023-11-16,2311.09204,,Fusion-Eval: Integrating Evaluators with LLMs,https://huggingface.co/papers/2311.09204,5,2,0,0,0,0 +2023-11-16,2311.08469,,UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations,https://huggingface.co/papers/2311.08469,10,0,0,0,1,0 +2023-11-16,2311.08692,,Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models,https://huggingface.co/papers/2311.08692,12,0,0,0,0,0 +2023-11-16,2311.08667,,EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis,https://huggingface.co/papers/2311.08667,18,1,0,0,0,0 +2023-11-16,2311.09217,,DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model,https://huggingface.co/papers/2311.09217,21,1,0,0,0,0 +2023-11-16,2311.09221,,Single-Image 3D Human Digitization with Shape-Guided Diffusion,https://huggingface.co/papers/2311.09221,18,1,0,0,0,0 +2023-11-16,2311.08581,,Drivable 3D Gaussian Avatars,https://huggingface.co/papers/2311.08581,44,3,0,0,0,0 +2023-11-16,2311.09213,,GRIM: GRaph-based Interactive narrative visualization for gaMes,https://huggingface.co/papers/2311.09213,11,1,0,0,0,0 +2023-11-15,2311.07919,https://github.com/qwenlm/qwen-audio,Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models,https://huggingface.co/papers/2311.07919,9,0,1,5,0,4 +2023-11-15,2311.08403,https://github.com/ming1993li/instant3dcodes,Instant3D: Instant Text-to-3D Generation,https://huggingface.co/papers/2311.08403,44,3,1,0,0,0 +2023-11-15,2311.07590,,Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure,https://huggingface.co/papers/2311.07590,15,3,0,0,0,0 +2023-11-15,2311.08263,,Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster,https://huggingface.co/papers/2311.08263,14,0,0,0,0,0 +2023-11-15,2311.07885,,One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion,https://huggingface.co/papers/2311.07885,37,4,0,0,0,0 +2023-11-15,2311.07587,,"Frontier Language Models are not Robust to Adversarial Arithmetic, or ""What do I need to say so you agree 2+2=5?",https://huggingface.co/papers/2311.07587,3,0,0,0,0,0 +2023-11-15,2311.07911,https://github.com/google-research/google-research/tree/master/instruction_following_eval,Instruction-Following Evaluation for Large Language Models,https://huggingface.co/papers/2311.07911,18,0,0,24,2,63 +2023-11-15,2311.07961,,"The ART of LLM Refinement: Ask, Refine, and Trust",https://huggingface.co/papers/2311.07961,9,0,0,0,0,0 +2023-11-15,2311.08105,,DiLoCo: Distributed Low-Communication Training of Language Models,https://huggingface.co/papers/2311.08105,14,1,0,0,0,0 +2023-11-15,2311.08401,,Fine-tuning Language Models for Factuality,https://huggingface.co/papers/2311.08401,27,2,0,0,0,0 +2023-11-15,2311.07689,,MART: Improving LLM Safety with Multi-round Automatic Red-Teaming,https://huggingface.co/papers/2311.07689,7,0,0,0,0,0 +2023-11-15,2311.07989,https://github.com/codefuse-ai/awesome-code-llm,A Survey on Language Models for Code,https://huggingface.co/papers/2311.07989,21,0,1,0,0,0 +2023-11-14,2311.06430,,GOAT: GO to Any Thing,https://huggingface.co/papers/2311.06430,14,2,0,0,0,0 +2023-11-14,2311.07574,https://github.com/x2fd/lvis-instruct4v,To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning,https://huggingface.co/papers/2311.07574,14,0,1,0,1,0 +2023-11-14,2311.06772,,ChatAnything: Facetime Chat with LLM-Enhanced Personas,https://huggingface.co/papers/2311.06772,33,3,0,0,0,2 +2023-11-14,2311.07069,,Music ControlNet: Multiple Time-varying Controls for Music Generation,https://huggingface.co/papers/2311.07069,43,4,0,0,0,0 +2023-11-14,2311.07361,,The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4,https://huggingface.co/papers/2311.07361,11,0,0,0,0,0 +2023-11-14,2311.07562,https://github.com/zzxslp/mm-navigator,GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation,https://huggingface.co/papers/2311.07562,12,1,0,0,0,0 +2023-11-14,2311.06495,https://github.com/microsoft/layoutgeneration,LayoutPrompter: Awaken the Design Ability of Large Language Models,https://huggingface.co/papers/2311.06495,10,0,0,0,1,0 +2023-11-14,2311.06697,,Trusted Source Alignment in Large Language Models,https://huggingface.co/papers/2311.06697,9,0,0,0,0,0 +2023-11-14,2311.06720,,Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer,https://huggingface.co/papers/2311.06720,7,0,0,1,0,0 +2023-11-14,2311.06753,,Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data,https://huggingface.co/papers/2311.06753,6,0,0,0,0,0 +2023-11-14,2311.07446,,Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text,https://huggingface.co/papers/2311.07446,27,0,0,0,0,0 +2023-11-14,2311.07463,,"MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks",https://huggingface.co/papers/2311.07463,13,0,0,0,0,0 +2023-11-14,2311.07575,https://github.com/alpha-vllm/llama2-accessory,"SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models",https://huggingface.co/papers/2311.07575,11,0,1,5,0,99 +2023-11-14,2311.06783,https://github.com/Q-Future/Q-Instruct,Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models,https://huggingface.co/papers/2311.06783,26,2,1,5,1,1 +2023-11-13,2311.05707,,FMViT: A multiple-frequency mixing Vision Transformer,https://huggingface.co/papers/2311.05707,5,1,0,0,0,0 +2023-11-13,2311.05884,,Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems,https://huggingface.co/papers/2311.05884,5,1,0,0,0,0 +2023-11-13,2311.06158,,Language Models can be Logical Solvers,https://huggingface.co/papers/2311.06158,17,2,0,6,1,0 +2023-11-13,2311.05640,,FinGPT: Large Generative Models for a Small Language,https://huggingface.co/papers/2311.05640,26,1,0,0,0,0 +2023-11-13,2311.05661,,Prompt Engineering a Prompt Engineer,https://huggingface.co/papers/2311.05661,19,1,0,0,0,0 +2023-11-13,2311.05772,,ADaPT: As-Needed Decomposition and Planning with Language Models,https://huggingface.co/papers/2311.05772,8,1,0,0,0,0 +2023-11-13,2311.05908,,FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores,https://huggingface.co/papers/2311.05908,12,1,0,2,0,9 +2023-11-13,2311.06214,,Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model,https://huggingface.co/papers/2311.06214,28,3,0,0,0,0 +2023-11-13,2311.05698,,Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities,https://huggingface.co/papers/2311.05698,7,1,0,0,0,0 +2023-11-13,2311.05770,https://github.com/google-research/deeplab2,PolyMaX: General Dense Prediction with Mask Transformer,https://huggingface.co/papers/2311.05770,6,1,0,0,0,0 +2023-11-13,2311.06242,,Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks,https://huggingface.co/papers/2311.06242,70,6,0,21,0,47 +2023-11-13,2311.05657,,"Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs",https://huggingface.co/papers/2311.05657,26,2,0,22,14,0 +2023-11-13,2311.05997,,JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models,https://huggingface.co/papers/2311.05997,34,1,0,0,0,0 +2023-11-13,2311.06243,,Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization,https://huggingface.co/papers/2311.06243,17,1,0,0,0,0 +2023-11-10,2311.05348,https://github.com/OPPOMKLab/u-LLaVA,u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model,https://huggingface.co/papers/2311.05348,10,1,1,0,0,0 +2023-11-10,2311.04931,https://github.com/nomic-ai/gpt4all,GPT4All: An Ecosystem of Open Source Compressed Language Models,https://huggingface.co/papers/2311.04931,20,1,0,0,0,0 +2023-11-10,2311.05332,https://github.com/pjlab-adg/gpt4v-ad-exploration,On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving,https://huggingface.co/papers/2311.05332,7,1,0,0,0,0 +2023-11-10,2311.04934,,Prompt Cache: Modular Attention Reuse for Low-Latency Inference,https://huggingface.co/papers/2311.04934,25,2,0,0,0,0 +2023-11-10,2311.05437,https://github.com/LLaVA-VL/LLaVA-Plus-Codebase,LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents,https://huggingface.co/papers/2311.05437,40,4,1,4,0,0 +2023-11-10,2311.05556,https://github.com/luosiallen/latent-consistency-model,LCM-LoRA: A Universal Stable-Diffusion Acceleration Module,https://huggingface.co/papers/2311.05556,78,5,1,6,0,100 +2023-11-09,2311.04901,,GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs,https://huggingface.co/papers/2311.04901,6,0,0,0,0,0 +2023-11-09,2311.04391,,3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features,https://huggingface.co/papers/2311.04391,10,0,0,0,0,0 +2023-11-09,2311.04400,,LRM: Large Reconstruction Model for Single Image to 3D,https://huggingface.co/papers/2311.04400,45,2,0,14,0,56 +2023-11-09,2311.04235,https://github.com/normster/llm_rules,Can LLMs Follow Simple Rules?,https://huggingface.co/papers/2311.04235,9,0,1,0,1,0 +2023-11-09,2311.04254,https://github.com/microsoft/everything-of-thoughts-xot-,Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation,https://huggingface.co/papers/2311.04254,12,0,0,0,0,0 +2023-11-09,2311.04257,https://github.com/X-PLUG/mPLUG-Owl/tree/main/mPLUG-Owl2,mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration,https://huggingface.co/papers/2311.04257,20,2,0,1,0,3 +2023-11-09,2311.04287,https://github.com/stanford-crfm/helm,Holistic Evaluation of Text-To-Image Models,https://huggingface.co/papers/2311.04287,11,0,0,0,0,0 +2023-11-09,2311.04498,https://github.com/next-chatv/next-chat,"NExT-Chat: An LMM for Chat, Detection and Segmentation",https://huggingface.co/papers/2311.04498,9,0,1,0,0,0 +2023-11-09,2311.04589,,TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models,https://huggingface.co/papers/2311.04589,17,5,0,0,0,0 +2023-11-08,2311.04212,https://github.com/shi-labs/vim,Video Instance Matting,https://huggingface.co/papers/2311.04212,6,0,0,0,0,0 +2023-11-08,2311.03736,,Neural MMO 2.0: A Massively Multi-task Addition to Massively Multi-agent Learning,https://huggingface.co/papers/2311.03736,9,1,0,0,0,0 +2023-11-08,2311.03629,,Random Field Augmentations for Self-Supervised Representation Learning,https://huggingface.co/papers/2311.03629,6,0,0,0,0,0 +2023-11-08,2311.03517,,SoundCam: A Dataset for Finding Humans Using Room Acoustics,https://huggingface.co/papers/2311.03517,9,0,0,0,0,0 +2023-11-08,2311.04219,https://github.com/luodian/otter,OtterHD: A High-Resolution Multi-modality Model,https://huggingface.co/papers/2311.04219,31,2,1,0,0,2 +2023-11-08,2311.03739,,Leveraging Large Language Models for Automated Proof Synthesis in Rust,https://huggingface.co/papers/2311.03739,5,0,0,0,0,0 +2023-11-08,2311.04124,,Unveiling Safety Vulnerabilities of Large Language Models,https://huggingface.co/papers/2311.04124,5,0,0,0,1,0 +2023-11-08,2311.04145,https://github.com/modelscope/modelscope,I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models,https://huggingface.co/papers/2311.04145,31,3,0,2,0,6 +2023-11-07,2311.02772,,Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency,https://huggingface.co/papers/2311.02772,3,1,0,0,0,0 +2023-11-07,2311.02805,https://github.com/ink-usc/rationalemultirewarddistillation,Tailoring Self-Rationalizers with Multi-Reward Distillation,https://huggingface.co/papers/2311.02805,2,1,0,0,0,0 +2023-11-07,2311.02849,,Co-training and Co-distillation for Quality Improvement and Compression of Language Models,https://huggingface.co/papers/2311.02849,2,1,0,0,0,0 +2023-11-07,2311.03079,https://github.com/thudm/cogvlm,CogVLM: Visual Expert for Pretrained Language Models,https://huggingface.co/papers/2311.03079,21,2,1,21,1,10 +2023-11-07,2311.03301,,Ziya2: Data-centric Learning is All LLMs Need,https://huggingface.co/papers/2311.03301,16,0,0,2,0,1 +2023-11-07,2311.03354,,CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding,https://huggingface.co/papers/2311.03354,4,0,0,0,0,1 +2023-11-07,2311.02848,,Consistent4D: Consistent 360° Dynamic Object Generation from Monocular Video,https://huggingface.co/papers/2311.02848,2,1,0,0,0,0 +2023-11-07,2311.03356,https://github.com/mbzuai-oryx/groundingLMM,GLaMM: Pixel Grounding Large Multimodal Model,https://huggingface.co/papers/2311.03356,31,1,1,7,2,0 +2023-11-07,2311.02462,,Levels of AGI: Operationalizing Progress on the Path to AGI,https://huggingface.co/papers/2311.02462,31,1,0,0,0,0 +2023-11-07,2311.02382,,Ultra-Long Sequence Distributed Transformer,https://huggingface.co/papers/2311.02382,2,1,0,0,0,0 +2023-11-07,2311.03226,,LDM3D-VR: Latent Diffusion Model for 3D VR,https://huggingface.co/papers/2311.03226,7,1,0,2,1,14 +2023-11-07,2311.02262,https://github.com/qingruzhang/pasta,Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs,https://huggingface.co/papers/2311.02262,9,2,1,0,0,0 +2023-11-07,2311.03285,https://github.com/s-lora/s-lora,S-LoRA: Serving Thousands of Concurrent LoRA Adapters,https://huggingface.co/papers/2311.03285,27,2,1,0,0,0 +2023-11-07,2311.02542,,VR-NeRF: High-Fidelity Virtualized Walkable Spaces,https://huggingface.co/papers/2311.02542,13,1,0,0,0,0 +2023-11-07,2311.02303,https://github.com/codefuse-ai/mftcoder,MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning,https://huggingface.co/papers/2311.02303,4,1,1,4,0,5 +2023-11-07,2311.02103,,Relax: Composable Abstractions for End-to-End Dynamic Machine Learning,https://huggingface.co/papers/2311.02103,15,1,0,0,0,0 +2023-11-06,2311.01767,https://github.com/gydpku/pptc,PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion,https://huggingface.co/papers/2311.01767,16,2,1,0,0,0 +2023-11-06,2311.01615,,FLAP: Fast Language-Audio Pre-training,https://huggingface.co/papers/2311.01615,16,1,0,0,0,0 +2023-11-06,2311.02077,,EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision,https://huggingface.co/papers/2311.02077,14,1,0,0,0,0 +2023-11-03,2311.00899,,RoboVQA: Multimodal Long-Horizon Reasoning for Robotics,https://huggingface.co/papers/2311.00899,7,2,0,0,0,0 +2023-11-03,2311.01282,,FlashDecoding++: Faster Large Language Model Inference on GPUs,https://huggingface.co/papers/2311.01282,32,3,0,0,0,0 +2023-11-03,2311.01462,,Idempotent Generative Network,https://huggingface.co/papers/2311.01462,22,4,0,0,0,0 +2023-11-03,2311.01455,,RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation,https://huggingface.co/papers/2311.01455,25,2,0,0,0,0 +2023-11-03,2311.00895,,In-Context Prompt Editing For Conditional Audio Generation,https://huggingface.co/papers/2311.00895,9,1,0,0,0,0 +2023-11-03,2311.00945,,E3 TTS: Easy End-to-End Diffusion-based Text to Speech,https://huggingface.co/papers/2311.00945,12,1,0,0,0,0 +2023-11-02,2311.00522,,Text Rendering Strategies for Pixel Language Models,https://huggingface.co/papers/2311.00522,10,1,0,0,0,0 +2023-11-02,2311.00272,,ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation,https://huggingface.co/papers/2311.00272,8,1,0,0,0,0 +2023-11-02,2311.00176,,ChipNeMo: Domain-Adapted LLMs for Chip Design,https://huggingface.co/papers/2311.00176,8,2,0,0,0,0 +2023-11-02,2311.00047,https://github.com/vl-illusion/dataset,Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?,https://huggingface.co/papers/2311.00047,7,1,1,0,0,0 +2023-11-02,2311.00257,,AMSP: Super-Scaling LLM Training via Advanced Model States Partitioning,https://huggingface.co/papers/2311.00257,8,1,0,0,0,0 +2023-11-02,2311.00618,,De-Diffusion Makes Text a Strong Cross-Modal Interface,https://huggingface.co/papers/2311.00618,21,12,0,0,0,0 +2023-11-02,2311.00059,,"The Generative AI Paradox: ""What It Can Create, It May Not Understand""",https://huggingface.co/papers/2311.00059,17,5,0,0,0,0 +2023-11-02,2311.00613,,Controllable Music Production with Diffusion Models and Guidance Gradients,https://huggingface.co/papers/2311.00613,24,1,0,0,0,0 +2023-11-02,2311.00430,https://github.com/huggingface/distil-whisper,Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling,https://huggingface.co/papers/2311.00430,54,2,1,35,0,100 +2023-11-02,2311.00571,,"LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing",https://huggingface.co/papers/2311.00571,40,10,0,4,0,1 +2023-11-01,2310.20499,https://github.com/skytliang/spygame,Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models,https://huggingface.co/papers/2310.20499,7,1,0,0,0,0 +2023-11-01,2310.20550,https://github.com/baaivision/capsfusion,CapsFusion: Rethinking Image-Text Data at Scale,https://huggingface.co/papers/2310.20550,25,2,1,0,0,0 +2023-11-01,2310.20707,https://github.com/allenai/wimbd,What's In My Big Data?,https://huggingface.co/papers/2310.20707,9,1,0,0,0,0 +2023-11-01,2310.20092,,Beyond U: Making Diffusion Models Faster & Lighter,https://huggingface.co/papers/2310.20092,11,1,0,0,0,0 +2023-11-01,2310.20700,,SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction,https://huggingface.co/papers/2310.20700,9,1,0,1,0,2 +2023-11-01,2310.20587,https://github.com/srzer/LaMo-2023,Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning,https://huggingface.co/papers/2310.20587,15,1,0,0,0,0 +2023-11-01,2310.20216,,Does GPT-4 Pass the Turing Test?,https://huggingface.co/papers/2310.20216,17,3,0,0,0,0 +2023-11-01,2310.20689,https://github.com/microsoft/lema,Learning From Mistakes Makes LLM Better Reasoner,https://huggingface.co/papers/2310.20689,27,4,1,0,0,0 +2023-11-01,2310.19909,https://github.com/hsouri/battle-of-the-backbones,Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks,https://huggingface.co/papers/2310.19909,20,1,0,0,0,0 +2023-11-01,2310.19956,,The Impact of Depth and Width on Transformer Language Model Generalization,https://huggingface.co/papers/2310.19956,9,1,0,0,0,0 +2023-11-01,2310.20624,,LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B,https://huggingface.co/papers/2310.20624,12,9,0,0,0,0 +2023-10-31,2310.19019,,"TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise",https://huggingface.co/papers/2310.19019,9,3,0,0,0,0 +2023-10-31,2310.19102,https://github.com/efeslab/atom,Atom: Low-bit Quantization for Efficient and Accurate LLM Serving,https://huggingface.co/papers/2310.19102,8,4,1,1,0,0 +2023-10-31,2310.19784,,CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models,https://huggingface.co/papers/2310.19784,9,3,0,0,0,2 +2023-10-31,2310.19415,,Text-to-3D with classifier score distillation,https://huggingface.co/papers/2310.19415,4,1,0,0,0,0 +2023-10-31,2310.19061,https://github.com/zhilingyan/gpt4v-medical-report,Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V,https://huggingface.co/papers/2310.19061,8,1,0,0,0,0 +2023-10-31,2310.18628,https://github.com/SalesforceAIResearch/PersDistill,Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation,https://huggingface.co/papers/2310.18628,6,1,0,0,0,0 +2023-10-31,2310.19341,https://github.com/skyworkai/skywork,Skywork: A More Open Bilingual Foundation Model,https://huggingface.co/papers/2310.19341,5,1,1,29,4,6 +2023-10-31,2310.19512,https://github.com/ailab-cvc/videocrafter,VideoCrafter1: Open Diffusion Models for High-Quality Video Generation,https://huggingface.co/papers/2310.19512,14,2,1,0,0,1 +2023-10-31,2310.19773,,MM-VID: Advancing Video Understanding with GPT-4V(ision),https://huggingface.co/papers/2310.19773,19,1,0,0,0,0 +2023-10-31,2310.18356,,LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery,https://huggingface.co/papers/2310.18356,22,2,0,0,0,0 +2023-10-30,2310.17752,,PockEngine: Sparse and Efficient Fine-tuning in a Pocket,https://huggingface.co/papers/2310.17752,11,4,0,0,0,0 +2023-10-30,2310.17784,,Data-Centric Financial Large Language Models,https://huggingface.co/papers/2310.17784,14,3,0,0,0,0 +2023-10-30,2310.17796,https://github.com/opengvlab/controlllm,ControlLLM: Augment Language Models with Tools by Searching on Graphs,https://huggingface.co/papers/2310.17796,16,1,1,0,0,1 +2023-10-30,2310.18168,,Personas as a Way to Model Truthfulness in Language Models,https://huggingface.co/papers/2310.18168,5,1,0,0,0,0 +2023-10-30,2310.17994,,ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image,https://huggingface.co/papers/2310.17994,8,1,0,0,0,0 +2023-10-30,2310.17880,,Reconstructive Latent-Space Neural Radiance Fields for Efficient 3D Scene Representations,https://huggingface.co/papers/2310.17880,7,1,0,0,0,0 +2023-10-30,2310.17750,,A Framework for Automated Measurement of Responsible AI Harms in Generative AI Applications,https://huggingface.co/papers/2310.17750,9,1,0,0,0,0 +2023-10-30,2310.17722,,Large Language Models as Generalizable Policies for Embodied Tasks,https://huggingface.co/papers/2310.17722,6,1,0,0,0,0 +2023-10-30,2310.18313,https://github.com/azure/ms-amp,FP8-LM: Training FP8 Large Language Models,https://huggingface.co/papers/2310.18313,31,2,0,0,0,0 +2023-10-30,2310.17680,,CodeFusion: A Pre-trained Diffusion Model for Code Generation,https://huggingface.co/papers/2310.17680,68,10,0,0,0,0 +2023-10-27,2310.17022,,Controlled Decoding from Language Models,https://huggingface.co/papers/2310.17022,12,2,0,0,0,0 +2023-10-27,2310.17157,https://github.com/fminference/dejavu,Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time,https://huggingface.co/papers/2310.17157,9,1,1,0,0,0 +2023-10-27,2310.17075,,HyperFields: Towards Zero-Shot Generation of NeRFs from Text,https://huggingface.co/papers/2310.17075,13,2,0,0,0,0 +2023-10-27,2310.17631,https://github.com/baaivision/judgelm,JudgeLM: Fine-tuned Large Language Models are Scalable Judges,https://huggingface.co/papers/2310.17631,32,6,1,3,2,0 +2023-10-26,2310.16828,,"TD-MPC2: Scalable, Robust World Models for Continuous Control",https://huggingface.co/papers/2310.16828,6,0,0,1,1,0 +2023-10-26,2310.16226,https://github.com/apple/ml-tic-clip,TiC-CLIP: Continual Training of CLIP Models,https://huggingface.co/papers/2310.16226,7,1,1,6,1,0 +2023-10-26,2310.16836,https://github.com/nbasyl/llm-fp4,LLM-FP4: 4-Bit Floating-Point Quantized Transformers,https://huggingface.co/papers/2310.16836,11,0,0,0,0,0 +2023-10-26,2310.16789,,Detecting Pretraining Data from Large Language Models,https://huggingface.co/papers/2310.16789,10,0,0,0,2,1 +2023-10-26,2310.16795,https://github.com/ist-daslab/qmoe,QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models,https://huggingface.co/papers/2310.16795,26,3,1,0,0,0 +2023-10-26,2310.16764,,ConvNets Match Vision Transformers at Scale,https://huggingface.co/papers/2310.16764,18,1,0,0,0,0 +2023-10-26,2310.16825,https://github.com/mosaicml/diffusion,CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images,https://huggingface.co/papers/2310.16825,30,1,0,4,7,2 +2023-10-26,2310.16450,https://github.com/damo-nlp-sg/clex,CLEX: Continuous Length Extrapolation for Large Language Models,https://huggingface.co/papers/2310.16450,9,1,1,6,0,1 +2023-10-26,2310.15008,,Wonder3D: Single Image to 3D using Cross-Domain Diffusion,https://huggingface.co/papers/2310.15008,20,4,0,0,0,2 +2023-10-26,2310.16832,,LightSpeed: Light and Fast Neural Light Fields on Mobile Devices,https://huggingface.co/papers/2310.16832,4,0,0,0,0,0 +2023-10-26,2310.16534,https://github.com/albertwy/gpt-4v-evaluation,An Early Evaluation of GPT-4V(ision),https://huggingface.co/papers/2310.16534,21,1,0,0,0,0 +2023-10-26,2310.16818,https://github.com/deepseek-ai/dreamcraft3d,DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior,https://huggingface.co/papers/2310.16818,27,0,1,0,0,0 +2023-10-26,2310.16656,,A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation,https://huggingface.co/papers/2310.16656,39,1,0,0,1,0 +2023-10-25,2310.15308,,SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding,https://huggingface.co/papers/2310.15308,22,4,0,0,0,0 +2023-10-25,2310.15200,,Inject Semantic Concepts into Image Tagging for Open-Set Recognition,https://huggingface.co/papers/2310.15200,5,1,0,3,0,1 +2023-10-25,2310.15337,https://github.com/abdulhaim/moral_foundations_llm,Moral Foundations of Large Language Models,https://huggingface.co/papers/2310.15337,1,1,0,0,0,0 +2023-10-25,2310.15494,https://github.com/lwaekfjlk/trams,TRAMS: Training-free Memory Selection for Long-range Language Modeling,https://huggingface.co/papers/2310.15494,1,1,0,0,0,0 +2023-10-25,2310.15511,https://huggingface.co/datasets/microsoft/kitab,KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval,https://huggingface.co/papers/2310.15511,4,1,0,0,1,0 +2023-10-25,2310.15916,https://github.com/roeehendel/icl_task_vectors,In-Context Learning Creates Task Vectors,https://huggingface.co/papers/2310.15916,39,8,1,0,0,0 +2023-10-25,2310.15987,,Dissecting In-Context Learning of Translations in GPTs,https://huggingface.co/papers/2310.15987,5,1,0,0,0,0 +2023-10-25,2310.16045,https://github.com/bradyfu/woodpecker,Woodpecker: Hallucination Correction for Multimodal Large Language Models,https://huggingface.co/papers/2310.16045,14,1,1,0,0,0 +2023-10-24,2310.13772,,TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models,https://huggingface.co/papers/2310.13772,6,2,0,0,0,0 +2023-10-24,2310.13724,,"Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots",https://huggingface.co/papers/2310.13724,8,3,0,0,0,0 +2023-10-24,2310.15111,,Matryoshka Diffusion Models,https://huggingface.co/papers/2310.15111,39,5,0,0,2,0 +2023-10-24,2310.13730,,Localizing and Editing Knowledge in Text-to-Image Generative Models,https://huggingface.co/papers/2310.13730,6,2,0,0,0,0 +2023-10-24,2310.13798,,Specific versus General Principles for Constitutional AI,https://huggingface.co/papers/2310.13798,2,2,0,0,0,0 +2023-10-24,2310.13961,https://github.com/ibm/ensemble-instruct,Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs,https://huggingface.co/papers/2310.13961,4,2,1,0,0,0 +2023-10-24,2310.15169,https://github.com/arthur-qiu/freenoise-lavie,FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling,https://huggingface.co/papers/2310.15169,8,0,1,0,0,1 +2023-10-24,2310.14495,,InstructExcel: A Benchmark for Natural Language Instruction in Excel,https://huggingface.co/papers/2310.14495,1,2,0,0,0,0 +2023-10-24,2310.15123,,Branch-Solve-Merge Improves Large Language Model Evaluation and Generation,https://huggingface.co/papers/2310.15123,7,0,0,0,0,0 +2023-10-24,2310.14573,,Exploring the Boundaries of GPT-4 in Radiology,https://huggingface.co/papers/2310.14573,7,2,0,0,0,0 +2023-10-24,2310.15144,https://github.com/design-bench/design-bench.github.io,DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design,https://huggingface.co/papers/2310.15144,12,2,0,0,0,0 +2023-10-24,2310.14566,,"HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models",https://huggingface.co/papers/2310.14566,24,6,0,0,2,0 +2023-10-23,2310.13355,,SILC: Improving Vision Language Pretraining with Self-Distillation,https://huggingface.co/papers/2310.13355,6,1,0,0,0,0 +2023-10-23,2310.13545,https://github.com/sail-sg/scalelong,ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection,https://huggingface.co/papers/2310.13545,3,1,0,0,0,0 +2023-10-23,2310.13639,,Contrastive Prefence Learning: Learning from Human Feedback without RL,https://huggingface.co/papers/2310.13639,22,2,0,1,0,0 +2023-10-23,2310.13119,,DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation,https://huggingface.co/papers/2310.13119,11,1,0,0,0,0 +2023-10-23,2310.13671,https://github.com/rickyskywalker/synthesis_step-by-step_official,Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models,https://huggingface.co/papers/2310.13671,17,1,0,0,0,0 +2023-10-23,2310.13268,https://github.com/thu-ml/dpm-solver-v3,DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics,https://huggingface.co/papers/2310.13268,17,2,0,0,0,0 +2023-10-23,2310.13012,https://github.com/h2oai/h2ogpt,H2O Open Ecosystem for State-of-the-art Large Language Models,https://huggingface.co/papers/2310.13012,7,2,1,0,0,0 +2023-10-23,2310.13065,,Creative Robot Tool Use with Large Language Models,https://huggingface.co/papers/2310.13065,7,1,0,0,0,0 +2023-10-23,2310.13127,,Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models,https://huggingface.co/papers/2310.13127,10,1,0,2,0,0 +2023-10-23,2310.13289,https://github.com/bytedance/salmonn,SALMONN: Towards Generic Hearing Abilities for Large Language Models,https://huggingface.co/papers/2310.13289,17,1,1,2,0,1 +2023-10-23,2310.13332,https://github.com/raibows/learn-to-reason,Democratizing Reasoning Ability: Tailored Learning from Large Language Model,https://huggingface.co/papers/2310.13332,14,1,0,0,0,0 +2023-10-23,2310.13385,https://github.com/microsoft/lmops,Tuna: Instruction Tuning using Feedback from Large Language Models,https://huggingface.co/papers/2310.13385,9,1,0,0,0,0 +2023-10-23,2310.13522,https://github.com/jasonyux/tripost,Teaching Language Models to Self-Improve through Interactive Demonstrations,https://huggingface.co/papers/2310.13522,11,1,1,0,0,0 +2023-10-23,2310.13548,https://github.com/meg-tong/sycophancy-eval,Towards Understanding Sycophancy in Language Models,https://huggingface.co/papers/2310.13548,4,2,0,0,1,0 +2023-10-23,2310.13227,,ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search,https://huggingface.co/papers/2310.13227,12,1,0,0,0,0 +2023-10-20,2310.12773,https://github.com/pku-alignment/safe-rlhf,Safe RLHF: Safe Reinforcement Learning from Human Feedback,https://huggingface.co/papers/2310.12773,27,5,1,12,0,2 +2023-10-20,2310.12945,,3D-GPT: Procedural 3D Modeling with Large Language Models,https://huggingface.co/papers/2310.12945,52,2,0,0,0,1 +2023-10-20,2310.12962,,An Emulator for Fine-Tuning Large Language Models using Small Language Models,https://huggingface.co/papers/2310.12962,14,1,0,0,0,0 +2023-10-20,2310.12274,,An Image is Worth Multiple Words: Learning Object Level Concepts using Multi-Concept Prompt Learning,https://huggingface.co/papers/2310.12274,10,1,0,0,0,0 +2023-10-20,2310.12474,https://github.com/fudan-zvg/pgc-3d,Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping,https://huggingface.co/papers/2310.12474,4,1,1,0,0,0 +2023-10-20,2310.12963,https://github.com/automix-llm/automix,AutoMix: Automatically Mixing Language Models,https://huggingface.co/papers/2310.12963,14,2,0,0,0,0 +2023-10-20,2310.12921,https://github.com/alignmentresearch/vlmrm,Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning,https://huggingface.co/papers/2310.12921,18,1,0,0,0,0 +2023-10-20,2310.12823,https://github.com/thudm/agenttuning,AgentTuning: Enabling Generalized Agent Abilities for LLMs,https://huggingface.co/papers/2310.12823,34,1,1,13,2,10 +2023-10-20,2310.12931,https://github.com/eureka-research/Eureka,Eureka: Human-Level Reward Design via Coding Large Language Models,https://huggingface.co/papers/2310.12931,26,3,0,0,0,0 +2023-10-20,2310.12404,https://github.com/ldzhangyx/loop-copilot,Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing,https://huggingface.co/papers/2310.12404,13,1,0,0,0,0 +2023-10-19,2310.11954,https://github.com/microsoft/muzic,MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models,https://huggingface.co/papers/2310.11954,24,2,0,0,0,0 +2023-10-19,2310.11784,,Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts,https://huggingface.co/papers/2310.11784,10,2,0,0,0,0 +2023-10-19,2310.11511,https://github.com/AkariAsai/self-rag,"Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection",https://huggingface.co/papers/2310.11511,68,5,1,4,1,3 +2023-10-18,2310.11441,https://github.com/microsoft/SoM,Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V,https://huggingface.co/papers/2310.11441,26,3,1,0,0,3 +2023-10-18,2310.10837,https://github.com/robertcsordas/moe,Approximating Two-Layer Feedforward Networks for Efficient Transformers,https://huggingface.co/papers/2310.10837,10,3,0,0,0,0 +2023-10-18,2310.11248,,CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion,https://huggingface.co/papers/2310.11248,3,1,0,0,0,0 +2023-10-18,2310.11440,https://github.com/EvalCrafter/EvalCrafter,EvalCrafter: Benchmarking and Evaluating Large Video Generation Models,https://huggingface.co/papers/2310.11440,14,1,1,0,1,1 +2023-10-18,2310.10769,https://github.com/RQ-Wu/LAMP,LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation,https://huggingface.co/papers/2310.10769,8,2,1,0,0,0 +2023-10-18,2310.11448,,4K4D: Real-Time 4D View Synthesis at 4K Resolution,https://huggingface.co/papers/2310.11448,37,2,0,1,0,0 +2023-10-18,2310.11454,,VeRA: Vector-based Random Matrix Adaptation,https://huggingface.co/papers/2310.11454,27,1,0,0,0,0 +2023-10-18,2310.10971,https://github.com/cfifty/CAML,Context-Aware Meta-Learning,https://huggingface.co/papers/2310.10971,14,1,1,0,0,0 +2023-10-18,2310.10944,https://github.com/intel/neural-compressor,TEQ: Trainable Equivalent Transformation for Quantization of LLMs,https://huggingface.co/papers/2310.10944,9,1,1,0,0,0 +2023-10-18,2310.11453,,BitNet: Scaling 1-bit Transformers for Large Language Models,https://huggingface.co/papers/2310.11453,96,12,0,0,0,0 +2023-10-17,2310.10645,,Interactive Task Planning with Language Models,https://huggingface.co/papers/2310.10645,9,1,0,0,0,0 +2023-10-17,2310.10631,https://github.com/EleutherAI/math-lm,Llemma: An Open Language Model For Mathematics,https://huggingface.co/papers/2310.10631,47,6,1,18,3,46 +2023-10-17,2310.10638,https://github.com/swj0419/in-context-pretraining,In-Context Pretraining: Language Modeling Beyond Document Boundaries,https://huggingface.co/papers/2310.10638,27,3,0,1,0,0 +2023-10-17,2310.08678,,Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams,https://huggingface.co/papers/2310.08678,11,3,0,0,0,0 +2023-10-17,2310.09753,https://github.com/eboix/relational-reasoning,When can transformers reason with abstract symbols?,https://huggingface.co/papers/2310.09753,2,1,0,0,0,0 +2023-10-17,2310.09983,,Farzi Data: Autoregressive Data Distillation,https://huggingface.co/papers/2310.09983,6,1,0,0,0,0 +2023-10-17,2310.10047,,Improving Large Language Model Fine-tuning for Solving Math Problems,https://huggingface.co/papers/2310.10047,5,1,0,0,0,0 +2023-10-17,2310.10537,https://github.com/microsoft/microxcaling,Microscaling Data Formats for Deep Learning,https://huggingface.co/papers/2310.10537,5,1,0,0,0,0 +2023-10-17,2310.10625,,Video Language Planning,https://huggingface.co/papers/2310.10625,8,1,0,0,0,0 +2023-10-17,2310.09342,https://github.com/microsoft/NeuralInvariantRanker,Ranking LLM-Generated Loop Invariants for Program Verification,https://huggingface.co/papers/2310.09342,2,1,1,0,0,0 +2023-10-17,2310.09478,,MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning,https://huggingface.co/papers/2310.09478,17,1,0,0,0,2 +2023-10-17,2310.09520,https://github.com/haikangdeng/RAD,Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model,https://huggingface.co/papers/2310.09520,10,1,0,0,0,0 +2023-10-16,2310.09199,,"PaLI-3 Vision Language Models: Smaller, Faster, Stronger",https://huggingface.co/papers/2310.09199,24,3,0,100,0,24 +2023-10-16,2310.08992,https://github.com/SalesforceAIResearch/CodeChain,CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules,https://huggingface.co/papers/2310.08992,9,1,1,0,0,0 +2023-10-16,2310.09139,,The Consensus Game: Language Model Generation via Equilibrium Search,https://huggingface.co/papers/2310.09139,12,3,0,0,0,0 +2023-10-16,2310.08715,,Toward Joint Language Modeling for Speech Units and Text,https://huggingface.co/papers/2310.08715,6,1,0,0,0,0 +2023-10-16,2310.08740,,A Zero-Shot Language Agent for Computer Control with Structured Reflection,https://huggingface.co/papers/2310.08740,14,2,0,0,0,0 +2023-10-16,2310.09263,,Table-GPT: Table-tuned GPT for Diverse Table Tasks,https://huggingface.co/papers/2310.09263,38,12,0,0,1,0 +2023-10-16,2310.08659,https://github.com/yxli2123/loftq,LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models,https://huggingface.co/papers/2310.08659,21,4,1,17,0,0 +2023-10-13,2310.08541,,Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation,https://huggingface.co/papers/2310.08541,17,6,0,0,0,0 +2023-10-13,2310.08579,,HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion,https://huggingface.co/papers/2310.08579,14,1,0,0,0,0 +2023-10-13,2310.08465,https://github.com/showlab/MotionDirector,MotionDirector: Motion Customization of Text-to-Video Diffusion Models,https://huggingface.co/papers/2310.08465,13,5,1,0,0,4 +2023-10-13,2310.08529,,GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors,https://huggingface.co/papers/2310.08529,17,2,0,0,0,1 +2023-10-13,2310.08588,https://github.com/dongyh20/octopus,Octopus: Embodied Vision-Language Programmer from Environmental Feedback,https://huggingface.co/papers/2310.08588,33,4,0,0,0,0 +2023-10-13,2310.06830,https://github.com/openlemur/lemur,Lemur: Harmonizing Natural Language and Code for Language Agents,https://huggingface.co/papers/2310.06830,29,3,1,2,0,31 +2023-10-13,2310.07889,,LangNav: Language as a Perceptual Representation for Navigation,https://huggingface.co/papers/2310.07889,4,1,0,1,0,0 +2023-10-13,2310.08185,,EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation,https://huggingface.co/papers/2310.08185,6,1,0,0,0,0 +2023-10-13,2310.08491,https://github.com/kaistAI/Prometheus,Prometheus: Inducing Fine-grained Evaluation Capability in Language Models,https://huggingface.co/papers/2310.08491,52,4,1,25,4,6 +2023-10-06,2310.03046,https://github.com/jieyuz2/ecoassistant,EcoAssistant: Using LLM Assistant More Affordably and Accurately,https://huggingface.co/papers/2310.03046,5,1,0,0,0,0 +2023-10-06,2310.03094,https://github.com/murongyue/llm_mot_cascade,Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning,https://huggingface.co/papers/2310.03094,12,1,0,0,0,0 +2023-10-06,2310.03714,https://github.com/stanfordnlp/dspy,DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines,https://huggingface.co/papers/2310.03714,30,1,1,0,0,0 +2023-10-06,2310.03731,https://github.com/mathllm/mathcoder,MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning,https://huggingface.co/papers/2310.03731,28,4,1,5,2,6 +2023-10-06,2310.03716,https://github.com/prasanns/rlhf-length-biases,A Long Way to Go: Investigating Length Correlations in RLHF,https://huggingface.co/papers/2310.03716,9,1,1,0,0,0 +2023-10-06,2310.03704,,Drag View: Generalizable Novel View Synthesis with Unposed Imagery,https://huggingface.co/papers/2310.03704,7,1,0,0,0,0 +2023-10-06,2310.03214,https://github.com/freshllms/freshqa,FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation,https://huggingface.co/papers/2310.03214,14,1,0,0,0,0 +2023-10-06,2310.03734,,Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency,https://huggingface.co/papers/2310.03734,13,1,0,0,0,0 +2023-10-06,2310.03720,,HeaP: Hierarchical Policies for Web Actions using LLMs,https://huggingface.co/papers/2310.03720,5,1,0,0,0,0 +2023-10-06,2310.00704,https://github.com/yangdongchao/uniaudio,UniAudio: An Audio Foundation Model Toward Universal Audio Generation,https://huggingface.co/papers/2310.00704,18,1,0,0,0,0 +2023-10-06,2310.03744,,Improved Baselines with Visual Instruction Tuning,https://huggingface.co/papers/2310.03744,35,5,0,20,2,31 +2023-10-06,2310.03739,https://github.com/mihirp1998/alignprop,Aligning Text-to-Image Diffusion Models with Reward Backpropagation,https://huggingface.co/papers/2310.03739,21,4,0,0,0,0 +2023-10-06,2310.03502,https://github.com/ai-forever/Kandinsky-2,Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion,https://huggingface.co/papers/2310.03502,75,5,1,0,0,2 +2023-10-06,2310.03051,,How FaR Are Large Language Models From Agents with Theory-of-Mind?,https://huggingface.co/papers/2310.03051,33,3,0,0,0,0 +2023-10-04,2310.01714,,Large Language Models as Analogical Reasoners,https://huggingface.co/papers/2310.01714,14,1,0,0,0,0 +2023-10-04,2310.01557,https://github.com/microsoft/smartplay,SmartPlay : A Benchmark for LLMs as Intelligent Agents,https://huggingface.co/papers/2310.01557,12,2,0,0,0,0 +2023-10-04,2310.01596,https://github.com/TIGER-AI-Lab/ImagenHub,ImagenHub: Standardizing the evaluation of conditional image generation models,https://huggingface.co/papers/2310.01596,17,3,1,0,7,0 +2023-10-04,2310.01798,,Large Language Models Cannot Self-Correct Reasoning Yet,https://huggingface.co/papers/2310.01798,32,2,0,0,0,0 +2023-10-03,2310.01407,https://github.com/fast-codi/CoDi,Conditional Diffusion Distillation,https://huggingface.co/papers/2310.01407,19,3,1,0,0,1 +2023-10-03,2310.00898,,Enable Language Models to Implicitly Learn Self-Improvement From Data,https://huggingface.co/papers/2310.00898,22,2,0,0,0,0 +2023-10-03,2310.00426,https://github.com/PixArt-alpha/PixArt-alpha,PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis,https://huggingface.co/papers/2310.00426,60,11,1,8,0,100 +2023-09-29,2309.16643,https://github.com/lisiyao21/animeinbet,Deep Geometrized Cartoon Line Inbetweening,https://huggingface.co/papers/2309.16643,23,0,0,0,0,0 +2023-09-29,2309.16588,https://github.com/facebookresearch/dinov2,Vision Transformers Need Registers,https://huggingface.co/papers/2309.16588,73,9,0,23,0,0 +2023-09-29,2309.16671,https://github.com/facebookresearch/metaclip,Demystifying CLIP Data,https://huggingface.co/papers/2309.16671,18,3,1,13,0,9 +2023-09-29,2309.16583,https://github.com/gpt-fathom/gpt-fathom,GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond,https://huggingface.co/papers/2309.16583,12,0,1,0,0,0 +2023-09-29,2309.16650,,ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning,https://huggingface.co/papers/2309.16650,8,0,0,0,0,0 +2023-09-29,2309.16235,,Language models in molecular discovery,https://huggingface.co/papers/2309.16235,10,0,0,0,0,0 +2023-09-29,2309.16496,,CCEdit: Creative and Controllable Video Editing via Diffusion Models,https://huggingface.co/papers/2309.16496,8,1,0,1,1,0 +2023-09-29,2309.16429,https://github.com/guyyariv/TempoTokens,Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation,https://huggingface.co/papers/2309.16429,10,2,1,0,0,0 +2023-09-29,2309.16585,https://github.com/gsgen3d/gsgen,Text-to-3D using Gaussian Splatting,https://huggingface.co/papers/2309.16585,30,2,0,0,0,0 +2023-09-29,2309.16653,https://github.com/dreamgaussian/dreamgaussian,DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation,https://huggingface.co/papers/2309.16653,44,5,0,0,0,17 +2023-09-29,2309.16039,,Effective Long-Context Scaling of Foundation Models,https://huggingface.co/papers/2309.16039,29,3,0,7,0,0 +2023-09-29,2309.16058,,AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model,https://huggingface.co/papers/2309.16058,53,7,0,2,0,1 +2023-09-29,2309.16414,,AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models,https://huggingface.co/papers/2309.16414,19,2,0,0,0,0 +2023-09-29,2309.16534,,MotionLM: Multi-Agent Motion Forecasting as Language Modeling,https://huggingface.co/papers/2309.16534,15,0,0,0,0,0 +2023-09-29,2309.16668,,RealFill: Reference-Driven Generation for Authentic Image Completion,https://huggingface.co/papers/2309.16668,12,2,0,0,0,0 +2023-09-29,2309.16609,https://github.com/QwenLM/Qwen-7B,Qwen Technical Report,https://huggingface.co/papers/2309.16609,32,2,1,46,0,100 +2023-09-28,2309.15564,,Jointly Training Large Autoregressive Multimodal Models,https://huggingface.co/papers/2309.15564,8,1,0,0,0,0 +2023-09-28,2309.15426,https://github.com/oppo-us-research/NeuRBF,NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions,https://huggingface.co/papers/2309.15426,14,2,0,0,0,0 +2023-09-28,2309.15505,https://github.com/google-research/google-research,Finite Scalar Quantization: VQ-VAE Made Simple,https://huggingface.co/papers/2309.15505,21,5,0,0,0,0 +2023-09-28,2309.15273,https://github.com/sha2nkt/deco,DECO: Dense Estimation of 3D Human-Scene Contact In The Wild,https://huggingface.co/papers/2309.15273,7,1,1,0,0,1 +2023-09-28,2309.15818,https://github.com/showlab/show-1,Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation,https://huggingface.co/papers/2309.15818,18,4,1,6,0,2 +2023-09-28,2309.15807,,Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack,https://huggingface.co/papers/2309.15807,30,9,0,0,0,0 +2023-09-28,2309.15129,,Evaluating Cognitive Maps and Planning in Large Language Models with CogEval,https://huggingface.co/papers/2309.15129,6,1,0,0,0,0 +2023-09-28,2309.15251,,VPA: Fully Test-Time Visual Prompt Adaptation,https://huggingface.co/papers/2309.15251,4,1,0,0,0,0 +2023-09-28,2309.15223,,Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition,https://huggingface.co/papers/2309.15223,17,1,0,0,0,0 +2023-09-27,2309.15091,,VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning,https://huggingface.co/papers/2309.15091,32,4,0,0,0,0 +2023-09-27,2309.14509,,DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models,https://huggingface.co/papers/2309.14509,17,1,0,0,0,0 +2023-09-27,2309.14592,https://github.com/intellabs/fp8-emulation-toolkit,Efficient Post-training Quantization with FP8 Formats,https://huggingface.co/papers/2309.14592,10,2,0,0,0,0 +2023-09-27,2309.15098,https://github.com/microsoft/mechanistic-error-probe,Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models,https://huggingface.co/papers/2309.15098,7,1,0,0,0,0 +2023-09-27,2309.14717,,QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models,https://huggingface.co/papers/2309.14717,43,8,0,0,0,0 +2023-09-27,2309.14525,,Aligning Large Multimodal Models with Factually Augmented RLHF,https://huggingface.co/papers/2309.14525,30,2,0,0,1,0 +2023-09-27,2309.15103,https://github.com/Vchitect/LaVie,LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models,https://huggingface.co/papers/2309.15103,42,3,1,1,0,4 +2023-09-26,2309.13308,,Calibrating LLM-Based Evaluator,https://huggingface.co/papers/2309.13308,10,1,0,0,0,0 +2023-09-26,2309.13356,,Exploring Large Language Models' Cognitive Moral Development through Defining Issues Test,https://huggingface.co/papers/2309.13356,36,4,0,0,0,0 +2023-09-26,2309.14327,https://github.com/microsoft/deepspeedexamples,DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention,https://huggingface.co/papers/2309.14327,21,2,1,0,0,0 +2023-09-26,2309.14322,,Small-scale proxies for large-scale Transformer training instabilities,https://huggingface.co/papers/2309.14322,18,2,0,3,0,1 +2023-09-26,2309.13952,,VidChapters-7M: Video Chapters at Scale,https://huggingface.co/papers/2309.13952,9,3,0,0,0,0 +2023-09-26,2309.13075,https://github.com/kumar-shridhar/screws,SCREWS: A Modular Framework for Reasoning with Revisions,https://huggingface.co/papers/2309.13075,15,2,0,0,0,0 +2023-09-25,2309.12424,,DualToken-ViT: Position-aware Efficient Vision Transformer with Dual Token Fusion,https://huggingface.co/papers/2309.12424,11,2,0,0,0,0 +2023-09-25,2309.13041,,Robotic Offline RL from Internet Videos via Value-Function Pre-Training,https://huggingface.co/papers/2309.13041,8,0,0,0,0,0 +2023-09-25,2309.13018,,Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model,https://huggingface.co/papers/2309.13018,9,1,0,0,0,0 +2023-09-25,2309.13042,https://github.com/jiahao000/mosaicfusion,MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation,https://huggingface.co/papers/2309.13042,9,1,0,0,0,0 +2023-09-25,2309.12499,,CodePlan: Repository-level Coding using LLMs and Planning,https://huggingface.co/papers/2309.12499,69,13,0,0,0,0 +2023-09-22,2309.12207,https://github.com/sdascoli/boolformer,Boolformer: Symbolic Regression of Logic Functions with Transformers,https://huggingface.co/papers/2309.12207,11,1,0,0,0,0 +2023-09-22,2309.12284,https://github.com/meta-math/MetaMath,MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models,https://huggingface.co/papers/2309.12284,18,4,0,19,11,31 +2023-09-22,2309.12311,https://github.com/sled-group/chat-with-nerf,LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent,https://huggingface.co/papers/2309.12311,16,2,1,0,0,0 +2023-09-22,2309.11523,https://github.com/qhfan/RMT,RMT: Retentive Networks Meet Vision Transformers,https://huggingface.co/papers/2309.11523,32,2,0,0,0,0 +2023-09-22,2309.12307,https://github.com/dvlab-research/longlora,LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models,https://huggingface.co/papers/2309.12307,84,9,1,27,1,7 +2023-09-22,2309.11998,https://github.com/lm-sys/fastchat,LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset,https://huggingface.co/papers/2309.11998,23,4,1,0,2,3 +2023-09-22,2309.11568,https://github.com/cerebras/modelzoo,BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model,https://huggingface.co/papers/2309.11568,9,2,0,3,0,8 +2023-09-22,2309.11674,https://github.com/fe1ixxu/alma,A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models,https://huggingface.co/papers/2309.11674,30,3,1,31,2,5 +2023-09-21,2309.11009,,Controllable Dynamic Appearance for Neural 3D Portraits,https://huggingface.co/papers/2309.11009,2,1,0,0,0,0 +2023-09-21,2309.11500,,A Large-scale Dataset for Audio-Language Representation Learning,https://huggingface.co/papers/2309.11500,9,1,0,0,1,0 +2023-09-21,2309.11197,,The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute,https://huggingface.co/papers/2309.11197,4,1,0,0,0,0 +2023-09-21,2309.10917,,End-to-End Speech Recognition Contextualization with Large Language Models,https://huggingface.co/papers/2309.10917,9,1,0,0,0,0 +2023-09-21,2309.11495,,Chain-of-Verification Reduces Hallucination in Large Language Models,https://huggingface.co/papers/2309.11495,38,5,0,0,0,0 +2023-09-21,2309.11419,,Kosmos-2.5: A Multimodal Literate Model,https://huggingface.co/papers/2309.11419,50,4,0,2,0,5 +2023-09-21,2309.11497,https://github.com/ChenyangSi/FreeU,FreeU: Free Lunch in Diffusion U-Net,https://huggingface.co/papers/2309.11497,63,5,1,1,0,2 +2023-09-21,2309.11499,https://github.com/RunpeiDong/DreamLLM,DreamLLM: Synergistic Multimodal Comprehension and Creation,https://huggingface.co/papers/2309.11499,58,5,1,0,0,0 +2023-09-21,2309.10952,,LMDX: Language Model-based Document Information Extraction and Localization,https://huggingface.co/papers/2309.10952,63,19,0,0,0,0 +2023-09-20,2309.10202,,Stabilizing RLHF through Advantage Model and Selective Rehearsal,https://huggingface.co/papers/2309.10202,9,1,0,0,0,0 +2023-09-20,2309.10818,https://github.com/cerebras/modelzoo,SlimPajama-DC: Understanding Data Combinations for LLM Training,https://huggingface.co/papers/2309.10818,10,1,0,1,1,0 +2023-09-20,2309.10537,,FoleyGen: Visually-Guided Audio Generation,https://huggingface.co/papers/2309.10537,7,1,0,0,0,0 +2023-09-20,2309.10150,,Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions,https://huggingface.co/papers/2309.10150,24,1,0,0,0,0 +2023-09-20,2309.10279,,360^circ Reconstruction From a Single Image Using Space Carved Outpainting,https://huggingface.co/papers/2309.10279,5,1,0,0,0,0 +2023-09-20,2309.10706,https://github.com/opennlg/openba,OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch,https://huggingface.co/papers/2309.10706,15,1,1,2,0,0 +2023-09-20,2309.10305,https://github.com/baichuan-inc/baichuan2,Baichuan 2: Open Large-scale Language Models,https://huggingface.co/papers/2309.10305,17,2,1,1,0,0 +2023-09-20,2309.10668,https://github.com/google-deepmind/language_modeling_is_compression,Language Modeling Is Compression,https://huggingface.co/papers/2309.10668,82,7,0,0,0,1 +2023-09-20,2309.10020,https://github.com/computer-vision-in-the-wild/cvinw_readings,Multimodal Foundation Models: From Specialists to General-Purpose Assistants,https://huggingface.co/papers/2309.10020,40,2,1,5,0,1 +2023-09-19,2309.06497,https://github.com/facebookresearch/optimizers/tree/main/distributed_shampoo,A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale,https://huggingface.co/papers/2309.06497,4,0,0,0,0,0 +2023-09-19,2309.08628,,Recovering from Privacy-Preserving Masking with Large Language Models,https://huggingface.co/papers/2309.08628,4,0,0,0,0,0 +2023-09-19,2309.08637,,TextBind: Multi-turn Interleaved Multimodal Instruction-following,https://huggingface.co/papers/2309.08637,7,0,0,0,0,0 +2023-09-19,2309.08646,,Cure the headache of Transformers via Collinear Constrained Attention,https://huggingface.co/papers/2309.08646,12,4,0,0,0,0 +2023-09-19,2309.08773,,Enhance audio generation controllability through representation similarity regularization,https://huggingface.co/papers/2309.08773,3,1,0,0,0,0 +2023-09-19,2309.08827,,S3-DST: Structured Open-Domain Dialogue Segmentation and State Tracking in the Era of LLMs,https://huggingface.co/papers/2309.08827,4,0,0,0,0,0 +2023-09-19,2309.08963,https://github.com/gersteinlab/struc-bench,Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?,https://huggingface.co/papers/2309.08963,9,1,0,0,0,0 +2023-09-19,2309.08968,,Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT),https://huggingface.co/papers/2309.08968,22,1,0,0,0,0 +2023-09-19,2309.09390,,Augmenting text for spoken language understanding with Large Language Models,https://huggingface.co/papers/2309.09390,2,0,0,0,0,0 +2023-09-19,2309.09400,,"CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages",https://huggingface.co/papers/2309.09400,78,4,0,17,8,6 +2023-09-19,2309.09530,https://github.com/microsoft/lmops,Adapting Large Language Models via Reading Comprehension,https://huggingface.co/papers/2309.09530,74,3,0,76,16,26 +2023-09-19,2309.08804,,Stack-and-Delay: a new codebook pattern for music generation,https://huggingface.co/papers/2309.08804,4,0,0,0,0,0 +2023-09-19,2309.08872,,"PDFTriage: Question Answering over Long, Structured Documents",https://huggingface.co/papers/2309.08872,52,6,0,0,0,0 +2023-09-19,2309.09117,,Contrastive Decoding Improves Reasoning in Large Language Models,https://huggingface.co/papers/2309.09117,37,1,0,0,0,0 +2023-09-19,2309.09971,,MindAgent: Emergent Gaming Interaction,https://huggingface.co/papers/2309.09971,11,1,0,0,0,0 +2023-09-19,2309.09506,https://github.com/projectnuwa/layoutnuwa,LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models,https://huggingface.co/papers/2309.09506,14,1,1,0,0,0 +2023-09-19,2309.09958,https://github.com/haotian-liu/LLaVA,An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models,https://huggingface.co/papers/2309.09958,18,1,1,5,0,1 +2023-09-18,2309.07986,,Viewpoint Textual Inversion: Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models,https://huggingface.co/papers/2309.07986,2,1,0,0,0,0 +2023-09-18,2309.08051,,Retrieval-Augmented Text-to-Audio Generation,https://huggingface.co/papers/2309.08051,6,0,0,0,0,0 +2023-09-18,2309.07974,https://github.com/facebookresearch/neuralmemory,A Data Source for Reasoning Embodied Agents,https://huggingface.co/papers/2309.07974,4,0,1,0,0,0 +2023-09-18,2309.07990,,Leveraging Contextual Information for Effective Entity Salience Detection,https://huggingface.co/papers/2309.07990,5,0,0,0,0,0 +2023-09-18,2309.08172,https://github.com/mayer123/laser,LASER: LLM Agent with State-Space Exploration for Web Navigation,https://huggingface.co/papers/2309.08172,10,0,0,0,0,0 +2023-09-18,2309.08587,,Compositional Foundation Models for Hierarchical Planning,https://huggingface.co/papers/2309.08587,9,1,0,0,0,0 +2023-09-18,2309.08600,https://github.com/hoagyc/sparse_coding,Sparse Autoencoders Find Highly Interpretable Features in Language Models,https://huggingface.co/papers/2309.08600,12,0,0,0,0,0 +2023-09-18,2309.08520,,Scaling Laws for Sparsely-Connected Foundation Models,https://huggingface.co/papers/2309.08520,12,0,0,0,0,0 +2023-09-18,2309.07970,,Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping,https://huggingface.co/papers/2309.07970,7,0,0,0,0,0 +2023-09-18,2309.08210,,Investigating Answerability of LLMs for Long-Form Question Answering,https://huggingface.co/papers/2309.08210,11,1,0,0,0,0 +2023-09-18,2309.08586,,Replacing softmax with ReLU in Vision Transformers,https://huggingface.co/papers/2309.08586,16,0,0,0,0,0 +2023-09-18,2309.08532,https://github.com/beeevita/evoprompt,Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers,https://huggingface.co/papers/2309.08532,51,11,0,0,0,0 +2023-09-15,2309.07314,,AudioSR: Versatile Audio Super-resolution at Scale,https://huggingface.co/papers/2309.07314,23,4,0,0,0,0 +2023-09-15,2309.07906,,Generative Image Dynamics,https://huggingface.co/papers/2309.07906,51,11,0,0,0,0 +2023-09-15,2309.07749,https://github.com/facebookresearch/OmnimatteRF,OmnimatteRF: Robust Omnimatte with 3D Background Modeling,https://huggingface.co/papers/2309.07749,6,0,0,0,0,0 +2023-09-15,2309.07462,,Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?,https://huggingface.co/papers/2309.07462,3,0,0,0,0,0 +2023-09-15,2309.07870,https://github.com/aiwaves-cn/agents,Agents: An Open-source Framework for Autonomous Language Agents,https://huggingface.co/papers/2309.07870,39,1,0,0,0,4 +2023-09-15,2309.07900,,Ambiguity-Aware In-Context Learning with Large Language Models,https://huggingface.co/papers/2309.07900,3,1,0,0,0,0 +2023-09-15,2309.07430,,Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts,https://huggingface.co/papers/2309.07430,27,4,0,0,0,0 +2023-09-14,2309.07122,,Tree-Structured Shading Decomposition,https://huggingface.co/papers/2309.07122,6,0,0,0,0,0 +2023-09-14,2309.06933,,DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models,https://huggingface.co/papers/2309.06933,12,0,0,0,0,0 +2023-09-14,2309.06802,https://github.com/iSach/SoccerNeRFs,Dynamic NeRFs for Soccer Scenes,https://huggingface.co/papers/2309.06802,16,0,0,0,0,0 +2023-09-14,2309.07125,,Text-Guided Generation and Editing of Compositional 3D Avatars,https://huggingface.co/papers/2309.07125,6,1,0,0,0,0 +2023-09-14,2309.06895,,MagiCapture: High-Resolution Multi-Concept Portrait Customization,https://huggingface.co/papers/2309.06895,27,3,0,0,0,0 +2023-09-14,2309.07062,,Large Language Models for Compiler Optimization,https://huggingface.co/papers/2309.07062,22,4,0,0,0,0 +2023-09-14,2309.06657,,Statistical Rejection Sampling Improves Preference Optimization,https://huggingface.co/papers/2309.06657,13,0,0,0,0,0 +2023-09-13,2309.06440,,"LEAP Hand: Low-Cost, Efficient, and Anthropomorphic Hand for Robot Learning",https://huggingface.co/papers/2309.06440,9,0,0,0,0,0 +2023-09-13,2309.06441,,Learning Disentangled Avatars with Hybrid 3D Representations,https://huggingface.co/papers/2309.06441,4,0,0,0,0,0 +2023-09-13,2309.05767,https://github.com/microsoft/clap,Natural Language Supervision for General-Purpose Audio Representations,https://huggingface.co/papers/2309.05767,7,0,1,1,0,0 +2023-09-13,2309.05858,,Uncovering mesa-optimization algorithms in Transformers,https://huggingface.co/papers/2309.05858,12,0,0,0,0,0 +2023-09-13,2309.06126,,AstroLLaMA: Towards Specialized Foundation Models in Astronomy,https://huggingface.co/papers/2309.06126,16,0,0,0,0,1 +2023-09-13,2309.06380,https://github.com/gnobitab/instaflow,InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation,https://huggingface.co/papers/2309.06380,32,1,1,2,1,1 +2023-09-13,2309.05689,https://github.com/microsoft/LMOps/tree/main/LLM4Science,Large Language Model for Science: A Study on P vs. NP,https://huggingface.co/papers/2309.05689,20,34,0,0,0,1 +2023-09-13,2309.06180,https://github.com/vllm-project/vllm,Efficient Memory Management for Large Language Model Serving with PagedAttention,https://huggingface.co/papers/2309.06180,25,1,1,0,0,0 +2023-09-13,2309.05793,,PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models,https://huggingface.co/papers/2309.05793,50,6,0,1,0,0 +2023-09-12,2309.04581,,Dynamic Mesh-Aware Radiance Fields,https://huggingface.co/papers/2309.04581,5,0,0,0,0,0 +2023-09-12,2309.04564,,When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale,https://huggingface.co/papers/2309.04564,15,0,0,0,0,0 +2023-09-12,2309.04662,,MADLAD-400: A Multilingual And Document-Level Large Audited Dataset,https://huggingface.co/papers/2309.04662,21,3,0,25,4,33 +2023-09-12,2309.04663,,FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning,https://huggingface.co/papers/2309.04663,5,0,0,0,0,0 +2023-09-12,2309.04827,,"Neurons in Large Language Models: Dead, N-gram, Positional",https://huggingface.co/papers/2309.04827,16,0,0,0,0,0 +2023-09-12,2309.05516,https://github.com/intel/neural-compressor,Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs,https://huggingface.co/papers/2309.05516,8,2,1,32,0,0 +2023-09-12,2309.05463,,Textbooks Are All You Need II: phi-1.5 technical report,https://huggingface.co/papers/2309.05463,84,5,0,14,5,70 +2023-09-12,2309.05519,https://github.com/NExT-GPT/NExT-GPT,NExT-GPT: Any-to-Any Multimodal LLM,https://huggingface.co/papers/2309.05519,76,13,1,1,0,0 +2023-09-11,2309.03907,,DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs,https://huggingface.co/papers/2309.03907,6,0,0,0,0,0 +2023-09-11,2309.04354,,Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts,https://huggingface.co/papers/2309.04354,13,1,0,0,0,0 +2023-09-11,2309.04247,,Towards Practical Capture of High-Fidelity Relightable Avatars,https://huggingface.co/papers/2309.04247,8,0,0,0,0,0 +2023-09-11,2309.03926,,Large-Scale Automatic Audiobook Creation,https://huggingface.co/papers/2309.03926,52,1,0,0,0,0 +2023-09-11,2309.04269,,From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting,https://huggingface.co/papers/2309.04269,29,0,0,0,0,0 +2023-09-08,2309.03453,https://github.com/liuyuan-pal/syncdreamer,SyncDreamer: Generating Multiview-consistent Images from a Single-view Image,https://huggingface.co/papers/2309.03453,12,3,1,1,0,0 +2023-09-08,2309.03895,,InstructDiffusion: A Generalist Modeling Interface for Vision Tasks,https://huggingface.co/papers/2309.03895,12,0,0,0,0,0 +2023-09-08,2309.03897,https://github.com/sczhou/propainter,ProPainter: Improving Propagation and Transformer for Video Inpainting,https://huggingface.co/papers/2309.03897,25,1,1,0,0,2 +2023-09-08,2309.03315,,Robotic Table Tennis: A Case Study into a High Speed Learning System,https://huggingface.co/papers/2309.03315,6,0,0,0,0,0 +2023-09-08,2309.03903,https://github.com/hkchengrex/Tracking-Anything-with-DEVA,Tracking Anything with Decoupled Video Segmentation,https://huggingface.co/papers/2309.03903,27,2,0,0,0,0 +2023-09-08,2309.03450,https://github.com/salesforce/xgen,XGen-7B Technical Report,https://huggingface.co/papers/2309.03450,8,0,1,3,0,40 +2023-09-08,2309.03883,https://github.com/voidism/dola,DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models,https://huggingface.co/papers/2309.03883,31,3,1,0,0,0 +2023-09-08,2309.03550,,Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model,https://huggingface.co/papers/2309.03550,11,0,0,0,0,0 +2023-09-08,2309.03549,,Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation,https://huggingface.co/papers/2309.03549,5,0,0,0,0,0 +2023-09-08,2309.03241,https://github.com/thudm/mathglm,GPT Can Solve Mathematical Problems Without a Calculator,https://huggingface.co/papers/2309.03241,17,9,0,0,0,0 +2023-09-08,2309.03852,,FLM-101B: An Open LLM and How to Train It with $100K Budget,https://huggingface.co/papers/2309.03852,42,1,0,0,0,0 +2023-09-08,2309.03905,https://github.com/opengvlab/llama-adapter,ImageBind-LLM: Multi-modality Instruction Tuning,https://huggingface.co/papers/2309.03905,16,5,1,1,0,0 +2023-09-08,2309.03409,https://github.com/google-deepmind/opro,Large Language Models as Optimizers,https://huggingface.co/papers/2309.03409,73,3,0,0,0,0 +2023-09-07,2309.03199,https://github.com/shivammehta25/Matcha-TTS,Matcha-TTS: A fast TTS architecture with conditional flow matching,https://huggingface.co/papers/2309.03199,10,0,1,3,0,1 +2023-09-07,2309.03185,https://github.com/BayesRays/BayesRays,Bayes' Rays: Uncertainty Quantification for Neural Radiance Fields,https://huggingface.co/papers/2309.03185,6,0,0,0,0,0 +2023-09-07,2309.02561,,Physically Grounded Vision-Language Models for Robotic Manipulation,https://huggingface.co/papers/2309.02561,8,1,0,1,0,0 +2023-09-07,2309.02591,,Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning,https://huggingface.co/papers/2309.02591,14,1,0,0,0,0 +2023-09-07,2309.03130,,MyoDex: A Generalizable Prior for Dexterous Manipulation,https://huggingface.co/papers/2309.03130,2,0,0,0,0,0 +2023-09-07,2309.03160,https://github.com/markomih/ResFields,ResFields: Residual Neural Fields for Spatiotemporal Signals,https://huggingface.co/papers/2309.03160,7,0,0,0,0,0 +2023-09-07,2309.03179,https://github.com/aliasgharkhani/slime,SLiMe: Segment Like Me,https://huggingface.co/papers/2309.03179,29,5,0,0,0,0 +2023-09-06,2309.01775,,Gated recurrent neural networks discover attention,https://huggingface.co/papers/2309.01775,6,0,0,0,0,0 +2023-09-06,2309.00987,,Sequential Dexterity: Chaining Dexterous Policies for Long-Horizon Manipulation,https://huggingface.co/papers/2309.00987,2,0,0,0,0,0 +2023-09-06,2309.02186,,AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections,https://huggingface.co/papers/2309.02186,19,3,0,0,0,0 +2023-09-06,2309.02420,https://github.com/RuojinCai/Doppelgangers,Doppelgangers: Learning to Disambiguate Images of Similar Structures,https://huggingface.co/papers/2309.02420,9,0,0,0,0,0 +2023-09-06,2309.01826,,One Wide Feedforward is All You Need,https://huggingface.co/papers/2309.01826,31,1,0,0,0,0 +2023-09-06,2309.00966,,Compositional Diffusion-Based Continuous Constraint Solvers,https://huggingface.co/papers/2309.00966,4,0,0,0,0,0 +2023-09-06,2309.02040,,Diffusion Generative Inverse Design,https://huggingface.co/papers/2309.02040,2,0,0,0,0,0 +2023-09-06,2309.02119,,Hierarchical Masked 3D Diffusion Model for Video Outpainting,https://huggingface.co/papers/2309.02119,10,0,0,1,0,0 +2023-09-06,2309.00775,,Contrastive Feature Masking Open-Vocabulary Vision Transformer,https://huggingface.co/papers/2309.00775,7,0,0,0,0,0 +2023-09-06,2309.00908,,MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation,https://huggingface.co/papers/2309.00908,4,0,0,0,0,0 +2023-09-06,2309.01700,,ControlMat: A Controlled Generative Approach to Material Capture,https://huggingface.co/papers/2309.01700,12,0,0,1,0,0 +2023-09-06,2309.01770,,StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation,https://huggingface.co/papers/2309.01770,9,1,0,0,0,0 +2023-09-06,2309.00986,https://github.com/modelscope/modelscope-agent,ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models,https://huggingface.co/papers/2309.00986,17,1,0,0,0,0 +2023-09-06,2309.00754,,Efficient RLHF: Reducing the Memory Usage of PPO,https://huggingface.co/papers/2309.00754,13,0,0,0,0,0 +2023-09-06,2309.02285,,PromptTTS 2: Describing and Generating Voices with Text Prompt,https://huggingface.co/papers/2309.02285,11,2,0,0,0,0 +2023-09-04,2309.00615,https://github.com/ziyuguo99/point-bind_point-llm,"Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following",https://huggingface.co/papers/2309.00615,10,1,0,0,0,0 +2023-09-04,2309.00398,,VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation,https://huggingface.co/papers/2309.00398,19,6,0,0,0,0 +2023-09-04,2309.00610,https://github.com/hzxie/CityDreamer,CityDreamer: Compositional Generative Model of Unbounded 3D Cities,https://huggingface.co/papers/2309.00610,16,0,1,1,0,1 +2023-09-04,2309.00267,,RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback,https://huggingface.co/papers/2309.00267,46,1,0,0,0,0 +2023-09-04,2309.00359,,"Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior",https://huggingface.co/papers/2309.00359,19,0,0,0,0,0 +2023-09-04,2309.00035,,FACET: Fairness in Computer Vision Evaluation Benchmark,https://huggingface.co/papers/2309.00035,13,2,0,0,0,0 +2023-09-04,2309.00071,https://github.com/jquesnelle/scaled-rope,YaRN: Efficient Context Window Extension of Large Language Models,https://huggingface.co/papers/2309.00071,59,4,1,100,0,67 +2023-09-01,2308.16582,,Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images,https://huggingface.co/papers/2308.16582,10,0,0,0,0,0 +2023-09-01,2308.16891,https://github.com/YanjieZe/GNFactor,GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields,https://huggingface.co/papers/2308.16891,8,0,0,0,0,0 +2023-09-01,2308.16876,,SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation,https://huggingface.co/papers/2308.16876,6,0,0,0,0,0 +2023-09-01,2308.16246,https://github.com/allenai/embodied-clip,Active Neural Mapping,https://huggingface.co/papers/2308.16246,8,0,0,0,0,0 +2023-09-01,2308.16271,https://github.com/ma-lab-berkeley/crate,Emergence of Segmentation with Minimalistic White-Box Transformers,https://huggingface.co/papers/2308.16271,13,0,0,0,0,0 +2023-09-01,2308.16512,,MVDream: Multi-view Diffusion for 3D Generation,https://huggingface.co/papers/2308.16512,99,6,0,0,0,1 +2023-09-01,2308.16458,,BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge,https://huggingface.co/papers/2308.16458,9,0,0,0,0,0 +2023-09-01,2308.16824,,Can Programming Languages Boost Each Other via Instruction Tuning?,https://huggingface.co/papers/2308.16824,9,0,0,0,1,0 +2023-09-01,2308.16884,https://github.com/facebookresearch/belebele,The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants,https://huggingface.co/papers/2308.16884,8,0,1,12,3,2 +2023-08-31,2308.16185,,Learning Vision-based Pursuit-Evasion Robot Policies,https://huggingface.co/papers/2308.16185,6,0,0,0,0,0 +2023-08-31,2308.15560,https://github.com/google-research/weatherbench2,WeatherBench 2: A benchmark for the next generation of data-driven global weather models,https://huggingface.co/papers/2308.15560,8,0,0,0,0,0 +2023-08-31,2308.15975,,RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation,https://huggingface.co/papers/2308.15975,11,1,0,0,0,0 +2023-08-31,2308.15930,https://github.com/linksoul-ai/llasm,LLaSM: Large Language and Speech Model,https://huggingface.co/papers/2308.15930,29,2,1,0,0,1 +2023-08-31,2308.16137,,LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models,https://huggingface.co/papers/2308.16137,39,4,0,1,0,0 +2023-08-31,2308.16149,,Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models,https://huggingface.co/papers/2308.16149,25,6,0,13,1,8 +2023-08-29,2308.13785,https://github.com/kodenii/ores,ORES: Open-vocabulary Responsible Visual Synthesis,https://huggingface.co/papers/2308.13785,6,0,1,0,0,0 +2023-08-29,2308.14089,,MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records,https://huggingface.co/papers/2308.14089,26,3,0,0,0,0 +2023-08-28,2308.13418,https://github.com/facebookresearch/nougat,Nougat: Neural Optical Understanding for Academic Documents,https://huggingface.co/papers/2308.13418,34,2,1,8,0,22 +2023-08-28,2308.13494,https://github.com/WISION-Lab/eventful-transformer,Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers,https://huggingface.co/papers/2308.13494,8,2,0,0,0,0 +2023-08-28,2308.13404,https://github.com/iamNCJ/NRHints,Relighting Neural Radiance Fields with Shadow and Highlight Hints,https://huggingface.co/papers/2308.13404,7,0,0,0,0,0 +2023-08-28,2308.13416,https://github.com/deepsoftwareanalytics/sotana,SoTaNa: The Open-Source Software Development Assistant,https://huggingface.co/papers/2308.13416,10,0,1,0,0,0 +2023-08-28,2308.13137,https://github.com/opengvlab/omniquant,OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models,https://huggingface.co/papers/2308.13137,17,0,1,87,0,0 +2023-08-17,2308.08258,,SceNeRFlow: Time-Consistent Reconstruction of General Dynamic Scenes,https://huggingface.co/papers/2308.08258,4,0,0,0,0,0 +2023-08-17,2308.07931,https://github.com/f3rm/f3rm,Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation,https://huggingface.co/papers/2308.07931,6,0,0,0,0,0 +2023-08-17,2308.08089,,"DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory",https://huggingface.co/papers/2308.08089,21,0,0,1,0,1 +2023-08-17,2308.07968,,Teach LLMs to Personalize -- An Approach inspired by Writing Education,https://huggingface.co/papers/2308.07968,24,0,0,0,0,0 +2023-08-17,2308.08316,,Dual-Stream Diffusion Net for Text-to-Video Generation,https://huggingface.co/papers/2308.08316,23,3,0,0,0,10 +2023-08-17,2308.08545,https://github.com/huangyangyi/tech,TeCH: Text-guided Reconstruction of Lifelike Clothed Humans,https://huggingface.co/papers/2308.08545,31,3,0,0,0,0 +2023-08-16,2308.07795,https://github.com/ai-initiative-kaust/videorlcs,Learning to Identify Critical States for Reinforcement Learning from Videos,https://huggingface.co/papers/2308.07795,6,0,0,0,0,0 +2023-08-16,2308.07395,,Text Injection for Capitalization and Turn-Taking Prediction in Speech Models,https://huggingface.co/papers/2308.07395,5,0,0,0,0,0 +2023-08-16,2308.07903,,Relightable and Animatable Neural Avatar from Sparse-View Video,https://huggingface.co/papers/2308.07903,9,0,0,0,0,0 +2023-08-16,2308.07891,https://github.com/isekai-portal/Link-Context-Learning,Link-Context Learning for Multimodal LLMs,https://huggingface.co/papers/2308.07891,14,1,1,2,2,0 +2023-08-16,2308.07922,https://github.com/jeffhj/raven,RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models,https://huggingface.co/papers/2308.07922,16,1,1,0,0,0 +2023-08-16,2308.07926,https://github.com/qiuyu96/codef,CoDeF: Content Deformation Fields for Temporally Consistent Video Processing,https://huggingface.co/papers/2308.07926,27,1,0,0,0,0 +2023-08-16,2308.07921,,Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification,https://huggingface.co/papers/2308.07921,21,1,0,0,0,0 +2023-08-15,2308.06595,https://github.com/mlfoundations/VisIT-Bench,VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use,https://huggingface.co/papers/2308.06595,5,1,1,0,1,0 +2023-08-15,2308.07286,,The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation,https://huggingface.co/papers/2308.07286,5,0,0,0,0,0 +2023-08-15,2308.07316,https://github.com/alexmartin1722/Revive-2I,Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation,https://huggingface.co/papers/2308.07316,6,1,1,0,0,0 +2023-08-15,2308.07228,https://github.com/wzhouxiff/restoreformerplusplus,RestoreFormer++: Towards Real-World Blind Face Restoration from Undegraded Key-Value Pairs,https://huggingface.co/papers/2308.07228,9,0,1,0,0,2 +2023-08-15,2308.06721,,IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models,https://huggingface.co/papers/2308.06721,26,2,0,9,1,100 +2023-08-15,2308.06912,https://github.com/google-research/causallm_icl,CausalLM is not optimal for in-context learning,https://huggingface.co/papers/2308.06912,18,1,0,0,0,0 +2023-08-15,2308.07317,https://github.com/arielnlee/Platypus,"Platypus: Quick, Cheap, and Powerful Refinement of LLMs",https://huggingface.co/papers/2308.07317,23,4,1,51,4,74 +2023-08-15,2308.06873,,SpeechX: Neural Codec Language Model as a Versatile Speech Transformer,https://huggingface.co/papers/2308.06873,25,1,0,0,0,0 +2023-08-15,2308.07124,https://github.com/bigcode-project/octopack,OctoPack: Instruction Tuning Code Large Language Models,https://huggingface.co/papers/2308.07124,28,1,1,6,6,9 +2023-08-14,2308.06261,https://github.com/microsoft/nemoeval,Enhancing Network Management Using Code Generated by Large Language Models,https://huggingface.co/papers/2308.06261,4,3,0,0,0,0 +2023-08-14,2308.06125,,Improving Joint Speech-Text Representations Without Alignment,https://huggingface.co/papers/2308.06125,5,0,0,0,0,0 +2023-08-14,2308.05960,https://github.com/salesforce/bolaa,BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents,https://huggingface.co/papers/2308.05960,18,2,0,0,0,0 +2023-08-14,2308.06103,,Composable Function-preserving Expansions for Transformer Architectures,https://huggingface.co/papers/2308.06103,18,1,0,0,0,0 +2023-08-14,2308.05884,,PIPPA: A Partially Synthetic Conversational Dataset,https://huggingface.co/papers/2308.05884,28,2,0,1,4,0 +2023-08-14,2308.06259,,Self-Alignment with Instruction Backtranslation,https://huggingface.co/papers/2308.06259,39,2,0,17,5,5 +2023-08-11,2308.05732,,PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers,https://huggingface.co/papers/2308.05732,6,0,0,0,0,0 +2023-08-11,2308.05221,,"Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI",https://huggingface.co/papers/2308.05221,8,0,0,0,0,0 +2023-08-11,2308.05371,,Flexible Isosurface Extraction for Gradient-Based Mesh Optimization,https://huggingface.co/papers/2308.05371,8,0,0,0,0,0 +2023-08-11,2308.05326,https://github.com/aqlaboratory/openfold,OpenProteinSet: Training data for structural biology at scale,https://huggingface.co/papers/2308.05326,10,0,0,0,0,0 +2023-08-11,2308.05737,https://github.com/alaamaalouf/followanything,"Follow Anything: Open-set detection, tracking, and following in real-time",https://huggingface.co/papers/2308.05737,11,0,0,0,0,0 +2023-08-11,2308.05374,https://github.com/kevinyaobytedance/llm_eval,Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment,https://huggingface.co/papers/2308.05374,25,2,0,0,0,0 +2023-08-11,2308.05734,https://github.com/haoheliu/AudioLDM2,AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining,https://huggingface.co/papers/2308.05734,34,0,1,5,0,34 +2023-08-10,2308.04556,https://github.com/NVlabs/FocalFormer3D,FocalFormer3D : Focusing on Hard Instance for 3D Object Detection,https://huggingface.co/papers/2308.04556,8,0,0,0,0,0 +2023-08-10,2308.04623,,Accelerating LLM Inference with Staged Speculative Decoding,https://huggingface.co/papers/2308.04623,22,4,0,0,0,0 +2023-08-10,2308.04729,,JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models,https://huggingface.co/papers/2308.04729,31,6,0,0,0,0 +2023-08-10,2308.04592,https://github.com/facebookresearch/shepherd,Shepherd: A Critic for Language Model Generation,https://huggingface.co/papers/2308.04592,28,5,0,0,0,0 +2023-08-09,2308.03793,https://github.com/michiganleon/reclip_wacv,ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation,https://huggingface.co/papers/2308.03793,10,0,0,0,0,0 +2023-08-09,2308.04430,https://github.com/kernelmachine/silo-lm,SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore,https://huggingface.co/papers/2308.04430,9,0,1,0,0,0 +2023-08-09,2308.04265,,FLIRT: Feedback Loop In-context Red Teaming,https://huggingface.co/papers/2308.04265,12,0,0,0,0,0 +2023-08-09,2308.03958,https://github.com/google/sycophancy-intervention,Simple synthetic data reduces sycophancy in large language models,https://huggingface.co/papers/2308.03958,21,0,1,0,0,0 +2023-08-09,2308.04079,https://github.com/graphdeco-inria/gaussian-splatting,3D Gaussian Splatting for Real-Time Radiance Field Rendering,https://huggingface.co/papers/2308.04079,163,13,1,0,0,0 +2023-08-08,2308.02560,,From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion,https://huggingface.co/papers/2308.02560,4,0,0,0,0,77 +2023-08-08,2308.03290,,FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search,https://huggingface.co/papers/2308.03290,5,0,0,0,0,0 +2023-08-08,2308.03421,,RecycleGPT: An Autoregressive Language Model with Recyclable Module,https://huggingface.co/papers/2308.03421,7,0,0,0,0,0 +2023-08-08,2308.03291,https://github.com/stanojevic/fast-mst-algorithm,SynJax: Structured Probability Distributions for JAX,https://huggingface.co/papers/2308.03291,5,0,0,0,0,0 +2023-08-08,2308.03028,,Pre-Trained Large Language Models for Industrial Control,https://huggingface.co/papers/2308.03028,6,0,0,0,0,0 +2023-08-08,2308.03280,,Mirror-NeRF: Learning Neural Radiance Fields for Mirrors with Whitted-Style Ray Tracing,https://huggingface.co/papers/2308.03280,6,0,0,0,0,0 +2023-08-08,2308.03729,https://github.com/opengvlab/multi-modality-arena,Tiny LVLM-eHub: Early Multimodal Experiments with Bard,https://huggingface.co/papers/2308.03729,9,0,0,0,0,0 +2023-08-08,2308.03296,,Studying Large Language Model Generalization with Influence Functions,https://huggingface.co/papers/2308.03296,10,0,0,0,0,0 +2023-08-08,2308.03757,,3D Motion Magnification: Visualizing Subtle Motions with Time Varying Radiance Fields,https://huggingface.co/papers/2308.03757,10,0,0,0,0,0 +2023-08-08,2308.03427,,TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents,https://huggingface.co/papers/2308.03427,14,0,0,0,0,0 +2023-08-08,2308.03279,,UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition,https://huggingface.co/papers/2308.03279,20,2,0,10,3,3 +2023-08-08,2308.02510,,Seeing through the Brain: Image Reconstruction of Visual Perception from Human Brain Signals,https://huggingface.co/papers/2308.02510,20,3,0,0,0,0 +2023-08-08,2308.03688,https://github.com/thudm/agentbench,AgentBench: Evaluating LLMs as Agents,https://huggingface.co/papers/2308.03688,24,0,0,0,0,0 +2023-08-08,2308.03610,https://github.com/bytedance/AvatarVerse,AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose,https://huggingface.co/papers/2308.03610,22,1,1,1,0,0 +2023-08-08,2308.03526,,AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning,https://huggingface.co/papers/2308.03526,25,0,0,0,0,0 +2023-08-08,2308.02669,,ConceptLab: Creative Generation using Diffusion Prior Constraints,https://huggingface.co/papers/2308.02669,22,1,0,0,0,0 +2023-08-07,2308.01937,,Training Data Protection with Compositional Diffusion Models,https://huggingface.co/papers/2308.01937,5,0,0,0,0,0 +2023-08-07,2308.02453,https://github.com/srl-ethz/faive_gym_oss,Getting the Ball Rolling: Learning a Dexterous Policy for a Biomimetic Tendon-Driven Hand with Rolling Contact Joints,https://huggingface.co/papers/2308.02453,8,0,0,0,0,0 +2023-08-07,2308.02180,,Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology,https://huggingface.co/papers/2308.02180,9,0,0,0,0,0 +2023-08-07,2308.02487,https://github.com/bytedance/fc-clip,Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP,https://huggingface.co/papers/2308.02487,12,0,1,0,0,1 +2023-08-07,2308.02490,https://github.com/yuweihao/mm-vet,MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities,https://huggingface.co/papers/2308.02490,16,0,1,0,2,0 +2023-08-07,2308.02151,https://github.com/weirayao/retroformer,Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization,https://huggingface.co/papers/2308.02151,18,1,1,0,0,0 +2023-08-04,2308.01499,,TDMD: A Database for Dynamic Color Mesh Subjective and Objective Quality Explorations,https://huggingface.co/papers/2308.01499,2,0,0,0,0,0 +2023-08-04,2308.01379,,Computational Long Exposure Mobile Photography,https://huggingface.co/papers/2308.01379,3,0,0,0,0,0 +2023-08-04,2308.01734,,Ambient Adventures: Teaching ChatGPT on Developing Complex Stories,https://huggingface.co/papers/2308.01734,6,0,0,0,0,0 +2023-08-04,2308.01904,https://github.com/impiga/plain-detr,DETR Doesn't Need Multi-Scale or Locality Design,https://huggingface.co/papers/2308.01904,7,0,0,0,0,0 +2023-08-04,2308.01907,https://github.com/opengvlab/all-seeing,The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World,https://huggingface.co/papers/2308.01907,10,0,1,0,2,2 +2023-08-04,2308.01477,,"HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions",https://huggingface.co/papers/2308.01477,11,0,0,0,0,0 +2023-08-04,2308.01546,https://github.com/retrocirce/musicldm,MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies,https://huggingface.co/papers/2308.01546,17,0,1,1,0,1 +2023-08-04,2308.01544,,Multimodal Neurons in Pretrained Text-Only Transformers,https://huggingface.co/papers/2308.01544,15,0,0,0,0,0 +2023-08-04,2308.01825,https://github.com/ofa-sys/gsm8k-screl,Scaling Relationship on Learning Mathematical Reasoning with Large Language Models,https://huggingface.co/papers/2308.01825,21,0,1,0,0,0 +2023-08-04,2308.01390,https://github.com/mlfoundations/open_flamingo,OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models,https://huggingface.co/papers/2308.01390,31,2,1,5,0,2 +2023-08-04,2308.01399,,Learning to Model the World with Language,https://huggingface.co/papers/2308.01399,34,0,0,0,0,0 +2023-08-04,2308.01320,https://github.com/microsoft/DeepSpeed,"DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales",https://huggingface.co/papers/2308.01320,43,3,1,0,0,0 +2023-08-03,2308.01313,,"More Context, Less Distraction: Visual Classification by Inferring and Conditioning on Contextual Attributes",https://huggingface.co/papers/2308.01313,7,0,0,0,0,0 +2023-08-03,2308.01300,,Revisiting DETR Pre-training for Object Detection,https://huggingface.co/papers/2308.01300,8,0,0,0,0,0 +2023-08-03,2308.01317,,ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders,https://huggingface.co/papers/2308.01317,12,1,0,0,0,0 +2023-08-03,2308.00906,,ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation,https://huggingface.co/papers/2308.00906,13,0,0,0,0,0 +2023-08-03,2308.00951,https://github.com/google-research/vmoe,From Sparse to Soft Mixtures of Experts,https://huggingface.co/papers/2308.00951,20,0,0,0,0,0 +2023-08-02,2308.00566,,Predicting masked tokens in stochastic locations improves masked image modeling,https://huggingface.co/papers/2308.00566,15,0,0,0,0,0 +2023-08-02,2308.00113,https://github.com/facebookresearch/three_bricks,Three Bricks to Consolidate Watermarks for Large Language Models,https://huggingface.co/papers/2308.00113,13,0,1,0,0,1 +2023-08-02,2308.00436,https://github.com/ningmiao/selfcheck,SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning,https://huggingface.co/papers/2308.00436,21,0,0,0,0,0 +2023-08-02,2308.00304,,Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models,https://huggingface.co/papers/2308.00304,23,1,0,0,0,0 +2023-08-02,2308.00675,,Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models,https://huggingface.co/papers/2308.00675,35,1,0,0,0,0 +2023-08-01,2307.16125,https://github.com/ailab-cvc/seed-bench,SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension,https://huggingface.co/papers/2307.16125,6,2,1,0,0,0 +2023-08-01,2307.16890,,Discovering Adaptable Symbolic Algorithms from Scratch,https://huggingface.co/papers/2307.16890,5,0,0,0,0,0 +2023-08-01,2307.16888,,Virtual Prompt Injection for Instruction-Tuned Large Language Models,https://huggingface.co/papers/2307.16888,6,2,0,0,0,0 +2023-08-01,2307.16715,https://github.com/showlab/univtg,UniVTG: Towards Unified Video-Language Temporal Grounding,https://huggingface.co/papers/2307.16715,9,2,1,0,0,2 +2023-08-01,2307.16368,https://github.com/brown-palm/AntGPT,AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?,https://huggingface.co/papers/2307.16368,10,0,0,0,0,0 +2023-08-01,2307.16449,https://github.com/rese1f/MovieChat,MovieChat: From Dense Token to Sparse Memory for Long Video Understanding,https://huggingface.co/papers/2307.16449,15,0,1,0,0,0 +2023-08-01,2307.16686,,Guiding Image Captioning Models Toward More Specific Captions,https://huggingface.co/papers/2307.16686,15,2,0,0,0,0 +2023-08-01,2307.16184,https://github.com/mshukor/unival,"Unified Model for Image, Video, Audio and Language Tasks",https://huggingface.co/papers/2307.16184,14,0,1,0,0,1 +2023-08-01,2307.15771,,The Hydra Effect: Emergent Self-repair in Language Model Computations,https://huggingface.co/papers/2307.15771,18,0,0,0,0,0 +2023-08-01,2307.15780,,LLM-Rec: Personalized Recommendation via Prompting Large Language Models,https://huggingface.co/papers/2307.15780,24,4,0,0,0,0 +2023-08-01,2307.15818,,RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control,https://huggingface.co/papers/2307.15818,27,2,0,0,0,0 +2023-08-01,2307.16372,https://github.com/seungheondoh/lp-music-caps,LP-MusicCaps: LLM-Based Pseudo Music Captioning,https://huggingface.co/papers/2307.16372,35,0,0,3,3,4 +2023-08-01,2307.16789,https://github.com/openbmb/toolbench,ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs,https://huggingface.co/papers/2307.16789,96,4,1,0,1,0 +2023-07-31,2307.15131,https://github.com/windingwind/seal-3d,Seal-3D: Interactive Pixel-Level Editing for Neural Radiance Fields,https://huggingface.co/papers/2307.15131,5,0,0,0,0,0 +2023-07-31,2307.15504,https://github.com/thunlp/unifiedinstructiontuning,Exploring Format Consistency for Instruction Tuning,https://huggingface.co/papers/2307.15504,5,0,0,0,0,0 +2023-07-31,2307.15593,https://github.com/jthickstun/watermark,Robust Distortion-free Watermarks for Language Models,https://huggingface.co/papers/2307.15593,8,0,0,0,0,0 +2023-07-31,2307.15199,,PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization,https://huggingface.co/papers/2307.15199,11,0,0,0,0,0 +2023-07-31,2307.15189,https://github.com/snap-stanford/med-flamingo,Med-Flamingo: a Multimodal Medical Few-shot Learner,https://huggingface.co/papers/2307.15189,22,1,1,1,0,0 +2023-07-31,2307.15337,,Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding,https://huggingface.co/papers/2307.15337,36,2,0,0,0,0 +2023-07-31,2307.15217,,Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback,https://huggingface.co/papers/2307.15217,35,4,0,0,0,0 +2023-07-28,2307.15042,,TEDi: Temporally-Entangled Diffusion for Long-Term Motion Synthesis,https://huggingface.co/papers/2307.15042,6,0,0,0,0,0 +2023-07-28,2307.14460,https://github.com/isl-org/MiDaS,MiDaS v3.1 -- A Model Zoo for Robust Monocular Relative Depth Estimation,https://huggingface.co/papers/2307.14460,7,1,1,4,0,3 +2023-07-28,2307.13813,,How to Scale Your EMA,https://huggingface.co/papers/2307.13813,8,4,0,0,0,0 +2023-07-28,2307.14620,https://github.com/facebookresearch/nerf-det,NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection,https://huggingface.co/papers/2307.14620,11,0,0,1,0,0 +2023-07-28,2307.14535,https://github.com/real-stanford/scalingup,Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition,https://huggingface.co/papers/2307.14535,13,0,0,0,0,0 +2023-07-28,2307.15063,https://github.com/MarcBotet/hamlet,To Adapt or Not to Adapt? Real-Time Adaptation for Semantic Segmentation,https://huggingface.co/papers/2307.15063,15,1,0,0,0,0 +2023-07-28,2307.14995,,Scaling TransNormer to 175 Billion Parameters,https://huggingface.co/papers/2307.14995,21,4,0,8,0,0 +2023-07-28,2307.14936,,PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback,https://huggingface.co/papers/2307.14936,42,2,0,0,0,0 +2023-07-27,2307.13924,https://github.com/nvlabs/trajdata,trajdata: A Unified Interface to Multiple Human Trajectory Datasets,https://huggingface.co/papers/2307.13924,2,0,0,0,0,0 +2023-07-27,2307.14008,https://github.com/microsoft/TokenMixers,Adaptive Frequency Filters As Efficient Global Token Mixers,https://huggingface.co/papers/2307.14008,3,0,0,0,0,0 +2023-07-27,2307.14117,,Leveraging Implicit Feedback from Deployment Data in Dialogue,https://huggingface.co/papers/2307.14117,4,0,0,0,0,0 +2023-07-27,2307.14225,,Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences,https://huggingface.co/papers/2307.14225,8,0,0,0,0,0 +2023-07-27,2307.13720,,Composite Diffusion | whole >= Σparts,https://huggingface.co/papers/2307.13720,8,0,0,0,0,0 +2023-07-27,2307.13908,,Points-to-3D: Bridging the Gap between Sparse Points and Shape-Controllable Text-to-3D Generation,https://huggingface.co/papers/2307.13908,8,0,0,0,0,0 +2023-07-27,2307.14334,,Towards Generalist Biomedical AI,https://huggingface.co/papers/2307.14334,11,0,0,0,0,0 +2023-07-27,2307.13974,https://github.com/jiawen-zhu/hqtrack,Tracking Anything in High Quality,https://huggingface.co/papers/2307.13974,13,2,1,0,0,0 +2023-07-27,2307.13854,https://github.com/web-arena-x/webarena,WebArena: A Realistic Web Environment for Building Autonomous Agents,https://huggingface.co/papers/2307.13854,21,4,0,0,0,0 +2023-07-27,2307.13702,,Measuring Faithfulness in Chain-of-Thought Reasoning,https://huggingface.co/papers/2307.13702,27,1,0,0,0,0 +2023-07-27,2307.14335,https://github.com/audio-agi/wavjourney,WavJourney: Compositional Audio Creation with Large Language Models,https://huggingface.co/papers/2307.14335,42,1,1,0,0,2 +2023-07-26,2307.13101,https://github.com/khatch31/laeo,Contrastive Example-Based Control,https://huggingface.co/papers/2307.13101,4,0,0,0,0,0 +2023-07-26,2307.13226,https://github.com/zerg-overmind/strivec,Strivec: Sparse Tri-Vector Radiance Fields,https://huggingface.co/papers/2307.13226,5,0,0,0,0,0 +2023-07-26,2307.13383,https://github.com/microsoft/coverage-eval,Predicting Code Coverage without Execution,https://huggingface.co/papers/2307.13383,8,0,0,0,0,0 +2023-07-26,2307.13692,,ARB: Advanced Reasoning Benchmark for Large Language Models,https://huggingface.co/papers/2307.13692,17,0,0,0,0,0 +2023-07-26,2307.13269,https://github.com/sail-sg/lorahub,LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition,https://huggingface.co/papers/2307.13269,31,2,1,0,0,1 +2023-07-25,2307.12698,,MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features,https://huggingface.co/papers/2307.12698,6,0,0,0,0,0 +2023-07-25,2307.12612,https://github.com/huawei-noah/noah-research,Less is More: Focus Attention for Efficient DETR,https://huggingface.co/papers/2307.12612,6,0,0,0,0,0 +2023-07-25,2307.12854,,Multiscale Video Pretraining for Long-Term Activity Forecasting,https://huggingface.co/papers/2307.12854,5,0,0,0,0,0 +2023-07-25,2307.12169,,Optimized Network Architectures for Large Language Model Training with Billions of Parameters,https://huggingface.co/papers/2307.12169,9,0,0,0,0,0 +2023-07-25,2307.11768,https://github.com/anthropics/decompositionfaithfulnesspaper,Question Decomposition Improves the Faithfulness of Model-Generated Reasoning,https://huggingface.co/papers/2307.11768,12,0,0,0,0,0 +2023-07-25,2307.12950,,RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment,https://huggingface.co/papers/2307.12950,9,0,0,0,0,0 +2023-07-25,2307.12976,https://github.com/edenbiran/rippleedits,Evaluating the Ripple Effects of Knowledge Editing in Language Models,https://huggingface.co/papers/2307.12976,11,0,0,0,0,0 +2023-07-25,2307.12533,https://github.com/secretflow/spu/tree/main/examples/python/ml/flax_llama7b,PUMA: Secure Inference of LLaMA-7B in Five Minutes,https://huggingface.co/papers/2307.12533,13,0,0,0,0,0 +2023-07-25,2307.11795,,Prompting Large Language Models with Speech Recognition Abilities,https://huggingface.co/papers/2307.11795,16,1,0,0,0,0 +2023-07-25,2307.12560,,Interpolating between Images with Diffusion Models,https://huggingface.co/papers/2307.12560,19,0,0,0,0,0 +2023-07-25,2307.12856,,"A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis",https://huggingface.co/papers/2307.12856,35,1,0,0,0,0 +2023-07-25,2307.12981,,3D-LLM: Injecting the 3D World into Large Language Models,https://huggingface.co/papers/2307.12981,35,3,0,0,1,0 +2023-07-24,2307.11118,https://github.com/sWizad/momentum-diffusion,Diffusion Sampling with Momentum for Mitigating Divergence Artifacts,https://huggingface.co/papers/2307.11118,7,0,1,0,0,1 +2023-07-24,2307.11418,,FaceCLIPNeRF: Text-driven 3D Face Manipulation using Deformable Neural Radiance Fields,https://huggingface.co/papers/2307.11418,7,0,0,0,0,0 +2023-07-24,2307.11526,https://github.com/luo-ziyuan/CopyRNeRF-code,CopyRNeRF: Protecting the CopyRight of Neural Radiance Fields,https://huggingface.co/papers/2307.11526,11,1,0,0,0,0 +2023-07-24,2307.11410,https://github.com/OPPO-Mente-Lab/Subject-Diffusion,Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning,https://huggingface.co/papers/2307.11410,15,0,1,0,0,0 +2023-07-21,2307.10558,,Instruction-following Evaluation through Verbalizer Manipulation,https://huggingface.co/papers/2307.10558,3,0,0,0,0,0 +2023-07-21,2307.10635,https://github.com/mandyyyyii/scibench,SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models,https://huggingface.co/papers/2307.10635,7,0,1,0,1,0 +2023-07-21,2307.10936,,PASTA: Pretrained Action-State Transformer Agents,https://huggingface.co/papers/2307.10936,9,0,0,0,0,0 +2023-07-21,2307.10928,https://github.com/kaistai/flask,FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets,https://huggingface.co/papers/2307.10928,11,2,0,0,0,0 +2023-07-21,2307.10350,,Improving Multimodal Datasets with Image Captioning,https://huggingface.co/papers/2307.10350,9,0,0,0,2,0 +2023-07-21,2307.10907,https://github.com/apple/ml-entropy-reconstruction,The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning,https://huggingface.co/papers/2307.10907,7,0,0,0,0,0 +2023-07-21,2307.10802,https://github.com/invictus717/MetaTransformer,Meta-Transformer: A Unified Framework for Multimodal Learning,https://huggingface.co/papers/2307.10802,41,3,1,1,0,0 +2023-07-21,2307.11078,,Brain2Music: Reconstructing Music from Human Brain Activity,https://huggingface.co/papers/2307.11078,40,0,0,0,0,0 +2023-07-21,2307.10373,,TokenFlow: Consistent Diffusion Features for Consistent Video Editing,https://huggingface.co/papers/2307.10373,55,5,0,0,0,3 +2023-07-20,2307.09638,https://github.com/chandar-lab/cmoptimizer,Promoting Exploration in Memory-Augmented Adam using Critical Momenta,https://huggingface.co/papers/2307.09638,2,0,0,0,0,0 +2023-07-20,2307.10173,https://github.com/DNA-Rendering/DNA-Rendering,DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering,https://huggingface.co/papers/2307.10173,5,0,0,0,0,0 +2023-07-20,2307.10168,,LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs,https://huggingface.co/papers/2307.10168,9,0,0,0,0,0 +2023-07-20,2307.10088,https://github.com/google-research/google-research,Android in the Wild: A Large-Scale Dataset for Android Device Control,https://huggingface.co/papers/2307.10088,10,1,0,0,1,0 +2023-07-20,2307.09668,,Towards A Unified Agent with Foundation Models,https://huggingface.co/papers/2307.09668,12,0,0,0,0,1 +2023-07-20,2307.10172,https://github.com/salesforce/DialogStudio,DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI,https://huggingface.co/papers/2307.10172,11,0,1,5,2,2 +2023-07-20,2307.09781,,Text2Layer: Layered Image Generation using Latent Diffusion Model,https://huggingface.co/papers/2307.09781,13,0,0,0,0,0 +2023-07-20,2307.10159,https://github.com/sd-fabric/fabric,FABRIC: Personalizing Diffusion Models with Iterative Feedback,https://huggingface.co/papers/2307.10159,30,1,0,0,0,7 +2023-07-20,2307.10169,,Challenges and Applications of Large Language Models,https://huggingface.co/papers/2307.10169,47,2,0,0,0,0 +2023-07-20,2307.09793,,"On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models",https://huggingface.co/papers/2307.09793,46,8,0,0,0,1 +2023-07-19,2307.09320,,Biomaker CA: a Biome Maker project using Cellular Automata,https://huggingface.co/papers/2307.09320,3,0,0,0,0,0 +2023-07-19,2307.09233,,Augmenting CLIP with Improved Visio-Linguistic Reasoning,https://huggingface.co/papers/2307.09233,7,0,0,0,0,0 +2023-07-19,2307.09458,,Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla,https://huggingface.co/papers/2307.09458,10,0,0,0,0,0 +2023-07-19,2307.09112,https://github.com/sail-sg/numcc,NU-MCC: Multiview Compressive Coding with Neighborhood Decoder and Repulsive UDF,https://huggingface.co/papers/2307.09112,8,0,0,0,0,0 +2023-07-19,2307.09009,https://github.com/lchen001/llmdrift,How is ChatGPT's behavior changing over time?,https://huggingface.co/papers/2307.09009,23,6,0,0,0,0 +2023-07-19,2307.09288,https://github.com/facebookresearch/llama,Llama 2: Open Foundation and Fine-Tuned Chat Models,https://huggingface.co/papers/2307.09288,238,19,1,100,10,100 +2023-07-18,2307.07947,https://github.com/Ariostgx/lctgen,Language Conditioned Traffic Generation,https://huggingface.co/papers/2307.07947,4,0,0,0,0,0 +2023-07-18,2307.08579,https://github.com/afeng-x/smt,Scale-Aware Modulation Meet Transformer,https://huggingface.co/papers/2307.08579,4,0,0,0,0,0 +2023-07-18,2307.08506,,Does Visual Pretraining Help End-to-End Reasoning?,https://huggingface.co/papers/2307.08506,6,0,0,0,0,0 +2023-07-18,2307.07663,,INVE: Interactive Neural Video Editing,https://huggingface.co/papers/2307.07663,8,0,0,0,0,0 +2023-07-18,2307.08041,https://github.com/ailab-cvc/seed,Planting a SEED of Vision in Large Language Model,https://huggingface.co/papers/2307.08041,9,1,1,0,0,0 +2023-07-18,2307.07635,https://github.com/facebookresearch/co-tracker,CoTracker: It is Better to Track Together,https://huggingface.co/papers/2307.07635,10,0,1,1,0,1 +2023-07-18,2307.08702,,Diffusion Models Beat GANs on Image Classification,https://huggingface.co/papers/2307.08702,17,1,0,0,0,0 +2023-07-18,2307.08701,https://github.com/gpt4life/alpagasus,AlpaGasus: Training A Better Alpaca with Fewer Data,https://huggingface.co/papers/2307.08701,21,0,1,9,3,0 +2023-07-18,2307.08581,,BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs,https://huggingface.co/papers/2307.08581,27,0,0,0,0,1 +2023-07-18,2307.08674,,"TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT",https://huggingface.co/papers/2307.08674,47,5,0,0,0,0 +2023-07-18,2307.08621,https://github.com/microsoft/unilm,Retentive Network: A Successor to Transformer for Large Language Models,https://huggingface.co/papers/2307.08621,169,34,1,9,0,2 +2023-07-17,2307.07511,,NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis,https://huggingface.co/papers/2307.07511,5,0,0,0,0,0 +2023-07-17,2307.07047,,DIALGEN: Collaborative Human-LM Generated Dialogues for Improved Understanding of Human-Human Conversations,https://huggingface.co/papers/2307.07047,15,0,0,0,1,0 +2023-07-17,2307.07487,,DreamTeacher: Pretraining Image Backbones with Deep Generative Models,https://huggingface.co/papers/2307.07487,19,0,0,0,0,0 +2023-07-17,2307.07164,https://github.com/microsoft/lmops,Learning to Retrieve In-Context Examples for Large Language Models,https://huggingface.co/papers/2307.07164,21,0,0,0,0,0 +2023-07-17,2307.07218,,Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts,https://huggingface.co/papers/2307.07218,26,10,0,0,0,0 +2023-07-17,2307.06962,https://github.com/um2ii/mist_paper,Copy Is All You Need,https://huggingface.co/papers/2307.06962,33,3,0,0,0,0 +2023-07-14,2307.06350,,T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation,https://huggingface.co/papers/2307.06350,6,0,0,0,0,0 +2023-07-14,2307.06908,https://github.com/ai21labs/factor,Generating Benchmarks for Factuality Evaluation of Language Models,https://huggingface.co/papers/2307.06908,7,0,0,0,0,0 +2023-07-14,2307.06940,https://github.com/videocrafter/animate-a-story,Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation,https://huggingface.co/papers/2307.06940,9,0,0,0,0,0 +2023-07-14,2307.06857,,Self-consistency for open-ended generations,https://huggingface.co/papers/2307.06857,9,0,0,0,0,0 +2023-07-14,2307.06439,,Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events,https://huggingface.co/papers/2307.06439,9,1,0,0,0,0 +2023-07-14,2307.06925,,Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models,https://huggingface.co/papers/2307.06925,10,0,0,0,0,0 +2023-07-14,2307.06942,https://github.com/opengvlab/internvideo,InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation,https://huggingface.co/papers/2307.06942,21,0,1,1,3,0 +2023-07-14,2307.06945,https://github.com/getao/icae,In-context Autoencoder for Context Compression in a Large Language Model,https://huggingface.co/papers/2307.06945,26,0,1,0,0,0 +2023-07-14,2307.06949,,HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models,https://huggingface.co/papers/2307.06949,50,6,0,0,0,0 +2023-07-13,2307.05973,https://github.com/huangwl18/voxposer,VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models,https://huggingface.co/papers/2307.05973,3,0,0,0,0,0 +2023-07-13,2307.05959,,Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations,https://huggingface.co/papers/2307.05959,2,0,0,0,0,0 +2023-07-13,2307.05741,,Towards Robust and Efficient Continual Language Learning,https://huggingface.co/papers/2307.05741,4,0,0,0,0,0 +2023-07-13,2307.05591,,SITTA: A Semantic Image-Text Alignment for Image Captioning,https://huggingface.co/papers/2307.05591,5,0,0,0,0,0 +2023-07-13,2307.06290,,Instruction Mining: High-Quality Instruction Data Selection for Large Language Models,https://huggingface.co/papers/2307.06290,9,0,0,0,0,0 +2023-07-13,2307.05628,,DNAGPT: A Generalized Pretrained Tool for Multiple DNA Sequence Analysis Tasks,https://huggingface.co/papers/2307.05628,8,0,0,0,0,0 +2023-07-13,2307.06135,,SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning,https://huggingface.co/papers/2307.06135,13,1,0,0,0,0 +2023-07-13,2307.05695,,Stack More Layers Differently: High-Rank Training Through Low-Rank Updates,https://huggingface.co/papers/2307.05695,22,0,0,0,0,1 +2023-07-13,2307.06304,,"Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution",https://huggingface.co/papers/2307.06304,25,2,0,7,0,100 +2023-07-13,2307.06018,,PolyLM: An Open Source Polyglot Large Language Model,https://huggingface.co/papers/2307.06018,25,3,0,8,0,3 +2023-07-12,2307.05014,,Test-Time Training on Video Streams,https://huggingface.co/papers/2307.05014,5,0,0,0,0,0 +2023-07-12,2307.05462,,Efficient 3D Articulated Human Generation with Layered Surface Volumes,https://huggingface.co/papers/2307.05462,7,0,0,0,0,0 +2023-07-12,2307.05454,https://github.com/google-research/multi-morph-checklist,Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features,https://huggingface.co/papers/2307.05454,6,0,0,0,0,0 +2023-07-12,2307.05463,https://github.com/facebookresearch/EgoVLPv2,EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone,https://huggingface.co/papers/2307.05463,10,0,0,0,0,0 +2023-07-12,2307.05432,https://github.com/facebookresearch/sslforpdes,Self-Supervised Learning with Lie Symmetries for Partial Differential Equations,https://huggingface.co/papers/2307.05432,13,1,1,0,0,0 +2023-07-12,2307.05473,,Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives,https://huggingface.co/papers/2307.05473,12,0,0,0,0,0 +2023-07-12,2307.05300,,Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration,https://huggingface.co/papers/2307.05300,18,0,0,0,0,0 +2023-07-12,2307.05445,https://github.com/snap-research/3dvader,AutoDecoding Latent 3D Diffusion Models,https://huggingface.co/papers/2307.05445,13,0,0,0,0,0 +2023-07-12,2307.05222,https://github.com/baaivision/emu,Generative Pretraining in Multimodality,https://huggingface.co/papers/2307.05222,21,0,0,1,0,0 +2023-07-12,2307.04964,https://github.com/openlmlab/moss-rlhf,Secrets of RLHF in Large Language Models Part I: PPO,https://huggingface.co/papers/2307.04964,27,1,1,4,0,0 +2023-07-12,2307.04787,,Collaborative Score Distillation for Consistent Visual Synthesis,https://huggingface.co/papers/2307.04787,27,0,0,0,0,0 +2023-07-11,2307.04577,,AnyTeleop: A General Vision-Based Dexterous Robot Arm-Hand Teleoperation System,https://huggingface.co/papers/2307.04577,1,0,0,0,0,0 +2023-07-11,2307.04008,,Toward Interactive Dictation,https://huggingface.co/papers/2307.04008,3,0,0,0,0,0 +2023-07-11,2307.04751,,"Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement",https://huggingface.co/papers/2307.04751,3,0,0,0,0,0 +2023-07-11,2307.04087,https://github.com/baai-dcai/visual-instruction-tuning,SVIT: Scaling up Visual Instruction Tuning,https://huggingface.co/papers/2307.04087,6,0,1,0,3,0 +2023-07-11,2307.04349,https://github.com/zyq-scut/rltf,RLTF: Reinforcement Learning from Unit Test Feedback,https://huggingface.co/papers/2307.04349,4,0,1,0,0,0 +2023-07-11,2307.04603,https://github.com/kakaobrain/solvent,Solvent: A Framework for Protein Folding,https://huggingface.co/papers/2307.04603,5,0,0,0,0,0 +2023-07-11,2307.04699,,International Institutions for Advanced AI,https://huggingface.co/papers/2307.04699,4,0,0,0,0,0 +2023-07-11,2307.03917,,On decoder-only architecture for speech-to-text and large language model integration,https://huggingface.co/papers/2307.03917,6,0,0,0,0,0 +2023-07-11,2307.03875,,Large Language Models for Supply Chain Optimization,https://huggingface.co/papers/2307.03875,17,2,0,0,0,0 +2023-07-11,2307.04721,,Large Language Models as General Pattern Machines,https://huggingface.co/papers/2307.04721,14,1,0,0,0,0 +2023-07-11,2307.04767,https://github.com/ux-decoder/semantic-sam,Semantic-SAM: Segment and Recognize Anything at Any Granularity,https://huggingface.co/papers/2307.04767,20,1,0,0,0,0 +2023-07-11,2307.04686,https://github.com/hugofloresgarcia/vampnet,VampNet: Music Generation via Masked Acoustic Token Modeling,https://huggingface.co/papers/2307.04686,20,2,1,0,0,0 +2023-07-11,2307.03869,,Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation,https://huggingface.co/papers/2307.03869,22,1,0,0,0,0 +2023-07-11,2307.04725,https://github.com/guoyww/animatediff,AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning,https://huggingface.co/papers/2307.04725,64,7,1,1,0,13 +2023-07-10,2307.03718,,Frontier AI Regulation: Managing Emerging Risks to Public Safety,https://huggingface.co/papers/2307.03718,4,0,0,0,0,0 +2023-07-10,2307.03576,,One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention,https://huggingface.co/papers/2307.03576,6,0,0,0,0,0 +2023-07-10,2307.03659,,Decomposing the Generalization Gap in Imitation Learning for Visual Robotic Manipulation,https://huggingface.co/papers/2307.03659,5,0,0,0,0,0 +2023-07-10,2307.03322,,BiPhone: Modeling Inter Language Phonetic Influences in Text,https://huggingface.co/papers/2307.03322,7,3,0,0,0,0 +2023-07-10,2307.03601,https://github.com/jshilong/gpt4roi,GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest,https://huggingface.co/papers/2307.03601,11,0,1,0,0,0 +2023-07-10,2307.03381,https://github.com/lee-ny/teaching_arithmetic,Teaching Arithmetic to Small Transformers,https://huggingface.co/papers/2307.03381,17,0,0,0,0,0 +2023-07-10,2307.03692,,Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning,https://huggingface.co/papers/2307.03692,24,3,0,0,0,0 +2023-07-07,2307.03166,https://github.com/tensorflow/models,VideoGLUE: Video General Understanding Evaluation of Foundation Models,https://huggingface.co/papers/2307.03166,5,0,0,0,0,0 +2023-07-07,2307.02628,,SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference,https://huggingface.co/papers/2307.02628,7,0,0,0,0,0 +2023-07-07,2307.03183,https://github.com/YuanGongND/whisper-at,Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers,https://huggingface.co/papers/2307.03183,8,0,1,0,0,1 +2023-07-07,2307.03170,https://github.com/cstankonrad/long_llama,Focused Transformer: Contrastive Training for Context Scaling,https://huggingface.co/papers/2307.03170,11,1,1,7,0,1 +2023-07-07,2307.02768,,"Training Models to Generate, Recognize, and Reframe Unhelpful Thoughts",https://huggingface.co/papers/2307.02768,12,0,0,0,0,0 +2023-07-07,2307.02499,https://github.com/x-plug/mplug-docowl,mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding,https://huggingface.co/papers/2307.02499,15,1,1,0,0,0 +2023-07-07,2307.03172,https://github.com/nelson-liu/lost-in-the-middle,Lost in the Middle: How Language Models Use Long Contexts,https://huggingface.co/papers/2307.03172,34,3,0,13,1,31 +2023-07-07,2307.03109,https://github.com/mlgroupjlu/llm-eval-survey,A Survey on Evaluation of Large Language Models,https://huggingface.co/papers/2307.03109,38,1,1,0,0,0 +2023-07-06,2307.01831,,DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation,https://huggingface.co/papers/2307.01831,8,0,0,0,0,0 +2023-07-06,2307.01229,https://github.com/microsoft/muzic,EmoGen: Eliminating Subjective Bias in Emotional Music Generation,https://huggingface.co/papers/2307.01229,5,0,0,0,0,0 +2023-07-06,2307.01848,https://github.com/Gary3410/TaPA,Embodied Task Planning with Large Language Models,https://huggingface.co/papers/2307.01848,5,0,1,0,0,1 +2023-07-06,2307.02484,,Elastic Decision Transformer,https://huggingface.co/papers/2307.02484,5,0,0,0,0,0 +2023-07-06,2307.02179,,Open-Source Large Language Models Outperform Crowd Workers and Approach ChatGPT in Text-Annotation Tasks,https://huggingface.co/papers/2307.02179,7,2,0,0,0,0 +2023-07-06,2307.02321,https://github.com/qualcomm-ai-research/batchshaping,MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers,https://huggingface.co/papers/2307.02321,7,0,0,0,0,0 +2023-07-06,2307.01938,,Physics-based Motion Retargeting from Sparse Inputs,https://huggingface.co/papers/2307.01938,7,0,0,0,0,0 +2023-07-06,2307.01928,,Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners,https://huggingface.co/papers/2307.01928,10,1,0,0,0,0 +2023-07-06,2307.02483,,Jailbroken: How Does LLM Safety Training Fail?,https://huggingface.co/papers/2307.02483,13,0,0,0,0,2 +2023-07-06,2307.02485,,Building Cooperative Embodied Agents Modularly with Large Language Models,https://huggingface.co/papers/2307.02485,11,0,0,0,0,0 +2023-07-06,2307.02469,,What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?,https://huggingface.co/papers/2307.02469,12,0,0,0,0,0 +2023-07-06,2307.02053,https://github.com/declare-lab/flacuna,Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning,https://huggingface.co/papers/2307.02053,23,1,1,1,1,1 +2023-07-06,2307.02421,https://github.com/mc-e/dragondiffusion,DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models,https://huggingface.co/papers/2307.02421,34,5,1,1,0,2 +2023-07-06,2307.01952,https://github.com/stability-ai/generative-models,SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis,https://huggingface.co/papers/2307.01952,78,6,1,51,0,100 +2023-07-06,2307.02486,https://github.com/microsoft/unilm,"LongNet: Scaling Transformers to 1,000,000,000 Tokens",https://huggingface.co/papers/2307.02486,80,15,1,1,0,0 +2023-07-04,2307.00119,,Meta-training with Demonstration Retrieval for Efficient Few-shot Learning,https://huggingface.co/papers/2307.00119,5,0,0,0,0,0 +2023-07-04,2307.00804,,SketchMetaFace: A Learning-based Sketching Interface for High-fidelity 3D Character Face Modeling,https://huggingface.co/papers/2307.00804,5,2,0,0,0,0 +2023-07-04,2307.00117,,Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control,https://huggingface.co/papers/2307.00117,6,0,0,0,0,0 +2023-07-04,2307.01163,,Improving Language Plasticity via Pretraining with Active Forgetting,https://huggingface.co/papers/2307.01163,6,0,0,0,0,0 +2023-07-04,2307.01097,https://github.com/Tangshitao/MVDiffusion,MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion,https://huggingface.co/papers/2307.01097,8,0,1,0,0,0 +2023-07-04,2307.01200,,Real-time Monocular Full-body Capture in World Space via Sequential Proxy-to-Motion Learning,https://huggingface.co/papers/2307.01200,9,0,0,0,0,0 +2023-07-04,2307.00716,,JourneyDB: A Benchmark for Generative Image Understanding,https://huggingface.co/papers/2307.00716,18,0,0,0,5,0 +2023-07-04,2307.00184,https://github.com/google-research/google-research,Personality Traits in Large Language Models,https://huggingface.co/papers/2307.00184,20,0,0,0,0,0 +2023-07-04,2307.00040,,DisCo: Disentangled Control for Referring Human Dance Generation in Real World,https://huggingface.co/papers/2307.00040,24,2,0,0,0,0 +2023-07-04,2307.00522,,LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance,https://huggingface.co/papers/2307.00522,29,1,0,0,0,8 +2023-07-04,2307.01197,https://github.com/syscv/sam-pt,Segment Anything Meets Point Tracking,https://huggingface.co/papers/2307.01197,35,2,0,0,0,0 +2023-07-03,2306.17319,https://github.com/google-research/deeplab2,ReMaX: Relaxing for Better Training on Efficient Panoptic Segmentation,https://huggingface.co/papers/2306.17319,2,0,0,0,0,0 +2023-07-03,2306.17759,,The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit,https://huggingface.co/papers/2306.17759,3,0,0,0,0,0 +2023-07-03,2306.17492,https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/PRO,Preference Ranking Optimization for Human Alignment,https://huggingface.co/papers/2306.17492,6,4,0,2,0,0 +2023-07-03,2306.17194,https://github.com/azshue/autopoison,On the Exploitability of Instruction Tuning,https://huggingface.co/papers/2306.17194,9,0,0,0,0,0 +2023-07-03,2306.17848,,Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing,https://huggingface.co/papers/2306.17848,8,0,0,0,2,0 +2023-07-03,2306.17563,,Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting,https://huggingface.co/papers/2306.17563,9,0,0,0,0,0 +2023-07-03,2306.17582,https://github.com/microsoft/promptcraft-robotics,ChatGPT for Robotics: Design Principles and Model Abilities,https://huggingface.co/papers/2306.17582,10,0,1,0,0,0 +2023-07-03,2306.17840,https://github.com/ripl/statler,Statler: State-Maintaining Language Models for Embodied Reasoning,https://huggingface.co/papers/2306.17840,11,0,0,0,0,0 +2023-07-03,2306.17806,,Stay on topic with Classifier-Free Guidance,https://huggingface.co/papers/2306.17806,27,3,0,0,0,0 +2023-07-03,2306.17843,https://github.com/guochengqian/magic123,Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors,https://huggingface.co/papers/2306.17843,42,4,0,0,0,0 +2023-06-30,2306.16564,,Automatic Calibration and Error Correction for Large Language Models via Pareto Optimal Self-Supervision,https://huggingface.co/papers/2306.16564,3,1,0,0,0,0 +2023-06-30,2306.16940,https://github.com/pixelite1201/BEDLAM,BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion,https://huggingface.co/papers/2306.16940,5,0,0,0,0,0 +2023-06-30,2306.16601,https://github.com/intel/intel-extension-for-transformers,An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs,https://huggingface.co/papers/2306.16601,4,0,1,0,0,0 +2023-06-30,2306.16700,,Dynamic-Resolution Model Learning for Object Pile Manipulation,https://huggingface.co/papers/2306.16700,5,0,0,0,0,0 +2023-06-30,2306.16869,,NeuralFuse: Learning to Improve the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes,https://huggingface.co/papers/2306.16869,5,0,0,0,0,0 +2023-06-30,2306.16857,,ArrayBot: Reinforcement Learning for Generalizable Distributed Manipulation through Touch,https://huggingface.co/papers/2306.16857,5,0,0,0,0,0 +2023-06-30,2306.16793,,Benchmarking Large Language Model Capabilities for Conditional Generation,https://huggingface.co/papers/2306.16793,7,0,0,0,0,0 +2023-06-30,2306.17115,https://github.com/neuralcarver/michelangelo,Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation,https://huggingface.co/papers/2306.17115,11,0,1,1,0,1 +2023-06-30,2306.17107,https://github.com/SALT-NLP/LLaVAR,LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding,https://huggingface.co/papers/2306.17107,12,2,1,1,1,2 +2023-06-30,2306.16527,https://github.com/huggingface/obelics,OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents,https://huggingface.co/papers/2306.16527,44,4,1,11,1,100 +2023-06-30,2306.17156,,"Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors",https://huggingface.co/papers/2306.17156,21,1,0,0,0,0 +2023-06-30,2306.17154,,Generate Anything Anywhere in Any Scene,https://huggingface.co/papers/2306.17154,22,3,0,0,0,0 +2023-06-30,2306.16934,https://github.com/bbaaii/DreamDiffusion,DreamDiffusion: Generating High-Quality Images from Brain EEG Signals,https://huggingface.co/papers/2306.16934,30,3,1,0,0,0 +2023-06-30,2306.16928,,One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization,https://huggingface.co/papers/2306.16928,37,6,0,3,0,11 +2023-06-29,2306.16009,,Accelerating Transducers through Adjacent Token Merging,https://huggingface.co/papers/2306.16009,2,0,0,0,0,0 +2023-06-29,2306.16052,,SVNR: Spatially-variant Noise Removal with Denoising Diffusion,https://huggingface.co/papers/2306.16052,6,0,0,0,0,0 +2023-06-29,2306.15724,https://github.com/real-stanford/reflect,REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction,https://huggingface.co/papers/2306.15724,5,0,0,0,0,0 +2023-06-29,2306.16388,,Towards Measuring the Representation of Subjective Global Opinions in Language Models,https://huggingface.co/papers/2306.16388,6,0,0,0,1,0 +2023-06-29,2306.15794,https://github.com/HazyResearch/hyena-dna,HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution,https://huggingface.co/papers/2306.15794,17,2,1,14,0,0 +2023-06-29,2306.16410,https://github.com/contextualai/lens,Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language,https://huggingface.co/papers/2306.16410,27,5,1,0,0,0 +2023-06-28,2306.15447,,Are aligned neural networks adversarially aligned?,https://huggingface.co/papers/2306.15447,5,0,0,0,0,0 +2023-06-28,2306.15400,,Length Generalization in Arithmetic Transformers,https://huggingface.co/papers/2306.15400,4,0,0,0,0,0 +2023-06-28,2306.15091,,Understanding In-Context Learning via Supportive Pretraining Data,https://huggingface.co/papers/2306.15091,6,1,0,0,0,0 +2023-06-28,2306.15667,,PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment,https://huggingface.co/papers/2306.15667,7,0,0,0,0,0 +2023-06-28,2306.15354,https://github.com/alibaba-damo-academy/3D-Speaker,"3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement",https://huggingface.co/papers/2306.15354,7,0,0,0,0,0 +2023-06-28,2306.15128,https://github.com/raivnlab/mimic,MIMIC: Masked Image Modeling with Image Correspondences,https://huggingface.co/papers/2306.15128,7,0,1,0,0,0 +2023-06-28,2306.15626,https://github.com/lean-dojo/leandojo,LeanDojo: Theorem Proving with Retrieval-Augmented Language Models,https://huggingface.co/papers/2306.15626,16,0,0,0,0,0 +2023-06-28,2306.15658,https://github.com/ucsc-vlaa/clipa,"CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a \10,000 Budget; An Extra 4,000 Unlocks 81.8% Accuracy",https://huggingface.co/papers/2306.15658,12,1,0,7,0,0 +2023-06-28,2306.15595,,Extending Context Window of Large Language Models via Positional Interpolation,https://huggingface.co/papers/2306.15595,53,6,0,19,0,22 +2023-06-27,2306.14066,,SEEDS: Emulation of Weather Forecast Ensembles with Diffusion Models,https://huggingface.co/papers/2306.14066,1,0,0,0,0,0 +2023-06-27,2306.14896,https://github.com/NVlabs/RVT,RVT: Robotic View Transformer for 3D Object Manipulation,https://huggingface.co/papers/2306.14896,2,0,0,0,0,0 +2023-06-27,2306.14878,https://github.com/newbeeer/diffusion_restart_sampling,Restart Sampling for Improving Generative Processes,https://huggingface.co/papers/2306.14878,5,0,1,0,0,0 +2023-06-27,2306.13754,,Zero-shot spatial layout conditioning for text-to-image diffusion models,https://huggingface.co/papers/2306.13754,6,1,0,0,0,0 +2023-06-27,2306.14153,,DomainStudio: Fine-Tuning Diffusion Models for Domain-Driven Image Generation using Limited Data,https://huggingface.co/papers/2306.14153,6,0,0,0,0,0 +2023-06-27,2306.14846,,ViNT: A Foundation Model for Visual Navigation,https://huggingface.co/papers/2306.14846,5,0,0,0,0,0 +2023-06-27,2306.13776,,Swin-Free: Achieving Better Cross-Window Attention and Efficiency with Size-varying Window,https://huggingface.co/papers/2306.13776,5,0,0,0,0,0 +2023-06-27,2306.14565,,Aligning Large Multi-Modal Model with Robust Instruction Tuning,https://huggingface.co/papers/2306.14565,6,0,0,0,2,0 +2023-06-27,2306.14447,,RoboCook: Long-Horizon Elasto-Plastic Object Manipulation with Diverse Tools,https://huggingface.co/papers/2306.14447,6,0,0,0,0,0 +2023-06-27,2306.14035,,Thinking Like an Annotator: Generation of Dataset Labeling Instructions,https://huggingface.co/papers/2306.14035,8,1,0,0,0,0 +2023-06-27,2306.14048,,H_2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models,https://huggingface.co/papers/2306.14048,11,1,0,0,0,0 +2023-06-27,2306.14892,,Supervised Pretraining Can Learn In-Context Reinforcement Learning,https://huggingface.co/papers/2306.14892,8,0,0,0,0,0 +2023-06-27,2306.14101,,Language models are weak learners,https://huggingface.co/papers/2306.14101,10,0,0,0,0,0 +2023-06-27,2306.13840,,Beyond Scale: the Diversity Coefficient as a Data Quality Metric Demonstrates LLMs are Pre-trained on Formally Diverse Data,https://huggingface.co/papers/2306.13840,11,1,0,0,0,0 +2023-06-27,2306.14289,https://github.com/chaoningzhang/mobilesam,Faster Segment Anything: Towards Lightweight SAM for Mobile Applications,https://huggingface.co/papers/2306.14289,15,1,1,2,0,1 +2023-06-27,2306.14435,https://github.com/Yujun-Shi/DragDiffusion,DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing,https://huggingface.co/papers/2306.14435,20,5,1,0,0,1 +2023-06-27,2306.14795,https://github.com/openmotionlab/motiongpt,MotionGPT: Human Motion as a Foreign Language,https://huggingface.co/papers/2306.14795,27,2,1,1,0,4 +2023-06-27,2306.14824,https://github.com/microsoft/unilm/tree/master/kosmos-2,Kosmos-2: Grounding Multimodal Large Language Models to the World,https://huggingface.co/papers/2306.14824,34,9,0,0,1,8 +2023-06-26,2306.13455,https://github.com/zjy526223908/dreameditor,DreamEditor: Text-Driven 3D Scene Editing with Neural Fields,https://huggingface.co/papers/2306.13455,8,0,1,0,0,0 +2023-06-26,2306.13631,https://github.com/OpenMask3D/openmask3d,OpenMask3D: Open-Vocabulary 3D Instance Segmentation,https://huggingface.co/papers/2306.13631,9,0,0,0,0,0 +2023-06-26,2306.13649,,GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models,https://huggingface.co/papers/2306.13649,12,1,0,1,0,0 +2023-06-26,2306.13588,https://github.com/yyy-apple/sys-nl-feedback,System-Level Natural Language Feedback,https://huggingface.co/papers/2306.13588,10,0,0,0,0,0 +2023-06-26,2306.13651,https://github.com/neelsjain/byod,Bring Your Own Data! Self-Supervised Evaluation for Large Language Models,https://huggingface.co/papers/2306.13651,15,0,1,0,0,0 +2023-06-26,2306.13575,https://github.com/gregorbachmann/scaling_mlps,Scaling MLPs: A Tale of Inductive Bias,https://huggingface.co/papers/2306.13575,14,0,0,0,0,0 +2023-06-26,2306.13421,,Long-range Language Modeling with Self-retrieval,https://huggingface.co/papers/2306.13421,16,0,0,0,0,0 +2023-06-23,2306.13078,,Continuous Layout Editing of Single Images with Diffusion Models,https://huggingface.co/papers/2306.13078,7,0,0,0,0,0 +2023-06-23,2306.12760,https://github.com/orig333/Blended-NeRF,Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields,https://huggingface.co/papers/2306.12760,8,0,0,0,0,0 +2023-06-23,2306.10008,https://github.com/fahadshamshad/clip2protect,CLIP2Protect: Protecting Facial Privacy using Text-Guided Makeup via Adversarial Latent Search,https://huggingface.co/papers/2306.10008,8,0,0,0,0,0 +2023-06-23,2306.12929,,Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing,https://huggingface.co/papers/2306.12929,12,0,0,0,0,0 +2023-06-23,2306.12509,,Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference,https://huggingface.co/papers/2306.12509,14,0,0,0,0,0 +2023-06-23,2306.12672,https://github.com/gabegrand/world-models,From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought,https://huggingface.co/papers/2306.12672,25,1,0,0,0,0 +2023-06-23,2306.12925,,AudioPaLM: A Large Language Model That Can Speak and Listen,https://huggingface.co/papers/2306.12925,49,5,0,0,0,0 +2023-06-22,2306.12059,https://github.com/atomicarchitects/equiformer_v2,EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations,https://huggingface.co/papers/2306.12059,5,0,1,0,0,0 +2023-06-22,2306.11932,,Opportunities and Risks of LLMs for Scalable Deliberation with Polis,https://huggingface.co/papers/2306.11932,6,0,0,0,0,0 +2023-06-22,2306.12422,,DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation,https://huggingface.co/papers/2306.12422,12,1,0,0,0,0 +2023-06-22,2306.11987,https://github.com/xijiu9/Train_Transformers_with_INT4,Training Transformers with 4-bit Integers,https://huggingface.co/papers/2306.11987,21,5,0,0,0,0 +2023-06-22,2306.12156,https://github.com/casia-iva-lab/fastsam,Fast Segment Anything,https://huggingface.co/papers/2306.12156,34,3,1,4,0,21 +2023-06-21,2306.10785,,Multitrack Music Transcription with a Time-Frequency Perceiver,https://huggingface.co/papers/2306.10785,5,0,0,0,0,0 +2023-06-21,2306.10968,https://github.com/ictnlp/bayling,BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models,https://huggingface.co/papers/2306.10968,7,0,1,3,0,0 +2023-06-21,2306.10763,https://github.com/microsoft/monitors4codegen,Guiding Language Models of Code with Global Context using Monitors,https://huggingface.co/papers/2306.10763,7,2,1,0,0,0 +2023-06-21,2306.10169,https://github.com/danielchyeh/this-is-my,Meta-Personalizing Vision-Language Models to Find Named Instances in Video,https://huggingface.co/papers/2306.10169,6,0,0,0,0,0 +2023-06-21,2306.11719,,Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision,https://huggingface.co/papers/2306.11719,7,1,0,0,0,0 +2023-06-21,2306.11706,,RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation,https://huggingface.co/papers/2306.11706,7,1,0,0,0,0 +2023-06-21,2306.10231,,GLIMMER: generalized late-interaction memory reranker,https://huggingface.co/papers/2306.10231,7,0,0,0,0,0 +2023-06-21,2306.10533,https://github.com/NVlabs/sds-complete,Point-Cloud Completion with Pretrained Text-to-image Diffusion Models,https://huggingface.co/papers/2306.10533,7,0,0,0,0,0 +2023-06-21,2306.11698,,DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models,https://huggingface.co/papers/2306.11698,11,0,0,0,1,0 +2023-06-21,2306.10998,,RepoFusion: Training Code Models to Understand Your Repository,https://huggingface.co/papers/2306.10998,14,0,0,0,1,0 +2023-06-21,2306.11565,,HomeRobot: Open-Vocabulary Mobile Manipulation,https://huggingface.co/papers/2306.11565,15,0,0,0,0,0 +2023-06-21,2306.10900,,MotionGPT: Finetuned LLMs are General-Purpose Motion Generators,https://huggingface.co/papers/2306.10900,18,1,0,0,0,0 +2023-06-21,2306.11644,,Textbooks Are All You Need,https://huggingface.co/papers/2306.11644,141,14,0,1,13,0 +2023-06-19,2306.09557,,CAJun: Continuous Adaptive Jumping using a Learned Centroidal Controller,https://huggingface.co/papers/2306.09557,3,0,0,0,0,0 +2023-06-19,2306.09442,https://github.com/thestephencasper/explore_establish_exploit_llms,"Explore, Establish, Exploit: Red Teaming Language Models from Scratch",https://huggingface.co/papers/2306.09442,6,1,0,0,1,0 +2023-06-19,2306.09635,,CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models,https://huggingface.co/papers/2306.09635,6,0,0,0,0,0 +2023-06-19,2306.09682,,OCTScenes: A Versatile Real-World Dataset of Tabletop Scenes for Object-Centric Learning,https://huggingface.co/papers/2306.09682,6,0,0,0,1,0 +2023-06-19,2306.09479,,Inverse Scaling: When Bigger Isn't Better,https://huggingface.co/papers/2306.09479,8,0,0,0,0,0 +2023-06-19,2306.09539,,Block-State Transformer,https://huggingface.co/papers/2306.09539,8,0,0,0,0,0 +2023-06-19,2306.09683,https://github.com/google-research/scenic/tree/main/scenic/projects/owl_vit,Scaling Open-Vocabulary Object Detection,https://huggingface.co/papers/2306.09683,12,0,0,13,0,25 +2023-06-19,2306.10007,,Robot Learning with Sensorimotor Pre-training,https://huggingface.co/papers/2306.10007,11,0,0,0,0,0 +2023-06-19,2306.09864,,AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation,https://huggingface.co/papers/2306.09864,14,1,0,0,0,0 +2023-06-19,2306.09896,,Demystifying GPT Self-Repair for Code Generation,https://huggingface.co/papers/2306.09896,19,1,0,0,0,1 +2023-06-19,2306.09782,https://github.com/openlmlab/lomo,Full Parameter Fine-tuning for Large Language Models with Limited Resources,https://huggingface.co/papers/2306.09782,28,2,1,0,0,0 +2023-06-19,2306.10012,https://github.com/osu-nlp-group/magicbrush,MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing,https://huggingface.co/papers/2306.10012,34,6,1,0,0,0 +2023-06-16,2306.08055,,Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training,https://huggingface.co/papers/2306.08055,3,0,0,0,0,0 +2023-06-16,2306.08651,,Toward Grounded Social Reasoning,https://huggingface.co/papers/2306.08651,3,0,0,0,0,0 +2023-06-16,2306.09322,,Neural Relighting with Subsurface Scattering by Learning the Radiance Transfer Gradient,https://huggingface.co/papers/2306.09322,3,0,0,0,0,0 +2023-06-16,2306.08129,,AVIS: Autonomous Visual Information Seeking with Large Language Models,https://huggingface.co/papers/2306.08129,5,0,0,0,0,0 +2023-06-16,2306.08133,,Large-scale Language Model Rescoring on Long-form Data,https://huggingface.co/papers/2306.08133,4,0,0,0,0,0 +2023-06-16,2306.09109,,NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations,https://huggingface.co/papers/2306.09109,4,0,0,0,0,0 +2023-06-16,2306.09349,,UrbanIR: Large-Scale Urban Scene Inverse Rendering from a Single Video,https://huggingface.co/papers/2306.09349,5,0,0,0,0,0 +2023-06-16,2306.08707,,VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing,https://huggingface.co/papers/2306.08707,6,0,0,0,0,0 +2023-06-16,2306.08068,,DORSal: Diffusion for Object-centric Representations of Scenes et al.,https://huggingface.co/papers/2306.08068,6,0,0,0,0,0 +2023-06-16,2306.08893,https://github.com/orrzohar/lovm,LOVM: Language-Only Vision Model Selection,https://huggingface.co/papers/2306.08893,7,0,0,0,0,0 +2023-06-16,2306.09327,,Language-Guided Music Recommendation for Video via Prompt Analogies,https://huggingface.co/papers/2306.09327,8,0,0,0,0,0 +2023-06-16,2306.09200,https://github.com/waterhorse1/chessgpt,ChessGPT: Bridging Policy Learning and Language Modeling,https://huggingface.co/papers/2306.09200,9,0,1,3,1,3 +2023-06-16,2306.08620,,Anticipatory Music Transformer,https://huggingface.co/papers/2306.08620,9,0,0,11,0,0 +2023-06-16,2306.09316,,Diffusion Models for Zero-Shot Open-Vocabulary Segmentation,https://huggingface.co/papers/2306.09316,9,0,0,0,0,0 +2023-06-16,2306.08997,,Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models,https://huggingface.co/papers/2306.08997,10,2,0,0,0,0 +2023-06-16,2306.08647,,Language to Rewards for Robotic Skill Synthesis,https://huggingface.co/papers/2306.08647,12,0,0,0,0,0 +2023-06-16,2306.08205,,Agile Catching with Whole-Body MPC and Blackbox Policy Learning,https://huggingface.co/papers/2306.08205,9,1,0,0,0,0 +2023-06-16,2306.09093,https://github.com/lyuchenyang/macaw-llm,"Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration",https://huggingface.co/papers/2306.09093,15,4,0,0,0,0 +2023-06-16,2306.09329,,DreamHuman: Animatable 3D Avatars from Text,https://huggingface.co/papers/2306.09329,15,2,0,0,0,0 +2023-06-16,2306.08161,https://github.com/h2oai/h2o-llmstudio,h2oGPT: Democratizing Large Language Models,https://huggingface.co/papers/2306.08161,18,3,1,1,0,16 +2023-06-16,2306.08543,,Knowledge Distillation of Large Language Models,https://huggingface.co/papers/2306.08543,18,0,0,0,0,0 +2023-06-16,2306.09296,https://github.com/thu-keg/kola,KoLA: Carefully Benchmarking World Knowledge of Large Language Models,https://huggingface.co/papers/2306.09296,19,0,0,0,0,0 +2023-06-16,2306.08568,https://github.com/nlpxucan/wizardlm,WizardCoder: Empowering Code Large Language Models with Evol-Instruct,https://huggingface.co/papers/2306.08568,28,1,1,100,5,100 +2023-06-16,2306.08640,,"AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn",https://huggingface.co/papers/2306.08640,26,2,0,0,0,0 +2023-06-16,2306.09348,,Seeing the World through Your Eyes,https://huggingface.co/papers/2306.09348,31,1,0,0,0,0 +2023-06-16,2306.08276,,TryOnDiffusion: A Tale of Two UNets,https://huggingface.co/papers/2306.08276,71,6,0,0,0,0 +2023-06-14,2306.07946,,STUDY: Socially Aware Temporally Casual Decoder Recommender Systems,https://huggingface.co/papers/2306.07946,1,0,0,0,0,0 +2023-06-14,2306.07941,,GPT-Calls: Enhancing Call Segmentation and Tagging by Generating Synthetic Conversations via Large Language Models,https://huggingface.co/papers/2306.07941,3,0,0,0,0,0 +2023-06-14,2306.07437,https://github.com/TimoBolkart/TEMPEH,Instant Multi-View Head Capture through Learnable Registration,https://huggingface.co/papers/2306.07437,3,0,0,0,0,0 +2023-06-14,2306.07552,https://github.com/facebookresearch/galactic,Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second,https://huggingface.co/papers/2306.07552,3,0,0,0,0,0 +2023-06-14,2306.07969,,GeneCIS: A Benchmark for General Conditional Image Similarity,https://huggingface.co/papers/2306.07969,4,0,0,0,0,0 +2023-06-14,2306.07944,,Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding,https://huggingface.co/papers/2306.07944,5,0,0,0,0,0 +2023-06-14,2306.07473,https://github.com/genentech/voxmol,3D molecule generation by denoising voxel grids,https://huggingface.co/papers/2306.07473,5,0,0,0,0,0 +2023-06-14,2306.07968,https://github.com/caml-lab/research/tree/main/arxiveri,arXiVeri: Automatic table verification with GPT,https://huggingface.co/papers/2306.07968,6,0,0,0,0,0 +2023-06-14,2306.07970,https://github.com/zju3dv/NeuSC,Neural Scene Chronology,https://huggingface.co/papers/2306.07970,6,0,0,0,0,0 +2023-06-14,2306.07580,,SayTap: Language to Quadrupedal Locomotion,https://huggingface.co/papers/2306.07580,7,0,0,0,0,0 +2023-06-14,2306.07349,,ATT3D: Amortized Text-to-3D Object Synthesis,https://huggingface.co/papers/2306.07349,9,1,0,0,0,0 +2023-06-14,2306.07915,https://github.com/google-research/big_vision,Image Captioners Are Scalable Vision Learners Too,https://huggingface.co/papers/2306.07915,10,0,0,0,0,0 +2023-06-14,2306.07536,,TART: A plug-and-play Transformer module for task-agnostic reasoning,https://huggingface.co/papers/2306.07536,10,0,0,0,0,0 +2023-06-14,2306.07906,https://github.com/thudm/webglm,WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences,https://huggingface.co/papers/2306.07906,13,0,1,2,1,2 +2023-06-14,2306.07476,,AniFaceDrawing: Anime Portrait Exploration during Your Sketching,https://huggingface.co/papers/2306.07476,17,1,0,0,0,0 +2023-06-14,2306.07967,https://github.com/arnav0400/vit-slim,One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning,https://huggingface.co/papers/2306.07967,24,0,0,2,0,0 +2023-06-14,2306.07954,,Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation,https://huggingface.co/papers/2306.07954,111,11,0,0,1,2 +2023-06-13,2306.06823,,Weakly supervised information extraction from inscrutable handwritten document images,https://huggingface.co/papers/2306.06823,4,1,0,0,0,0 +2023-06-13,2306.07196,,Retrieval-Enhanced Contrastive Vision-Text Models,https://huggingface.co/papers/2306.07196,7,0,0,0,0,0 +2023-06-13,2306.07042,,Transformers learn through gradual rank increase,https://huggingface.co/papers/2306.07042,9,0,0,0,0,0 +2023-06-13,2306.07075,,Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence,https://huggingface.co/papers/2306.07075,9,5,0,0,0,0 +2023-06-13,2306.06212,https://github.com/ianhuang0630/aladdin,Aladdin: Zero-Shot Hallucination of Stylized 3D Assets from Abstract Scene Descriptions,https://huggingface.co/papers/2306.06212,9,0,1,0,0,0 +2023-06-13,2306.06546,https://github.com/descriptinc/descript-audio-codec,High-Fidelity Audio Compression with Improved RVQGAN,https://huggingface.co/papers/2306.06546,9,1,0,4,0,1 +2023-06-13,2306.07279,https://github.com/crockwell/cap3d,Scalable 3D Captioning with Pretrained Models,https://huggingface.co/papers/2306.07279,15,0,1,0,1,0 +2023-06-13,2306.06638,,Face0: Instantaneously Conditioning a Text-to-Image Model on a Face,https://huggingface.co/papers/2306.06638,17,1,0,0,0,0 +2023-06-13,2306.07174,,Augmenting Language Models with Long-Term Memory,https://huggingface.co/papers/2306.07174,18,5,0,0,0,0 +2023-06-13,2306.07280,,Controlling Text-to-Image Diffusion by Orthogonal Finetuning,https://huggingface.co/papers/2306.07280,20,1,0,1,0,9 +2023-06-13,2306.07179,https://github.com/mlcommons/algorithmic-efficiency,Benchmarking Neural Network Training Algorithms,https://huggingface.co/papers/2306.07179,23,1,0,0,0,0 +2023-06-13,2306.06189,https://github.com/NVlabs/FasterViT,FasterViT: Fast Vision Transformers with Hierarchical Attention,https://huggingface.co/papers/2306.06189,30,0,1,2,0,0 +2023-06-12,2306.05696,,Embodied Executable Policy Learning with Language-based Scene Summarization,https://huggingface.co/papers/2306.05696,3,0,0,0,0,0 +2023-06-12,2306.06092,https://github.com/compphoto/RealisticImageEnhancement,Realistic Saliency Guided Image Enhancement,https://huggingface.co/papers/2306.06092,3,0,0,0,0,0 +2023-06-12,2306.05493,,Multi-Modal Classifiers for Open-Vocabulary Object Detection,https://huggingface.co/papers/2306.05493,5,1,0,0,0,0 +2023-06-12,2306.05685,https://github.com/lm-sys/fastchat,Judging LLM-as-a-judge with MT-Bench and Chatbot Arena,https://huggingface.co/papers/2306.05685,27,2,1,100,9,100 +2023-06-12,2306.05836,https://github.com/causalnlp/corr2cause,Can Large Language Models Infer Causation from Correlation?,https://huggingface.co/papers/2306.05836,5,1,1,0,0,0 +2023-06-12,2306.05949,,Evaluating the Social Impact of Generative AI Systems in Systems and Society,https://huggingface.co/papers/2306.05949,8,0,0,0,0,0 +2023-06-12,2306.06044,,GANeRF: Leveraging Discriminators to Optimize Neural Radiance Fields,https://huggingface.co/papers/2306.06044,4,0,0,0,0,0 +2023-06-12,2306.05544,,BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping,https://huggingface.co/papers/2306.05544,10,1,0,0,0,0 +2023-06-12,2306.06070,https://github.com/osu-nlp-group/mind2web,Mind2Web: Towards a Generalist Agent for the Web,https://huggingface.co/papers/2306.06070,19,3,1,0,2,0 +2023-06-09,2306.05427,,Grounded Text-to-Image Synthesis with Attention Refocusing,https://huggingface.co/papers/2306.05427,3,2,0,0,0,0 +2023-06-09,2306.05411,https://github.com/facebookresearch/r-mae,R-MAE: Regions Meet Masked Autoencoders,https://huggingface.co/papers/2306.05411,2,0,0,0,0,0 +2023-06-09,2306.05420,https://github.com/google-research/spherical-cnn,Scaling Spherical CNNs,https://huggingface.co/papers/2306.05420,1,0,0,0,0,0 +2023-06-09,2306.04845,,Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts,https://huggingface.co/papers/2306.04845,3,0,0,0,0,0 +2023-06-09,2306.05392,https://github.com/sanjayss34/codevqa,Modular Visual Question Answering via Code Generation,https://huggingface.co/papers/2306.05392,2,0,0,0,0,0 +2023-06-09,2306.04822,,Optimizing ViViT Training: Time and Memory Reduction for Action Recognition,https://huggingface.co/papers/2306.04822,2,0,0,0,0,0 +2023-06-09,2306.05357,,Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models,https://huggingface.co/papers/2306.05357,3,0,0,0,0,0 +2023-06-09,2306.05410,,LU-NeRF: Scene and Pose Estimation by Synchronizing Local Unposed NeRFs,https://huggingface.co/papers/2306.05410,2,0,0,0,0,0 +2023-06-09,2306.04751,https://github.com/allenai/open-instruct,How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources,https://huggingface.co/papers/2306.04751,5,0,1,65,3,35 +2023-06-09,2306.04707,,Improving Open Language Models by Learning from Organic Interactions,https://huggingface.co/papers/2306.04707,3,1,0,0,0,0 +2023-06-09,2306.05425,https://github.com/luodian/otter,MIMIC-IT: Multi-Modal In-Context Instruction Tuning,https://huggingface.co/papers/2306.05425,10,0,1,7,2,30 +2023-06-09,2306.05178,,SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions,https://huggingface.co/papers/2306.05178,5,0,0,0,0,1 +2023-06-09,2306.05428,,Background Prompting for Improved Object Depth,https://huggingface.co/papers/2306.05428,3,0,0,0,0,0 +2023-06-09,2306.04757,https://github.com/declare-lab/instruct-eval,INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models,https://huggingface.co/papers/2306.04757,5,0,1,11,0,43 +2023-06-09,2306.05087,https://github.com/weopenml/pandalm,PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization,https://huggingface.co/papers/2306.05087,6,0,1,2,0,24 +2023-06-09,2306.05424,https://github.com/mbzuai-oryx/video-chatgpt,Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models,https://huggingface.co/papers/2306.05424,7,1,1,0,0,0 +2023-06-09,2306.05399,https://github.com/shi-labs/matting-anything,Matting Anything,https://huggingface.co/papers/2306.05399,6,3,1,1,0,1 +2023-06-09,2306.05422,https://github.com/qianqianwang68/omnimotion,Tracking Everything Everywhere All at Once,https://huggingface.co/papers/2306.05422,9,2,0,0,0,0 +2023-06-09,2306.05284,https://github.com/facebookresearch/audiocraft,Simple and Controllable Music Generation,https://huggingface.co/papers/2306.05284,135,23,1,69,0,100 +2023-06-08,2306.04009,,Triggering Multi-Hop Reasoning for Question Answering in Language Models using Soft Prompts and Random Walks,https://huggingface.co/papers/2306.04009,1,0,0,0,0,0 +2023-06-08,2306.04031,,Certified Reasoning with Language Models,https://huggingface.co/papers/2306.04031,1,0,0,0,0,0 +2023-06-08,2306.04076,,Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer,https://huggingface.co/papers/2306.04076,1,0,0,0,0,0 +2023-06-08,2306.04140,,Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions,https://huggingface.co/papers/2306.04140,2,0,0,0,0,0 +2023-06-08,2306.04634,https://github.com/jwkirchenbauer/lm-watermarking,On the Reliability of Watermarks for Large Language Models,https://huggingface.co/papers/2306.04634,5,1,1,0,0,0 +2023-06-08,2306.04632,https://github.com/buxiangzhiren/asymmetric_vqgan,Designing a Better Asymmetric VQGAN for StableDiffusion,https://huggingface.co/papers/2306.04632,3,0,1,4,0,1 +2023-06-08,2306.04362,https://github.com/x-plug/youku-mplug,Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks,https://huggingface.co/papers/2306.04362,2,0,1,0,0,0 +2023-06-08,2306.04528,,PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts,https://huggingface.co/papers/2306.04528,3,0,0,0,0,0 +2023-06-08,2306.04235,https://github.com/zjersey/lightseq-arm,MobileNMT: Enabling Translation in 15MB and 30ms,https://huggingface.co/papers/2306.04235,3,0,0,0,0,0 +2023-06-08,2306.04619,,ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections,https://huggingface.co/papers/2306.04619,4,0,0,0,0,0 +2023-06-08,2306.04050,https://github.com/vcskaushik/LLMzip,LLMZip: Lossless Text Compression using Large Language Models,https://huggingface.co/papers/2306.04050,4,0,0,0,0,0 +2023-06-08,2306.04387,,M^3IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning,https://huggingface.co/papers/2306.04387,8,1,0,0,1,0 +2023-06-07,2306.03438,https://github.com/amazon-science/buggy-code-completion,Large Language Models of Code Fail at Completing Code with Potential Bugs,https://huggingface.co/papers/2306.03438,2,0,0,0,0,0 +2023-06-07,2306.03819,https://github.com/eleutherai/concept-erasure,LEACE: Perfect linear concept erasure in closed form,https://huggingface.co/papers/2306.03819,2,0,1,0,0,0 +2023-06-07,2306.03802,,Learning to Ground Instructional Articles in Videos through Narrations,https://huggingface.co/papers/2306.03802,1,0,0,0,0,0 +2023-06-07,2306.03872,https://github.com/lz1oceani/verify_cot,Deductive Verification of Chain-of-Thought Reasoning,https://huggingface.co/papers/2306.03872,4,0,0,0,0,0 +2023-06-07,2306.03203,,A Static Evaluation of Code Completion by Large Language Models,https://huggingface.co/papers/2306.03203,3,0,0,0,0,0 +2023-06-07,2306.03460,,Natural Language Commanding via Program Synthesis,https://huggingface.co/papers/2306.03460,2,2,0,0,0,0 +2023-06-07,2306.03509,,Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias,https://huggingface.co/papers/2306.03509,4,4,0,0,0,0 +2023-06-07,2306.03504,,Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis,https://huggingface.co/papers/2306.03504,7,1,0,0,0,0 +2023-06-07,2306.03881,,Emergent Correspondence from Image Diffusion,https://huggingface.co/papers/2306.03881,6,2,0,0,0,0 +2023-06-07,2306.03514,https://github.com/xinyu1205/Recognize_Anything-Tag2Text,Recognize Anything: A Strong Image Tagging Model,https://huggingface.co/papers/2306.03514,10,6,1,5,1,3 +2023-06-06,2306.01754,,"Transformer-based Vulnerability Detection in Code at EditTime: Zero-shot, Few-shot, or Fine-tuning?",https://huggingface.co/papers/2306.01754,1,1,0,0,0,0 +2023-06-06,2306.01872,,Probabilistic Adaptation of Text-to-Video Models,https://huggingface.co/papers/2306.01872,1,0,0,0,0,0 +2023-06-06,2306.01879,,VisualGPTScore: Visio-Linguistic Reasoning with Multimodal Generative Pre-Training Scores,https://huggingface.co/papers/2306.01879,1,0,0,0,0,0 +2023-06-06,2306.02245,https://github.com/dyzhang09/sam3d,SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model,https://huggingface.co/papers/2306.02245,2,0,0,0,0,0 +2023-06-06,2306.01923,,The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation,https://huggingface.co/papers/2306.01923,2,0,0,0,0,0 +2023-06-06,2306.02531,https://github.com/apple/ml-planner,PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model,https://huggingface.co/papers/2306.02531,1,0,0,0,0,0 +2023-06-06,2306.03083,,MotionDiffuser: Controllable Multi-Agent Motion Prediction using Diffusion,https://huggingface.co/papers/2306.03083,3,0,0,0,0,0 +2023-06-06,2306.01841,https://github.com/facebookresearch/ternary_binary_transformer,Binary and Ternary Natural Language Generation,https://huggingface.co/papers/2306.01841,2,0,1,0,0,0 +2023-06-06,2306.03092,https://github.com/NVlabs/neuralangelo,Neuralangelo: High-Fidelity Neural Surface Reconstruction,https://huggingface.co/papers/2306.03092,2,1,0,0,0,0 +2023-06-06,2306.03024,,PokemonChat: Auditing ChatGPT for Pokémon Universe Knowledge,https://huggingface.co/papers/2306.03024,2,2,0,0,0,0 +2023-06-06,2306.02982,,PolyVoice: Language Models for Speech to Speech Translation,https://huggingface.co/papers/2306.02982,4,0,0,0,0,0 +2023-06-06,2306.01741,,GPT Models Meet Robotic Applications: Co-Speech Gesturing Chat System,https://huggingface.co/papers/2306.01741,2,0,0,0,0,0 +2023-06-06,2306.03082,https://github.com/lichang-chen/instructzero,InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models,https://huggingface.co/papers/2306.03082,5,0,0,0,0,0 +2023-06-06,2306.03038,,HeadSculpt: Crafting 3D Head Avatars with Text,https://huggingface.co/papers/2306.03038,4,0,0,0,0,0 +2023-06-06,2306.02561,https://github.com/yuchenlin/LLM-Blender,LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion,https://huggingface.co/papers/2306.02561,6,2,1,9,0,5 +2023-06-06,2306.02858,https://github.com/damo-nlp-sg/video-llama,Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding,https://huggingface.co/papers/2306.02858,16,7,1,6,1,4 +2023-06-06,2306.02254,,A Technical Report for Polyglot-Ko: Open-Source Large-Scale Korean Language Models,https://huggingface.co/papers/2306.02254,11,1,0,13,0,36 +2023-06-06,2306.02707,,Orca: Progressive Learning from Complex Explanation Traces of GPT-4,https://huggingface.co/papers/2306.02707,46,18,0,100,45,100 +2023-06-05,2306.01337,,An Empirical Study on Challenging Math Problem Solving with GPT-4,https://huggingface.co/papers/2306.01337,1,1,0,0,0,0 +2023-06-05,2306.01061,,Reimagining Retrieval Augmented Language Models for Answering Queries,https://huggingface.co/papers/2306.01061,1,0,0,0,0,0 +2023-06-05,2306.01160,https://github.com/epfml/dynamic-sparse-flash-attention,Faster Causal Attention Over Large Sequences Through Sparse Flash Attention,https://huggingface.co/papers/2306.01160,1,2,0,0,0,0 +2023-06-05,2306.01242,,Responsible Task Automation: Empowering Large Language Models as Responsible Task Automators,https://huggingface.co/papers/2306.01242,2,0,0,0,0,0 +2023-06-05,2306.01684,,Harnessing large-language models to generate private synthetic text,https://huggingface.co/papers/2306.01684,2,0,0,0,0,0 +2023-06-05,2306.01736,,DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model,https://huggingface.co/papers/2306.01736,1,0,0,0,0,0 +2023-06-05,2306.01694,https://github.com/collinskatie/checkmate,Evaluating Language Models for Mathematics through Interactions,https://huggingface.co/papers/2306.01694,2,0,0,0,0,0 +2023-06-05,2306.01693,,Fine-Grained Human Feedback Gives Better Rewards for Language Model Training,https://huggingface.co/papers/2306.01693,3,0,0,0,0,0 +2023-06-05,2306.01567,https://github.com/syscv/sam-hq,Segment Anything in High Quality,https://huggingface.co/papers/2306.01567,7,2,1,3,0,1 +2023-06-05,2306.01116,,"The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only",https://huggingface.co/papers/2306.01116,30,3,0,77,13,100 +2023-06-02,2306.00622,,ReviewerGPT? An Exploratory Study on Using Large Language Models for Paper Reviewing,https://huggingface.co/papers/2306.00622,1,0,0,0,0,0 +2023-06-02,2306.00964,,Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation,https://huggingface.co/papers/2306.00964,1,0,0,0,0,0 +2023-06-02,2306.00986,,Diffusion Self-Guidance for Controllable Image Generation,https://huggingface.co/papers/2306.00986,2,0,0,0,0,0 +2023-06-02,2306.00943,,Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance,https://huggingface.co/papers/2306.00943,5,1,0,0,0,0 +2023-06-02,2306.00110,https://github.com/microsoft/muzic,MuseCoco: Generating Symbolic Music from Text,https://huggingface.co/papers/2306.00110,2,0,0,1,0,0 +2023-06-02,2306.00029,https://github.com/salesforce/codetf,CodeTF: One-stop Transformer Library for State-of-the-art Code LLM,https://huggingface.co/papers/2306.00029,2,0,1,0,0,0 +2023-06-02,2306.00107,https://github.com/yizhilll/mert,MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training,https://huggingface.co/papers/2306.00107,2,0,1,4,0,13 +2023-06-02,2306.00802,,Birth of a Transformer: A Memory Viewpoint,https://huggingface.co/papers/2306.00802,2,0,0,0,0,0 +2023-06-02,2306.00008,,Brainformers: Trading Simplicity for Efficiency,https://huggingface.co/papers/2306.00008,1,1,0,0,0,0 +2023-06-02,2306.00148,,SafeDiffuser: Safe Planning with Diffusion Probabilistic Models,https://huggingface.co/papers/2306.00148,1,0,0,0,0,0 +2023-06-02,2306.00926,https://github.com/ygtxr1997/celebbasis,Inserting Anybody in Diffusion Models via Celeb Basis,https://huggingface.co/papers/2306.00926,3,3,1,1,0,0 +2023-06-02,2306.00956,,The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects,https://huggingface.co/papers/2306.00956,1,0,0,0,0,0 +2023-06-02,2306.00971,,ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation,https://huggingface.co/papers/2306.00971,3,0,0,0,0,0 +2023-06-02,2306.00984,https://github.com/google-research/syn-rep-learn,StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners,https://huggingface.co/papers/2306.00984,3,1,0,0,0,0 +2023-06-02,2306.00966,https://github.com/hila-chefer/Conceptor,The Hidden Language of Diffusion Models,https://huggingface.co/papers/2306.00966,5,0,0,0,0,0 +2023-06-02,2306.00637,,Wuerstchen: Efficient Pretraining of Text-to-Image Models,https://huggingface.co/papers/2306.00637,12,6,0,5,0,33 +2023-06-02,2306.00890,,LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day,https://huggingface.co/papers/2306.00890,10,1,0,11,0,7 +2023-06-02,2306.00983,,StyleDrop: Text-to-Image Generation in Any Style,https://huggingface.co/papers/2306.00983,6,3,0,2,0,5 +2023-06-02,2306.00238,https://github.com/apple/ml-cvnets,Bytes Are All You Need: Transformers Operating Directly On File Bytes,https://huggingface.co/papers/2306.00238,6,0,0,0,0,0 +2023-06-02,2306.00378,https://github.com/wyysf-98/GenMM,Example-based Motion Synthesis via Generative Motion Matching,https://huggingface.co/papers/2306.00378,6,2,1,0,0,1 +2023-06-02,2306.00739,,SQL-PaLM: Improved Large Language ModelAdaptation for Text-to-SQL,https://huggingface.co/papers/2306.00739,16,3,0,0,0,0 +2023-06-02,2306.00980,,SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds,https://huggingface.co/papers/2306.00980,14,13,0,0,0,0 +2023-06-01,2305.20082,,Control4D: Dynamic Portrait Editing by Learning 4D GAN from 2D Diffusion-based Editor,https://huggingface.co/papers/2305.20082,2,2,0,0,0,0 +2023-06-01,2305.20086,https://github.com/somepago/dcr,Understanding and Mitigating Copying in Diffusion Models,https://huggingface.co/papers/2305.20086,3,0,0,0,0,0 +2023-06-01,2305.19370,,Blockwise Parallel Transformer for Long Context Large Models,https://huggingface.co/papers/2305.19370,3,0,0,0,0,0 +2023-06-01,2305.19472,https://github.com/allenai/plasma,PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning,https://huggingface.co/papers/2305.19472,1,0,0,0,0,0 +2023-06-01,2305.19835,,Deliberate then Generate: Enhanced Prompting Framework for Text Generation,https://huggingface.co/papers/2305.19835,1,0,0,0,0,0 +2023-06-01,2305.20010,,Human or Not? A Gamified Approach to the Turing Test,https://huggingface.co/papers/2305.20010,1,0,0,0,0,0 +2023-06-01,2305.20088,https://github.com/lijiefan/laclip,Improving CLIP Training with Language Rewrites,https://huggingface.co/papers/2305.20088,2,1,0,0,0,0 +2023-06-01,2305.19452,https://github.com/google-research/google-research/tree/master/bigger_better_faster,"Bigger, Better, Faster: Human-level Atari with human-level efficiency",https://huggingface.co/papers/2305.19452,3,0,0,0,0,0 +2023-06-01,2305.20091,https://github.com/shubham-goel/4D-Humans,Humans in 4D: Reconstructing and Tracking Humans with Transformers,https://huggingface.co/papers/2305.20091,1,0,1,0,0,0 +2023-06-01,2305.20081,https://github.com/sail-sg/edp,Efficient Diffusion Policies for Offline Reinforcement Learning,https://huggingface.co/papers/2305.20081,2,0,0,0,0,0 +2023-06-01,2305.20030,https://github.com/YuxinWenRick/tree-ring-watermark,Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust,https://huggingface.co/papers/2305.20030,7,2,0,0,0,1 +2023-05-31,2305.18365,,What indeed can GPT models do in chemistry? A comprehensive benchmark on eight tasks,https://huggingface.co/papers/2305.18365,3,0,0,0,0,0 +2023-05-31,2305.18583,,Controllable Text-to-Image Generation with GPT-4,https://huggingface.co/papers/2305.18583,3,1,0,0,0,0 +2023-05-31,2305.18729,https://github.com/dvlab-research/rival,Real-World Image Variation by Aligning Diffusion Inversion Chain,https://huggingface.co/papers/2305.18729,4,1,0,0,0,0 +2023-05-31,2305.19066,https://github.com/noamelata/nesteddiffusion,Nested Diffusion Processes for Anytime Image Generation,https://huggingface.co/papers/2305.19066,1,0,1,0,0,1 +2023-05-31,2305.18474,,Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation,https://huggingface.co/papers/2305.18474,2,0,0,2,0,0 +2023-05-31,2305.18766,,HiFA: High-fidelity Text-to-3D with Advanced Diffusion Guidance,https://huggingface.co/papers/2305.18766,5,1,0,0,0,0 +2023-05-31,2305.18373,,KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models,https://huggingface.co/papers/2305.18373,1,0,0,0,0,0 +2023-05-31,2305.18565,,PaLI-X: On Scaling up a Multilingual Vision and Language Model,https://huggingface.co/papers/2305.18565,3,0,0,0,0,0 +2023-05-31,2305.18654,https://github.com/nouhadziri/faith-and-fate,Faith and Fate: Limits of Transformers on Compositionality,https://huggingface.co/papers/2305.18654,4,1,0,0,0,0 +2023-05-31,2305.19234,https://github.com/berlino/grammar-prompting,Grammar Prompting for Domain-Specific Language Generation with Large Language Models,https://huggingface.co/papers/2305.19234,3,4,0,0,0,0 +2023-05-31,2305.18415,https://github.com/sukjulian/lab-gatr,Geometric Algebra Transformers,https://huggingface.co/papers/2305.18415,2,0,0,0,0,0 +2023-05-31,2305.19245,,AlteredAvatar: Stylizing Dynamic 3D Avatars with Fast Style Adaptation,https://huggingface.co/papers/2305.19245,2,0,0,0,0,0 +2023-05-31,2305.18802,,LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus,https://huggingface.co/papers/2305.18802,2,0,0,0,3,1 +2023-05-31,2305.18752,https://github.com/stevengrove/gpt4tools,GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction,https://huggingface.co/papers/2305.18752,3,1,1,0,0,0 +2023-05-31,2305.19012,https://github.com/icoz69/styleavatar3d,StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation,https://huggingface.co/papers/2305.19012,4,2,0,0,0,0 +2023-05-31,2305.19164,https://github.com/virajprabhu/lance,LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images,https://huggingface.co/papers/2305.19164,2,0,1,0,0,0 +2023-05-30,2305.17359,https://github.com/xianjun-yang/dna-gpt,DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text,https://huggingface.co/papers/2305.17359,1,0,0,0,0,0 +2023-05-30,2305.17390,,SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks,https://huggingface.co/papers/2305.17390,2,0,0,0,0,0 +2023-05-30,2305.17493,,Model Dementia: Generated Data Makes Models Forget,https://huggingface.co/papers/2305.17493,2,0,0,0,0,0 +2023-05-30,2305.18098,,BigTrans: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages,https://huggingface.co/papers/2305.18098,4,2,0,6,0,2 +2023-05-30,2305.18259,https://github.com/aigtext/glyphcontrol-release,GlyphControl: Glyph Conditional Control for Visual Text Generation,https://huggingface.co/papers/2305.18259,2,1,1,0,0,1 +2023-05-30,2305.18286,,Photoswap: Personalized Subject Swapping in Images,https://huggingface.co/papers/2305.18286,3,0,0,0,0,0 +2023-05-30,2305.18292,,Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models,https://huggingface.co/papers/2305.18292,5,1,0,0,0,0 +2023-05-30,2305.18231,,High-Fidelity Image Compression with Score-based Generative Models,https://huggingface.co/papers/2305.18231,1,0,0,0,0,0 +2023-05-30,2305.18247,https://github.com/videocrafter/talecrafter,TaleCrafter: Interactive Story Visualization with Multiple Characters,https://huggingface.co/papers/2305.18247,4,0,0,0,0,0 +2023-05-30,2305.18264,https://github.com/g-u-n/gen-l-video,Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising,https://huggingface.co/papers/2305.18264,3,0,1,0,0,0 +2023-05-30,2305.17333,https://github.com/princeton-nlp/mezo,Fine-Tuning Language Models with Just Forward Passes,https://huggingface.co/papers/2305.17333,2,2,1,0,0,0 +2023-05-30,2305.17306,https://github.com/franxyao/chain-of-thought-hub,Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance,https://huggingface.co/papers/2305.17306,2,0,0,0,0,0 +2023-05-30,2305.17144,,Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory,https://huggingface.co/papers/2305.17144,2,0,0,0,0,0 +2023-05-30,2305.18274,https://github.com/medarc-ai/fmri-reconstruction-nsd,Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors,https://huggingface.co/papers/2305.18274,4,1,1,0,0,0 +2023-05-30,2305.17216,https://github.com/kohjingyu/gill,Generating Images with Multimodal Language Models,https://huggingface.co/papers/2305.17216,7,2,1,0,0,0 +2023-05-30,2305.18295,,RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths,https://huggingface.co/papers/2305.18295,7,0,0,0,0,0 +2023-05-29,2305.16867,,Playing repeated games with Large Language Models,https://huggingface.co/papers/2305.16867,2,0,0,0,0,0 +2023-05-29,2305.17126,https://github.com/ctlllll/llm-toolmaker,Large Language Models as Tool Makers,https://huggingface.co/papers/2305.17126,2,1,0,0,0,0 +2023-05-29,2305.16381,https://github.com/google-research/google-research/tree/master/dpok,DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models,https://huggingface.co/papers/2305.16381,3,0,0,0,0,0 +2023-05-29,2305.16411,,ZeroAvatar: Zero-shot 3D Avatar Generation from a Single Image,https://huggingface.co/papers/2305.16411,1,0,0,0,0,0 +2023-05-29,2305.17066,,Mindstorms in Natural Language-Based Societies of Mind,https://huggingface.co/papers/2305.17066,3,0,0,0,0,0 +2023-05-29,2305.16349,,Lexinvariant Language Models,https://huggingface.co/papers/2305.16349,1,0,0,0,0,0 +2023-05-29,2305.16367,,Role-Play with Large Language Models,https://huggingface.co/papers/2305.16367,3,3,0,0,0,0 +2023-05-29,2305.16635,,Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing,https://huggingface.co/papers/2305.16635,1,1,0,0,0,0 +2023-05-29,2305.16806,https://github.com/vyraun/literalness,Do GPTs Produce Less Literal Translations?,https://huggingface.co/papers/2305.16806,1,0,0,0,0,0 +2023-05-29,2305.16958,https://github.com/bloomberg/mixce-acl2023,MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies,https://huggingface.co/papers/2305.16958,1,0,1,1,0,0 +2023-05-29,2305.16960,,Training Socially Aligned Language Models in Simulated Human Society,https://huggingface.co/papers/2305.16960,2,0,0,3,0,0 +2023-05-29,2305.16999,https://github.com/google-research/big_vision,Three Towers: Flexible Contrastive Learning with Pretrained Image Models,https://huggingface.co/papers/2305.16999,2,0,0,0,0,0 +2023-05-29,2305.16380,,Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer,https://huggingface.co/papers/2305.16380,3,0,0,0,0,0 +2023-05-29,2305.16704,https://github.com/facebookresearch/iclmlp,A Closer Look at In-Context Learning under Distribution Shifts,https://huggingface.co/papers/2305.16704,1,0,0,0,0,0 +2023-05-29,2305.16843,https://github.com/deepmind/randomized_positional_encodings,Randomized Positional Encodings Boost Length Generalization of Transformers,https://huggingface.co/papers/2305.16843,2,0,0,0,0,0 +2023-05-29,2305.16311,https://github.com/google/break-a-scene,Break-A-Scene: Extracting Multiple Concepts from a Single Image,https://huggingface.co/papers/2305.16311,6,0,0,0,0,0 +2023-05-29,2305.16334,,OlaGPT: Empowering LLMs With Human-like Problem-Solving Abilities,https://huggingface.co/papers/2305.16334,1,0,0,0,0,0 +2023-05-29,2305.17098,,ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing,https://huggingface.co/papers/2305.17098,3,3,0,0,0,0 +2023-05-29,2305.16338,,Think Before You Act: Decision Transformers with Internal Working Memory,https://huggingface.co/papers/2305.16338,3,0,0,0,0,0 +2023-05-29,2305.16355,,PandaGPT: One Model To Instruction-Follow Them All,https://huggingface.co/papers/2305.16355,3,0,0,1,0,2 +2023-05-29,2305.16765,,Backpack Language Models,https://huggingface.co/papers/2305.16765,2,1,0,2,0,2 +2023-05-26,2305.15486,,SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning,https://huggingface.co/papers/2305.15486,1,0,0,0,0,0 +2023-05-26,2305.15717,,The False Promise of Imitating Proprietary LLMs,https://huggingface.co/papers/2305.15717,5,0,0,0,0,0 +2023-05-26,2305.15581,https://github.com/ubc-vision/LDM_correspondences,Unsupervised Semantic Correspondence Using Stable Diffusion,https://huggingface.co/papers/2305.15581,2,0,0,0,0,0 +2023-05-26,2305.15586,,Manifold Diffusion Fields,https://huggingface.co/papers/2305.15586,2,0,0,0,0,0 +2023-05-26,2305.15719,,Efficient Neural Music Generation,https://huggingface.co/papers/2305.15719,2,0,0,0,0,0 +2023-05-26,2305.15779,,Custom-Edit: Text-Guided Image Editing with Customized Diffusion Models,https://huggingface.co/papers/2305.15779,3,0,0,0,0,0 +2023-05-26,2305.15798,,On Architectural Compression of Text-to-Image Diffusion Models,https://huggingface.co/papers/2305.15798,3,1,0,23,0,32 +2023-05-26,2305.16291,https://github.com/MineDojo/Voyager,Voyager: An Open-Ended Embodied Agent with Large Language Models,https://huggingface.co/papers/2305.16291,9,4,0,0,0,0 +2023-05-26,2305.16213,https://github.com/threestudio-project/threestudio,ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation,https://huggingface.co/papers/2305.16213,9,0,1,0,0,0 +2023-05-25,2305.14540,https://github.com/salesforce/factualnlg,LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond,https://huggingface.co/papers/2305.14540,2,1,0,0,0,0 +2023-05-25,2305.14564,https://github.com/simengsun/pearl,PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents,https://huggingface.co/papers/2305.14564,1,0,0,0,0,0 +2023-05-25,2305.14878,,Leveraging GPT-4 for Automatic Translation Post-Editing,https://huggingface.co/papers/2305.14878,1,0,0,0,0,0 +2023-05-25,2305.15038,https://github.com/damo-nlp-sg/gpt4-as-dataanalyst,Is GPT-4 a Good Data Analyst?,https://huggingface.co/papers/2305.15038,4,2,0,0,0,0 +2023-05-24,2305.13534,https://github.com/nanami18/snowballed_hallucination,How Language Model Hallucinations Can Snowball,https://huggingface.co/papers/2305.13534,2,0,0,0,0,0 +2023-05-24,2305.13735,https://github.com/naver-ai/almost,Aligning Large Language Models through Synthetic Feedback,https://huggingface.co/papers/2305.13735,1,0,1,0,0,0 +2023-05-24,2305.13786,https://github.com/deepmind/perception_test,Perception Test: A Diagnostic Benchmark for Multimodal Video Models,https://huggingface.co/papers/2305.13786,1,0,0,0,0,0 +2023-05-24,2305.13579,https://github.com/drboog/profusion,Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach,https://huggingface.co/papers/2305.13579,2,0,1,0,0,0 +2023-05-24,2305.13840,https://github.com/weifeng-chen/control-a-video,Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models,https://huggingface.co/papers/2305.13840,4,1,1,6,0,2 +2023-05-24,2305.14233,https://github.com/thunlp/ultrachat,Enhancing Chat Language Models by Scaling High-quality Instructional Conversations,https://huggingface.co/papers/2305.14233,6,4,1,6,10,26 +2023-05-24,2305.14201,https://github.com/liutiedong/goat,Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks,https://huggingface.co/papers/2305.14201,4,5,1,10,1,3 +2023-05-24,2305.14314,https://github.com/artidoro/qlora,QLoRA: Efficient Finetuning of Quantized LLMs,https://huggingface.co/papers/2305.14314,44,7,1,100,2,100 +2023-05-23,2305.13304,https://github.com/aiwaves-cn/recurrentgpt,RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text,https://huggingface.co/papers/2305.13304,1,2,0,0,0,0 +2023-05-23,2305.11938,https://github.com/google-research/xtreme-up,XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages,https://huggingface.co/papers/2305.11938,1,0,0,2,0,0 +2023-05-23,2305.12001,,OPT-R: Exploring the Role of Explanations in Finetuning and Prompting for Reasoning Skills of Large Language Models,https://huggingface.co/papers/2305.12001,1,0,0,0,0,0 +2023-05-23,2305.12050,,CodeCompose: A Large-Scale Industrial Deployment of AI-assisted Code Authoring,https://huggingface.co/papers/2305.12050,1,0,0,0,0,0 +2023-05-23,2305.12487,,Augmenting Autotelic Agents with Large Language Models,https://huggingface.co/papers/2305.12487,1,0,0,0,0,0 +2023-05-23,2305.13009,https://github.com/slp-rl/spokenstorycloze,Textually Pretrained Speech Language Models,https://huggingface.co/papers/2305.13009,2,0,0,0,0,0 +2023-05-23,2305.13050,https://github.com/guyyariv/AudioToken,AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation,https://huggingface.co/papers/2305.13050,3,2,1,0,0,2 +2023-05-23,2305.13048,https://github.com/BlinkDL/RWKV-LM,RWKV: Reinventing RNNs for the Transformer Era,https://huggingface.co/papers/2305.13048,11,1,1,3,0,3 +2023-05-23,2305.13301,https://github.com/kvablack/ddpo-pytorch,Training Diffusion Models with Reinforcement Learning,https://huggingface.co/papers/2305.13301,3,0,1,7,0,0 +2023-05-23,2305.13077,https://github.com/ybybzhang/controlvideo,ControlVideo: Training-free Controllable Text-to-Video Generation,https://huggingface.co/papers/2305.13077,6,3,1,0,0,2 +2023-05-22,2305.11337,,RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture,https://huggingface.co/papers/2305.11337,2,0,0,0,0,0 +2023-05-22,2305.11675,,Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity,https://huggingface.co/papers/2305.11675,1,1,0,0,0,0 +2023-05-22,2305.11846,https://github.com/microsoft/i-Code/tree/main/i-Code-V3,Any-to-Any Generation via Composable Diffusion,https://huggingface.co/papers/2305.11846,4,2,0,1,0,0 +2023-05-22,2305.11837,,Comparing Software Developers with ChatGPT: An Empirical Investigation,https://huggingface.co/papers/2305.11837,1,0,0,0,0,0 +2023-05-22,2305.11243,,Comparing Machines and Children: Using Developmental Psychology Experiments to Assess the Strengths and Weaknesses of LaMDA Responses,https://huggingface.co/papers/2305.11243,1,0,0,0,0,0 +2023-05-22,2305.11308,,Counterfactuals for Design: A Model-Agnostic Method For Design Recommendations,https://huggingface.co/papers/2305.11308,1,0,0,0,0,0 +2023-05-22,2305.11364,https://github.com/pair-code/interpretability,Visualizing Linguistic Diversity of Text Datasets Synthesized by Large Language Models,https://huggingface.co/papers/2305.11364,2,1,0,0,0,0 +2023-05-22,2305.11541,https://github.com/keanudicap/MSQA,Empower Large Language Model to Perform Better on Industrial Domain-Specific Question Answering,https://huggingface.co/papers/2305.11541,1,1,0,0,0,0 +2023-05-22,2305.11598,,Introspective Tips: Large Language Model for In-Context Decision Making,https://huggingface.co/papers/2305.11598,1,0,0,0,0,0 +2023-05-22,2305.11738,https://github.com/microsoft/ProphetNet,CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing,https://huggingface.co/papers/2305.11738,5,0,0,0,0,0 +2023-05-22,2305.11759,https://github.com/amazon-science/controlling-llm-memorization,Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning,https://huggingface.co/papers/2305.11759,1,0,1,0,0,0 +2023-05-22,2305.11778,,Cross-Lingual Supervision improves Large Language Models Pre-training,https://huggingface.co/papers/2305.11778,2,0,0,0,0,0 +2023-05-22,2305.11840,https://github.com/google-research-datasets/seegull,SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models,https://huggingface.co/papers/2305.11840,1,0,0,0,0,0 +2023-05-22,2305.11863,https://github.com/huthlab/encoding-model-scaling-laws,Scaling laws for language encoding models in fMRI,https://huggingface.co/papers/2305.11863,1,0,0,0,0,0 +2023-05-22,2305.11834,https://github.com/microsoft/pengi,Pengi: An Audio Language Model for Audio Tasks,https://huggingface.co/papers/2305.11834,2,1,0,0,0,0 +2023-05-22,2305.11841,,How Does Generative Retrieval Scale to Millions of Passages?,https://huggingface.co/papers/2305.11841,3,0,0,0,0,0 +2023-05-22,2305.11694,https://github.com/google-research/language,QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations,https://huggingface.co/papers/2305.11694,1,0,0,0,1,0 +2023-05-22,2305.11588,https://github.com/eckertzhang/text2nerf,Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields,https://huggingface.co/papers/2305.11588,3,1,1,0,0,0 +2023-05-22,2305.11854,,Multimodal Web Navigation with Instruction-Finetuned Foundation Models,https://huggingface.co/papers/2305.11854,3,0,0,0,0,0 +2023-05-22,2305.11870,https://github.com/snuvclab/chupa,Chupa: Carving 3D Clothed Humans from Skinned Shape Priors using 2D Diffusion Probabilistic Models,https://huggingface.co/papers/2305.11870,3,0,0,0,0,0 +2023-05-22,2305.11206,,LIMA: Less Is More for Alignment,https://huggingface.co/papers/2305.11206,20,9,0,9,10,20 +2023-05-19,2305.10474,,Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models,https://huggingface.co/papers/2305.10474,1,0,0,0,0,0 +2023-05-19,2305.10722,https://github.com/eric-ai-lab/dsd,Discriminative Diffusion Models as Few-shot Vision and Language Learners,https://huggingface.co/papers/2305.10722,3,0,1,0,0,0 +2023-05-19,2305.10841,https://github.com/microsoft/muzic,GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework,https://huggingface.co/papers/2305.10841,2,1,0,0,0,0 +2023-05-19,2305.10853,,LDM3D: Latent Diffusion Model for 3D,https://huggingface.co/papers/2305.10853,10,2,0,3,0,16 +2023-05-19,2305.10874,,VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation,https://huggingface.co/papers/2305.10874,1,0,0,0,0,0 +2023-05-19,2305.11147,https://github.com/salesforce/unicontrol,UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild,https://huggingface.co/papers/2305.11147,3,1,1,1,0,1 +2023-05-19,2305.10688,,MolXPT: Wrapping Molecules with Text for Generative Pre-training,https://huggingface.co/papers/2305.10688,1,0,0,0,0,0 +2023-05-19,2305.11175,https://github.com/opengvlab/interngpt,VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks,https://huggingface.co/papers/2305.11175,3,5,1,0,0,0 +2023-05-19,2305.10912,,A Generalist Dynamics Model for Control,https://huggingface.co/papers/2305.10912,1,0,0,0,0,0 +2023-05-19,2305.11129,https://github.com/google-research/longt5,mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences,https://huggingface.co/papers/2305.11129,2,1,0,3,0,0 +2023-05-19,2305.11000,https://github.com/0nutation/speechgpt,SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities,https://huggingface.co/papers/2305.11000,4,1,0,3,0,0 +2023-05-19,2305.11173,https://github.com/facebookresearch/vlpart,Going Denser with Open-Vocabulary Part Segmentation,https://huggingface.co/papers/2305.11173,2,1,0,0,0,0 +2023-05-19,2305.10855,,TextDiffuser: Diffusion Models as Text Painters,https://huggingface.co/papers/2305.10855,3,0,0,0,0,2 +2023-05-19,2305.10601,https://github.com/ysymyth/tree-of-thought-llm,Tree of Thoughts: Deliberate Problem Solving with Large Language Models,https://huggingface.co/papers/2305.10601,10,1,0,0,0,0 +2023-05-19,2305.10434,,Learning the Visualness of Text Using Large Vision-Language Models,https://huggingface.co/papers/2305.10434,2,0,0,0,0,0 +2023-05-19,2305.11171,https://github.com/google-research/google-research/tree/master/true_teacher,TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models,https://huggingface.co/papers/2305.11171,2,0,0,1,1,2 +2023-05-19,2305.10763,,CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training,https://huggingface.co/papers/2305.10763,3,4,0,0,0,0 +2023-05-19,2305.10764,,OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding,https://huggingface.co/papers/2305.10764,6,4,0,0,0,1 +2023-05-19,2305.10973,https://github.com/XingangPan/DragGAN,Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold,https://huggingface.co/papers/2305.10973,30,74,1,1,0,30 +2023-05-18,2305.10431,https://github.com/mit-han-lab/fastcomposer,FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention,https://huggingface.co/papers/2305.10431,2,0,1,0,0,0 +2023-05-18,2305.10400,https://github.com/yonatanbitton/wysiwyr,What You See is What You Read? Improving Text-Image Alignment Evaluation,https://huggingface.co/papers/2305.10400,2,0,1,0,1,0 +2023-05-18,2305.09758,,A Video Is Worth 4096 Tokens: Verbalize Story Videos To Understand Them In Zero Shot,https://huggingface.co/papers/2305.09758,1,1,0,0,0,0 +2023-05-18,2305.09764,,Application-Agnostic Language Modeling for On-Device ASR,https://huggingface.co/papers/2305.09764,2,0,0,0,0,0 +2023-05-18,2305.10005,https://github.com/alexander-h-liu/dinosr,DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning,https://huggingface.co/papers/2305.10005,2,0,0,0,0,0 +2023-05-18,2305.10142,https://github.com/franxyao/gpt-bargaining,Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback,https://huggingface.co/papers/2305.10142,1,0,0,0,0,0 +2023-05-18,2305.10266,,Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation Capability,https://huggingface.co/papers/2305.10266,1,0,0,0,0,0 +2023-05-18,2305.10429,https://github.com/sangmichaelxie/doremi,DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining,https://huggingface.co/papers/2305.10429,3,2,1,0,0,0 +2023-05-18,2305.10018,,Transfer Learning for Fine-grained Classification Using Semi-supervised Learning and Visual Transformers,https://huggingface.co/papers/2305.10018,1,0,0,0,0,0 +2023-05-18,2305.10320,,CostFormer:Cost Transformer for Cost Aggregation in Multi-view Stereo,https://huggingface.co/papers/2305.10320,1,0,0,0,0,0 +2023-05-18,2305.09761,https://github.com/javieryu/nerf_bridge,"NerfBridge: Bringing Real-time, Online Neural Radiance Field Training to Robotics",https://huggingface.co/papers/2305.09761,1,0,0,0,0,0 +2023-05-18,2305.09975,https://github.com/microsoft/SmartWordSuggestions,Smart Word Suggestions for Writing Assistance,https://huggingface.co/papers/2305.09975,2,0,0,0,0,0 +2023-05-18,2305.10403,,PaLM 2 Technical Report,https://huggingface.co/papers/2305.10403,5,4,0,0,1,1 +2023-05-18,2305.09863,https://github.com/microsoft/automated-explanations,Explaining black box text modules in natural language with language models,https://huggingface.co/papers/2305.09863,3,0,0,0,0,0 +2023-05-18,2305.10425,,SLiC-HF: Sequence Likelihood Calibration with Human Feedback,https://huggingface.co/papers/2305.10425,5,0,0,1,0,0 +2023-05-18,2305.09857,https://github.com/vipulraheja/coedit,CoEdIT: Text Editing by Task-Specific Instruction Tuning,https://huggingface.co/papers/2305.09857,6,3,1,5,3,14 +2023-05-17,2305.09662,,Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation,https://huggingface.co/papers/2305.09662,3,0,0,0,0,0 +2023-05-17,2305.09515,https://github.com/microsoft/ProphetNet,AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation,https://huggingface.co/papers/2305.09515,2,3,0,0,0,0 +2023-05-17,2305.09148,https://github.com/chillingdream/dap,Dual-Alignment Pre-training for Cross-lingual Sentence Embedding,https://huggingface.co/papers/2305.09148,1,0,0,0,0,0 +2023-05-17,2305.09664,https://github.com/JasonQSY/3DOI,Understanding 3D Object Interaction from a Single Image,https://huggingface.co/papers/2305.09664,1,0,1,0,0,1 +2023-05-17,2305.09253,https://github.com/drimpossible/acm,Online Continual Learning Without the Storage Constraint,https://huggingface.co/papers/2305.09253,2,0,0,0,0,0 +2023-05-17,2305.08891,,Common Diffusion Noise Schedules and Sample Steps are Flawed,https://huggingface.co/papers/2305.08891,7,4,0,0,0,0 +2023-05-17,2305.09636,,SoundStorm: Efficient Parallel Audio Generation,https://huggingface.co/papers/2305.09636,3,6,0,0,0,0 +2023-05-17,2305.09137,https://github.com/thu-coai/picl,Pre-Training to Learn in Context,https://huggingface.co/papers/2305.09137,2,0,1,0,0,0 +2023-05-17,2305.09617,,Towards Expert-Level Medical Question Answering with Large Language Models,https://huggingface.co/papers/2305.09617,5,1,0,0,0,0 +2023-05-17,2305.09641,,FitMe: Deep Photorealistic 3D Morphable Model Avatars,https://huggingface.co/papers/2305.09641,3,2,0,0,0,0 +2023-05-16,2305.08379,https://github.com/allenai/tess-diffusion,TESS: Text-to-Text Self-Conditioned Simplex Diffusion,https://huggingface.co/papers/2305.08379,1,0,0,0,0,0 +2023-05-16,2305.08850,,Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts,https://huggingface.co/papers/2305.08850,1,0,0,0,0,0 +2023-05-16,2305.08848,https://github.com/JetRunner/SuperICL,Small Models are Valuable Plug-ins for Large Language Models,https://huggingface.co/papers/2305.08848,3,0,0,0,0,0 +2023-05-16,2305.07969,https://github.com/MarkChenYutian/GPT-Sentinel-public,GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content,https://huggingface.co/papers/2305.07969,1,0,0,0,0,0 +2023-05-16,2305.08844,https://github.com/feyzaakyurek/rl4f,RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs,https://huggingface.co/papers/2305.08844,1,0,0,0,0,0 +2023-05-16,2305.07677,,Masked Audio Text Encoders are Effective Multi-Modal Rescorers,https://huggingface.co/papers/2305.07677,2,0,0,0,0,0 +2023-05-16,2305.07804,,Dr. LLaMA: Improving Small Language Models in Domain-Specific QA via Generative Data Augmentation,https://huggingface.co/papers/2305.07804,2,1,0,0,0,0 +2023-05-16,2305.08275,https://github.com/salesforce/ulip,ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding,https://huggingface.co/papers/2305.08275,2,0,0,0,0,0 +2023-05-16,2305.08298,,Symbol tuning improves in-context learning in language models,https://huggingface.co/papers/2305.08298,3,0,0,0,0,0 +2023-05-16,2305.08677,,Natural Language Decomposition and Interpretation of Complex Utterances,https://huggingface.co/papers/2305.08677,2,0,0,0,0,0 +2023-05-16,2305.08809,https://github.com/stanfordnlp/pyvene,Interpretability at Scale: Identifying Causal Mechanisms in Alpaca,https://huggingface.co/papers/2305.08809,2,0,1,0,0,0 +2023-05-16,2305.08675,https://github.com/facebookresearch/clip-rocket,Improved baselines for vision-language pre-training,https://huggingface.co/papers/2305.08675,2,0,0,0,0,0 +2023-05-16,2305.07922,https://github.com/salesforce/codet5,CodeT5+: Open Code Large Language Models for Code Understanding and Generation,https://huggingface.co/papers/2305.07922,4,2,1,0,0,0 +2023-05-16,2305.07961,,Leveraging Large Language Models in Conversational Recommender Systems,https://huggingface.co/papers/2305.07961,2,0,0,0,0,0 +2023-05-16,2305.08810,,AutoRecon: Automated 3D Object Discovery and Reconstruction,https://huggingface.co/papers/2305.08810,2,2,0,0,0,0 +2023-05-16,2305.08596,,DarkBERT: A Language Model for the Dark Side of the Internet,https://huggingface.co/papers/2305.08596,8,11,0,0,0,0 +2023-05-16,2305.07759,,TinyStories: How Small Can Language Models Be and Still Speak Coherent English?,https://huggingface.co/papers/2305.07759,31,7,0,0,0,0 +2023-05-15,2305.07243,https://github.com/neonbjb/tortoise-tts,Better speech synthesis through scaling,https://huggingface.co/papers/2305.07243,5,0,1,0,0,0 +2023-05-15,2305.07490,,ArtGPT-4: Artistic Vision-Language Understanding with Adapter-enhanced MiniGPT-4,https://huggingface.co/papers/2305.07490,1,0,0,0,0,0 +2023-05-15,2305.07378,,Surfacing Biases in Large Language Models using Contrastive Input Decoding,https://huggingface.co/papers/2305.07378,1,0,0,0,0,0 +2023-05-15,2305.07214,,MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition,https://huggingface.co/papers/2305.07214,1,0,0,0,0,0 +2023-05-15,2305.07447,https://github.com/bytedance/uss,Universal Source Separation with Weakly Labelled Data,https://huggingface.co/papers/2305.07447,2,0,1,0,0,0 +2023-05-15,2305.07514,,BlendFields: Few-Shot Example-Driven Facial Modeling,https://huggingface.co/papers/2305.07514,1,0,0,0,0,0 +2023-05-15,2305.07615,https://github.com/griff4692/calibrating-summaries,What are the Desired Characteristics of Calibration Sets? Identifying Correlates on Long Form Scientific Summarization,https://huggingface.co/papers/2305.07615,1,1,1,0,0,0 +2023-05-15,2305.07558,https://github.com/e-bug/fine-grained-evals,Measuring Progress in Fine-grained Vision-and-Language Understanding,https://huggingface.co/papers/2305.07558,1,0,1,0,0,0 +2023-05-15,2305.07153,,Towards best practices in AGI safety and governance: A survey of expert opinion,https://huggingface.co/papers/2305.07153,0,0,0,0,0,0 +2023-05-15,2305.07440,,Optimizing Memory Mapping Using Deep Reinforcement Learning,https://huggingface.co/papers/2305.07440,1,0,0,0,0,0 +2023-05-15,2305.07185,,MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers,https://huggingface.co/papers/2305.07185,9,8,0,0,0,0 +2023-05-12,2305.06594,,V2Meow: Meowing to the Visual Beat via Music Generation,https://huggingface.co/papers/2305.06594,1,0,0,0,0,0 +2023-05-12,2305.06424,https://github.com/hongwang600/flair,Bot or Human? Detecting ChatGPT Imposters with A Single Question,https://huggingface.co/papers/2305.06424,1,0,0,0,0,0 +2023-05-12,2305.06575,,Chain-of-Dictionary Prompting Elicits Translation in Large Language Models,https://huggingface.co/papers/2305.06575,1,0,0,0,0,0 +2023-05-12,2305.06404,,LACoS-BLOOM: Low-rank Adaptation with Contrastive objective on 8 bits Siamese-BLOOM,https://huggingface.co/papers/2305.06404,1,0,0,0,0,0 +2023-05-12,2305.06474,,Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction,https://huggingface.co/papers/2305.06474,1,0,0,0,0,0 +2023-05-12,2305.06500,https://github.com/salesforce/lavis,InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning,https://huggingface.co/papers/2305.06500,4,0,0,0,0,0 +2023-05-12,2305.06555,https://github.com/alibabaresearch/damo-convai,Domain Incremental Lifelong Learning in an Open World,https://huggingface.co/papers/2305.06555,1,2,0,0,0,0 +2023-05-12,2305.07004,,Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting,https://huggingface.co/papers/2305.07004,1,0,0,0,0,0 +2023-05-12,2305.07021,,Simple Token-Level Confidence Improves Caption Correctness,https://huggingface.co/papers/2305.07021,1,0,0,0,0,0 +2023-05-12,2305.07027,https://github.com/microsoft/cream,EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention,https://huggingface.co/papers/2305.07027,3,1,0,0,0,0 +2023-05-12,2305.06456,,Perpetual Humanoid Control for Real-time Simulated Avatars,https://huggingface.co/papers/2305.06456,1,1,0,0,0,0 +2023-05-12,2305.07015,https://github.com/pkuliyi2015/sd-webui-stablesr,Exploiting Diffusion Prior for Real-World Image Super-Resolution,https://huggingface.co/papers/2305.07015,4,0,1,0,0,0 +2023-05-12,2305.07011,https://github.com/google-research/google-research/tree/master/fvlm/rovit,Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers,https://huggingface.co/papers/2305.07011,4,1,0,0,0,0 +2023-05-12,2305.07017,https://github.com/ucsc-vlaa/clipa,An Inverse Scaling Law for CLIP Training,https://huggingface.co/papers/2305.07017,3,2,0,0,0,0 +2023-05-12,2305.06908,https://github.com/zhenye234/CoMoSpeech,CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model,https://huggingface.co/papers/2305.06908,5,0,0,0,0,0 +2023-05-11,2305.06077,,Relightify: Relightable 3D Faces from a Single Image via Diffusion Models,https://huggingface.co/papers/2305.06077,2,0,0,0,0,0 +2023-05-11,2305.05845,https://github.com/rohandkn/skribble2vid,Sketching the Future (STF): Applying Conditional Control Techniques to Text-to-Video Models,https://huggingface.co/papers/2305.05845,2,2,0,0,0,0 +2023-05-11,2305.05862,,Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks,https://huggingface.co/papers/2305.05862,4,1,0,0,0,0 +2023-05-11,2305.05973,,Privacy-Preserving Recommender Systems with Synthetic Query Generation using Differentially Private Large Language Models,https://huggingface.co/papers/2305.05973,1,0,0,0,0,0 +2023-05-11,2305.06218,,Multi-Task End-to-End Training Improves Conversational Recommendation,https://huggingface.co/papers/2305.06218,1,0,0,0,0,0 +2023-05-11,2305.06324,,Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception,https://huggingface.co/papers/2305.06324,1,0,0,0,0,0 +2023-05-11,2305.06131,,Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era,https://huggingface.co/papers/2305.06131,2,1,0,0,0,0 +2023-05-11,2305.05706,,DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects,https://huggingface.co/papers/2305.05706,1,0,0,0,0,0 +2023-05-11,2305.06351,https://github.com/lab4d-org/lab4d,Reconstructing Animatable Categories from Videos,https://huggingface.co/papers/2305.06351,1,0,0,0,0,0 +2023-05-11,2305.06356,https://github.com/synthesiaresearch/humanrf,HumanRF: High-Fidelity Neural Radiance Fields for Humans in Motion,https://huggingface.co/papers/2305.06356,1,1,0,0,0,0 +2023-05-11,2305.06355,https://github.com/opengvlab/ask-anything,VideoChat: Chat-Centric Video Understanding,https://huggingface.co/papers/2305.06355,3,1,1,0,0,0 +2023-05-11,2305.06161,,StarCoder: may the source be with you!,https://huggingface.co/papers/2305.06161,29,3,0,0,0,0 +2023-05-10,2305.05189,https://github.com/Qrange-group/SUR-adapter,SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models,https://huggingface.co/papers/2305.05189,2,2,1,0,0,0 +2023-05-10,2305.05176,,FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance,https://huggingface.co/papers/2305.05176,4,4,0,0,0,0 +2023-05-10,2305.05644,https://github.com/jayzhang42/federatedgpt-shepherd,Towards Building the Federated GPT: Federated Instruction Tuning,https://huggingface.co/papers/2305.05644,3,0,1,0,0,0 +2023-05-10,2305.05364,,Large Language Model Programs,https://huggingface.co/papers/2305.05364,2,0,0,0,0,0 +2023-05-10,2305.05383,https://github.com/microsoft/CodeBERT,Code Execution with Pre-trained Language Models,https://huggingface.co/papers/2305.05383,2,1,1,0,0,0 +2023-05-10,2305.05658,https://github.com/jimmyyhwu/tidybot,TidyBot: Personalized Robot Assistance with Large Language Models,https://huggingface.co/papers/2305.05658,2,1,0,0,0,0 +2023-05-10,2305.05065,,Recommender Systems with Generative Retrieval,https://huggingface.co/papers/2305.05065,4,4,0,0,0,0 +2023-05-10,2305.04966,,NerfAcc: Efficient Sampling Accelerates NeRFs,https://huggingface.co/papers/2305.04966,2,0,0,0,0,0 +2023-05-10,2305.05432,https://github.com/google-research-datasets/wit/blob/main/wikiweb2m.md,WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset,https://huggingface.co/papers/2305.05432,1,0,0,0,0,0 +2023-05-10,2304.09355,,To Compress or Not to Compress- Self-Supervised Learning and Information Theory: A Review,https://huggingface.co/papers/2304.09355,5,0,0,0,0,0 +2023-05-10,2305.05591,,AudioSlots: A slot-centric generative model for audio separation,https://huggingface.co/papers/2305.05591,3,0,0,0,0,0 +2023-05-10,2305.05662,,InternChat: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language,https://huggingface.co/papers/2305.05662,3,0,0,0,0,0 +2023-05-09,2305.04391,https://github.com/nvlabs/red-diff,A Variational Perspective on Solving Inverse Problems with Diffusion Models,https://huggingface.co/papers/2305.04391,1,0,0,0,0,0 +2023-05-09,2305.04461,,Locally Attentional SDF Diffusion for Controllable 3D Shape Generation,https://huggingface.co/papers/2305.04461,1,0,0,0,0,0 +2023-05-09,2305.04745,,Controllable Light Diffusion for Portraits,https://huggingface.co/papers/2305.04745,3,0,0,0,0,0 +2023-05-09,2305.04160,,X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages,https://huggingface.co/papers/2305.04160,2,7,0,0,0,0 +2023-05-09,2305.03937,https://github.com/arazd/residualprompts,Residual Prompt Tuning: Improving Prompt Tuning with Residual Reparameterization,https://huggingface.co/papers/2305.03937,1,0,0,0,0,0 +2023-05-09,2305.03981,,Pre-training Language Model as a Multi-perspective Course Learner,https://huggingface.co/papers/2305.03981,1,0,0,0,0,0 +2023-05-09,2305.04388,https://github.com/milesaturpin/cot-unfaithfulness,Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting,https://huggingface.co/papers/2305.04388,1,0,1,0,0,0 +2023-05-09,2305.04790,https://github.com/open-mmlab/multimodal-gpt,MultiModal-GPT: A Vision and Language Model for Dialogue with Humans,https://huggingface.co/papers/2305.04790,1,4,1,0,0,0 +2023-05-09,2305.04241,,Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens,https://huggingface.co/papers/2305.04241,1,1,0,0,0,0 +2023-05-09,2305.04268,,Multi-Space Neural Radiance Fields,https://huggingface.co/papers/2305.04268,1,0,0,0,0,0 +2023-05-09,2305.04789,,AvatarReX: Real-time Expressive Full-body Avatars,https://huggingface.co/papers/2305.04789,1,0,0,0,0,0 +2023-05-09,2305.04091,https://github.com/agi-edgerunners/plan-and-solve-prompting,Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models,https://huggingface.co/papers/2305.04091,1,1,0,0,0,0 +2023-05-08,2305.03509,https://github.com/poloclub/diffusion-explainer,Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion,https://huggingface.co/papers/2305.03509,1,1,0,0,0,0 +2023-05-08,2305.03514,https://github.com/salt-nlp/llms_for_css,Can Large Language Models Transform Computational Social Science?,https://huggingface.co/papers/2305.03514,1,0,1,0,0,0 +2023-05-08,2305.03719,,"Governance of the AI, by the AI, and for the AI",https://huggingface.co/papers/2305.03719,0,0,0,0,0,0 +2023-05-08,2305.03210,,AttentionViz: A Global View of Transformer Attention,https://huggingface.co/papers/2305.03210,1,1,0,0,0,0 +2023-05-08,2305.03689,,COLA: How to adapt vision-language models to Compose Objects Localized with Attributes?,https://huggingface.co/papers/2305.03689,2,1,0,0,0,0 +2023-05-08,2305.03286,https://github.com/xupei0610/compositemotion,Composite Motion Learning with Task Control,https://huggingface.co/papers/2305.03286,1,0,0,0,0,0 +2023-05-08,2305.03668,https://github.com/google-research-datasets/wit/blob/main/wikiweb2m.md,A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding,https://huggingface.co/papers/2305.03668,1,4,0,0,0,0 +2023-05-08,2305.03713,,Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos,https://huggingface.co/papers/2305.03713,1,0,0,0,0,0 +2023-05-08,2305.03695,https://github.com/liujch1998/vera,Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements,https://huggingface.co/papers/2305.03695,2,0,1,0,0,0 +2023-05-08,2305.03726,https://github.com/luodian/otter,Otter: A Multi-Modal Model with In-Context Instruction Tuning,https://huggingface.co/papers/2305.03726,6,3,1,0,0,0 +2023-05-08,2305.03111,,Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs,https://huggingface.co/papers/2305.03111,7,0,0,0,0,0 +2023-05-05,2305.02463,https://github.com/openai/shap-e,Shap-E: Generating Conditional 3D Implicit Functions,https://huggingface.co/papers/2305.02463,2,1,0,0,0,0 +2023-05-05,2305.02483,,ChatGPT-steered Editing Instructor for Customization of Abstractive Summarization,https://huggingface.co/papers/2305.02483,1,1,0,0,0,0 +2023-05-05,2305.03047,https://github.com/IBM/Dromedary,Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision,https://huggingface.co/papers/2305.03047,1,5,1,0,0,0 +2023-05-05,2305.02440,,Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs,https://huggingface.co/papers/2305.02440,1,0,0,0,0,0 +2023-05-05,2305.02783,,Automated Code generation for Information Technology Tasks in YAML through Large Language Models,https://huggingface.co/papers/2305.02783,1,0,0,0,0,0 +2023-05-05,2305.02412,,"Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents",https://huggingface.co/papers/2305.02412,1,0,0,0,0,0 +2023-05-05,2305.02790,,BranchNorm: Robustly Scaling Extremely Deep Transformers,https://huggingface.co/papers/2305.02790,1,0,0,0,0,0 +2023-05-05,2305.03052,https://github.com/basilevh/tcow,Tracking through Containers and Occluders in the Wild,https://huggingface.co/papers/2305.03052,1,0,0,0,0,0 +2023-05-05,2305.02678,,Real-Time Neural Appearance Models,https://huggingface.co/papers/2305.02678,1,1,0,0,0,0 +2023-05-05,2305.02968,https://github.com/facebookresearch/mtm,"Masked Trajectory Models for Prediction, Representation, and Control",https://huggingface.co/papers/2305.02968,1,0,0,0,0,0 +2023-05-05,2305.03027,,NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads,https://huggingface.co/papers/2305.03027,1,0,0,0,0,0 +2023-05-05,2305.03040,,TUVF: Learning Generalizable Texture UV Radiance Fields,https://huggingface.co/papers/2305.03040,1,0,0,0,0,0 +2023-05-05,2305.03049,,NeuralEditor: Editing Neural Radiance Fields via Manipulating Point Clouds,https://huggingface.co/papers/2305.03049,1,1,0,0,0,0 +2023-05-05,2305.02499,,AutoML-GPT: Automatic Machine Learning with GPT,https://huggingface.co/papers/2305.02499,3,5,0,0,0,0 +2023-05-05,2305.02549,,FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction,https://huggingface.co/papers/2305.02549,5,2,0,0,0,0 +2023-05-05,2305.02665,,Learning Language-Specific Layers for Multilingual Machine Translation,https://huggingface.co/papers/2305.02665,2,0,0,0,0,0 +2023-05-05,2305.03043,,Single-Shot Implicit Morphable Faces with Consistent Texture Parameterization,https://huggingface.co/papers/2305.03043,5,0,0,0,0,0 +2023-05-05,2305.03048,https://github.com/zrrskywalker/personalize-sam,Personalize Segment Anything Model with One Shot,https://huggingface.co/papers/2305.03048,7,1,1,0,0,0