huggingface/HuggingDiscussions · [FEEDBACK] Daily Papers

Hugging Face org Jun 12, 2024

•

edited Jul 25, 2024

Note that this is not a post about adding new papers, it's about feedback on the Daily Papers community update feature.

How to submit a paper to the Daily Papers, like @akhaliq (AK)?

Submitting is available to paper authors
Only recent papers (less than 7d) can be featured on the Daily

Then drop the arxiv id in the form at https://huggingface.co/papers/submit

Add medias to the paper (images, videos) when relevant
You can start the discussion to engage with the community

Please check out the documentation

RollingPig

Jun 17, 2024

https://arxiv.org/abs/2406.01954

runninglsy

Jun 18, 2024

•

edited Jun 27, 2024

We are excited to share our recent work on MLLM architecture design titled "Ovis: Structural Embedding Alignment for Multimodal Large Language Model".

Paper: https://arxiv.org/abs/2405.20797
Github: https://github.com/AIDC-AI/Ovis
Model: https://huggingface.co/AIDC-AI/Ovis-Clip-Llama3-8B
Data: https://huggingface.co/datasets/AIDC-AI/Ovis-dataset

Yiwen-ntu

Jun 18, 2024

This comment has been hidden

kramp

Hugging Face org Jun 18, 2024

@Yiwen-ntu for now we support only videos as paper covers in the Daily.

renqiux0302

Jun 19, 2024

This comment has been hidden

taki555

Jun 19, 2024

This comment has been hidden

devichand

Jun 20, 2024

we are excited to share our work titled "Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models" : https://arxiv.org/abs/2406.12644

pietroastolfi

Jun 20, 2024

Consistency-diversity realism Pareto fronts of conditional image generative models -- http://arxiv.org/abs/2406.10429

fengyao1909

Jun 21, 2024

•

edited Jun 24, 2024

"Data Contamination Can Cross Language Barriers". -- https://arxiv.org/pdf/2406.13236

diwank

Jun 22, 2024

How do I add papers that are on Nature rather than arXiv?

alexhsu

Jun 24, 2024

Share our latest paper: CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation (https://arxiv.org/abs/2406.05365)

kamwoh

Jun 24, 2024

Gorgeous: Create Your Desired Character Facial Makeup from Any Ideas https://arxiv.org/abs/2404.13944

RishabhBhardwaj

Jun 24, 2024

🎉 We are thrilled to announce the publication of my first research paper on model merging, Della-Merging. Della employs a magnitude-based sampling approach to eliminate redundant delta parameters, reducing interference when merging homologous models (those fine-tuned from the same backbone).

Paper: https://arxiv.org/abs/2406.11617
Github: https://github.com/declare-lab/della

Della outperforms existing homologous model merging techniques such as DARE and TIES. Across three expert models (LM, Math, Code) and their corresponding benchmark datasets (AlpacaEval, GSM8K, MBPP), Della achieves an improvement of 3.6 points over TIES and 1.2 points over DARE.

kramp

Hugging Face org Jun 24, 2024

•

edited Jun 24, 2024

How do I add papers that are on Nature rather than arXiv?

@diwank we support only arXiv for now

mactavish91

Jun 25, 2024

LVBench is a benchmark designed to evaluate and enhance the capabilities of multimodal models in understanding and extracting information from long videos up to two hours in duration. Our extensive evaluations reveal that current multimodal models still underperform on these demanding long video understanding tasks.

Paper: https://arxiv.org/abs/2406.08035
Github: https://github.com/THUDM/LVBench

keerti

Jun 26, 2024

STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models

Paper: https://arxiv.org/abs/2406.05872
Code: https://github.com/IBM/starling-agent

simonycl

Jun 26, 2024

SIT: Fine-tuning Large Language Models with Sequential Instructions

Paper: https://arxiv.org/pdf/2403.07794
Data and model: https://seqit.github.io
Code: https://github.com/hanxuhu/SeqIns

iiiiwis

Jun 26, 2024

Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

paper: https://arxiv.org/pdf/2406.17419
code: https://github.com/MozerWang/Loong

christine1729

Jun 26, 2024

TRAIT: Task Oriented In-Domain Data Augmentation (for Continual Pre-training of LLMs), https://arxiv.org/abs/2406.16694

jdps

Jun 26, 2024

•

edited Jun 26, 2024

Slot State Space Models

paper: https://arxiv.org/abs/2406.12272

yushun0410

Jun 27, 2024

•

edited Jun 27, 2024

We are excited to share our recent work: "Adam-mini: Use Fewer Learning Rates To Gain More” https://arxiv.org/abs/2406.16793

We propose Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini can also achieve 49.5% higher throughput than AdamW on Llama2-7B pre-training. The design of Adam-mini is inspired by certain Hessian structures we observed on Transformers. Code available at: https://github.com/zyushun/Adam-mini

BestWishYsh

Jun 27, 2024

We have developed a new text-to-video generation benchmark for metamorphic evaluation. We specifically design four major categories for time.lapse videos (as shown below), including biological, human-created, meteorological, and physical videos.and extend these to 75 subcategories.
paper: https://arxiv.org/abs/2406.18522
leaderboard: https://huggingface.co/spaces/BestWishYsh/ChronoMagic-Bench
code: https://github.com/PKU-YuanGroup/ChronoMagic-Bench

brucewan666

Jun 28, 2024

KV cache optimization for LLMs and MLLMs:

LLMs: D2O: Dynamic Discriminative Operations for Efficient Generative Inference of Large Language Models, arxiv: https://arxiv.org/abs/2406.13035
MLLMs: look-m: look-once optimization in kv cache for efficient multimodal long-context inference, arxiv: https://arxiv.org/html/2406.18139v1

ionutmodo

Jun 28, 2024

•

edited Jun 28, 2024

We developed a generic schematic for the optimization loop to reduce the memory footprint of second order, full-matrix adaptive optimizers.💾

Our target optimizers are the ones that store a window of past gradients, such as M-FAC and GGT, which usually require storing around 500 to 1000 gradients (equivalent to this many model copies in the GPU memory).

Our technique uses sparse/low-rank gradients and Error Feedback and shows we can reduce the memory footprint of optimizers' state by 30x for GGT and 45x to 60x for M-FAC. 📉

Why is this important? 🤔
In the case of M-FAC, which is an approximation of Natural Gradient (NG) (most commonly known optimizer about this is K-FAC), our work allows using approximations of NG at larger scale, such as ResNet-18 / ImageNet and BERT-Base finetuning.

Please experiment with this NG approximation and let us know about your findings!

📄 Our arxiv paper: https://arxiv.org/pdf/2306.06098
💻 Our code on GitHub: https://github.com/IST-DASLab/EFCP

mskrt

Jul 1, 2024

was waiting for papers to get verified on my account so i could submit but then the 7 day window closed :( any chance we can still submit ours?

mukeshdalal

Jul 1, 2024

Hi AK and HF team,

I would appreciate your considering my recent ArXiv paper "Model Callers for Transforming Predictive and Generative AI Applications" for inclusion in the HF daily papers. I could not submit directly to your site, since I don't already have a paper in HF DPs.
Paper: https://arxiv.org/abs/2406.15377
Github code: https://github.com/mukdal/modelcaller
Python library: pip install modelcaller

Abstract: We introduce a novel software abstraction termed "model caller," acting as an intermediary for AI and ML model calling, advocating its transformative utility beyond existing model-serving frameworks. This abstraction offers multiple advantages: enhanced accuracy and reduced latency in model predictions, superior monitoring and observability of models, more streamlined AI system architectures, simplified AI development and management processes, and improved collaboration and accountability across AI/ML/Data Science, software, data, and operations teams. Model callers are valuable for both creators and users of models within both predictive and generative AI applications. Additionally, we have developed and released a prototype Python library for model callers, accessible for installation via pip or for download from GitHub.

Thanks,
Mukesh Dalal

sahupritish

Jul 3, 2024

Hello AK and HF Team,
We would to add our recent paper "Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification" in HF daily papers. I am putting this request here, since I don't already have a paper in HF daily papers.

Paper: https://arxiv.org/pdf/2407.02352
Authors: Pritish Sahu, Karan Sikka, Ajay Divakaran

Thanks,
Pritish Sahu

tono

Jul 3, 2024

Hello AK and HF Team,

We would like to share our 2022 paper now recently published in Automation in Construction, Science, Elsevier, "Vitruvio: Conditional variational autoencoder to generate building meshes via single perspective sketches"

📄 Paper: https://www.sciencedirect.com/science/article/pii/S0926580524002346?dgcid=author (50days free access).
📄 Our arxiv paper: https://arxiv.org/abs/2210.13634

We demonstrated the critical importance of considering building orientation in reconstruction projects. Additionally, we have provided a comprehensive baseline and dataset specifically for building reconstruction. Help us spread the word within the AEC industry to raise awareness about these advancements. Watch our video presenting the problem and our findings: VIDEO .

Code: https://github.com/CDInstitute/Vitruvio

Feel free to use this message on your social media, blog, or any platform where you wish to share your research and video.

Alberto Tono , Heyaojing Huang , Ashwin Agrawal, and Martin Fischer

KT313

Jul 5, 2024

This comment has been hidden

adamdad

Jul 9, 2024

@kramp @akhaliq
Dear Team,
Thanks for the nice work. But our paper just released today was marked as "older than 7 days".
Could you please kind check?
paper: https://arxiv.org/abs/2407.06182 code: https://github.com/Adamdad/vico and project page https://adamdad.github.io/vico/
Best,

zwq2018

Jul 10, 2024

•

edited Jul 10, 2024

@kramp @akhaliq
We would to share our recent paper "Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model" in HF daily papers. We find that even leading multimodal models such as gpt-4o or Claude-3.5-Sonnet have difficulty recognizing simple abstract images, e.g., reading time from a clock, understanding a flowchart, or planning a route using a road map. Therefore, we design a multimodal self-instruct to synthesize the abstract image benchmark and training set, and SFT a model to make it understand the abstract image.

If you are interested in our articles, we would appreciate it!

Paper: https://arxiv.org/abs/2407.07053
Code: https://github.com/zwq2018/Multi-modal-Self-instruct
Dataset: https://huggingface.co/datasets/zwq2018/Multi-modal-Self-instruct
Leaderboard: https://multi-modal-self-instruct.github.io/

Thanks,
Wenqi Zhang

fdqerq22ds

Jul 10, 2024

🚀 Launching ORLM: the first open-source Operations Research LLM, powered by our OR-Instruct process! 🛠️

🏆 ORLMs achieves SOTA on NL4OPT, MAMO, & the new IndustryOR benchmarks based on different 7b backbones!

📄 Paper: https://arxiv.org/pdf/2405.17743
💻 Code: https://github.com/Cardinal-Operations/ORLM

cientgu

Jul 10, 2024

How to submit a paper which is not submited on arxiv?
paper: https://cientgu.github.io/files/VisualSignalDecomposition.pdf

kramp

Hugging Face org Jul 10, 2024

How to submit a paper which is not submited on arxiv?

@cientgu we support only papers from arxiv.org

hao9610

Jul 11, 2024

Hello AK and HF Team @kramp @akhaliq :
We would like to share our recent paper "OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion". We propse the OV-DINO, a novel unified open vocabulary detection approach that offers superior performance and effectiveness for practical real-world application. It entails a Unified Data Integration pipeline that integrates diverse data sources for end-to-end pre-training, and a Language-Aware Selective Fusion module to improve the vision-language understanding of the model. And it shows significant performance improvement on COCO and LVIS benchmarks compared to previous methods, achieving relative improvements of +2.5% AP on COCO and +13.6% AP on LVIS compared to G-DINO in zero-shot evaluation.

If you are interested in our paper, we would great appreciate it!

Paper: https://arxiv.org/abs/2407.07844
Code: https://github.com/wanghao9610/OV-DINO

Thanks,
Hao Wang

jopan

Jul 11, 2024

Dear Team,

I will like to share our paper recently accepted by ECAI. In this paper, we introduce FlowLearn, a novel dataset designed to test the capabilities of Large Vision-Language Models (LVLMs) in understanding and interpreting flowcharts. To the best of our knowledge, this is the first release of a dataset specifically tailored for flowchart comprehension and includes a comprehensive evaluation of LVLMs. Our research reveals significant challenges these models face, such as recognizing textual and visual components and their relationships within flowcharts. Through comprehensive experiments, we assess various state-of-the-art LVLMs, highlighting the gaps and providing insights for future advancements in machine comprehension of graphical data.

Title: FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding
Paper: https://arxiv.org/abs/2407.05183v1
Dataset: https://huggingface.co/datasets/jopan/ FlowLearn
Code: https://github.com/Jo-Pan/FlowLearn

Lucywang720

Jul 15, 2024

Dear Team,

We propose a novel approach to modulate LLM behaviors through direct parameter editing, offering an alternative to traditional alignment methods. Our new approach achieves efficient modulation with inference-level computational cost! Achieve up to 90% detoxification with inference-level computational cost!

Paper: https://arxiv.org/abs/2407.08770
Code: https://github.com/lucywang720/model-surgery/

Gootter12

Jul 16, 2024

@akhaliq @kramp Hi AK and HF team,

We would like to share our recent paper, "AUITestAgent: Natural Language-Driven GUI Functional Bug Tester." We propose AUITestAgent, the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire process of GUI interaction and function verification. Experiments on customized benchmarks demonstrate that AUITestAgent outperforms existing tools in the quality of generated GUI interactions and achieved the accuracy of verifications of 94%. Besides, field deployment in Meituan has shown AUITestAgent's practical usability.

Paper: https://arxiv.org/abs/2407.09018
Web: https://github.com/bz-lab/AUITestAgent

Thanks,
Yongxiang Hu

Bakerbunker

Jul 16, 2024

This comment has been hidden

cdminix

Jul 18, 2024

•

edited Jul 18, 2024

Hi everyone,

we'd like to share our objective evaluation of recent TTS systems.
Since many new TTS systems have been released, with a wide range of approaches, we think evaluating them is very important.
It would also be beneficial to have access to objective evaluation, since that way, models can be evaluated during training, and evaluating new models is easier.
We found that objective evaluation correlates well with human ratings when done accross a wide set of factors such as Prosody, Speaker, Environment, etc.
The TTS Arena recently released by huggingface and @mrfakename was a great human evaluation to compare against, and our methods showed a high correlation with their scores. We also showed strong correlation with MOS scores from the Blizzard 2008 challenge and the Back to the Future Blizzard Paper from 2022.

Paper: https://arxiv.org/abs/2407.12707
Web: https://ttsdsbenchmark.com
HF Space: https://huggingface.co/spaces/ttsds/benchmark
Code: https://github.com/ttsds/ttsds

It would be amazing to have our benchmark featured in the Daily Papers!

Cheers,
Christoph

prlz77

Jul 19, 2024

Hello @akhaliq @kramp

We would like to present ✨AurA✨, a zero-overhead inference-time intervention upon LLM activations to mitigate harmful behaviors, such as toxicity. We will present it next week at ICML2024.

In the figure below we show the toxicity reduction between the original model (circles) and using our AURA intervention (stars), for different LLMs. PPL stands for Perplexity and RTP refers to the Real Toxicity Prompts dataset. We also show AurA can reduce toxicity even in the presence of adversarial prompts.

It would be great if you could feature it in the Daily Papers!

📄https://arxiv.org/abs/2407.12824
🔗 Code: https://github.com/apple/ml-aura/tree/main

ZhiyuanthePony

Jul 20, 2024

@akhaliq @kramp Hi AK and HF team,

We are excited to present our ECCV 2024 paper, "ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation". This innovative approach is the first to explore score distillation in training text-to-3D generation models, marking a shift from optimization-based to learning-based generation. Unlike existing data-driven methods, our approach enables the training of a high-quality text-to-3D generator in an unsupervised manner. This method opens up numerous potential applications, and we will continue to explore its possibilities.

Figure. First row: ASD application with optimization-based text-to-3D. Second row: ASD applied to learning-based text-to-3D, enabling the training of text-to-3D generation models without real 3D data. ASD can extend the training corpus to up to 100,000 text prompts.

Paper: https://arxiv.org/pdf/2407.02040
Code: https://github.com/theEricMa/ScaleDreamer

mannykayy

Jul 20, 2024

Hi all,

We have been working on a pretty cool AI agent framework.

AutoGRAMS is a software 2.5 framework that makes it easy to build very complex AI agents using either spreadsheets or python code or the AutoGRAMS scripting language.

Paper: https://arxiv.org/abs/2407.10049
Code: https://github.com/autograms/autograms

https://www.autograms.ai/

Happy to recieve any feedback.

zhiminy

Jul 23, 2024

•

edited Jul 23, 2024

Our paper investigates various foundation model (FM) leaderboards across multiple platforms, focusing on their types, operational workflows ("LBOps"), and issues ("leaderboard smells"). We also curate an awesome list of FM leaderboards, check here.

spitis

Jul 24, 2024

Hi all, wanted to share our recent work on context-conditioned reward modeling: https://arxiv.org/abs/2407.14916 (accompanying context-conditioned preference dataset to follow shortly).

We construct a Reasonable Preference Reversal (RPR) dataset (to follow) and use it to finetune a 7B parameter reward model that outperforms Llama3-70B (as well as other reward models) on context-conditioned preference queries.

Our work could be used to improve performance when conditioning reward models on principles (e.g., for constitutional AI) or user profiles (e.g. for pluralistic alignment), or other contexts. The goal is to reduce ambiguity in preference queries, and work toward improving human preference modeling.

xiaobiaodu

Jul 25, 2024

@akhaliq @kramp Hi AK and HF team

We propose a novel method, called DreamCar, to reconstruct 3D real cars in the moving-forward scenes. I guess you would be interested in https://arxiv.org/pdf/2407.16988
Here is our project page: https://xiaobiaodu.github.io/dreamcar-project/

xiaobiaodu

Jul 25, 2024

@akhaliq @kramp Hi AK and HF team

We propose a novel dataset, called 3DRealCar, containing 2500 large-scale 3D real cars for various tasks. I guess you would be interested in https://arxiv.org/abs/2406.04875

Here is our project page: https://xiaobiaodu.github.io/3drealcar/index.html

cientgu

Jul 29, 2024

@akhaliq @kramp Hi AK and HF team
This paper (https://arxiv.org/pdf/2407.18290) discusses several key questions in current visual generation community.
However, it cannot be submitted to daily paper because the arxiv reviews this paper for three weeks, even though it just appeared on arxiv today. Could you help post it on daily paper?

jfduan

Jul 30, 2024

A survey paper (https://arxiv.org/abs/2407.20018) discusses the effcient LLM training system and infra.

li-qing

Aug 2, 2024

@akhaliq @kramp Hi AK and HF team,

Our recent work "FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models" (https://arxiv.org/abs/2407.11522) builds a new dataset FIRE that empowers VLMs to integrate user feedback into the refined responses spontaneously, and provides a comprehensive evaluation for the feedback-refining ability of existing methods. We also host our dataset, model, and demo on Huggingface.

Project: https://mm-fire.github.io/
Dataset: https://huggingface.co/datasets/PengxiangLi/FIRE
Model: https://huggingface.co/li-qing/llava-next-llama3-8b-student-fire
Gradio Demo: https://li-qing-fire.hf.space

JackyZhuo

Aug 6, 2024

@akhaliq @kramp Hi AK and HF team,
Our recent work, Lumina-mGPT (https://arxiv.org/abs/2408.02657), introduces a multimodal autoregressive transformer capable of various vision and language tasks, particularly excelling in generating flexible photorealistic images from text descriptions. We have released our code and model on GitHub (https://github.com/Alpha-VLLM/Lumina-mGPT)

martinezmatias

Aug 6, 2024

Hi HF team
This new paper (https://arxiv.org/abs/2408.01050 https://huggingface.co/papers/2408.01050 ) discusses hyperparameter optimization of HuggingFace pipelines and vLLM in the context of code generation.

Minami-su

Aug 30, 2024

We are excited to share our recent work "ncRNA Coding Potential Prediction Using BiLSTM and Transformer Encoder-Based Model".
paper already accepted
Paper: https://pubs.acs.org/doi/10.1021/acs.jcim.4c01097
Github: https://github.com/Minami-su/nBAT
Model: https://huggingface.co/Minami-su/nBAT
Data: https://huggingface.co/Minami-su/nBAT

burtenshaw

Hugging Face org Sep 2, 2024

•

edited Sep 2, 2024

Hey 👋

We're excited to share this paper on open human preferences for LLMs: https://arxiv.org/abs/2408.16961

xiuqhou

Sep 5, 2024

Hi, we are excited to share our recent work "Relation DETR: Exploring Explicit Position Relation Prior for Object Detection".
Paper: https://arxiv.org/abs/2407.11699v1
Github: https://github.com/xiuqhou/Relation-DETR
Dataset: https://huggingface.co/datasets/xiuqhou/SA-Det-100k

xcjthu

Sep 6, 2024

Hi, we are excited to share our recent paper on modular large language models "Configurable Foundation Models: Building LLMs from a Modular Perspective".
In this paper, we provide a comprehensive overview of existing efforts to decompose LLMs into modules and conduct an empirical study to verify the modularity characteristic of densely trained LLMs, Llama3 and Mistral.
Paper: https://arxiv.org/abs/2409.02877

yixuantt

Sep 6, 2024

We are excited to share our recent work, "Pooling And Attention: What Are Effective Designs For LLM-Based Embedding Models?"

Paper: https://arxiv.org/abs/2409.02727
Github: https://github.com/yixuantt/PoolingAndAttn

deleted

Sep 7, 2024

This comment has been hidden

KMJo

Sep 7, 2024

We're thrilled to share our latest work, "Skip-and-Play: Depth-Driven Pose-Preserved Image Generation for Any Objects"
Paper: https://arxiv.org/abs/2409.02653

deleted

Sep 7, 2024

This comment has been hidden

ZCODE0

Sep 14, 2024

We're thrilled to share our latest work, "Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models", the first first-order FL method with shared randomness that significantly enhances the scalability of existing federated full-parameter tuning approaches by achieving high computational efficiency, reduced communication overhead, and fast convergence, all while maintaining competitive model accuracy.

Paper: https://arxiv.org/abs/2409.06277
Github: https://github.com/allen4747/Ferret

beeformer

Sep 17, 2024

Hi, I'd like to share our paper beeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems

Paper https://arxiv.org/pdf/2409.10309
Github https://github.com/recombee/beeformer

adibvafa

Sep 17, 2024

•

edited Sep 17, 2024

🚀 Excited to share our latest preprint: "CodonTransformer: a multispecies codon optimizer using context-aware neural networks"!

CodonTransformer is a groundbreaking deep learning model that optimizes DNA sequences for heterologous protein expression across 164 species.
By leveraging Transformer architecture and a novel training stratey named STREAM, it generates host-specific DNA sequences with natural-like codon patterns, minimizing negative regulatory elements.

💥 Website
https://adibvafa.github.io/CodonTransformer/

⭐ GitHub (Please give us a :star:!)
https://github.com/Adibvafa/CodonTransformer

🤖 Colab Notebook (Try it out!)
https://adibvafa.github.io/CodonTransformer/GoogleColab

🪼 Model
https://huggingface.co/adibvafa/CodonTransformer

📝 Paper
https://www.biorxiv.org/content/10.1101/2024.09.13.612903

Please share with anyone interested!

adibvafa

Sep 17, 2024

https://www.biorxiv.org/content/10.1101/2024.09.13.612903

NoSavedDATA

Sep 19, 2024

No Saved Kaleidosope: an 100% Jitted Neural Network Coding Language with Pythonic Syntax
https://arxiv.org/abs/2409.11600

yangwang92

Sep 26, 2024

This comment has been hidden

dong1park92

Sep 29, 2024

Contribution-based Low-Rank Adaptation with Pre-training Model for Real Image Restoration
paper: https://arxiv.org/pdf/2408.01099

Siheng99

Sep 30, 2024

I’m excited to share our recent work with everyone: A Survey on the Honesty of Large Language Models. In this paper, we systematically review the current research on LLM honesty and propose potential future research directions, aiming to contribute to the development of this field.

Paper: https://arxiv.org/pdf/2409.18786
Project Page: https://github.com/SihengLi99/LLM-Honesty-Survey

yixuantt

Oct 1, 2024

I’m excited to share our recent work with everyone: Do We Need Domain-Specific Embedding Models? An Empirical Investigation. In this paper, we introduce the FinMTEB and empirically analyze the significant performance drop of seven SOTA embedding models on domain-specific context with four controlling metrics, rethink the necessity of domain-specific llm-based embedding models and benchmarks

Paper: https://arxiv.org/pdf/2409.18511
Project Page: https://github.com/yixuantt/FinMTEB

JamalSM

Oct 1, 2024

Leveraging Foundation Models for Efficient Federated Learning in Resource-restricted Edge Networks

https://arxiv.org/pdf/2409.09273

soyeonm

Oct 1, 2024

•

edited Oct 1, 2024

Excited to share our latest preprint: "Embodied-RAG: General non-parametric Embodied Memory for Retrieval and Generation"!

In recent years, we have seen progress of foundation models (RT-X models and V/LLM's) as embodied agents. However, methods to augment these models with general-purpose long-term/large-scale memory has been under-explored. Clearly, as the environment becomes larger (e.g. outdoors) in navigation/"mobile" manipulation, we need a general-purpose external memory.

We introduce Embodied-RAG, a General Non-Parametric Method for Retrieval and Generation.

Project Page: https://quanting-xie.github.io/Embodied-RAG-web/

Paper: https://arxiv.org/abs/2409.18313

Youtube Demo: https://youtu.be/LcB89Rdyxhg

More demos in the website!

ari7cr

Oct 3, 2024

Could you also enable indexing of other preprint servers like TechRxiv?

honggen

Oct 5, 2024

We are glad to share our preprint work: REAL: Response Embedding-based Alignment for LLMs. An efficient and offline high-quality data selection for LLM alignment.
Using the response embedding to select the preference and non-preference pairs for DPO fine-tuning.
The paper link is https://arxiv.org/pdf/2409.17169

tytyt

Oct 6, 2024

Hello, everyone. We are pleased to present our paper: "Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding"
To the best of our knowledge, this is the first training-free acceleration method for auto-regressive text-to-image generation models.
You can access the full paper here: https://arxiv.org/abs/2410.01699

DonJoey

Oct 8, 2024

We're thrilled to share our recent works,

''Collaborative Performance Prediction for Large Language Models'', While scaling laws have been a popular method for predicting LLM performance on downstream tasks, our research shows that simpler approaches like matrix factorization and neural collaborative filtering can yield even better results. We encourage a collaborative framework where model design information is shared, allowing for accurate predictions of future models' performance on downstream tasks. Our framework supports integration with open-source leaderboards, such as Open Leaderboard and HELM, enabling developers to predict their models' performance by leveraging historical model data. You can access the full paper here: https://arxiv.org/abs/2407.01300.
''RevisEval: Improving LLM-as-a-Judge via Response-Adapted References'', Evaluation has long been a cornerstone of progress in text generation capabilities. With the limitations of traditional metrics, LLM-as-a-Judge has become a viable method for assessing generative abilities in open-ended tasks, though it still faces significant reliability gaps compared to human evaluation. By harnessing the revision capabilities of LLMs, we unlock the potential of references in traditional evaluations, generating response-adapted references that can significantly enhance general evaluation methods on various tasks. This approach not only boosts the accuracy of LLM-as-a-Judge but also revives traditional metrics like BLEU, enabling them to effectively evaluate tasks on benchmarks such as MTBench and Alpacafarm, with results that are even comparable to those of LLM-as-a-Judge. It also performs well in using weak LLMs for evaluation and mitigating positional bias. You can access the full paper here: https://arxiv.org/abs/2410.05193

timothelaborie

Oct 9, 2024

https://arxiv.org/abs/2410.01131

luomingshuang

Oct 10, 2024

•

edited Oct 10, 2024

M3GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation，https://arxiv.org/pdf/2405.16273 ， accepted by NeurIPS 2024

Willpat

Oct 14, 2024

https://arxiv.org/abs/2410.05970

Mihaiii

Oct 14, 2024

I don't have a paper, but I made a small sample framework researchers could use for sampling experiments.

https://github.com/Mihaiii/backtrack_sampler

luping-liu

Oct 16, 2024

https://arxiv.org/abs/2410.11817
https://github.com/luping-liu/LongAlign

JesseTNRoberts

Oct 17, 2024

•

edited Oct 17, 2024

https://arxiv.org/abs/2305.17026

lmc22

Oct 19, 2024

Text4Seg: Reimagining Image Segmentation as Text Generation
Paper: https://arxiv.org/abs/2410.09855
Github: https://github.com/mc-lan/Text4Seg

hhyangcs

Oct 20, 2024

Depth Any Video with Scalable Synthetic Data

Depth Any Video introduces a scalable synthetic data pipeline, capturing 40,000 video clips from diverse games, and leverages powerful priors of generative video diffusion models to advance video depth estimation. By incorporating rotary position encoding, flow matching, and a mixed-duration training strategy, it robustly handles varying video lengths and frame rates. Additionally, a novel depth interpolation method enables high-resolution depth inference, achieving superior spatial accuracy and temporal consistency over previous models.

Arxiv link: https://arxiv.org/abs/2410.10815
Project page: https://depthanyvideo.github.io
Code: https://github.com/Nightmare-n/DepthAnyVideo
Huggingface gradio demo: https://huggingface.co/spaces/hhyangcs/depth-any-video

ronniecao

Oct 28, 2024

•

edited Oct 28, 2024

We are excited to share our recent proposed code completion benchmark "Codev-Bench: How Do LLMs Understand Develop-Centric Code Completion?".

📑 https://arxiv.org/abs/2410.01353
🚀 https://github.com/LingmaTongyi/Codev-Bench
🤗 https://huggingface.co/datasets/TongyiLingma/CodevBench

shallowdream204

Oct 29, 2024

📑 https://arxiv.org/abs/2410.18666
🚀 https://github.com/shallowdream204/DreamClear
🤗 https://huggingface.co/shallowdream204/DreamClear

zpschang

Nov 5, 2024

Hi AK and HF team,

Our paper https://arxiv.org/abs/2411.00785 titled "IGOR: Image-GOal Representations Are the Atomic Control Units for Foundation Models in Embodied AI" is just be made public today, although being onhold by arxiv for more than 7 days. However, the Daily paper submission website shows that it is more than 7 days old. We appreciate your help if you could help post the paper on the Daily Paper.

ionutmodo

Nov 5, 2024

•

edited Nov 5, 2024

Hi AK and HF team,

I am happy to introduce our MicroAdam optimizer, a low-memory variant of Adam optimizer that has a memory footprint of 0.9d bytes, compared to 2d bytes of AdamW-8bits. We achieve this result by only storing 99% sparse gradients and reconstructing the optimizer states at each step, which is a fast operation due to our optimized implementation using CUDA kernels. MicroAdam was mainly developed for finetuning tasks in mind. Please check out our work:

(Paper) 📑: https://arxiv.org/pdf/2405.15593
(Code) 🚀: https://github.com/IST-DASLab/MicroAdam

Wenxuuuan

Nov 7, 2024

This comment has been hidden

ionutmodo

Nov 7, 2024

Hi everyone,

I would like to introduce GridSearcher, a tool we have been developing in our DAS-Lab @ ISTA to speed up the hyper-parameter tuning process. Grid searcher is a pure python project designed to bypass the bash scripts to run grids of parameters for the ML projects. It provides a more flexible and user friendly way to manage and execute multiple programs in parallel. It is designed for systems where users have direct SSH access to machines and can run their python scripts right away.

Do you have access to your GPUs via SLURM? no problem, you can run srun --gres=gpu:8 --partition=gpu100 --time=10-00:00:00 --mem=1000G --cpus-per-task=200 --pty bash to request a bash session to your cluster to be able to use direct ssh access on the node, then use GridSearcher.

I am sure our project will help you save time, please check out our code on GitHub:

(Code) 🚀: https://github.com/IST-DASLab/GridSearcher/

Djrango

Nov 25, 2024

We are excited to introduce Qwen2vl-Flux, a state-of-the-art multimodal image generation model that enhances FLUX with Qwen2VL's vision-language understanding capabilities.

🔥 Key Features:

Enhanced vision-language understanding through Qwen2VL-7B
Multiple generation modes including variation, img2img, inpainting, and ControlNet guidance
Flexible attention mechanism and structural control

📚 Technical Details:

Integrates Qwen2VL (7B) with FLUX architecture

🤗 Resources:

Model: https://huggingface.co/Djrango/Qwen2vl-Flux
Code: https://github.com/erwold/qwen2vl-flux
Technical Report: https://github.com/erwold/qwen2vl-flux/blob/main/technical-report.pdf

YUXU915

Nov 26, 2024

•

edited Nov 26, 2024

We would like to introduce our new work: "HeadRouter: A Training-free Image Editing Framework for MM-DiTs by Adaptively Routing Attention Heads", which leverages different attention heads for real image editing. Our work is based on flux and is training free!

Project page: https://yuci-gpt.github.io/headrouter/
Paper: https://arxiv.org/abs/2411.15034

SimonX

Nov 26, 2024

🔥🔥🔥 We are excited to share a new efficient small language model architecture with parallel Mamba and Attention fusion - Hymba.

We study the tradeoff between Mamba and Attention, exploring how they can be combined, how the attention sink and forced-to-attend phenomena can be mitigated, and how the KV cache can be shared across layers.

The team delivered an end-to-end solution featuring a novel architecture, selecting data, a five-stage training setup, and trained both Base and Instruct models. Release is with open license.

A standout feature is that the Hymba-1.5B Base model outperforms LLaMA 3.2-3B, despite being trained on 7× fewer tokens and achieving 12× cache reduction.

😊 Model: https://huggingface.co/collections/nvidia/hymba-673c35516c12c4b98b5e845f
📖 Paper: https://www.arxiv.org/abs/2411.13676

Bowen232

Nov 27, 2024

Inspired by "big-little core" chip design, we introduce "one-big-many-small" grouping for efficient multi-model deployment, cutting storage costs from NM to (1+rN)M!

Paper: https://arxiv.org/abs/2406.08903
Github: github.com/thunlp/Delta-CoMe

LegendBC

Nov 27, 2024

Hi AK and HF team,

I am happy to introduce our DiffusionDrive, a real-time end-to-end autonomous driving model, which is much faster (10x reduction in diffusion denoising steps), more accurate (3.5 higher PDMS on NAVSIM), and more diverse (64% higher mode diversity score) than the vanilla diffusion policy. Without bells and whistles, DiffusionDrive achieves record-breaking 88.1 PDMS on NAVSIM benchmark with the same ResNet-34 backbone by directly learning from human demonstrations, while running at a real-time speed of 45 FPS. Please check out our work:

(Paper) 📑: https://arxiv.org/abs/2411.15139
(Code) 🚀: https://github.com/hustvl/DiffusionDrive

We demonstrate robust and safe driving in the real-world application

Jinyang23

Nov 28, 2024

•

edited Nov 29, 2024

@akhaliq @kramp
Dear AK and HF team ,

🚀 We are pleased to share our latest research paper, "Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS," for your consideration, as we believe it may be of significant interest for HF Daily Paper. This work introduces HiAR-ICL, a novel paradigm to enhance the complex reasoning capabilities of large language models.

🌟 Unlike traditional in-context learning, HiAR-ICL shifts the focus from example-based analogical learning to abstract thinking patterns. It employs Monte Carlo Tree Search to explore reasoning paths and creates "thought cards" to guide inferences. By dynamically matching test problems with appropriate thought cards through a proposed cognitive complexity framework, HiAR-ICL achieves state-of-the-art accuracy of 79.6% with 7B model on the challenging MATH benchmark, surpassing both GPT-4o and Claude 3.5.

📑 Paper: https://arxiv.org/pdf/2411.18478
🌐 Project Page: https://jinyangwu.github.io/hiar-icl/

We would greatly appreciate your consideration of our paper for inclusion.

Best regards,
Jinyang Wu, Mingkuan Feng, Shuai Zhang, Feihu Che, Zengqi Wen, Jianhua Tao

Gynjn

Nov 29, 2024

This comment has been hidden

Owos

Nov 29, 2024

Note that this is not a post about adding new papers, it's about feedback on the Daily Papers community update feature.

How to submit a paper to the Daily Papers, like @akhaliq (AK)?

Submitting is available to paper authors

Only recent papers (less than 7d) can be featured on the Daily

Then drop the arxiv id in the form at https://huggingface.co/papers/submit

Add medias to the paper (images, videos) when relevant

You can start the discussion to engage with the community

Please check out the documentation

Hi @kramp and @akhaliq please could you help me verify my authorship claim for this paper? https://huggingface.co/papers/2411.15640
Today makes it 6 days and I need to be able to feature it on the paper dailies.

Jinyang23

Dec 2, 2024

Hi @kramp and @akhaliq ,

I hope you're doing well! I would like to kindly request your assistance in verifying my authorship claim for this paper: https://huggingface.co/papers/2411.18478. Today marks the 6th day, and I would appreciate it if you could help expedite the verification process so that the paper can be featured on the daily papers.

Thank you so much for your help!

Best regards,
Jinyang Wu

GregorySenay

Dec 2, 2024

This comment has been hidden

Harold328

Dec 4, 2024

Self-Supervised Unified Generation with Universal Editing: https://arxiv.org/pdf/2412.02114

xumingyu16

Dec 6, 2024

Dear AK and HF team，

I would like to kindly request your assistance in verifying my authorship claim for this paper: https://huggingface.co/papers/2411.18478. Today marks the 7th day, and I would appreciate it if you could help expedite the verification process so that the paper can be featured on the daily papers.

Thank you so much for your help!

Best regards,
Mingyu Xu

Klayand

Dec 7, 2024

@akhaliq @kramp
Dear AK and HF team ,

🚀 We would like to kindly request your assistance in sharing our latest research paper in less than 1 month(Nov. 14), "Golden Noise for Diffusion Models: A Learning Framework". We believe it may be of significant interest for HF Daily Paper.

🌟 First, we identify a new concept termed noise prompt, which aims at turning a random noise into a golden noise by adding a small desirable perturbation derived from the text prompt. The golden noise perturbation can be considered as a kind of prompt for noise, as it is rich in semantic information and tailored to the given text prompt. Building upon this concept, we formulate a noise prompt learning framework that learns "prompted'' golden noises associated with text prompts for diffusion models.

🌟 Second, to implement the formulated noise prompt learning framework, we propose the training dataset, namely the noise prompt dataset(NPD), and the learning model, namely the noise prompt network(NPNet). Specifically, we design a noise prompt data collection pipeline via re-denoise sampling, a way to produce noise pairs. We also incorporate AI-driven feedback mechanisms to ensure that the noise pairs are highly valuable. This pipeline enables us to collect a large-scale training dataset for noise prompt learning, so the trained NPNet can directly transform a random Gaussian noise into a golden noise to boost the performance of the T2I diffusion model.

🌟 Third, we conduct extensive experiments across various mainstream diffusion models, including StableDiffusion-xl(SDXL), DreamShaper-xl-v2-turbo and Hunyuan-DiT, with 7 different samplers on 4 different datasets. We evaluate our model by utilizing 6 human preference metrics including Human Preference Score v2(HPSv2), PickScore Aesthetic Score(AES), ImageReward, CLIPScore and Multi-dimensional Preference Score(MPS). As illustrated in Fig.1, by leveraging the learned golden noises, not only is the overall quality and aesthetic style of the synthesized images visually enhanced, but all metrics also show significant improvements, demonstrating the effectiveness and generalization ability of our NPNet. For instance, on GenEval, our NPNet let SDXL improve the classical evaluation metric HPSv2 by 18%(24.04→28.41)}, which even surpasses a recent much stronger DiT-based diffusion model Hunyuan-DiT(27.78). Furthermore, the NPNet is a compact and efficient neural network that functions as a plug-and-play module, introducing only a 3% extra inference time per image compared to the standard pipeline, while requiring approximately 3% of the memory required by the standard pipeline. This efficiency underscores the practical applicability of NPNet in real-world scenarios.

📑 Paper: https://arxiv.org/abs/2411.09502
🌐 Project Page: https://github.com/xie-lab-ml/Golden-Noise-for-Diffusion-Models

We would greatly appreciate your assistance and consideration of our paper for inclusion.

Best regards,
Zikai Zhou, Shitong Shao, Lichen Bai, Zhiqiang Xu, Bo Han, Zeke Xie

ohad204

Dec 31, 2024

@akhaliq @kramp
Dear AK and HF team ,

🚀 We would like to kindly request your assistance in sharing our latest research paper, "Bringing Objects to Life: 4D generation from 3D objects".
We believe it may be of significant interest for HF Daily Paper.

🌟 Recent advancements in generative modeling now enable the creation of 4D content (moving 3D objects) controlled with text prompts.
4D generation has large potential in applications like virtual worlds, media, and gaming, but existing methods provide limited control over the appearance and geometry of generated content.

🌟 In this work, we introduce a method for animating user-provided 3D objects by conditioning on textual prompts to guide 4D generation, enabling custom animations while maintaining the identity of the original object.

🌟 We first convert a 3D mesh into a ``static" 4D Neural Radiance Field (NeRF) that preserves the visual attributes of the input object. Then, we animate the object using an Image-to-Video diffusion model driven by text. To improve motion realism, we introduce an incremental viewpoint selection protocol for sampling perspectives to promote lifelike movement and a masked Score Distillation Sampling (SDS) loss, which leverages attention maps to focus optimization on relevant regions.

🌟 We evaluate our model in terms of temporal coherence, prompt adherence, and visual fidelity and find that our method outperforms baselines that are based on other approaches, achieving up to threefold improvements in identity preservation measured using LPIPS scores, and effectively balancing visual quality with dynamic content.

📑 Paper: https://arxiv.org/abs/2412.20422
🌐 Project Page: https://3-to-4d.github.io/3-to-4d/

We would greatly appreciate your assistance and consideration of our paper for inclusion.

raannakasturi

Dec 31, 2024

https://doi.org/10.5281/zenodo.13862005

chengzeyi

Jan 3

•

edited Jan 3

👀

hilmansw

Jan 5

http://doi.org/10.11591/ijai.v13.i2.pp1753-1761

RZ412

Jan 19

Does the system determine paper publish date by latest version updated or initial submission date? I missed the deadline to put my paper on Daily Paper and now my paper has a major revision and I want to put it up again.

RealZhiqiLi

Jan 28

@akhaliq @kramp

Dear AK and HF Team,

We are writing to request the inclusion of our latest paper in the Hugging Face Daily Papers. Our paper, titled "Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models“
, was submitted to arXiv on January 20, 2025 (arxiv id). However, due to an unexpected hold from arXiv, it was not publicly available until today.

Since the Hugging Face Daily Papers typically consider papers uploaded within the past seven days, I wanted to clarify that our delay was not intentional but rather due to arXiv’s review process. Given this context, we would really appreciate it if our paper could still be considered for inclusion.

Here are interests of our paper for HF Daily Paper:

🚀 Data-Centric VLM Post-Training Strategy: Unlike most open-source models that only release final weights, we systematically design and analyze a post-training data strategy from scratch, revealing its crucial role in developing frontier-level vision-language models (VLMs).
🚀 Comprehensive Transparency & Open-Source Insights: We provide detailed insights into data curation, training recipes, and model design, offering the open-source community a reproducible framework to develop competitive VLMs.

📑 Paper: https://arxiv.org/abs/2412.20422
🌐 Project Page: https://github.com/NVlabs/EAGLE

Zhiqi Li

Sajib-006

Feb 9

PathoLM is a genome foundation model trained on 30 bacterial and viral species to predict pathogenicity directly from DNA sequences. Built using transformer-based architectures, it generalizes across diverse pathogens and non-pathogens, enabling accurate classification for emerging infectious diseases and metagenomic data. Check out the code and dataset on
GitHub: https://github.com/Sajib-006/Patho-LM

soyeonm

Feb 10

As AI become agents 🤖, how can we reliably delegate tasks to them, if they cannot communicate their limitations😭 or ask for help or test-time compute 🧑‍🚒 when needed?

We present our new pre-print Self-Regulation and Requesting Interventions that investigates how LLM agents 🤖 can assess their own limitations and determine when to leverage test-time compute, larger models, or human intervention🧑‍🚒.

Combining LLM-based process reward models (PRMs) with classical RL, we developed an off-line and hybrid method that eschews the inefficiencies of end-to-end deep learning while leveraging the robustness of PRMs. Empirically, our method matches the performance of ALWAYS using interventions, even when requiring only one intervention per task—just about 1/10 intervention usage.

Paper: https://arxiv.org/pdf/2502.04576
Webpage: https://soyeonm.github.io/self_reg/

Aurelien-Morgan

Feb 11

How about searching papers from any search bar ? Today, we have to navigate to Daily papers to be able to find one by arxiv code. Often, I forget that and, fail by first trying on the main search bar (often from homepage).

vardaan123

Feb 18

Paper title: Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
Link: https://arxiv.org/pdf/2502.11357

yixuantt

Feb 18

Dear AK and HF Team,

We are excited to share our recent work on comprehensive finance text embedding. We also develop a SoTA LLM-based embedding model for the finance domain. 🤗

Title: FinMTEB: Finance Massive Text Embedding Benchmark
Link: https://arxiv.org/abs/2502.10990
Github: https://github.com/yixuantt/FinMTEB
Leaderboard: https://huggingface.co/spaces/FinanceMTEB/FinMTEB
Model: yixuantt/Fin-e5

liulj13

Feb 18

https://arxiv.org/abs/2502.11079

qqqwt

Feb 19

Dear AK and HF Team,

We are thrilled to present our recent research, which investigates and benchmarks various inference-time computation strategies to enhance reasoning performance in large language models (LLMs). With the growing interest in solving complex reasoning tasks, methods such as Best-of-N and beam search have shown promise in improving reasoning capabilities without requiring modifications to model parameters or additional training. However, challenges remain in their implementation, with many existing approaches still in the proof-of-concept stage, hindered by computational complexity and task-specific limitations.

In this work, we focus on optimizing both the candidate solution generation and the reward mechanisms that underpin these inference-time strategies. By exploring the impact of different prompting techniques, hyperparameters like temperature and top-p, and reward types such as self-evaluation and RLHF rewards, we uncover previously overlooked strategies that significantly enhance reasoning performance. Our extensive experiments—spanning over 20,000 A100-80G GPU hours and 1,000+ experiments—cover various models from the Llama, Qwen, and Mistral families. These findings demonstrate that careful tuning of hyperparameters like temperature can lead to performance gains of up to 5% in reasoning tasks.

Furthermore, we establish a standardized benchmark for evaluating inference-time computation techniques, assessing six representative methods across eight different reasoning tasks. Our work provides a robust foundation for advancing future research in this area, setting the stage for more practical and scalable applications of LLM-based reasoning systems.

Title: Bag of Tricks for Inference-time Computation of LLM Reasoning

Link: https://arxiv.org/abs/2502.07191

Github: https://github.com/usail-hkust/benchmark_inference_time_computation_LLM

Kanesblack

Feb 19

Dear AK and HF Team,

We are excited to share our work on Text-to-SQL. The information for the paper we submitted is as follows:

Title: SQL-o1: A Self-Reward Heuristic Dynamic Search Method for Text-to-SQL
Link: https://arxiv.org/abs/2502.11741
Github: https://github.com/ShuaiLyu0110/SQL-o1

junzhang98

Feb 20

Dear AK and HF Team,

Buckle up for a wild ride into the world of large language models! 🚀 Ever wished you could fine-tune massive LLMs without needing a full-blown data center? Well, dream no more! Our new approach, LoRAM, is here to train small and infer large—bringing you memory-efficient LoRA training without sacrificing performance.

Imagine turning a 70-billion-parameter beast into a nimble, memory-efficient marvel—like transforming an elephant into a sleek race car! 🐘➡️🏎️ We take the classic LoRA method, give it a trendy haircut by pruning away those underutilized neurons 💇‍♂️, and then recover the pruned low-rank matrices to supercharge the full model during inference.

The Challenge 🤯

While LoRA offers a cost-effective fine-tuning solution, the memory footprint remains dominated by the original model parameters. Training a 70B model traditionally demands an A100-80G GPU or even a fleet of 15 GPUs. Yikes!

The LoRAM Magic 🪄

LoRAM turns this challenge on its head by:

Tiny Yet Mighty: Training on a pruned (small) model with just 20G HBM—no need for heavyweight GPUs! 🎉
Wallet-Friendly Wizardry: Using structured pruning combined with 4-bit quantization (QLoRAM) slashes storage costs by up to 16.95×, proving that efficiency and performance can indeed dance together! 💃💸
Seamless Sync: Minimal-cost continual pre-training aligns the knowledge between the pruned and original models, ensuring no magic is lost in translation. 🔗✨

The Results 🤯🚀

With LoRAM, we not only achieve dominant performance gains over both the original 70B model and smaller LoRA-trained models but also make massive model training accessible—running on a single 20G GPU!

Curious to see the magic in action? Check out our paper and code:

Paper: Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models
GitHub: LoRAM on GitHub

We can’t wait for you to join us on this exhilarating journey where smart engineering meets a splash of neural magic! 😄🌟

Cheers,
The LoRAM Team

saadob12

Feb 20

•

edited Feb 20

Dear AK and HF team,

We are excited to share our new paper on estimating hallucination rates of 11 large multilingual language models across 30 languages.
The paper comes with 2 datasets that are open source and ready to be used by the community. Below is the figure showing hallucination rates across 11 LLMs for 30 languages.

Summary of our findings:

Within LLM family, smaller LLM hallucinate more than large variant.

Increasing number of supported languages correlate significantly with increasing number of hallucinations.

Smaller digital representation of a language does not necessarily mean higher hallucination rates.

Resources:
The paper releases two datasets: for 30 languages.

Multilingual Hallucination Detection: https://huggingface.co/datasets/WueNLP/mHallucination_Detection
Multilingual Hallucination Evaluation: https://huggingface.co/datasets/WueNLP/mHallucination_Evaluation

Paper, Dataset, and Code:

Archive Paper: How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild
Huggingface collection: https://huggingface.co/collections/WueNLP/mhallucinations-llm-67b5aedb0e7fed1190e148d8
Github: https://github.com/WorldHellow/mHallucinations-LLM

Hopefully the community would enjoy reading and utilizing our work.

Cheers

rippleripple

Feb 25

•

edited Feb 25

Dear AK and HF Team,

We are excited to share our work on Multimodal Inconsistency Reasoning (MMIR). The information for the paper we submitted is as follows:

Title: Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models
Paper Link: https://arxiv.org/pdf/2502.16033
Github: https://github.com/eric-ai-lab/MMIR
Dataset: https://huggingface.co/datasets/rippleripple/MMIR

zhiminy

Feb 26

Dear AK and HF Team,

I’m super excited to recommend SE Arena, a new interactive platform for benchmarking Software Engineering chatbots.

🚀 If you’re working with AI in software dev or just passionate about improving how these models perform in real-world dev workflows, you have to check it out!

🧭 The best part? SE Arena has a transparent, open-source leaderboard, and you can actively contribute by casting your votes to shape the evaluations. Plus, with RepoChat, it pulls in real repo context (issues, commits, PRs) to make things feel real.

📣 Want to get involved and help drive the future of AI in software engineering? Head over to https://huggingface.co/spaces/SE-Arena/Software-Engineering-Arena and cast your vote today! 🙌

Our paper is published in FORGE 2025: https://conf.researchr.org/details/forge-2025/forge-2025-papers/6/SE-Arena-An-Interactive-Platform-for-Evaluating-Foundation-Models-in-Software-Engine
Check the details in https://arxiv.org/abs/2502.01860

We’d love your feedback and contributions! 🚀

mbkim

Feb 27

Can we align LLMs with personal preferences? It is hard to collect individual annotations sufficiently and train LLMs for each persona.... The answer is
Yes! 🏎️ Drift achieves personalized alignment only with 50~100 examples.

Drift Approximation: For efficient preference modeling, we first define various attributes and find the best composite of them to explain given examples.
Differential prompting: We don't need to construct attribute-dedicated datasets! We show differential prompting to evaluate each attribute in a zero-shot manner.
Drift Decoding: We can align LLM with the composite of attributes in a training-free manner! We don't need expensive LLM training and savings for each user.

We prove theoretically each objective of the approximation and decoding stages, and for all stages, there is no gradient computation in the total process.

Check the details here! https://arxiv.org/abs/2502.14289

imsanjoykb

Feb 27

🚀🚀DeepSQL-R1-distill-8B : A Quantized DeepSeek AI Model for SQL Code Generation
🔥 Outperforms Llama-3.2, Mistral-7B, and Claude-3 Sonnet in SQL generation tasks.
⚡ Superior execution accuracy and faster inference speeds for complex SQL queries.
🔍 Optimized for efficiency with quantization & distillation techniques.

Model Link: https://huggingface.co/imsanjoykb/deepSQL-R1-distill-8B
Code Link: https://github.com/imsanjoykb/deepSQL-R1-distill-8B
Paper : https://doi.org/10.6084/m9.figshare.28330301.v1
Inference: https://drive.google.com/file/d/145PP-oW50OMS1bYJaYuUphfufpsuOGWl/view?usp=sharing

luoweibetter

Mar 5

Hello AK and HF Team,
We would to add our recent paper "Exploring Intrinsic Normal Prototypes within a Single Image for Universal Anomaly Detection" in HF daily papers. I am putting this request here, since I don't already have a paper in HF daily papers.

Paper: https://arxiv.org/pdf/2503.02424
Github:https://github.com/luow23/INP-Former
Thanks,
Wei Luo

davidberenstein1957

Mar 7

Hi,

I came across this paper for synthetic data generation. "TabularARGN: A Flexible and Efficient Auto-Regressive Framework for Generating High-Fidelity Synthetic Data"

Paper: https://arxiv.org/html/2501.12012v1
GitHub: https://github.com/mostly-ai/mostlyai

Regards,
David

Deressa

Mar 10

Hello AK and HF Team,
We would like to add our paper "GenConViT: Deepfake Video Detection Using Generative Convolutional Vision Transformer"

paper: https://arxiv.org/abs/2307.07036
code: https://github.com/erprogs/GenConViT

Thank you!,
Deressa

Nathan9

Mar 11

doi:10.57967/hf/4791
🚀 ScrapeGoat Music Models: Now Available! 🎵
After our recent work on HCF-Aware Training to optimize memory efficiency in language models (especially beneficial for specialized models like ours), we're excited to announce that all training and weights for the ScrapeGoat Music project are now available in our Hugging Face repositories:

👉 https://lnkd.in/gVrZ56JS

👉 https://lnkd.in/g3wS3rS4

ALL training python codes and weights available in the repo on HF.

AQuarterMile

Mar 11

Dear AK and HF Team,

Exciting Update! We're thrilled to share WritingBench: A Comprehensive Framework for Evaluating Generative Writing

📃 [Paper] • 🚀 [Github Repo] • 📏 [Critic Model] • ✍️ [Writing Model]

💡WritingBench is a comprehensive benchmark for evaluating LLMs' writing capabilities across 1,239 real-world queries, spanning:

6 primary domains and 100 fine-grained subdomains
3 core writing requirements: Style / Format / Length
1,546 avg. tokens per query, integrating diverse sources of materials
Each query is paired with 5 instance-specific criteria

Regards,
Yuning

erjui

19 days ago

•

edited 19 days ago

Hello AK and HF Team,

We would to add our recent paper "Simple Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization" in HF daily papers.
I'm putting request here because I don't have paper claimed in HF daily papers yet.

Paper: https://arxiv.org/pdf/2503.02424

Regards,
Seongjae

julien-c

Hugging Face org 15 days ago

@erjui kudos on great paper: https://huggingface.co/papers/2505.07675

m-serious

11 days ago

🤗 Model (Time-R1) | 📊 Dataset (Time-Bench) | 🚀 Code | 📖 Paper

Excited to share our new work: Time-R1 is a framework designed to endow Language Models (LLMs) with comprehensive temporal reasoning capabilities, enabling them to progressively cultivate sophisticated temporal logic from past events, predict future occurrences, and creatively generate plausible future scenarios. Our 3B parameter language model trained with a novel three-stage reinforcement learning curriculum and dynamic rewards, outperforming state-of-the-art models over 200 times its size, including DeepSeek-R1, on challenging future-oriented tasks.

qqqwt

11 days ago

Hello AK and HF Team,

We would to add our recent paper "MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem"

Mathematical modeling is more than reasoning – it requires open-ended analysis, abstraction, and principled formulation. This work introduces MM-Agent, a large language model (LLM)-powered agent framework designed to tackle real-world mathematical modeling tasks end-to-end.

Key highlights:

📊 Proposes MM-Bench: 111 curated problems from MCM/ICM (2000–2025), across physics, biology, economics, etc.

🧩 MM-Agent decomposes modeling into 4 expert-inspired stages:
Problem Analysis → Model Formulation → Problem Solving → Report Generation

🚀 Outperforms baselines by 11.88% over expert-written solutions using GPT-4o, while costing just $0.88 and 15 minutes per task.

🏆 Helped two undergrad teams win Finalist Award (top 2%) in MCM/ICM 2025.

📄 Paper: https://arxiv.org/abs/2505.14148

💻 Code: https://github.com/usail-hkust/LLM-MM-Agent

ionutmodo

11 days ago

Hi, everyone! We are happy to share with you our work SVD-Free Low-Rank Adaptive Gradient Optimization for Large Language Models.

We focus on the low-rank compression of optimizer states and propose replacing the expensive SVD decomposition with a fixed orthogonal matrix that comes from the Discrete Consine Transformation (DCT).

In our work we couple the DCT matrix with a theoretically-justified approach to choose the most appropriate columns from the DCT matrix that minimize the reconstruction error for each gradient matrix G and obtain a dynamic projection matrix tailored to each gradient G.

Our numerical results show that DCT matrix not only recovers the performance of existing low-rank optimizers, but also reduces the running time by 20% and memory usage for large models, both for pretraining and finetuning.

📜 Paper: https://arxiv.org/pdf/2505.17967

🐍 Code: soon to appear in https://github.com/IST-DASLab/ISTA-DASLab-Optimizers via pip

xiaolinzi-ahu

3 days ago

•

edited 3 days ago

https://arxiv.org/abs/2505.23808
https://github.com/mulin-ahu/DenseLoRA