[FEEDBACK] Daily Papers

by kramp HF staff - opened
Hugging Face org
edited about 1 month ago

Note that this is not a post about adding new papers, it's about feedback on the Daily Papers community update feature.

How to submit a paper to the Daily Papers, like @akhaliq (AK)?

  • Submitting is available to paper authors
  • Only recent papers (less than 7d) can be featured on the Daily

Then drop the arxiv id in the form at https://huggingface.co/papers/submit

  • Add medias to the paper (images, videos) when relevant
  • You can start the discussion to engage with the community

We are excited to share our recent work on MLLM architecture design titled "Ovis: Structural Embedding Alignment for Multimodal Large Language Model".

Paper: https://arxiv.org/abs/2405.20797
Github: https://github.com/AIDC-AI/Ovis
Model: https://huggingface.co/AIDC-AI/Ovis-Clip-Llama3-8B
Data: https://huggingface.co/datasets/AIDC-AI/Ovis-dataset

This comment has been hidden
Hugging Face org

@Yiwen-ntu for now we support only videos as paper covers in the Daily.

This comment has been hidden
This comment has been hidden

we are excited to share our work titled "Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models" : https://arxiv.org/abs/2406.12644

Consistency-diversity realism Pareto fronts of conditional image generative models -- http://arxiv.org/abs/2406.10429

"Data Contamination Can Cross Language Barriers". -- https://arxiv.org/pdf/2406.13236

How do I add papers that are on Nature rather than arXiv?

Share our latest paper: CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation (https://arxiv.org/abs/2406.05365)

Gorgeous: Create Your Desired Character Facial Makeup from Any Ideas https://arxiv.org/abs/2404.13944

πŸŽ‰ We are thrilled to announce the publication of my first research paper on model merging, Della-Merging. Della employs a magnitude-based sampling approach to eliminate redundant delta parameters, reducing interference when merging homologous models (those fine-tuned from the same backbone).

Paper: https://arxiv.org/abs/2406.11617
Github: https://github.com/declare-lab/della

Della outperforms existing homologous model merging techniques such as DARE and TIES. Across three expert models (LM, Math, Code) and their corresponding benchmark datasets (AlpacaEval, GSM8K, MBPP), Della achieves an improvement of 3.6 points over TIES and 1.2 points over DARE.

How do I add papers that are on Nature rather than arXiv?

@diwank we support only arXiv for now

LVBench is a benchmark designed to evaluate and enhance the capabilities of multimodal models in understanding and extracting information from long videos up to two hours in duration. Our extensive evaluations reveal that current multimodal models still underperform on these demanding long video understanding tasks.

Paper: https://arxiv.org/abs/2406.08035
Github: https://github.com/THUDM/LVBench

STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models

Paper: https://arxiv.org/abs/2406.05872
Code: https://github.com/IBM/starling-agent

SIT: Fine-tuning Large Language Models with Sequential Instructions

Paper: https://arxiv.org/pdf/2403.07794
Data and model: https://seqit.github.io
Code: https://github.com/hanxuhu/SeqIns

Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

paper: https://arxiv.org/pdf/2406.17419
code: https://github.com/MozerWang/Loong

TRAIT: Task Oriented In-Domain Data Augmentation (for Continual Pre-training of LLMs), https://arxiv.org/abs/2406.16694

Slot State Space Models

paper: https://arxiv.org/abs/2406.12272

We are excited to share our recent work: "Adam-mini: Use Fewer Learning Rates To Gain More” https://arxiv.org/abs/2406.16793

We propose Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini can also achieve 49.5% higher throughput than AdamW on Llama2-7B pre-training. The design of Adam-mini is inspired by certain Hessian structures we observed on Transformers. Code available at: https://github.com/zyushun/Adam-mini

We have developed a new text-to-video generation benchmark for metamorphic evaluation. We specifically design four major categories for time.lapse videos (as shown below), including biological, human-created, meteorological, and physical videos.and extend these to 75 subcategories.
paper: https://arxiv.org/abs/2406.18522
leaderboard: https://huggingface.co/spaces/BestWishYsh/ChronoMagic-Bench
code: https://github.com/PKU-YuanGroup/ChronoMagic-Bench

KV cache optimization for LLMs and MLLMs:

  1. LLMs: D2O: Dynamic Discriminative Operations for Efficient Generative Inference of Large Language Models, arxiv: https://arxiv.org/abs/2406.13035
  2. MLLMs: look-m: look-once optimization in kv cache for efficient multimodal long-context inference, arxiv: https://arxiv.org/html/2406.18139v1

We developed a generic schematic for the optimization loop to reduce the memory footprint of second order, full-matrix adaptive optimizers.πŸ’Ύ

Our target optimizers are the ones that store a window of past gradients, such as M-FAC and GGT, which usually require storing around 500 to 1000 gradients (equivalent to this many model copies in the GPU memory).

Our technique uses sparse/low-rank gradients and Error Feedback and shows we can reduce the memory footprint of optimizers' state by 30x for GGT and 45x to 60x for M-FAC. πŸ“‰

Why is this important? πŸ€”
In the case of M-FAC, which is an approximation of Natural Gradient (NG) (most commonly known optimizer about this is K-FAC), our work allows using approximations of NG at larger scale, such as ResNet-18 / ImageNet and BERT-Base finetuning.

Please experiment with this NG approximation and let us know about your findings!

πŸ“„ Our arxiv paper: https://arxiv.org/pdf/2306.06098
πŸ’» Our code on GitHub: https://github.com/IST-DASLab/EFCP

was waiting for papers to get verified on my account so i could submit but then the 7 day window closed :( any chance we can still submit ours?

Hi AK and HF team,

I would appreciate your considering my recent ArXiv paper "Model Callers for Transforming Predictive and Generative AI Applications" for inclusion in the HF daily papers. I could not submit directly to your site, since I don't already have a paper in HF DPs.
Paper: https://arxiv.org/abs/2406.15377
Github code: https://github.com/mukdal/modelcaller
Python library: pip install modelcaller

Abstract: We introduce a novel software abstraction termed "model caller," acting as an intermediary for AI and ML model calling, advocating its transformative utility beyond existing model-serving frameworks. This abstraction offers multiple advantages: enhanced accuracy and reduced latency in model predictions, superior monitoring and observability of models, more streamlined AI system architectures, simplified AI development and management processes, and improved collaboration and accountability across AI/ML/Data Science, software, data, and operations teams. Model callers are valuable for both creators and users of models within both predictive and generative AI applications. Additionally, we have developed and released a prototype Python library for model callers, accessible for installation via pip or for download from GitHub.

Mukesh Dalal

Hello AK and HF Team,
We would to add our recent paper "Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification" in HF daily papers. I am putting this request here, since I don't already have a paper in HF daily papers.

Paper: https://arxiv.org/pdf/2407.02352
Authors: Pritish Sahu, Karan Sikka, Ajay Divakaran

Pritish Sahu

Hello AK and HF Team,

We would like to share our 2022 paper now recently published in Automation in Construction, Science, Elsevier, "Vitruvio: Conditional variational autoencoder to generate building meshes via single perspective sketches"

πŸ“„ Paper: https://www.sciencedirect.com/science/article/pii/S0926580524002346?dgcid=author (50days free access).
πŸ“„ Our arxiv paper: https://arxiv.org/abs/2210.13634

We demonstrated the critical importance of considering building orientation in reconstruction projects. Additionally, we have provided a comprehensive baseline and dataset specifically for building reconstruction. Help us spread the word within the AEC industry to raise awareness about these advancements. Watch our video presenting the problem and our findings: VIDEO .

Code: https://github.com/CDInstitute/Vitruvio

Feel free to use this message on your social media, blog, or any platform where you wish to share your research and video.

Alberto Tono , Heyaojing Huang , Ashwin Agrawal, and Martin Fischer

This comment has been hidden

@kramp @akhaliq
Dear Team,
Thanks for the nice work. But our paper just released today was marked as "older than 7 days".
Could you please kind check?
paper: https://arxiv.org/abs/2407.06182 code: https://github.com/Adamdad/vico and project page https://adamdad.github.io/vico/

@kramp @akhaliq
We would to share our recent paper "Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model" in HF daily papers. We find that even leading multimodal models such as gpt-4o or Claude-3.5-Sonnet have difficulty recognizing simple abstract images, e.g., reading time from a clock, understanding a flowchart, or planning a route using a road map. Therefore, we design a multimodal self-instruct to synthesize the abstract image benchmark and training set, and SFT a model to make it understand the abstract image.

If you are interested in our articles, we would appreciate it!

Paper: https://arxiv.org/abs/2407.07053
Code: https://github.com/zwq2018/Multi-modal-Self-instruct
Dataset: https://huggingface.co/datasets/zwq2018/Multi-modal-Self-instruct
Leaderboard: https://multi-modal-self-instruct.github.io/


Wenqi Zhang

πŸš€ Launching ORLM: the first open-source Operations Research LLM, powered by our OR-Instruct process! πŸ› οΈ

πŸ† ORLMs achieves SOTA on NL4OPT, MAMO, & the new IndustryOR benchmarks based on different 7b backbones!

πŸ“„ Paper: https://arxiv.org/pdf/2405.17743
πŸ’» Code: https://github.com/Cardinal-Operations/ORLM

How to submit a paper which is not submited on arxiv?
paper: https://cientgu.github.io/files/VisualSignalDecomposition.pdf

Hugging Face org

How to submit a paper which is not submited on arxiv?

@cientgu we support only papers from arxiv.org

Hello AK and HF Team @kramp @akhaliq :
We would like to share our recent paper "OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion". We propse the OV-DINO, a novel unified open vocabulary detection approach that offers superior performance and effectiveness for practical real-world application. It entails a Unified Data Integration pipeline that integrates diverse data sources for end-to-end pre-training, and a Language-Aware Selective Fusion module to improve the vision-language understanding of the model. And it shows significant performance improvement on COCO and LVIS benchmarks compared to previous methods, achieving relative improvements of +2.5% AP on COCO and +13.6% AP on LVIS compared to G-DINO in zero-shot evaluation.

If you are interested in our paper, we would great appreciate it!

Paper: https://arxiv.org/abs/2407.07844
Code: https://github.com/wanghao9610/OV-DINO

Hao Wang

Dear Team,

I will like to share our paper recently accepted by ECAI. In this paper, we introduce FlowLearn, a novel dataset designed to test the capabilities of Large Vision-Language Models (LVLMs) in understanding and interpreting flowcharts. To the best of our knowledge, this is the first release of a dataset specifically tailored for flowchart comprehension and includes a comprehensive evaluation of LVLMs. Our research reveals significant challenges these models face, such as recognizing textual and visual components and their relationships within flowcharts. Through comprehensive experiments, we assess various state-of-the-art LVLMs, highlighting the gaps and providing insights for future advancements in machine comprehension of graphical data.

Title: FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding
Paper: https://arxiv.org/abs/2407.05183v1
Dataset: https://huggingface.co/datasets/jopan/ FlowLearn
Code: https://github.com/Jo-Pan/FlowLearn

Dear Team,

We propose a novel approach to modulate LLM behaviors through direct parameter editing, offering an alternative to traditional alignment methods. Our new approach achieves efficient modulation with inference-level computational cost! Achieve up to 90% detoxification with inference-level computational cost!

Paper: https://arxiv.org/abs/2407.08770
Code: https://github.com/lucywang720/model-surgery/

@akhaliq @kramp Hi AK and HF team,

We would like to share our recent paper, "AUITestAgent: Natural Language-Driven GUI Functional Bug Tester." We propose AUITestAgent, the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire process of GUI interaction and function verification. Experiments on customized benchmarks demonstrate that AUITestAgent outperforms existing tools in the quality of generated GUI interactions and achieved the accuracy of verifications of 94%. Besides, field deployment in Meituan has shown AUITestAgent's practical usability.

Paper: https://arxiv.org/abs/2407.09018
Web: https://github.com/bz-lab/AUITestAgent

Yongxiang Hu

This comment has been hidden

Hi everyone,

we'd like to share our objective evaluation of recent TTS systems.
Since many new TTS systems have been released, with a wide range of approaches, we think evaluating them is very important.
It would also be beneficial to have access to objective evaluation, since that way, models can be evaluated during training, and evaluating new models is easier.
We found that objective evaluation correlates well with human ratings when done accross a wide set of factors such as Prosody, Speaker, Environment, etc.
The TTS Arena recently released by huggingface and @mrfakename was a great human evaluation to compare against, and our methods showed a high correlation with their scores. We also showed strong correlation with MOS scores from the Blizzard 2008 challenge and the Back to the Future Blizzard Paper from 2022.

Paper: https://arxiv.org/abs/2407.12707
Web: https://ttsdsbenchmark.com
HF Space: https://huggingface.co/spaces/ttsds/benchmark
Code: https://github.com/ttsds/ttsds

It would be amazing to have our benchmark featured in the Daily Papers!



Hello @akhaliq @kramp

We would like to present ✨AurA✨, a zero-overhead inference-time intervention upon LLM activations to mitigate harmful behaviors, such as toxicity. We will present it next week at ICML2024.

In the figure below we show the toxicity reduction between the original model (circles) and using our AURA intervention (stars), for different LLMs. PPL stands for Perplexity and RTP refers to the Real Toxicity Prompts dataset. We also show AurA can reduce toxicity even in the presence of adversarial prompts.

It would be great if you could feature it in the Daily Papers!

πŸ”— Code: https://github.com/apple/ml-aura/tree/main



@akhaliq @kramp Hi AK and HF team,

We are excited to present our ECCV 2024 paper, "ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation". This innovative approach is the first to explore score distillation in training text-to-3D generation models, marking a shift from optimization-based to learning-based generation. Unlike existing data-driven methods, our approach enables the training of a high-quality text-to-3D generator in an unsupervised manner. This method opens up numerous potential applications, and we will continue to explore its possibilities.

Teaser.gifFigure. First row: ASD application with optimization-based text-to-3D. Second row: ASD applied to learning-based text-to-3D, enabling the training of text-to-3D generation models without real 3D data. ASD can extend the training corpus to up to 100,000 text prompts.

Paper: https://arxiv.org/pdf/2407.02040
Code: https://github.com/theEricMa/ScaleDreamer

Hi all,

We have been working on a pretty cool AI agent framework.

AutoGRAMS is a software 2.5 framework that makes it easy to build very complex AI agents using either spreadsheets or python code or the AutoGRAMS scripting language.


Paper: https://arxiv.org/abs/2407.10049
Code: https://github.com/autograms/autograms



Happy to recieve any feedback.

Our paper investigates various foundation model (FM) leaderboards across multiple platforms, focusing on their types, operational workflows ("LBOps"), and issues ("leaderboard smells"). We also curate an awesome list of FM leaderboards, check here.

Hi all, wanted to share our recent work on context-conditioned reward modeling: https://arxiv.org/abs/2407.14916 (accompanying context-conditioned preference dataset to follow shortly).

We construct a Reasonable Preference Reversal (RPR) dataset (to follow) and use it to finetune a 7B parameter reward model that outperforms Llama3-70B (as well as other reward models) on context-conditioned preference queries.

Our work could be used to improve performance when conditioning reward models on principles (e.g., for constitutional AI) or user profiles (e.g. for pluralistic alignment), or other contexts. The goal is to reduce ambiguity in preference queries, and work toward improving human preference modeling.

Context Aware Reward Modeling.png

@akhaliq @kramp Hi AK and HF team

We propose a novel method, called DreamCar, to reconstruct 3D real cars in the moving-forward scenes. I guess you would be interested in https://arxiv.org/pdf/2407.16988
Here is our project page: https://xiaobiaodu.github.io/dreamcar-project/


@akhaliq @kramp Hi AK and HF team

We propose a novel dataset, called 3DRealCar, containing 2500 large-scale 3D real cars for various tasks. I guess you would be interested in https://arxiv.org/abs/2406.04875

Here is our project page: https://xiaobiaodu.github.io/3drealcar/index.html


Sign up or log in to comment