Michael Barry's picture

Michael Barry

MichaelBarryUK

·

AI & ML interests

None yet

Organizations

None yet

MichaelBarryUK's activity

upvoted a paper 3 days ago

Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

Paper • 2405.19893 • Published 4 days ago • 16

upvoted a paper 5 days ago

Yuan 2.0-M32: Mixture of Experts with Attention Router

Paper • 2405.17976 • Published 6 days ago • 16

upvoted a paper 6 days ago

Transformers Can Do Arithmetic with the Right Embeddings

Paper • 2405.17399 • Published 7 days ago • 47

upvoted a paper 17 days ago

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published 18 days ago • 96

upvoted 25 papers about 1 month ago

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2 • 103

NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment

Paper • 2405.01481 • Published May 2 • 20

FLAME: Factuality-Aware Alignment for Large Language Models

Paper • 2405.01525 • Published May 2 • 21

WildChat: 1M ChatGPT Interaction Logs in the Wild

Paper • 2405.01470 • Published May 2 • 57

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29 • 115

Extending Llama-3's Context Ten-Fold Overnight

Paper • 2404.19553 • Published Apr 30 • 30

KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30 • 97

Iterative Reasoning Preference Optimization

Paper • 2404.19733 • Published Apr 30 • 43

Octopus v4: Graph of language models

Paper • 2404.19296 • Published Apr 30 • 115

Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30 • 64

Make Your LLM Fully Utilize the Context

Paper • 2404.16811 • Published Apr 25 • 52

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Paper • 2404.16821 • Published Apr 25 • 49

Tele-FLM Technical Report

Paper • 2404.16645 • Published Apr 25 • 17

Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published Apr 25 • 56

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22 • 122

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23 • 28

SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22 • 23

A Multimodal Automated Interpretability Agent

Paper • 2404.14394 • Published Apr 22 • 19

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 239

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Paper • 2404.13208 • Published Apr 19 • 37

How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

Paper • 2404.14047 • Published Apr 22 • 37

Dynamic Typography: Bringing Words to Life

Paper • 2404.11614 • Published Apr 17 • 40

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18 • 51

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Paper • 2404.11912 • Published Apr 18 • 16

OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data

Paper • 2404.12195 • Published Apr 18 • 11

upvoted 20 papers about 2 months ago

Compression Represents Intelligence Linearly

Paper • 2404.09937 • Published Apr 15 • 27

Learn Your Reference Model for Real Good Alignment

Paper • 2404.09656 • Published Apr 15 • 80

Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

Paper • 2404.09833 • Published Apr 15 • 27

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12 • 62

Pre-training Small Base LMs with Fewer Tokens

Paper • 2404.08634 • Published Apr 12 • 32

WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents

Paper • 2404.05902 • Published Apr 8 • 20

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11 • 32

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Paper • 2404.07972 • Published Apr 11 • 41

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11 • 80

CodecLM: Aligning Language Models with Tailored Synthetic Data

Paper • 2404.05875 • Published Apr 8 • 15

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

Paper • 2404.06395 • Published Apr 9 • 18

Hash3D: Training-free Acceleration for 3D Generation

Paper • 2404.06091 • Published Apr 9 • 12

OmniFusion Technical Report

Paper • 2404.06212 • Published Apr 9 • 73

PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

Paper • 2404.04421 • Published Apr 5 • 14

Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4 • 58

AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent

Paper • 2404.03648 • Published Apr 4 • 22

MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens

Paper • 2404.03413 • Published Apr 4 • 21

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Paper • 2404.03653 • Published Apr 4 • 28

ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4 • 74

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 102

upvoted 11 papers 2 months ago

Octopus v2: On-device language model for super agent

Paper • 2404.01744 • Published Apr 2 • 53

DiJiang: Efficient Large Language Models through Compact Kernelization

Paper • 2403.19928 • Published Mar 29 • 9

Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28 • 99

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27 • 40

The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 75

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Paper • 2403.14520 • Published Mar 21 • 31

Mora: Enabling Generalist Video Generation via A Multi-Agent Framework

Paper • 2403.13248 • Published Mar 20 • 72

Reverse Training to Nurse the Reversal Curse

Paper • 2403.13799 • Published Mar 20 • 12

HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models

Paper • 2403.13447 • Published Mar 20 • 16

Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19 • 45

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Paper • 2403.13372 • Published Mar 20 • 57