kiran's picture

kiran

kira

·

ki6an

AI & ML interests

agi

Recent Activity

upvoted a collection 2 days ago

liked a model 2 days ago

mistralai/Mistral-Large-Instruct-2411

liked a model 3 days ago

Nexusflow/Athene-V2-Chat

Organizations

kira's activity

upvoted a collection 2 days ago

xLAM models

xLAM: A Family of Large Action Models to Empower AI Agent Systems: https://github.com/SalesforceAIResearch/xLAM • 11 items • Updated 22 days ago • 43

upvoted a collection 9 days ago

Qwen2.5-Coder

Code-specific model series based on Qwen2.5 • 40 items • Updated 3 days ago • 223

upvoted a collection 19 days ago

SmolLM2

State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 8 items • Updated 17 days ago • 171

upvoted 2 collections 4 months ago

Mini Pretrain Datasets

9 items • Updated Jul 9 • 9

Useful Pretrain-Datasets

pretrain-datasets with (maybe) good quality • 20 items • Updated Jun 12 • 1

upvoted a collection 6 months ago

Yi-1.5 (2024/05)

10 items • Updated May 20 • 90

upvoted a collection 7 months ago

GPT-4 generated datasets

Collection of some GPT-4 generated datasets. It may be useful for those looking for the best-quality datasets to train competitive LLMs. • 18 items • Updated Apr 16 • 8

upvoted a paper 7 months ago

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12 • 63

upvoted 4 papers 10 months ago

Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16 • 21

Extending LLMs' Context Window with 100 Samples

Paper • 2401.07004 • Published Jan 13 • 15

Scalable Pre-training of Large Autoregressive Image Models

Paper • 2401.08541 • Published Jan 16 • 36

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

Paper • 2401.06951 • Published Jan 13 • 25

upvoted a collection 11 months ago

Papers about model merging

referenced in the mergekit repo: https://github.com/cg123/mergekit • 4 items • Updated Feb 13 • 14

upvoted 3 papers about 1 year ago

CogVLM: Visual Expert for Pretrained Language Models

Paper • 2311.03079 • Published Nov 6, 2023 • 23

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Paper • 2309.14509 • Published Sep 25, 2023 • 17

One Wide Feedforward is All You Need

Paper • 2309.01826 • Published Sep 4, 2023 • 31

upvoted 3 papers over 1 year ago

SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference

Paper • 2307.02628 • Published Jul 5, 2023 • 10

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding

Paper • 2306.17107 • Published Jun 29, 2023 • 11

Extending Context Window of Large Language Models via Positional Interpolation

Paper • 2306.15595 • Published Jun 27, 2023 • 53