7 2 7

Vadim Karpenko

jrell

AI & ML interests

None yet

Recent Activity

liked a model 1 day ago

MrDragonFox/mOrpheus_3B-1Base_early_preview

upvoted a paper about 2 months ago

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

reacted to schuler's post with 🔥 2 months ago

📢 New Research Alert: Making Language Models Smaller & Smarter! Thrilled to share the latest technical report demonstrating how to reduce language model parameters by 77% while maintaining performance. The secret? Grouped pointwise convolutions. Yes. We brought a method from computer vision to the transformers arena. 🔑 Key Findings: • 77% parameter reduction. • Maintained model capabilities. • Improved generalization. Paper: https://www.researchgate.net/publication/388835829_SAVING_77_OF_THE_PARAMETERS_IN_LARGE_LANGUAGE_MODELS_TECHNICAL_REPORT Code: https://github.com/joaopauloschuler/less-parameters-llm

View all activity

Organizations

None yet

jrell's activity

liked a model 1 day ago

MrDragonFox/mOrpheus_3B-1Base_early_preview

Updated 1 day ago • 336 • 30

upvoted a paper about 2 months ago

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

Paper • 2502.12574 • Published Feb 18 • 11

reacted to schuler's post with 🔥 2 months ago

Post

7237

📢 New Research Alert: Making Language Models Smaller & Smarter!

Thrilled to share the latest technical report demonstrating how to reduce language model parameters by 77% while maintaining performance.

The secret? Grouped pointwise convolutions. Yes. We brought a method from computer vision to the transformers arena.

🔑 Key Findings:
• 77% parameter reduction.
• Maintained model capabilities.
• Improved generalization.

Paper: https://www.researchgate.net/publication/388835829_SAVING_77_OF_THE_PARAMETERS_IN_LARGE_LANGUAGE_MODELS_TECHNICAL_REPORT
Code: https://github.com/joaopauloschuler/less-parameters-llm

2 replies

upvoted a collection 4 months ago

Dolphin 3.0

Collection

Dolphin 3.0 is the next generation of the Dolphin series of instruct-tuned models. Designed to be the ultimate general purpose local model. • 9 items • Updated Feb 7 • 139

liked a model 5 months ago

PramaLLC/BEN

Image Segmentation • Updated Jan 26 • 79 • 86

New activity in ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2 6 months ago

LM Studio produces gibberish (GGUF)

#1 opened 6 months ago by

jrell

liked a model 7 months ago

mistralai/Mistral-Small-Instruct-2409

Updated Oct 16, 2024 • 5.26k • 383

New activity in QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF 9 months ago

llama.cpp error: 'done_getting_tensors: wrong number of tensors; expected 292, got 291'

#1 opened 9 months ago by

jrell

New activity in cognitivecomputations/dolphin-2.9.3-mistral-nemo-12b-gguf 9 months ago

"llama.cpp error: 'error loading model vocabulary: unknown pre-tokenizer type: 'dolphin12b''"

#1 opened 9 months ago by

jrell

reacted to singhsidhukuldeep's post with ❤️ 11 months ago

Post

1457

You are all happy 😊 that @meta-llama released Llama 3.

Then you are sad 😔 that it only has a context length of 8k.

Then you are happy 😄 that you can just scale llama-3 PoSE to 96k without training, only needing to modify max_position_embeddings and rope_theta.

But then you are sad 😢 it only improves the model's long-context retrieval performance (i.e., finding needles) while hardly improving its long-context utilization capability (doing QA and summarization).

But then you are happy 😁 that the
@GradientsTechnologies community has released the long-context Llama-3-8B-Instruct-262K with long context (262k-1M+).

Now we have another paper "Extending Llama-3's Context Ten-Fold Overnight" 📜.

The context length of Llama-3-8B-Instruct is extended from 8K to 80K using QLoRA fine-tuning⚙️.

The training cycle is highly efficient, taking "only" 😂 8 hours on a single 8xA800 (80G) GPU machine.

The model also preserves its original capability over short contexts. ✁

The dramatic context extension is mainly attributed to merely 3.5K synthetic training samples generated by GPT-4.📊

The paper suggests that the context length could be extended far beyond 80K with more computation resources (😅 GPU-poor).

The team plans to publicly release all resources, including data, model, data generation pipeline, and training code, to facilitate future research from the ❤️ community.

Paper: https://arxiv.org/abs/2404.19553

This is where we are... until next time... 🌟

Extending Llama-3's Context Ten-Fold Overnight (2404.19553)