Fused Ion 's picture

3 4 16

Fused Ion

fusedion

·

AI & ML interests

None yet

Recent Activity

liked a dataset about 2 months ago

ibm-granite/GneissWeb

liked a model about 2 months ago

DevQuasar/NovaSky-AI.Sky-T1-32B-Flash-GGUF

reacted to schuler's post with 👍 about 2 months ago

📢 New Research Alert: Making Language Models Smaller & Smarter! Thrilled to share the latest technical report demonstrating how to reduce language model parameters by 77% while maintaining performance. The secret? Grouped pointwise convolutions. Yes. We brought a method from computer vision to the transformers arena. 🔑 Key Findings: • 77% parameter reduction. • Maintained model capabilities. • Improved generalization. Paper: https://www.researchgate.net/publication/388835829_SAVING_77_OF_THE_PARAMETERS_IN_LARGE_LANGUAGE_MODELS_TECHNICAL_REPORT Code: https://github.com/joaopauloschuler/less-parameters-llm

View all activity

Organizations

None yet

fusedion's activity

upvoted a paper 2 months ago

ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer

Paper • 2501.15570 • Published Jan 26 • 24

upvoted 2 papers 6 months ago

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Paper • 2410.10812 • Published Oct 14, 2024 • 17

Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published Oct 1, 2024 • 150

upvoted a paper 9 months ago

Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

Paper • 2407.18248 • Published Jul 25, 2024 • 34