Antonio Linares's picture

3 2

Antonio Linares

fivetech

·

http://www.fivetechsoft.com

AI & ML interests

AI R+D

Recent Activity

reacted to m-ric's post with 🚀 about 2 months ago

STOP EVERYTHING NOW - we might finally have a radical architecture improvement over Transformers!!! 🚨 A lone scientist just proposed Tiny Recursive Model (TRM), and it is literally the most impressive model that I've seen this year. ➡️ Tiny Recursive Model is 7M parameters ➡️ On ARC-AGI, it beats flagship models like Gemini-2.5-pro Consider how wild this is: Gemini-2.5-pro must be over 10,000x bigger and had 1,000 as many authors 😂 (Alexia is alone on the paper) What's this sorcery? In short: it's a very tiny Transformers, but it loops over itself at two different frequencies, updating two latent variables: one for the proposed answer and one for the reasoning. @AlexiaJM started from the paper Hierarchical Reasoning Model, published a few months ago, that already showed breakthrough improvement on AGI for its small size (27M) Hierarchical Reasoning Model had introduced one main feature: 🔎 Deep supervision In their model, one part (here one layer) would run at high frequency, and another would be lower frequency, running only every n steps. They had used a recurrent architecture, where these layers would repeat many times ; but to make it work they had to do many approximations, including not fully backpropagating the loss through all layers. Alexia studied what was useful and what wasn't, and cleaned the architecture as follows : Why use a recurrent architecture, when you can just make it a loop? ➡️ She made the network recursive, looping over itself Why use 2 latent variables ? ➡️ She provides a crystal clear explanation : the one that changes frequently is the reasoning, the one that changes at low frequency is the proposed answer. ➡️ She runs ablation studies to validate that 2 is indeed optimal. This new setup is a much more elegant way to process reasoning than generating huge chains of tokens as all flagship models currently do. This might be the breakthrough we've been awaiting for so long!

updated a dataset 11 months ago

fivetech/forums_database

updated a dataset 12 months ago

fivetech/notebooklm

View all activity

Organizations

None yet

models 8

fivetech/gemma-1.1-7b-it-Q2_K-GGUF

9B • Updated Apr 6, 2024 • 15

fivetech/tinyLlama

1B • Updated Apr 5, 2024 • 11

fivetech/tinyMedical

Text Generation • 1B • Updated Jan 25, 2024 • 3

fivetech/test

Updated Jan 25, 2024

fivetech/tinyMedical-GGUF

1B • Updated Jan 25, 2024 • 15

fivetech/forums

Text Generation • 3B • Updated Jan 23, 2024 • 4

fivetech/codephi-2.7b

Text Generation • 3B • Updated Jan 23, 2024 • 7

fivetech/test1

Updated Jan 16, 2024 • 3

datasets 14

fivetech/forums_database

Updated Jan 10 • 31

fivetech/notebooklm

Viewer • Updated Dec 11, 2024 • 1 • 24 • 1

fivetech/galaxy

Viewer • Updated Nov 21, 2024 • 2 • 42

fivetech/CTMU

Viewer • Updated Nov 18, 2024 • 1 • 17

fivetech/Harbour_hvm

Viewer • Updated Oct 26, 2024 • 2 • 26

fivetech/webinar_sept2024

Updated Sep 10, 2024 • 13

fivetech/tao

Updated Feb 12, 2024 • 14

fivetech/matus

Updated Feb 12, 2024 • 40

fivetech/astrology

Updated Feb 2, 2024 • 21 • 7

fivetech/forums_json

Viewer • Updated Jan 25, 2024 • 1 • 19

View 14 datasets