Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
Sentdex 
posted an update May 1, 2024
Post
8575
Okay, first pass over KAN: Kolmogorov–Arnold Networks, it looks very interesting!

Interpretability of KAN model:
May be considered mostly as a safety issue these days, but it can also be used as a form of interaction between the user and a model, as this paper argues and I think they make a valid point here. With MLP, we only interact with the outputs, but KAN is an entirely different paradigm and I find it compelling.

Scalability:
KAN shows better parameter efficiency than MLP. This likely translates also to needing less data. We're already at the point with the frontier LLMs where all the data available from the internet is used + more is made synthetically...so we kind of need something better.

Continual learning:
KAN can handle new input information w/o catastrophic forgetting, which helps to keep a model up to date without relying on some database or retraining.

Sequential data:
This is probably what most people are curious about right now, and KANs are not shown to work with sequential data yet and it's unclear what the best approach might be to make it work well both in training and regarding the interpretability aspect. That said, there's a rich long history of achieving sequential data in variety of ways, so I don't think getting the ball rolling here would be too challenging.

Mostly, I just love a new paradigm and I want to see more!

KAN: Kolmogorov-Arnold Networks (2404.19756)

Thanks for the summary, @Sentdex !
For the curious, there are some examples on their GitHub repo https://github.com/KindXiaoming/pykan

I'm concerned about the low training speed (10x slower). Do we know anything about the inference latency as well? I think that's key to figure out whether this is viable or not.

·

I am sure more work on inference will be done, looks pretty exciting, possibly reducing the model sizes quite a bunch!

This year several revolutionary ideas have emerged together. Another one is this paper: https://arxiv.org/abs/2404.05903

Pinning a lot of hope on these networks, as the next step up. I think the ability to learn easily, on small amounts of data and without catastrophic forgetting is a massive plus. And so is the interpretability.

Hopefully, through clever research, the speed of training is upped as well, in the near future. Although, given a lower data ingestion need, is speed of training as big an issue?