Jaward Sesay's picture

Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

Articles

Organizations

MLX Community's profile picture

Jaward's activity

posted an update 8 days ago
view post
Post
556
In Honour of This Year's NeurIPs Test of Time Paper Awardees
This year's NIPs Test of Time Paper Awards went to two groundbreaking papers:
1. Generative Adversarial Nets (Goodfellow et al)
2. Sequence to Sequence Learning with Neural Networks (Ilya et al)
Let's explore how these papers helped pioneered breakthroughs in today's AI:

Full Article: https://huggingface.co/blog/Jaward/nip
posted an update 9 days ago
view post
Post
614
Lightweight implementation of the seminal paper “Sequence to Sequence Learning with Neural Networks”

Built, trained and eval a 2 layer deep seq2seq LSTM-based model (~10M params) on German-English corpus of Multi30K dataset. In honor of
ilya sutskever et al for winning this year’s NeurIPSConf Test of Time paper award 🫡

Code: https://github.com/Jaykef/ai-algorithms/blob/main/seq2seq.ipynb
posted an update 16 days ago
view post
Post
466
Rethinking Backpropagation: Thoughts on What's Wrong with Backpropagation

As a young researcher, I've often pondered the limitations of backpropagation, especially when mapped with how learning occurs in the human brain. While backpropagation has been the workhorse of deep learning, it isn't without flaws. In this post, I aim to share some thoughts on these shortcomings from first principles.

Full article
https://huggingface.co/blog/Jaward/rethinking-backpropagation
posted an update 18 days ago
view post
Post
2403
Implements compute-efficient DeepPCR algorithm which parallelizes sequential operations thus speeding up inference and training of neural networks. DeepPCR can significantly reduce the time complexity in operations such as denoising in latent diffusion space from O(L) to O(log2 L).

Code: https://github.com/Jaykef/ai-algorithms/blob/main/deep_pcr.ipynb
posted an update 22 days ago
posted an update about 1 month ago
posted an update about 1 month ago
view post
Post
1731
Interesting Work on Reasoning 🤔
- explores a new take on few-shot reasoning while challenging assumptions that program synthesis is necessary for abstract reasoning.
- shows test-time training + smart inference tricks can match human-average performance, though at high computational cost. Key insight: proper compute allocation matters more than method (whether symbolic or neural).

Paper: https://ekinakyurek.github.io/papers/ttt.pdf
posted an update about 1 month ago
view post
Post
2100
It's work like this that in some way signal the eventual “dominance” of AI over all the sciences.

“We train our model on the six-dimensional N-body phase space, predicting particle velocities as the time derivative of the model’s displacement outputs”

The emulator is capable of predicting
the nonlinear displacement and velocity fields for 128^3 particles in half a second on a single GPU🤯
  • 1 reply
·
posted an update about 2 months ago
view post
Post
1740
Triton nanoGPT now has a custom cross entropy loss kernel 🚀
Next: matmul, gradually overthrowing all major PyTorch ops:)

Simplified pseudo for parallel cross-entropy loss compute:
- init program: get pid, compute offsets, load targets.
- init row_max and row_sum.
- for-loop1 (find max logits): update row_max with max logits.
- for-loop2 (compute softmax and loss): compute row_sum, update loss.
- add log(row_sum) and store loss.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/triton_nanoGPT.ipynb
posted an update 2 months ago
reacted to fdaudens's post with 🔥 2 months ago
view post
Post
3051
The Nobel Prize background for Hopfield and Hinton's work on neural networks is pure gold. It's a masterclass in explaining AI basics.

Key takeaways from the conclusion:
- ML applications are expanding rapidly. We're still figuring out which will stick.
- Ethical discussions are crucial as the tech develops.
- Physics 🤝 AI: A two-way street of innovation.

Some mind-blowing AI applications in physics:
- Discovering the Higgs particle
- Cleaning up gravitational wave data
- Hunting exoplanets
- Predicting molecular structures
- Designing better solar cells

We're just scratching the surface. The interplay between AI and physics is reshaping both fields.

Bonus: The illustrations accompanying the background document are really neat. (Credit: Johan Jarnestad/The Royal Swedish Academy of Sciences)

#AI #MachineLearning #Physics #Ethics #Innovation
  • 1 reply
·
posted an update 2 months ago
reacted to clem's post with 👍 2 months ago
view post
Post
4160
Open-source AI creates healthy competition in a field where natural tendencies lead to extreme concentration of power. Imagine a world where only one or two companies could build software. This is the biggest risk and ethical challenge of them all IMO. Let's fight this!
  • 3 replies
·
reacted to clem's post with 👍 2 months ago
view post
Post
3709
Very few people realize that most of the successful AI startups got successful because they were focused on open science and open-source for at least their first few years. To name but a few, OpenAI (GPT, GPT2 was open-source), Runway & Stability (stable diffusion), Cohere, Mistral and of course Hugging Face!

The reasons are not just altruistic, it's also because sharing your science and your models pushes you to build AI faster (which is key in a fast-moving domain like AI), attracts the best scientists & engineers and generates much more visibility, usage and community contributions than if you were 100% closed-source. The same applies to big tech companies as we're seeing with Meta and Google!

More startups and companies should release research & open-source AI, it's not just good for the world but also increases their probability of success!
·
posted an update 2 months ago
view post
Post
2574
New hobby: creating AI research paper arts lol, using pymupdf to extract text and add background then animate with runway:) code coming soon…
posted an update 3 months ago
view post
Post
373
Triton-accelerated nanoGPT🤕
The WHY behind this ordeal - After practicing triton for about 2 weeks now, I challenged myself into implementing custom triton kernels for Karpathy's nanoGPT and quite an ordeal it was but somehow got something working, not perfect but getting there:), contributions are welcomed.

Code: https://github.com/Jaykef/Triton-nanoGPT
posted an update 3 months ago
view post
Post
1946
This is supercool!!
LlaVA-3D: adds 3D-awareness to LVMs without compromising 2D understanding capabilities.

Method: they developed a unified architecture that maps 2D clip patch features to their corresponding positions in 3D space - enabling joint 2D and 3D vision-language instruction tuning.

Project: https://zcmax.github.io/projects/LLaVA-3D/
posted an update 3 months ago
view post
Post
1340
Some interesting findings in this paper:
- They consider o1 a Large Reasoning Model (LRM) with a different arch from SOTA LLMs.
- Creative justifications: “It is almost as if o1 has gone from hallucinating to gaslighting!”. This is so true, I noticed also it can “hallucinate” its chain-of-thoughts lol.
- Accuracy/Cost Tradeoffs: o1 provides high accuracy but at significant computational and monetary costs due to hidden "reasoning tokens."
Paper: https://www.arxiv.org/abs/2409.13373
posted an update 3 months ago
view post
Post
1486
nanoGPT with Sigmoid Self-Attention
I couldn’t resist had to give it a try:)

Some observations on M2:
SSA was ~5-10% faster in training with similar final loss values, slightly less coherent text generation, marginally higher perplexity, and lower memory usage compared to softmax.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/sigmoid_attn.ipynb
replied to their post 3 months ago
view reply

I used to think this way, but as it turned these models don't just do probability distribution, they are actually learning features between these distributions and to use these features during inference require some "reasoning", capable models (gpt4, gpt3, claude3) prior to OpenAI o1 could barely reason through tasks, o1 now utilizes RL to boost reasoning during inference - scaling at inference has been a huge challenge but somehow OAI figured it out with RL. Obviously we are at an early stage of this breakthrough, proof of reasoning will become clearer in subsequent versions of o1.

Geoffrey Hinton gave a talk on this topic: https://www.youtube.com/watch?v=N1TEjTeQeg0