Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

liked a dataset 3 days ago

HuggingFaceFW/fineweb-2

posted an update 8 days ago

In Honour of This Year's NeurIPs Test of Time Paper Awardees This year's NIPs Test of Time Paper Awards went to two groundbreaking papers: 1. Generative Adversarial Nets (Goodfellow et al) 2. Sequence to Sequence Learning with Neural Networks (Ilya et al) Let's explore how these papers helped pioneered breakthroughs in today's AI: Full Article: https://huggingface.co/blog/Jaward/nip

published an article 8 days ago

In Honour of This Year's NeurIPs Test of Time Paper Awardees

View all activity

Articles

In Honour of This Year's NeurIPs Test of Time Paper Awardees

8 days ago

• 1

Rethinking Backpropagation: Thoughts on What's Wrong with Backpropagation

16 days ago

• 5

Journey With Me Into The Mind of Large Language Models: Interesting Findings in AnthropicAI's Scaling Monosemanticity paper.

May 22

• 2

On Coding Your First Attention

Apr 21

• 7

Organizations

Jaward's activity

liked a dataset 3 days ago

HuggingFaceFW/fineweb-2

Viewer • Updated 10 days ago • 13.8B • 59.1k • 317

posted an update 8 days ago

Post

556

In Honour of This Year's NeurIPs Test of Time Paper Awardees
This year's NIPs Test of Time Paper Awards went to two groundbreaking papers:
1. Generative Adversarial Nets (Goodfellow et al)
2. Sequence to Sequence Learning with Neural Networks (Ilya et al)
Let's explore how these papers helped pioneered breakthroughs in today's AI:

Full Article: https://huggingface.co/blog/Jaward/nip

published an article 8 days ago

Article

In Honour of This Year's NeurIPs Test of Time Paper Awardees

•

8 days ago

• 1

posted an update 9 days ago

Post

614

Lightweight implementation of the seminal paper “Sequence to Sequence Learning with Neural Networks”

Built, trained and eval a 2 layer deep seq2seq LSTM-based model (~10M params) on German-English corpus of Multi30K dataset. In honor of
ilya sutskever et al for winning this year’s NeurIPSConf Test of Time paper award 🫡

Code: https://github.com/Jaykef/ai-algorithms/blob/main/seq2seq.ipynb

posted an update 16 days ago

Post

466

Rethinking Backpropagation: Thoughts on What's Wrong with Backpropagation

As a young researcher, I've often pondered the limitations of backpropagation, especially when mapped with how learning occurs in the human brain. While backpropagation has been the workhorse of deep learning, it isn't without flaws. In this post, I aim to share some thoughts on these shortcomings from first principles.

Full article
https://huggingface.co/blog/Jaward/rethinking-backpropagation

posted an update 18 days ago

Post

2403

Implements compute-efficient DeepPCR algorithm which parallelizes sequential operations thus speeding up inference and training of neural networks. DeepPCR can significantly reduce the time complexity in operations such as denoising in latent diffusion space from O(L) to O(log2 L).

Code: https://github.com/Jaykef/ai-algorithms/blob/main/deep_pcr.ipynb

liked a dataset 22 days ago

osunlp/Multimodal-Mind2Web

Viewer • Updated Jun 5 • 14.2k • 1.43k • 51

posted an update 22 days ago

Post

1222

This is supercool!!
Explores o1-like multimodal reasoning.
Multi-agents with DPO is a nice touch 👍
Paper: https://arxiv.org/pdf/2411.14432
Code: https://github.com/dongyh20/Insight-V

upvoted a paper 25 days ago

Multimodal Autoregressive Pre-training of Large Vision Encoders

Paper • 2411.14402 • Published 26 days ago • 41

upvoted a paper 30 days ago

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15 • 109

posted an update about 1 month ago

Post

1616

Ok RNNs can rap too:)

Here we implement the seminal RNN paper “Generating Text with Recurrent Neural Networks"- we train a character-level multiplicative recurrent neural network model (~250k params) for 1000 epochs with Adam opt on 2pac's "Hit 'em Up", sample was fun lol.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/generating_texts_with_rnns.ipynb

liked a Space about 1 month ago

Running

1.2k

🐢

Qwen2.5 Coder Artifacts

posted an update about 1 month ago

Post

1731

Interesting Work on Reasoning 🤔
- explores a new take on few-shot reasoning while challenging assumptions that program synthesis is necessary for abstract reasoning.
- shows test-time training + smart inference tricks can match human-average performance, though at high computational cost. Key insight: proper compute allocation matters more than method (whether symbolic or neural).

Paper: https://ekinakyurek.github.io/papers/ttt.pdf

posted an update about 1 month ago

Post

2100

It's work like this that in some way signal the eventual “dominance” of AI over all the sciences.

“We train our model on the six-dimensional N-body phase space, predicting particle velocities as the time derivative of the model’s displacement outputs”

The emulator is capable of predicting
the nonlinear displacement and velocity fields for 128^3 particles in half a second on a single GPU🤯

1 reply

liked a model about 2 months ago

microsoft/OmniParser

Image-Text-to-Text • Updated 15 days ago • 7.36k • 1.49k

upvoted 2 papers about 2 months ago

AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant

Paper • 2410.18603 • Published Oct 24 • 31

GPT-4o System Card

Paper • 2410.21276 • Published Oct 25 • 82

posted an update about 2 months ago

Post

1740

Triton nanoGPT now has a custom cross entropy loss kernel 🚀
Next: matmul, gradually overthrowing all major PyTorch ops:)

Simplified pseudo for parallel cross-entropy loss compute:
- init program: get pid, compute offsets, load targets.
- init row_max and row_sum.
- for-loop1 (find max logits): update row_max with max logits.
- for-loop2 (compute softmax and loss): compute row_sum, update loss.
- add log(row_sum) and store loss.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/triton_nanoGPT.ipynb

posted an update 2 months ago

Post

404

This has to be the first peak performance level use case of a non-autoregressive architecture for TTS. Flow matching for the win!!

Demo: mrfakename/E2-F5-TTS
Model: SWivid/E2-TTS

upvoted a paper 2 months ago

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Paper • 2410.06885 • Published Oct 9 • 42