Dr. Joao Paulo Schwarz Schuler PRO

schuler

AI & ML interests

artificial intelligence

Recent Activity

updated a model 3 days ago
schuler/experimental-JP47D62B
published a model 3 days ago
schuler/experimental-JP47D62B
liked a Space 11 days ago
schuler/kphi3-talk-to-JP47D56C
View all activity

Organizations

None yet

schuler's activity

reacted to AdinaY's post with šŸ‘ 17 days ago
view post
Post
2467
Two AI startups, DeepSeek & Moonshot AI , keep moving in perfect sync šŸ‘‡

āœØ Last December: DeepSeek & Moonshot AI released their reasoning models on the SAME DAY.
DeepSeek: deepseek-ai/DeepSeek-R1
MoonShot: https://github.com/MoonshotAI/Kimi-k1.5

āœØ Last week: Both teams published papers on modifying attention mechanisms on the SAME DAY AGAIN.
DeepSeek: Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (2502.11089)
Moonshot: MoBA: Mixture of Block Attention for Long-Context LLMs (2502.13189)

āœØ TODAY:
DeepSeek unveiled Flash MLA: a efficient MLA decoding kernel for NVIDIA Hopper GPUs, optimized for variable-length sequences.
https://github.com/deepseek-ai/FlashMLA

Moonshot AI introduces Moonlight: a 3B/16B MoE trained on 5.7T tokens using Muon, pushing the Pareto frontier with fewer FLOPs.
moonshotai/Moonlight-16B-A3B

What's next? šŸ‘€
reacted to onekq's post with šŸ‘€ 17 days ago
view post
Post
2138
Huge disappointment to Claude Sonnet 3.7 šŸ˜ž Big performance regression. Worse than the June version in 2024. šŸ‘Ž
onekq-ai/WebApp1K-models-leaderboard

I'm sure though this version improves on something, only not the thing my leaderboard measures. This proves the point that no model can be the best on everything.
  • 2 replies
Ā·
posted an update 17 days ago
view post
Post
1939
šŸ“¢ Old Research Alert: Making Computer Vision Models Smaller & Smarter!

Years ago, I coded an optimization in the first layers of a convolutional neural network (computer vision) and ended never posting here. The optimization decreases the number of parameters while increasing accuracy. The optimization relies in separating (branching) chromatic and achromatic information through the layers of a neural network.

YouTube videos:
https://www.youtube.com/watch?v=u4vZZmBMFLw
https://www.youtube.com/watch?v=-BD293yqdKI

Source codes:
https://github.com/joaopauloschuler/two-branch-plant-disease
https://github.com/joaopauloschuler/two-path-noise-lab-plant-disease

Research papers:
https://www.researchgate.net/publication/361511874_Color-Aware_Two-Branch_DCNN_for_Efficient_Plant_Disease_Classification
https://www.researchgate.net/publication/355215213_Reliable_Deep_Learning_Plant_Leaf_Disease_Classification_Based_on_Light-Chroma_Separated_Branches

May the force be with you.
posted an update 25 days ago
view post
Post
3389
šŸ”® GPT-3 implemented in pure Free Pascal!
https://github.com/joaopauloschuler/gpt-3-for-pascal

This implementation follows the GPT-3 Small architecture from the landmark paper "Language Models are Few-Shot Learners":
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
ā”‚     Input Layer       ā”‚
ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¤
ā”‚ Token & Positional    ā”‚
ā”‚     Embedding         ā”‚
ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¤
ā”‚   12x Transformer     ā”‚
ā”‚      Blocks           ā”‚
ā”‚  - 12 heads           ā”‚
ā”‚  - 768 hidden dims    ā”‚
ā”‚  - 3072 intermediate  ā”‚
ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¤
ā”‚   Output Layer        ā”‚
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

Clean Pascal Implementation
for CntLayer := 1 to {Layers=}12 do
begin
  Result.AddTransformerBlockCAI(
    {Heads=}12, 
    {intermediate dimensions=}4*768, 
    {NoForward=}true, 
    {HasNorm=}true, 
    false
  );
end;

replied to their post 29 days ago
view reply

In the case that you run into any roadblock at modifying an existing model with this optimization so you can train the optimized model from scratch, please feel free to ask for help.