Dr. Joao Paulo Schwarz Schuler's picture

6 8

Dr. Joao Paulo Schwarz Schuler PRO

schuler

https://www.researchgate.net/profile/Joao-Paulo-Schwarz-Schuler

joaopauloschuler

AI & ML interests

artificial intelligence

Recent Activity

updated a model about 14 hours ago

schuler/experimental-JP47D62G

published a model about 14 hours ago

schuler/experimental-JP47D62G

liked a Space about 1 month ago

schuler/kphi3-talk-to-JP47D56C

View all activity

Organizations

None yet

schuler's activity

updated a model about 14 hours ago

schuler/experimental-JP47D62G

Text Generation • Updated about 14 hours ago

published a model about 14 hours ago

schuler/experimental-JP47D62G

Text Generation • Updated about 14 hours ago

liked a Space about 1 month ago

Talk to KPhi-3-JP47D56C

💬

Experimental KPhi-3-JP47D56C

updated a Space about 1 month ago

Talk to KPhi-3-JP47D56C

💬

Experimental KPhi-3-JP47D56C

reacted to AdinaY's post with 👍 about 2 months ago

Post

2489

Two AI startups, DeepSeek & Moonshot AI , keep moving in perfect sync 👇

✨ Last December: DeepSeek & Moonshot AI released their reasoning models on the SAME DAY.
DeepSeek: deepseek-ai/DeepSeek-R1
MoonShot: https://github.com/MoonshotAI/Kimi-k1.5

✨ Last week: Both teams published papers on modifying attention mechanisms on the SAME DAY AGAIN.
DeepSeek: Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (2502.11089)
Moonshot: MoBA: Mixture of Block Attention for Long-Context LLMs (2502.13189)

✨ TODAY:
DeepSeek unveiled Flash MLA: a efficient MLA decoding kernel for NVIDIA Hopper GPUs, optimized for variable-length sequences.
https://github.com/deepseek-ai/FlashMLA

Moonshot AI introduces Moonlight: a 3B/16B MoE trained on 5.7T tokens using Muon, pushing the Pareto frontier with fewer FLOPs.
moonshotai/Moonlight-16B-A3B

What's next? 👀

reacted to onekq's post with 👀 about 2 months ago

Post

2354

Huge disappointment to Claude Sonnet 3.7 😞 Big performance regression. Worse than the June version in 2024. 👎
onekq-ai/WebApp1K-models-leaderboard

I'm sure though this version improves on something, only not the thing my leaderboard measures. This proves the point that no model can be the best on everything.

2 replies

posted an update about 2 months ago

Post

1956

📢 Old Research Alert: Making Computer Vision Models Smaller & Smarter!

Years ago, I coded an optimization in the first layers of a convolutional neural network (computer vision) and ended never posting here. The optimization decreases the number of parameters while increasing accuracy. The optimization relies in separating (branching) chromatic and achromatic information through the layers of a neural network.

YouTube videos:
https://www.youtube.com/watch?v=u4vZZmBMFLw
https://www.youtube.com/watch?v=-BD293yqdKI

Source codes:
https://github.com/joaopauloschuler/two-branch-plant-disease
https://github.com/joaopauloschuler/two-path-noise-lab-plant-disease

Research papers:
https://www.researchgate.net/publication/361511874_Color-Aware_Two-Branch_DCNN_for_Efficient_Plant_Disease_Classification
https://www.researchgate.net/publication/355215213_Reliable_Deep_Learning_Plant_Leaf_Disease_Classification_Based_on_Light-Chroma_Separated_Branches

May the force be with you.

posted an update about 2 months ago

Post

3408

🔮 GPT-3 implemented in pure Free Pascal!
https://github.com/joaopauloschuler/gpt-3-for-pascal

This implementation follows the GPT-3 Small architecture from the landmark paper "Language Models are Few-Shot Learners":

┌─────────────────────────┐
│     Input Layer       │
├─────────────────────────┤
│ Token & Positional    │
│     Embedding         │
├─────────────────────────┤
│   12x Transformer     │
│      Blocks           │
│  - 12 heads           │
│  - 768 hidden dims    │
│  - 3072 intermediate  │
├─────────────────────────┤
│   Output Layer        │
└─────────────────────────┘

Clean Pascal Implementation

for CntLayer := 1 to {Layers=}12 do
begin
  Result.AddTransformerBlockCAI(
    {Heads=}12, 
    {intermediate dimensions=}4*768, 
    {NoForward=}true, 
    {HasNorm=}true, 
    false
  );
end;

updated 9 models 2 months ago

replied to their post 2 months ago

In the case that you run into any roadblock at modifying an existing model with this optimization so you can train the optimized model from scratch, please feel free to ask for help.

updated 2 models 2 months ago

schuler/experimental-JP47D54

Text Generation • Updated Feb 15 • 15

schuler/experimental-JP47D54B

Text Generation • Updated Feb 15 • 14