DNN

#7
by ConfusedChip - opened

I’m a student studying network and security but I code as much as possible in my free time. I’m wondering if I could split up the layers and attention heads of llama3.2 models across multiple devices and have them communicate through http in the same LAN(or just link them to a specific host address). I’m pretty new to GPT architecture though but I understand some of it.
I want to do it all in c++ and use specific assembly architecture for the calculations. I’m thinking of memory mapping model weights and using SIMD for calculations.
Could anybody give me any insight into how feasible the idea is and how practical it actually is? I’m still going to try however.

Sign up or log in to comment