Great stuff! Any chance y'all are looking into replicating this at the 70b scale? would be super useful to have a performant 35-40b as it's hard to squeeze llama-3.1-70b on a single node
Β· Sign up or log in to comment