Martin Viewegger

Viewegger

AI & ML interests

None yet

Recent Activity

Organizations

None yet

Viewegger's activity

New activity in jpgallegoar/F5-Spanish 20 days ago
New activity in PetrosStav/F5-TTS-Greek 22 days ago
New activity in marduk-ra/F5-TTS-German 22 days ago

Training process details

4
#2 opened 22 days ago by Nils11
reacted to m-ric's post with ๐Ÿ”ฅ 28 days ago
view post
Post
785
๐—”๐—ฟ๐—ฒ ๐˜€๐—ฐ๐—ฎ๐—น๐—ถ๐—ป๐—ด ๐—น๐—ฎ๐˜„๐˜€ ๐—ผ๐˜ƒ๐—ฒ๐—ฟ? ๐—” ๐—ฟ๐—ฒ๐—ฝ๐—ผ๐—ฟ๐˜ ๐—ณ๐—ฟ๐—ผ๐—บ ๐˜๐—ต๐—ฒ ๐—œ๐—ป๐—ณ๐—ผ๐—ฟ๐—บ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—ฎ๐—ป๐—ป๐—ผ๐˜‚๐—ป๐—ฐ๐—ฒ๐—ฑ ๐˜๐—ต๐—ฎ๐˜ ๐—ข๐—ฝ๐—ฒ๐—ป๐—”๐—œ ๐—ถ๐˜€ ๐˜€๐—ฒ๐—ฒ๐—ถ๐—ป๐—ด ๐—ฑ๐—ถ๐—บ๐—ถ๐—ป๐—ถ๐˜€๐—ต๐—ถ๐—ป๐—ด ๐—ฟ๐—ฒ๐˜๐˜‚๐—ฟ๐—ป๐˜€ ๐—ณ๐—ฟ๐—ผ๐—บ ๐˜€๐—ฐ๐—ฎ๐—น๐—ถ๐—ป๐—ด ๐˜‚๐—ฝ ๐˜๐—ต๐—ฒ ๐—ป๐—ฒ๐˜…๐˜ ๐—š๐—ฃ๐—ง ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€.

๐Ÿ“Š What are scaling laws? These are empiric laws that say "Every time you increase compute spent in training 10-fold, your LLM's performance will go up by a predictable tick". Of course, they apply only if you train your model with the right methods.

The image below illustrates it: they're from a paper by Google, "Scaling Autoregressive Models for Content-Rich Text-to-Image Generation", and they show how quality and instruction following of models improve when you scale the model up (which is equivalent to scaling up the compute spent in training).

โžก๏ธ These scaling laws have immense impact: they triggered the largest gold rush ever, with companies pouring billions into scaling up theiur training. Microsoft and OpenAI spent 100B into their "Startgate" mega training cluster, due to start running in 2028.

๐Ÿค” So, what about these reports of scaling laws slowing down?

If they are true, they would mean a gigantic paradigm shift, as the hundreds of billions poured by AI companies into scaling could be a dead-end. โ›”๏ธ

But I doubt it: until the most recent publications, scaling laws showed no signs of weakness, and the researchers at the higher end of the scale-up seems to imply the scaling up continues.

Wait and see!
  • 1 reply
ยท
reacted to yongchanghao's post with ๐Ÿ”ฅ about 1 month ago
view post
Post
3743
We just released a paper (NeuZip) that compresses VRAM in a lossless manner to run larger models. This should be particularly useful when VRAM is insufficient during training/inference. Specifically, we look inside each floating number and find that the exponents are highly compressible (as shown in the figure below).

Read more about the work at NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks (2410.20650)
New activity in gokaygokay/Flux-Seamless-Texture-LoRA about 1 month ago

Size of the dataset?

8
#1 opened about 1 month ago by Viewegger
New activity in nerijs/pixel-art-3.5L about 1 month ago

Thank you!

1
#1 opened about 1 month ago by Viewegger
New activity in kodoqmc/XTTS-v2_PeterDrury about 2 months ago

Hyperaparameters

#1 opened about 2 months ago by Viewegger