Question about continued training.

#14

by neverdoubt - opened Apr 2, 2024

Discussion

neverdoubt

Apr 2, 2024

Thank you for the sharing a great work.
How many tokens are used for the continued training?

Joseph717171

Apr 2, 2024

•

edited Apr 2, 2024

source: https://arxiv.org/html/2312.15166v2

Joseph717171

Apr 2, 2024

•

edited Apr 2, 2024

Source: https://arxiv.org/html/2312.15166v2

falca

Apr 2, 2024

So by "continued pretraining" you mean instruction tuning? Otherwise can't find info on that either. Would really appreciate if you clarify on that moment a little more.

Joseph717171

Apr 2, 2024

No, for UpStage’s Solar-10.7B, they took base weights, Ups-Scaled them, and then continued pretraining. Pretraining is different than fine-tuning; it requires more data/information than fine-tuning. “2. Depth Up-Scaling”, details the continued pretraining. “3. Training”, details the fine-tuning and instruction tuning. With that being said, perhaps UpStage, can fill in any blanks that we don’t have the information to fill. 🤔

falca

Apr 15, 2024

Yes, I've read that section. Unfortunately, it does not mention anything about data for pretraining: datasets or number of tokens.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment