From your experience what would be a good methodology for using a 1048k model for filtering pre-training data

by TimeLordRaps - opened Apr 30, 2024

Apr 30, 2024

Idea: use long context windows to select the best document from a set of documents that fit in its context window as a proxy for high quality pretraining data.
Secondary idea: use long context windows to order the documents in a set of documents that fit in its context window as a curriculum for high quality pretraining data

Your thoughts?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment