Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
vladbogo 
posted an update Feb 22
Post
Web Rephrase Augmented Pre-training (WRAP) enhances language model training efficiency by transforming documents into structured formats.

Key aspects:
* Utilizes an instruction-tuned model to rephrase web content into styles such as Wikipedia or Q/A, creating a blend of synthetic and real data for training.
* Demonstrated improvements of over 10% better perplexity, alongside more than 2% increase in zero-shot question-answering accuracy.

Congrats to the authors for their work!

Paper: Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling (2401.16380)

A more detailed overview can be found in my blog: https://huggingface.co/blog/vladbogo/rephrasing-the-web. Feedback is appreciated!

In this post