How can I get large corpus dataset (over 200 Millions of records) in a tsv file format to encode with intfloat/e5-large-v2 as an embedding model ?

#15
by liorf95 - opened

The tsv file format should be as follows:
id text title
1 Alice likes adventures. Alice
2 Bob likes reading books. Bob
3 Charlie is a free-spirited artist. Charlie

Sign up or log in to comment