Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
Freeman Lewin
EmetTheGolum
Follow
0 followers
·
3 following
https://emetresearch.ai/
Freeman_Lewin
freemanlewin
AI & ML interests
Data and Data Aquisition
Recent Activity
reacted
to
cfahlgren1
's
post
with ❤️
about 1 month ago
You can clean and format datasets entirely in the browser with a few lines of SQL. In this post, I replicate the process @mlabonne used to clean the new https://huggingface.co/datasets/microsoft/orca-agentinstruct-1M-v1 dataset. The cleaning process consists of: - Joining the separate splits together / add split column - Converting string messages into list of structs - Removing empty system prompts https://huggingface.co/blog/cfahlgren1/the-beginners-guide-to-cleaning-a-dataset Here's his new cleaned dataset: https://huggingface.co/datasets/mlabonne/orca-agentinstruct-1M-v1-cleaned
View all activity
Organizations
models
None public yet
datasets
1
EmetTheGolum/Test
Updated
Nov 22, 2024
•
28