Dataset inquiry

#1
by drja23 - opened

Hi Prateek, just wanted to ask where you got the dataset from to train this model
If possible could you please share the same dataset?

Owner
โ€ข
edited May 18

It's available at kaggle.com, just type thin section dataset in kaggle search. But remember the dataset is only for research not commercial. I am going to upload it on hf also.

Owner
โ€ข
edited May 18

Are you working on geology language model, please let me know we can colab.

I'm working on Fine-Tuning LLaVa to correctly analyze and give a brief summary of thin section images. It would be great if you could upload the dataset on HF.

I couldnt find any dataset on kaggle for thin sections. Could you please link me to it?

Owner

Great I am also working on the same but with idefics2 and llava. Here is the link https://www.kaggle.com/datasets/prateekvyas/petrology-thin-section-data . If you want to colab please let me know.

Great! Thank you so much! I'm still quite new to Fine-Tuning and ML in general. I may just slow you down but If i have any doubts, i'll make sure to message you again. Thanks once again

Great! Thank you so much! I'm still quite new to Fine-Tuning and ML in general. I may just slow you down but If i have any doubts, i'll make sure to message you again. Thanks once again

Owner

Any time, ๐Ÿ‘

Owner

I done the uploading, you can now get the data from hugging face at https://huggingface.co/datasets/pvyas96/Thin_Section_Dataset Thank you.

Thank you!

Hey Prateek, I had a question
Whenever i try to load the dataset the labels for the images show a single number instead of the folder name. Am I doing something wrong or is it a formatting mistake? Because even the dataset viewer on HF shows a number infront of the mineral label

Owner

The hf automatically convert the data to parquet data formate. If you want to retain the data structure as it is download from kaggle, that why I uploaded it on kaggle initially. Let me see it tomorrow then I will tell you.

Owner

Show the code screenshot for loading dataset.

image.png
this is the code output

image.png
Its the same here, a number is followed by the true label. Not sure why that happens or how to fix it

Owner
โ€ข
edited May 19

Yes, the labels are correct I checked the labels are automatically encoded with numbers, 0 means andalusite, 1 means argillaceous and so on alphabetically. you can get the keys of it.

oh okay, thanks!

Sign up or log in to comment