flax-community
/

clip-spanish

Inference Endpoints

Model card Files Files and versions Community

clip-spanish / training.md

edugp's picture

Update README and add a training doc

5019883 over 3 years ago

|

530 Bytes

	# Training:
	* Download tsv files from here: https://github.com/google-research-datasets/wit/blob/main/DATA.md
	* Use `prepare_wit.py` to download images from Wikipedia as annotated on each TSV file.
	* Use `scale_converter.py` to remove corrupt images and resize suitable images to 224x224.
	* Use `join_datasets_custom_split.py` to group all JSONs from different subsets of the dataset together.
	* Use `discard_incorrect_files.py` to filter out images that we were not able to convert.
	* Finally, use `run-clip.sh` to train.