Spaces:

jeremyadd
/

mini_datathon

Sleeping

App Files Files Community

mini_datathon / config.py

jeremyadd

Upload 11 files

b5f6a08 verified 6 months ago

raw

history blame

4.02 kB

	# Presentation of the challenge
	context_markdown = """
	The Goal of the first challenge is to estimate the category of the uploaded youtube video.

	"""
	content_markdown = """
	### Multi-class Problem

	It has the following features/target:

	#### Features

	- local_path: The local path to the upload's data.
	- upload_id: The unique identifier of the upload.
	- clean_upload_id: The upload_id with the "suicide_out_" prefix removed.
	- upload_type: An enumeration representing the type of upload. Default is UploadType.GENERAL.
	- features: A dictionary containing additional features associated with the upload.
	- title: The title of the upload.
	- playlist_title: The title of the playlist the upload belongs to.
	- description: The description of the upload.
	- duration_string: The duration of the upload in string format.
	- duration: The duration of the upload in seconds.
	- upload_date: The date when the upload was uploaded.
	- view_count: The number of views the upload has received.
	- comment_count: The number of comments on the upload.
	- like_count: The number of likes on the upload.
	- tags: The tags associated with the upload.

	### Target
	- categories: The categories associated with the upload.


	You can find the details about the context/data/challenge [here](https://drive.google.com/file/d/1qyEmi6UUWlyzeVPhPnqY2JNRHBPutak-/view?usp=sharing)
	"""
	#------------------------------------------------------------------------------------------------------------------#

	# Guide for the participants to get X_train, y_train and X_test
	# The google link can be placed in your google drive => get the shared links and place them here.
	data_instruction_commands = """
	The data can be parsed using the [youtube_modules.py](https://drive.google.com/file/d/1FCKpBTvTdL2RoNpIp9fHY18006CiglT2/view?usp=drive_link) script.
	You can find the readme [here](https://drive.google.com/file/d/1wBJmwfZ9JzcQ0MxvwamYxBjwYbpEsgMx/view?usp=drive_link)

	```python
	from youtube_modules import *
	import pickle
	import random
	import numpy as np

	train_uploads: List[Upload] = pickle.load(open("<path/to/data>/train_uploads.pkl", 'rb' ))

	test_uploads: List[Upload] = pickle.load(open("<path/to/data>/test_uploads.pkl", 'rb' ))
	```

	Make sure to upload your predictions as a .csv file with the columns: "id" (range(len(test_file))) and "label" (1, 2, 3).

	## Quickstart: use notebook remotely
	1. conda activate py38_default
	2. notebook load from remote
	$ jupyter notebook --ip=0.0.0.0 --no-browser

	then after receiving the URL copied and put it in your browser
	https://127.0.0.1:8888/?token=7de849a953befd20682d57ac33b3e6cd9024ca25eed2433

	Then replace 127.0.0.1 with your I.P. e.g
	https://1.222.333.4:8888/?token=7de849a953befd20682d57ac33b3e6cd9024ca25eed24336
	"""

	# Target on test (hidden from the participants)
	Y_TEST_GOOGLE_PUBLIC_LINK = 'https://drive.google.com/file/d/1gQ3_ywJElpcBrewCFhVUM-fnV4SN62na/view?usp=sharing'
	#------------------------------------------------------------------------------------------------------------------#

	# Evaluation metric and content
	from sklearn.metrics import f1_score
	GREATER_IS_BETTER = True # example for ROC-AUC == True, for MSE == False, etc.
	SKLEARN_SCORER = f1_score
	SKLEARN_ADDITIONAL_PARAMETERS = {'average': 'weighted'}

	evaluation_content = """
	The predictions are evaluated according to the f1-score (weighted).

	You can get it using
	```python
	from sklearn.metrics import f1_score

	f1_score(y_train, y_pred_train, average='weighted')
	```
	More details [here](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score).
	"""
	#------------------------------------------------------------------------------------------------------------------#

	# leaderboard benchmark score, will be displayed to everyone
	BENCHMARK_SCORE = 0.2
	#------------------------------------------------------------------------------------------------------------------#