Spaces:

jeremyadd
/

mini_datathon

Sleeping

File size: 4,020 Bytes

b5f6a08

# Presentation of the challenge
context_markdown = """

The Goal of the first challenge is to estimate the category of the uploaded youtube video.



"""
content_markdown = """

### Multi-class Problem



It has the following features/target:



#### Features



- local_path: The local path to the upload's data.

- upload_id: The unique identifier of the upload.

- clean_upload_id: The upload_id with the "suicide_out_" prefix removed.

- upload_type: An enumeration representing the type of upload. Default is UploadType.GENERAL.

- features: A dictionary containing additional features associated with the upload.

- title: The title of the upload.

- playlist_title: The title of the playlist the upload belongs to.

- description: The description of the upload.

- duration_string: The duration of the upload in string format.

- duration: The duration of the upload in seconds.

- upload_date: The date when the upload was uploaded.

- view_count: The number of views the upload has received.

- comment_count: The number of comments on the upload.

- like_count: The number of likes on the upload.

- tags: The tags associated with the upload.



### Target

- categories: The categories associated with the upload.





You can find the details about the context/data/challenge [here](https://drive.google.com/file/d/1qyEmi6UUWlyzeVPhPnqY2JNRHBPutak-/view?usp=sharing)

"""
#------------------------------------------------------------------------------------------------------------------#

# Guide for the participants to get X_train, y_train and X_test
# The google link can be placed in your google drive => get the shared links and place them here.
data_instruction_commands = """

The data can be parsed using the [youtube_modules.py](https://drive.google.com/file/d/1FCKpBTvTdL2RoNpIp9fHY18006CiglT2/view?usp=drive_link) script.

You can find the readme [here](https://drive.google.com/file/d/1wBJmwfZ9JzcQ0MxvwamYxBjwYbpEsgMx/view?usp=drive_link)



```python

from youtube_modules import *

import pickle

import random

import numpy as np



train_uploads: List[Upload] = pickle.load(open("<path/to/data>/train_uploads.pkl", 'rb' ))



test_uploads: List[Upload] = pickle.load(open("<path/to/data>/test_uploads.pkl", 'rb' ))

```



Make sure to upload your predictions as a .csv file with the columns: "id" (range(len(test_file))) and "label" (1, 2, 3).



## Quickstart: use notebook remotely

1. conda activate py38_default 

2. notebook load from remote 

$ jupyter notebook --ip=0.0.0.0 --no-browser

 

then after receiving the URL copied and put it in your browser

 https://127.0.0.1:8888/?token=7de849a953befd20682d57ac33b3e6cd9024ca25eed2433



Then replace 127.0.0.1 with your I.P. e.g

 https://1.222.333.4:8888/?token=7de849a953befd20682d57ac33b3e6cd9024ca25eed24336

"""

# Target on test (hidden from the participants)
Y_TEST_GOOGLE_PUBLIC_LINK = 'https://drive.google.com/file/d/1gQ3_ywJElpcBrewCFhVUM-fnV4SN62na/view?usp=sharing'
#------------------------------------------------------------------------------------------------------------------#

# Evaluation metric and content 
from sklearn.metrics import f1_score
GREATER_IS_BETTER = True  # example for ROC-AUC == True, for MSE == False, etc.
SKLEARN_SCORER = f1_score
SKLEARN_ADDITIONAL_PARAMETERS = {'average': 'weighted'}

evaluation_content = """

The predictions are evaluated according to the f1-score (weighted).



You can get it using 

```python

from sklearn.metrics import f1_score



f1_score(y_train, y_pred_train, average='weighted')

```

More details [here](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score).

"""
#------------------------------------------------------------------------------------------------------------------#

# leaderboard benchmark score, will be displayed to everyone
BENCHMARK_SCORE = 0.2
#------------------------------------------------------------------------------------------------------------------#