jeremyadd commited on
Commit
ad5fa96
·
verified ·
1 Parent(s): 8166422

Update config.py

Browse files
Files changed (1) hide show
  1. config.py +61 -92
config.py CHANGED
@@ -1,93 +1,62 @@
1
- # Presentation of the challenge
2
- context_markdown = """
3
- The Goal of the first challenge is to estimate the category of the uploaded youtube video.
4
-
5
- """
6
- content_markdown = """
7
- ### Multi-class Problem
8
-
9
- It has the following features/target:
10
-
11
- #### Features
12
-
13
- - local_path: The local path to the upload's data.
14
- - upload_id: The unique identifier of the upload.
15
- - clean_upload_id: The upload_id with the "suicide_out_" prefix removed.
16
- - upload_type: An enumeration representing the type of upload. Default is UploadType.GENERAL.
17
- - features: A dictionary containing additional features associated with the upload.
18
- - title: The title of the upload.
19
- - playlist_title: The title of the playlist the upload belongs to.
20
- - description: The description of the upload.
21
- - duration_string: The duration of the upload in string format.
22
- - duration: The duration of the upload in seconds.
23
- - upload_date: The date when the upload was uploaded.
24
- - view_count: The number of views the upload has received.
25
- - comment_count: The number of comments on the upload.
26
- - like_count: The number of likes on the upload.
27
- - tags: The tags associated with the upload.
28
-
29
- ### Target
30
- - categories: The categories associated with the upload.
31
-
32
-
33
- You can find the details about the context/data/challenge [here](https://drive.google.com/file/d/1qyEmi6UUWlyzeVPhPnqY2JNRHBPutak-/view?usp=sharing)
34
- """
35
- #------------------------------------------------------------------------------------------------------------------#
36
-
37
- # Guide for the participants to get X_train, y_train and X_test
38
- # The google link can be placed in your google drive => get the shared links and place them here.
39
- data_instruction_commands = """
40
- The data can be parsed using the [youtube_modules.py](https://drive.google.com/file/d/1FCKpBTvTdL2RoNpIp9fHY18006CiglT2/view?usp=drive_link) script.
41
- You can find the readme [here](https://drive.google.com/file/d/1wBJmwfZ9JzcQ0MxvwamYxBjwYbpEsgMx/view?usp=drive_link)
42
-
43
- ```python
44
- from youtube_modules import *
45
- import pickle
46
- import random
47
- import numpy as np
48
-
49
- train_uploads: List[Upload] = pickle.load(open("<path/to/data>/train_uploads.pkl", 'rb' ))
50
-
51
- test_uploads: List[Upload] = pickle.load(open("<path/to/data>/test_uploads.pkl", 'rb' ))
52
- ```
53
-
54
- Make sure to upload your predictions as a .csv file with the columns: "id" (range(len(test_file))) and "label" (1, 2, 3).
55
-
56
- ## Quickstart: use notebook remotely
57
- 1. conda activate py38_default
58
- 2. notebook load from remote
59
- $ jupyter notebook --ip=0.0.0.0 --no-browser
60
-
61
- then after receiving the URL copied and put it in your browser
62
- https://127.0.0.1:8888/?token=7de849a953befd20682d57ac33b3e6cd9024ca25eed2433
63
-
64
- Then replace 127.0.0.1 with your I.P. e.g
65
- https://1.222.333.4:8888/?token=7de849a953befd20682d57ac33b3e6cd9024ca25eed24336
66
- """
67
-
68
- # Target on test (hidden from the participants)
69
- Y_TEST_GOOGLE_PUBLIC_LINK = 'https://drive.google.com/file/d/1gQ3_ywJElpcBrewCFhVUM-fnV4SN62na/view?usp=sharing'
70
- #------------------------------------------------------------------------------------------------------------------#
71
-
72
- # Evaluation metric and content
73
- from sklearn.metrics import f1_score
74
- GREATER_IS_BETTER = True # example for ROC-AUC == True, for MSE == False, etc.
75
- SKLEARN_SCORER = f1_score
76
- SKLEARN_ADDITIONAL_PARAMETERS = {'average': 'weighted'}
77
-
78
- evaluation_content = """
79
- The predictions are evaluated according to the f1-score (weighted).
80
-
81
- You can get it using
82
- ```python
83
- from sklearn.metrics import f1_score
84
-
85
- f1_score(y_train, y_pred_train, average='weighted')
86
- ```
87
- More details [here](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score).
88
- """
89
- #------------------------------------------------------------------------------------------------------------------#
90
-
91
- # leaderboard benchmark score, will be displayed to everyone
92
- BENCHMARK_SCORE = 0.2
93
  #------------------------------------------------------------------------------------------------------------------#
 
1
+ # Presentation of the challenge
2
+ context_markdown = """
3
+ Manufacturing process feature selection and categorization
4
+ """
5
+ content_markdown = """
6
+ Abstract: Data from a semi-conductor manufacturing process
7
+ Data Set Characteristics: Multivariate
8
+ Number of Instances: 1567
9
+ Area: Computer
10
+ Attribute Characteristics: Real
11
+ Number of Attributes: 591
12
+ Date Donated: 2008-11-19
13
+ Associated Tasks: Classification, Causal-Discovery
14
+ Missing Values? Yes
15
+ A complex modern semi-conductor manufacturing process is normally under consistent
16
+ surveillance via the monitoring of signals/variables collected from sensors and or
17
+ process measurement points. However, not all of these signals are equally valuable
18
+ in a specific monitoring system. The measured signals contain a combination of
19
+ useful information, irrelevant information as well as noise. It is often the case
20
+ that useful information is buried in the latter two. Engineers typically have a
21
+ much larger number of signals than are actually required. If we consider each type
22
+ of signal as a feature, then feature selection may be applied to identify the most
23
+ relevant signals. The Process Engineers may then use these signals to determine key
24
+ factors contributing to yield excursions downstream in the process. This will
25
+ enable an increase in process throughput, decreased time to learning and reduce the
26
+ per unit production costs.
27
+ """
28
+ #------------------------------------------------------------------------------------------------------------------#
29
+
30
+ # Guide for the participants to get X_train, y_train and X_test
31
+ # The google link can be placed in your google drive => get the shared links and place them here.
32
+ data_instruction_commands = """
33
+ In order to get the data simply run the following command:
34
+ ```python
35
+ df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/secom/secom.data', sep=' ', header=None)
36
+ ```
37
+ Please ask the admin in order to get the target and the random seed used for train/test split.
38
+ """
39
+
40
+ # Target on test (hidden from the participants)
41
+ Y_TEST_GOOGLE_PUBLIC_LINK = 'https://drive.google.com/file/d/1-3X4eN_xk00GY4Bf6YU4mGtvQ8s_MDCQ/view?usp=sharing'
42
+ #------------------------------------------------------------------------------------------------------------------#
43
+
44
+ # Evaluation metric and content
45
+ from sklearn.metrics import precision_recall_curve as prauc
46
+ GREATER_IS_BETTER = True # example for ROC-AUC == True, for MSE == False, etc.
47
+ SKLEARN_SCORER = prauc
48
+ SKLEARN_ADDITIONAL_PARAMETERS = {}
49
+
50
+ evaluation_content = """
51
+ The predictions are evaluated according to the PR-AUC score.
52
+ You can get it using
53
+ ```python
54
+ from sklearn.metrics import average_precision_score as prauc
55
+ ```
56
+ More details [here](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html).
57
+ """
58
+ #------------------------------------------------------------------------------------------------------------------#
59
+
60
+ # leaderboard benchmark score, will be displayed to everyone
61
+ BENCHMARK_SCORE = 0.7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
  #------------------------------------------------------------------------------------------------------------------#