Spaces:
Sleeping
Sleeping
Update config.py
Browse files
config.py
CHANGED
@@ -1,93 +1,62 @@
|
|
1 |
-
# Presentation of the challenge
|
2 |
-
context_markdown = """
|
3 |
-
|
4 |
-
|
5 |
-
"""
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
import
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
https://127.0.0.1:8888/?token=7de849a953befd20682d57ac33b3e6cd9024ca25eed2433
|
63 |
-
|
64 |
-
Then replace 127.0.0.1 with your I.P. e.g
|
65 |
-
https://1.222.333.4:8888/?token=7de849a953befd20682d57ac33b3e6cd9024ca25eed24336
|
66 |
-
"""
|
67 |
-
|
68 |
-
# Target on test (hidden from the participants)
|
69 |
-
Y_TEST_GOOGLE_PUBLIC_LINK = 'https://drive.google.com/file/d/1gQ3_ywJElpcBrewCFhVUM-fnV4SN62na/view?usp=sharing'
|
70 |
-
#------------------------------------------------------------------------------------------------------------------#
|
71 |
-
|
72 |
-
# Evaluation metric and content
|
73 |
-
from sklearn.metrics import f1_score
|
74 |
-
GREATER_IS_BETTER = True # example for ROC-AUC == True, for MSE == False, etc.
|
75 |
-
SKLEARN_SCORER = f1_score
|
76 |
-
SKLEARN_ADDITIONAL_PARAMETERS = {'average': 'weighted'}
|
77 |
-
|
78 |
-
evaluation_content = """
|
79 |
-
The predictions are evaluated according to the f1-score (weighted).
|
80 |
-
|
81 |
-
You can get it using
|
82 |
-
```python
|
83 |
-
from sklearn.metrics import f1_score
|
84 |
-
|
85 |
-
f1_score(y_train, y_pred_train, average='weighted')
|
86 |
-
```
|
87 |
-
More details [here](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score).
|
88 |
-
"""
|
89 |
-
#------------------------------------------------------------------------------------------------------------------#
|
90 |
-
|
91 |
-
# leaderboard benchmark score, will be displayed to everyone
|
92 |
-
BENCHMARK_SCORE = 0.2
|
93 |
#------------------------------------------------------------------------------------------------------------------#
|
|
|
1 |
+
# Presentation of the challenge
|
2 |
+
context_markdown = """
|
3 |
+
Manufacturing process feature selection and categorization
|
4 |
+
"""
|
5 |
+
content_markdown = """
|
6 |
+
Abstract: Data from a semi-conductor manufacturing process
|
7 |
+
Data Set Characteristics: Multivariate
|
8 |
+
Number of Instances: 1567
|
9 |
+
Area: Computer
|
10 |
+
Attribute Characteristics: Real
|
11 |
+
Number of Attributes: 591
|
12 |
+
Date Donated: 2008-11-19
|
13 |
+
Associated Tasks: Classification, Causal-Discovery
|
14 |
+
Missing Values? Yes
|
15 |
+
A complex modern semi-conductor manufacturing process is normally under consistent
|
16 |
+
surveillance via the monitoring of signals/variables collected from sensors and or
|
17 |
+
process measurement points. However, not all of these signals are equally valuable
|
18 |
+
in a specific monitoring system. The measured signals contain a combination of
|
19 |
+
useful information, irrelevant information as well as noise. It is often the case
|
20 |
+
that useful information is buried in the latter two. Engineers typically have a
|
21 |
+
much larger number of signals than are actually required. If we consider each type
|
22 |
+
of signal as a feature, then feature selection may be applied to identify the most
|
23 |
+
relevant signals. The Process Engineers may then use these signals to determine key
|
24 |
+
factors contributing to yield excursions downstream in the process. This will
|
25 |
+
enable an increase in process throughput, decreased time to learning and reduce the
|
26 |
+
per unit production costs.
|
27 |
+
"""
|
28 |
+
#------------------------------------------------------------------------------------------------------------------#
|
29 |
+
|
30 |
+
# Guide for the participants to get X_train, y_train and X_test
|
31 |
+
# The google link can be placed in your google drive => get the shared links and place them here.
|
32 |
+
data_instruction_commands = """
|
33 |
+
In order to get the data simply run the following command:
|
34 |
+
```python
|
35 |
+
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/secom/secom.data', sep=' ', header=None)
|
36 |
+
```
|
37 |
+
Please ask the admin in order to get the target and the random seed used for train/test split.
|
38 |
+
"""
|
39 |
+
|
40 |
+
# Target on test (hidden from the participants)
|
41 |
+
Y_TEST_GOOGLE_PUBLIC_LINK = 'https://drive.google.com/file/d/1-3X4eN_xk00GY4Bf6YU4mGtvQ8s_MDCQ/view?usp=sharing'
|
42 |
+
#------------------------------------------------------------------------------------------------------------------#
|
43 |
+
|
44 |
+
# Evaluation metric and content
|
45 |
+
from sklearn.metrics import precision_recall_curve as prauc
|
46 |
+
GREATER_IS_BETTER = True # example for ROC-AUC == True, for MSE == False, etc.
|
47 |
+
SKLEARN_SCORER = prauc
|
48 |
+
SKLEARN_ADDITIONAL_PARAMETERS = {}
|
49 |
+
|
50 |
+
evaluation_content = """
|
51 |
+
The predictions are evaluated according to the PR-AUC score.
|
52 |
+
You can get it using
|
53 |
+
```python
|
54 |
+
from sklearn.metrics import average_precision_score as prauc
|
55 |
+
```
|
56 |
+
More details [here](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html).
|
57 |
+
"""
|
58 |
+
#------------------------------------------------------------------------------------------------------------------#
|
59 |
+
|
60 |
+
# leaderboard benchmark score, will be displayed to everyone
|
61 |
+
BENCHMARK_SCORE = 0.7
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
62 |
#------------------------------------------------------------------------------------------------------------------#
|