jeremyadd commited on
Commit
13618cf
·
verified ·
1 Parent(s): b5f6a08

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -107
README.md CHANGED
@@ -1,107 +1,79 @@
1
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
2
-
3
- [Heroku web app](https://minidatathon.herokuapp.com/)
4
-
5
- ![](mini_datathon.gif)
6
-
7
- # Mini Datathon
8
-
9
- This datathon platform is fully developped in python using *streamlit* with very few lines of code!
10
-
11
- As written in the title, it is designed for *small datathon* (but can easily scale) and the scripts are easy to understand.
12
-
13
- ## Installation
14
-
15
- 1) Easy way => using docker hub:
16
- `docker pull spotep/mini_datathon:latest`
17
-
18
- 2) Alternative way => clone the repo into your server:
19
- `git clone mini_datathon; cd mini_datathon`
20
-
21
- ## Usage
22
-
23
- You need 3 simple steps to setup your mini hackathon:
24
-
25
- 1) Edit the password of the **admin** user in [users.csv](users.csv) and the login & passwords for the participants
26
- 2) Edit the [config.py](config.py) file\
27
- a) The **presentation** & the **context** of the challenge \
28
- b) The **data content** and `X_train`, `y_train`, `X_test` & `y_test` that you can upload on google drive and just **share the links**. \
29
- c) The **evaluation metric** & **benchmark score**
30
- 3) Run the scripts\
31
- a) If you installed it the _alternative way_: `streamlit run main.py` \
32
- b) If you pulled the docker image, just **build** and **run** the container.
33
-
34
- Please do not forget to notify the participants that the submission file need to be a csv **ordered the same way as given
35
- in `y_train`**.
36
-
37
- _Ps: anytime the admin user has the possibility to **pause** the challenge, in that case the participants won't be able to upload their submissions._
38
-
39
- ## Example
40
-
41
- An example version of the code is deployed on heroku here: [web app](https://minidatathon.herokuapp.com/)
42
-
43
- In the deployed version, we have the [UCI Secom](https://archive.ics.uci.edu/ml/datasets/SECOM)
44
- imbalanced dataset (binary classification) and evaluated by the [PR-AUC score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html#sklearn.metrics.average_precision_score):
45
-
46
- in the [config.py](config.py) file you would need to fill the following parameters:
47
-
48
- - `GREATER_IS_BETTER = True`
49
- - `SKLEARN_SCORER = average_precision_score`
50
- - `SKLEARN_ADDITIONAL_PARAMETERS = {'average': 'micro'}`
51
- - upload the relevant data the your Google Drive & share the links.
52
-
53
- ## Behind the scenes
54
- ### Databases
55
- The platform needs only 2 components to be saved:
56
- #### The leaderboard
57
- The leaderboard is in fact a csv file that is being updated everytime a user submit predictions.
58
- The csv file contains 4 columns:
59
- - _id_: the login of the team
60
- - _score_: the **best** score of the team
61
- - _nb\_submissions_: the number of submissions the team uploads
62
- - _rank_: the live rank of the team
63
-
64
- We will have only 1 row per team since only the best score is being saved.
65
-
66
- By default, a benchmark score is pushed to the leaderboard:
67
-
68
- | id | score |
69
- |-----------|-------|
70
- | benchmark | 0.6 |
71
-
72
- For more details, please refer to the script [leaderboard](leaderboard.py).
73
-
74
- #### The users
75
- Like the leaderboard, it is a csv file.
76
- It is supposed to be defined by the admin of the competition.
77
- It contains 2 columns:
78
- - login
79
- - password
80
-
81
- A default user is created at first to begin to play with the platform:
82
-
83
- | login | password |
84
- |-----------|----------|
85
- | admin | password |
86
-
87
- In order to add new participants, simply add rows to the current users.csv file.
88
-
89
- For more details, please refer to the script [users](users.py).
90
-
91
- ## Next steps
92
-
93
- - [ ] allow to have a *private* and *public* leaderboard like it is done on kaggle.com
94
- - [ ] allow to connect using oauth
95
-
96
-
97
- ## License
98
- MIT License [here](LICENSE).
99
-
100
- ## Credits
101
- We could not find an easy implementation for our yearly internal hackathon at Intel.
102
- The idea originally came from my dear devops coworker [Elhay Efrat](https://github.com/shdowofdeath)
103
- and I took the responsability to develop it.
104
-
105
- If you like this project, let me know by [buying me a coffee](https://www.buymeacoffee.com/jeremyatia) :)
106
-
107
- <a href="https://www.buymeacoffee.com/jeremyatia" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 100px !important;width: 300px !important;" ></a>
 
1
+ ---
2
+ title: Mini Datathon
3
+ emoji: 💙
4
+ colorFrom: blue
5
+ colorTo: indigo
6
+ sdk: streamlit
7
+ sdk_version: "1.11.0"
8
+ app_file: app.py
9
+ pinned: true
10
+ ---
11
+
12
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
13
+
14
+
15
+ ![](mini_datathon.gif)
16
+
17
+ # Mini Datathon
18
+
19
+ This datathon platform is fully developped in python using *streamlit* with very few lines of code!
20
+
21
+ As written in the title, it is designed for *small datathon* (but can easily scale) and the scripts are easy to understand.
22
+
23
+ ## Example
24
+
25
+ In the deployed version, we have the [UCI Secom](https://archive.ics.uci.edu/ml/datasets/SECOM)
26
+ imbalanced dataset (binary classification) and evaluated by the [PR-AUC score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html#sklearn.metrics.average_precision_score):
27
+
28
+ in the [config.py](config.py) file you would need to fill the following parameters:
29
+
30
+ - `GREATER_IS_BETTER = True`
31
+ - `SKLEARN_SCORER = average_precision_score`
32
+ - `SKLEARN_ADDITIONAL_PARAMETERS = {'average': 'micro'}`
33
+ - upload the relevant data the your Google Drive & share the links.
34
+
35
+ ## Behind the scenes
36
+ ### Databases
37
+ The platform needs only 2 components to be saved:
38
+ #### The leaderboard
39
+ The leaderboard is in fact a csv file that is being updated everytime a user submit predictions.
40
+ The csv file contains 4 columns:
41
+ - _id_: the login of the team
42
+ - _score_: the **best** score of the team
43
+ - _nb\_submissions_: the number of submissions the team uploads
44
+ - _rank_: the live rank of the team
45
+
46
+ We will have only 1 row per team since only the best score is being saved.
47
+
48
+ By default, a benchmark score is pushed to the leaderboard:
49
+
50
+ | id | score |
51
+ |-----------|-------|
52
+ | benchmark | 0.6 |
53
+
54
+ For more details, please refer to the script [leaderboard](leaderboard.py).
55
+
56
+ #### The users
57
+ Like the leaderboard, it is a csv file.
58
+ It is supposed to be defined by the admin of the competition.
59
+ It contains 2 columns:
60
+ - login
61
+ - password
62
+
63
+ A default user is created at first to begin to play with the platform:
64
+
65
+ | login | password |
66
+ |-----------|----------|
67
+ | admin | password |
68
+
69
+ In order to add new participants, simply add rows to the current users.csv file.
70
+
71
+ For more details, please refer to the script [users](users.py).
72
+
73
+ ## License
74
+ MIT License [here](LICENSE).
75
+
76
+
77
+ If you like this project, let me know by [buying me a coffee](https://www.buymeacoffee.com/jeremyatia) :)
78
+
79
+ <a href="https://www.buymeacoffee.com/jeremyatia" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 100px !important;width: 300px !important;" ></a>