NBA Shots Prediction
The main objective of this project is the prediction of shooting performances of NBA players.
Project Organization
βββ data
βββ download_data.py
βββ LICENSE
βββ Notebooks
β βββ Dataframe generation and preparation.ipynb
β βββ Exploratory Data Analysis.ipynb
β βββ Models Training.ipynb
βββ README.md
βββ References
β βββ Rapport Final.docx
βββ requirements.txt
βββ streamlit_nba
β βββ app.py
β βββ a_re.png
β βββ assets
β βββ Ballon.png
β βββ github-mark.png
β βββ github-mark-white.png
β βββ gradient_boosting_features.png
β βββ Image panier.png
β βββ __pycache__
β β βββ config.cpython-310.pyc
β β βββ member.cpython-310.pyc
β βββ style.css
β βββ tabs
β β βββ conclusions.py
β β βββ credits.py
β β βββ Exploratory_data_analysis.py
β β βββ intro.py
β β βββ machine_learning_tab.py
β β βββ pred_own_shot.py
β βββ XGboost_features.png
About this project
This project focuses on analyzing the shooting performance of players in the National Basketball Association (NBA) using data on shots taken between 1997 and 2020. The main objective is to develop a classification model to predict the probability of a shot being successful. The data includes information such as the location of shots on the court, variables related to shooting actions, and other player characteristics.
The team used various models, including XGBoost, to train the prediction model. The analysis reveals that the model shows variable accuracy depending on the shot class, with better performance in predicting missed shots. The importance of variables, such as action type and shot distance, was examined to interpret the results.
Improvement perspectives were explored, including training individual models for each player, which showed an increase in accuracy. The addition of new variables, such as defender distance and ball possession time, was also considered to enhance the model's performance. Despite challenges related to data availability and computing power constraints, the project resulted in a model capable of reasonably accurately predicting made and missed shots, providing valuable insights into factors influencing outcomes in professional basketball.
We invite you to read the entire report (References/Rapport Final.docx) for more details.
How to use it ?
1 - set the environment
Clone the repository
Create a virtual environement by running the following command :
virutalenv venv
source venv/bin/activate
pip install -m requirements.txt
2 - Download the data
First solution : download from the script
To automatically download all the datasets and the models you will need to get an access token. Please contact @willymaillot87.
Once you get the token execute the script 'download_data.py'
Second Solution : manually download the datasets and create the models
Download all the necessary CSVs from the following links to the folder 'data'. You may need to create an account to kaggle to download those datasets :
- 'Seasons_Stats.csv' : https://www.kaggle.com/datasets/drgilermo/nba-players-stats?select=Seasons_Stats.csv
- 'player_data.csv' : https://www.kaggle.com/datasets/drgilermo/nba-players-stats?select=player_data.csv
- 'NBA Shot Locations 1997 - 2020.csv' : https://www.kaggle.com/datasets/jonathangmwl/nba-shot-locations?resource=download
- 'teams.csv' : https://www.kaggle.com/datasets/nathanlauga/nba-games?select=teams.csv
Once you get the datasets just run all the cells from the notebooks:
- 'Notebooks/Dataframe generation and preparation.ipynb'
- 'Notebooks/Models Training.ipynb'
3 - Run the streamlit app
In order to access to the streamlit app please run the following commands :
cd streamlit_app
streamlit run app.py
Open your browser to this adress :
localhost:8501