Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,94 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Tennis Match Win Streak Analysis
|
2 |
+
|
3 |
+
This project aims to analyze the most challenging path to a tennis historical winning-streak using machine learning and deep learning techniques. The model incorporates advanced feature engineering, data preprocessing, and a combination of neural networks and ensemble models to achieve accurate analysis of the level of competition.
|
4 |
+
|
5 |
+
## Table of Contents
|
6 |
+
|
7 |
+
- [Features](#features)
|
8 |
+
- [Installation](#installation)
|
9 |
+
- [Usage](#usage)
|
10 |
+
- [Data](#data)
|
11 |
+
- [Model Architecture](#model-architecture)
|
12 |
+
- [Hyperparameter Tuning](#hyperparameter-tuning)
|
13 |
+
- [Results](#results)
|
14 |
+
- [Contributing](#contributing)
|
15 |
+
- [License](#license)
|
16 |
+
|
17 |
+
## Features
|
18 |
+
|
19 |
+
- Advanced feature engineering, including rank and Elo rating differences.
|
20 |
+
- Data preprocessing with error handling and missing value management.
|
21 |
+
- PyTorch Lightning framework for building and training neural networks.
|
22 |
+
- Hyperparameter optimization using Optuna.
|
23 |
+
- Ensemble methods for improved accuracy.
|
24 |
+
- Winning streak analysis using clustering techniques.
|
25 |
+
|
26 |
+
## Installation
|
27 |
+
|
28 |
+
To get started with this project, you need to have Python 3.x installed. You can then install the required packages using pip:
|
29 |
+
pip install -r requirements.txt
|
30 |
+
|
31 |
+
## Usage
|
32 |
+
Place your match data CSV files (named PlayerMatches2.csv to PlayerMatches15.csv) in the project directory.
|
33 |
+
|
34 |
+
Run the script to load the data, preprocess it, and train the models:
|
35 |
+
|
36 |
+
python main.py
|
37 |
+
|
38 |
+
The script will save the best model and configuration for later analysis.
|
39 |
+
|
40 |
+
## Data
|
41 |
+
The project uses historical player match data in CSV format. Each file should contain the following columns:
|
42 |
+
|
43 |
+
date
|
44 |
+
|
45 |
+
tournament
|
46 |
+
|
47 |
+
winner_name
|
48 |
+
|
49 |
+
winner_rank
|
50 |
+
|
51 |
+
winner_eloRating
|
52 |
+
|
53 |
+
loser_name
|
54 |
+
|
55 |
+
loser_rank
|
56 |
+
|
57 |
+
loser_eloRating
|
58 |
+
|
59 |
+
Optional columns for enhanced feature engineering can also be included.
|
60 |
+
|
61 |
+
## Model Architecture
|
62 |
+
The project utilizes a custom neural network with:
|
63 |
+
|
64 |
+
Categorical embeddings for player names and other categorical features.
|
65 |
+
|
66 |
+
Fully connected layers to process both embedded and numerical input features.
|
67 |
+
|
68 |
+
Dropout layers for regularization.
|
69 |
+
|
70 |
+
Hyperparameter Tuning
|
71 |
+
|
72 |
+
Hyperparameter optimization is performed using Optuna, allowing for fine-tuning of:
|
73 |
+
|
74 |
+
Embedding dimensions
|
75 |
+
|
76 |
+
Hidden layer sizes
|
77 |
+
|
78 |
+
Learning rates
|
79 |
+
|
80 |
+
Dropout rates
|
81 |
+
|
82 |
+
Batch sizes
|
83 |
+
|
84 |
+
## Results
|
85 |
+
|
86 |
+
The model's performance is evaluated using mean squared error (MSE) on the validation set. Ensemble models are also trained and compared for additional insights.
|
87 |
+
|
88 |
+
## Contributing
|
89 |
+
|
90 |
+
Contributions are welcome! If you have suggestions for improvements or new features, feel free to submit a pull request or open an issue.
|
91 |
+
|
92 |
+
## License
|
93 |
+
|
94 |
+
This project is licensed under the MIT License. See the LICENSE file for more information
|