sunga25 commited on
Commit
0dd9f74
·
verified ·
1 Parent(s): 291224c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -3
README.md CHANGED
@@ -1,3 +1,94 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Tennis Match Win Streak Analysis
2
+
3
+ This project aims to analyze the most challenging path to a tennis historical winning-streak using machine learning and deep learning techniques. The model incorporates advanced feature engineering, data preprocessing, and a combination of neural networks and ensemble models to achieve accurate analysis of the level of competition.
4
+
5
+ ## Table of Contents
6
+
7
+ - [Features](#features)
8
+ - [Installation](#installation)
9
+ - [Usage](#usage)
10
+ - [Data](#data)
11
+ - [Model Architecture](#model-architecture)
12
+ - [Hyperparameter Tuning](#hyperparameter-tuning)
13
+ - [Results](#results)
14
+ - [Contributing](#contributing)
15
+ - [License](#license)
16
+
17
+ ## Features
18
+
19
+ - Advanced feature engineering, including rank and Elo rating differences.
20
+ - Data preprocessing with error handling and missing value management.
21
+ - PyTorch Lightning framework for building and training neural networks.
22
+ - Hyperparameter optimization using Optuna.
23
+ - Ensemble methods for improved accuracy.
24
+ - Winning streak analysis using clustering techniques.
25
+
26
+ ## Installation
27
+
28
+ To get started with this project, you need to have Python 3.x installed. You can then install the required packages using pip:
29
+ pip install -r requirements.txt
30
+
31
+ ## Usage
32
+ Place your match data CSV files (named PlayerMatches2.csv to PlayerMatches15.csv) in the project directory.
33
+
34
+ Run the script to load the data, preprocess it, and train the models:
35
+
36
+ python main.py
37
+
38
+ The script will save the best model and configuration for later analysis.
39
+
40
+ ## Data
41
+ The project uses historical player match data in CSV format. Each file should contain the following columns:
42
+
43
+ date
44
+
45
+ tournament
46
+
47
+ winner_name
48
+
49
+ winner_rank
50
+
51
+ winner_eloRating
52
+
53
+ loser_name
54
+
55
+ loser_rank
56
+
57
+ loser_eloRating
58
+
59
+ Optional columns for enhanced feature engineering can also be included.
60
+
61
+ ## Model Architecture
62
+ The project utilizes a custom neural network with:
63
+
64
+ Categorical embeddings for player names and other categorical features.
65
+
66
+ Fully connected layers to process both embedded and numerical input features.
67
+
68
+ Dropout layers for regularization.
69
+
70
+ Hyperparameter Tuning
71
+
72
+ Hyperparameter optimization is performed using Optuna, allowing for fine-tuning of:
73
+
74
+ Embedding dimensions
75
+
76
+ Hidden layer sizes
77
+
78
+ Learning rates
79
+
80
+ Dropout rates
81
+
82
+ Batch sizes
83
+
84
+ ## Results
85
+
86
+ The model's performance is evaluated using mean squared error (MSE) on the validation set. Ensemble models are also trained and compared for additional insights.
87
+
88
+ ## Contributing
89
+
90
+ Contributions are welcome! If you have suggestions for improvements or new features, feel free to submit a pull request or open an issue.
91
+
92
+ ## License
93
+
94
+ This project is licensed under the MIT License. See the LICENSE file for more information