joao-victor-campos commited on
Commit
bb9369a
1 Parent(s): 7150daf

add application file

Browse files
.github/pull_request_template.md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Description
2
+
3
+ Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
4
+
5
+ Fixes # (issue)
6
+
7
+ ## Type of change
8
+
9
+ Please delete options that are not relevant.
10
+
11
+ - [ ] Bug fix (non-breaking change which fixes an issue)
12
+ - [ ] New feature (non-breaking change which adds functionality)
13
+ - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
14
+ - [ ] This change requires a documentation update
15
+
16
+ # Checklist:
17
+
18
+ - [ ] My code follows the style guidelines of this project
19
+ - [ ] I have performed a self-review of my own code
20
+ - [ ] I have commented my code, particularly in hard-to-understand areas
21
+ - [ ] I have made corresponding changes to the documentation
22
+ - [ ] My changes generate no new warnings
23
+ - [ ] I have added tests that prove my fix is effective or that my feature works
24
+ - [ ] New and existing unit tests pass locally with my changes
25
+ - [ ] Any dependent changes have been merged and published in downstream modules
.gitignore ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Byte-compiled / optimized / DLL files
2
+ pycache/
3
+ *.py[cod]
4
+ *$py.class
5
+
6
+ # C extensions
7
+ *.so
8
+
9
+ # Distribution / packaging
10
+ .Python
11
+ build/
12
+ develop-eggs/
13
+ dist/
14
+ downloads/
15
+ eggs/
16
+ .eggs/
17
+ lib/
18
+ lib64/
19
+ parts/
20
+ sdist/
21
+ var/
22
+ wheels/
23
+ pip-wheel-metadata/
24
+ share/python-wheels/
25
+ *.egg-info/
26
+ .installed.cfg
27
+ *.egg
28
+ MANIFEST
29
+
30
+ # PyInstaller
31
+ # Usually these files are written by a python script from a template
32
+ # before PyInstaller builds the exe, so as to inject date/other infos into it.
33
+ *.manifest
34
+ .spec
35
+
36
+ # Installer logs
37
+ pip-log.txt
38
+ pip-delete-this-directory.txt
39
+
40
+ # Unit test / coverage reports
41
+ htmlcov/
42
+ .tox/
43
+ .nox/
44
+ .coverage
45
+ .coverage.
46
+ .cache
47
+ nosetests.xml
48
+ coverage.xml
49
+ *.cover
50
+ *.py,cover
51
+ .hypothesis/
52
+ .pytest_cache/
53
+
54
+ # Translations
55
+ *.mo
56
+ *.pot
57
+
58
+ # Django stuff:
59
+ *.log
60
+ local_settings.py
61
+ db.sqlite3
62
+ db.sqlite3-journal
63
+
64
+ # Flask stuff:
65
+ instance/
66
+ .webassets-cache
67
+
68
+ # Scrapy stuff:
69
+ .scrapy
70
+
71
+ # Sphinx documentation
72
+ docs/_build/
73
+
74
+ # PyBuilder
75
+ target/
76
+
77
+ # Jupyter Notebook
78
+ .ipynb_checkpoints
79
+
80
+ # IPython
81
+ profile_default/
82
+ ipython_config.py
83
+
84
+ # pyenv
85
+ .python-version
86
+
87
+ # pipenv
88
+ # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
89
+ # However, in case of collaboration, if having platform-specific dependencies or dependencies
90
+ # having no cross-platform support, pipenv may install dependencies that don't work, or not
91
+ # install all needed dependencies.
92
+ #Pipfile.lock
93
+
94
+ # PEP 582; used by e.g. github.com/David-OConnor/pyflow
95
+ pypackages/
96
+
97
+ # Celery stuff
98
+ celerybeat-schedule
99
+ celerybeat.pid
100
+
101
+ # SageMath parsed files
102
+ *.sage.py
103
+
104
+ # Environments
105
+ .env
106
+ .venv
107
+ env/
108
+ venv/
109
+ ENV/
110
+ env.bak/
111
+ venv.bak/
112
+
113
+ # Spyder project settings
114
+ .spyderproject
115
+ .spyproject
116
+
117
+ # Rope project settings
118
+ .ropeproject
119
+
120
+ # mkdocs documentation
121
+ /site
122
+
123
+ # mypy
124
+ .mypy_cache/
125
+ .dmypy.json
126
+ dmypy.json
127
+
128
+ # Pyre type checker
129
+ .pyre/
130
+
131
+ .vscode/
132
+
133
+ .deb
134
+
135
+ data/output/
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2022 João Victor Campos
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
Makefile ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # globals
2
+ VERSION := $(shell grep __version__ recommendation_app/__metadata__.py | head -1 | cut -d \" -f2 | cut -d \' -f2)
3
+
4
+ .PHONY: requirements-dev
5
+ ## install development requirements
6
+ requirements-dev:
7
+ @python -m pip install -U -r requirements.dev.txt
8
+
9
+ .PHONY: requirements-minimum
10
+ ## install prod requirements
11
+ requirements-minimum:
12
+ @python -m pip install -U -r requirements.txt
13
+
14
+ .PHONY: requirements
15
+ ## install requirements
16
+ requirements: requirements-dev requirements-minimum
17
+
18
+ .PHONY: style-check
19
+ ## run code style checks with black
20
+ style-check:
21
+ @echo ""
22
+ @echo "Code Style"
23
+ @echo "=========="
24
+ @echo ""
25
+ @python -m black --check --exclude="build/|buck-out/|dist/|_build/|pip/|\.pip/|\.git/|\.hg/|\.mypy_cache/|\.tox/|\.venv/" . && echo "\n\nSuccess" || (echo "\n\nFailure\n\nRun \"make black\" to apply style formatting to your code"; exit 1)
26
+
27
+ .PHONY: quality-check
28
+ ## run code quality checks with flake8
29
+ quality-check:
30
+ @echo ""
31
+ @echo "Flake 8"
32
+ @echo "======="
33
+ @echo ""
34
+ @python -m flake8 && echo "Success"
35
+ @echo ""
36
+
37
+ .PHONY: type-check
38
+ ## run code type checks with mypy
39
+ type-check:
40
+ @echo ""
41
+ @echo "Mypy"
42
+ @echo "======="
43
+ @echo ""
44
+ @python -m mypy --install-types --non-interactive recommendation_app && echo "Success"
45
+ @echo ""
46
+
47
+ .PHONY: checks
48
+ ## run all code checks
49
+ checks: style-check quality-check type-check
50
+
51
+ .PHONY: apply-style
52
+ ## fix stylistic errors with black and isort
53
+ apply-style:
54
+ @python -m black --exclude="build/|buck-out/|dist/|_build/|pip/|\\.pip/|\.git/|\.hg/|\.mypy_cache/|\.tox/|\.venv/" .
55
+ @python -m isort recommendation_app/ tests/
app.py ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import pandas as pd
3
+ from recommendation_app.core.data_handler.data_handler import DataHandler
4
+ from recommendation_app.core.model import Model
5
+
6
+ PATH = "../netflix-recommendation-app/data/output/df_titles.csv"
7
+ df2 = pd.read_csv(PATH)
8
+ movie_names = df2["title"].tolist()
9
+
10
+
11
+ def gradio(movie_name, n_rec):
12
+ if __name__ == "__main__":
13
+ PATH = "../netflix-recommendation-app/data/output/df_titles.csv"
14
+ features = [
15
+ "type",
16
+ "release_year",
17
+ "age_certification",
18
+ "runtime",
19
+ "seasons",
20
+ "imdb_score",
21
+ "tmdb_popularity",
22
+ "tmdb_score",
23
+ "genres_transformed",
24
+ "production_countries_transformed",
25
+ ]
26
+ df = pd.read_csv(PATH)
27
+ df_model = df.copy()
28
+ df_model = df_model[features]
29
+ x = DataHandler(df_model)
30
+ numeric_features = [
31
+ "release_year",
32
+ "runtime",
33
+ "seasons",
34
+ "imdb_score",
35
+ "tmdb_popularity",
36
+ "tmdb_score",
37
+ ]
38
+ x.normalize(numeric_features)
39
+ categorical_features = [
40
+ "age_certification",
41
+ "type",
42
+ "genres_transformed",
43
+ "production_countries_transformed",
44
+ ]
45
+ x.one_hot_encode(categorical_features)
46
+ # print(x.one_hot_encode(categorical_features))
47
+ # print(x.df)
48
+ mdl = Model(x.df)
49
+ n_rec = int(n_rec)
50
+ movie_name = str(movie_name)
51
+ movie_id = df.index[df["title"] == movie_name].tolist()
52
+ print(movie_id)
53
+ recommendations = mdl.recommend(movie_id, n_rec)
54
+ top_index = list(recommendations.index)[1:]
55
+ print(df[["title", "description"]].loc[top_index])
56
+ return df[["title", "description"]].loc[top_index]
57
+
58
+
59
+ app = gr.Interface(
60
+ fn=gradio,
61
+ inputs=[gr.Dropdown(choices=movie_names), gr.inputs.Number()],
62
+ outputs=[gr.outputs.Dataframe()],
63
+ )
64
+ app.launch()
data/input/credits.csv ADDED
The diff for this file is too large to render. See raw diff
 
data/input/titles.csv ADDED
The diff for this file is too large to render. See raw diff
 
recommendation/data_exploration.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
recommendation_app/__init__.py ADDED
File without changes
recommendation_app/__metadata__.py ADDED
File without changes
recommendation_app/core/__inity__.py ADDED
File without changes
recommendation_app/core/data_handler/__inity__.py ADDED
File without changes
recommendation_app/core/data_handler/data_handler.py ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List
2
+
3
+ import numpy as np
4
+ import pandas as pd
5
+ from sklearn import preprocessing
6
+
7
+
8
+ class DataHandler:
9
+ def __init__(self, df: pd.DataFrame) -> None:
10
+ self.df = df
11
+
12
+ def normalize(self, features: List) -> pd.DataFrame:
13
+ """Normalize a list of features from the DataFrame inplace.
14
+
15
+ Args:
16
+ df (pd.DataFrame): DataFrame to normalize the columns.
17
+ features (List): List of DataFrame column names.
18
+
19
+ Returns:
20
+ pd.DataFrame: DataFrame with normalized columns.
21
+ """
22
+ normalized_arr = preprocessing.normalize(self.df[features], axis=0)
23
+ self.df[features] = normalized_arr
24
+ return self.df
25
+
26
+ def one_hot_encode(self, features: List) -> pd.DataFrame:
27
+ """One Hot Encode a list of features from the DataFrame inplace.
28
+
29
+ Args:
30
+ df (pd.DataFrame): DataFrame to one hot encode the columns.
31
+ features (List): List of DataFrame column names.
32
+
33
+ Returns:
34
+ pd.DataFrame: DataFrame with one hot encoded columns.
35
+ """
36
+ for i in features:
37
+ ohe_df = pd.get_dummies(self.df[i])
38
+ print(ohe_df)
39
+ ohe_df.reset_index(drop=True, inplace=True)
40
+ self.df = pd.concat([self.df, ohe_df], axis=1)
41
+ self.df.drop(columns=i, inplace=True)
42
+ return self.df
recommendation_app/core/model.py ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from array import array
2
+
3
+ import numpy as np
4
+ import pandas as pd
5
+ from sklearn.metrics.pairwise import cosine_similarity
6
+
7
+
8
+ class Model:
9
+ def __init__(self, df: pd.DataFrame):
10
+ self.df = df
11
+
12
+ def movie_similarity(self, chosen_movie: array, sim_movies: array) -> array:
13
+ """Calculate the cosine similarity between two vectors.
14
+ Args:
15
+ chosen_movie (array): Array with all information about the movie chosen by the user.
16
+ sim_movies (array): n dimensions array with all movies.
17
+ Returns:
18
+ array: Returns the cosine similarity between chosen_movie and sim_array.
19
+ """
20
+ chosen_movie = chosen_movie.reshape(1, -1)
21
+ # sim_movies = sim_movies.reshape(-1, 6)
22
+ return cosine_similarity(chosen_movie, sim_movies, dense_output=True)
23
+
24
+ def recommend(self, movie_id: str, n_rec: int) -> pd.DataFrame:
25
+ """Returns nlargest similarity movies based on movie_id.
26
+ Args:
27
+ movie_id (str): Name of the movie to be compared.
28
+ n_rec (int): Number of movies the user wants.
29
+ Returns:
30
+ pd.DataFrame: Dataframe with the n_rec recommendations.
31
+ """
32
+ movie_info = self.df.loc[movie_id].values
33
+ x = self.movie_similarity(movie_info, self.df.values)
34
+
35
+ # x.reshape(1, -1)
36
+ y = x.tolist()[0]
37
+ print(y)
38
+ self.df["similarity"] = y
39
+ print(self.df)
40
+ # movie_info = self.df.loc[movie_id].values
41
+ # self.df['similarity'] = self.df.apply(self.movie_similarity(movie_info,
42
+ # self.df.values)))
43
+
44
+ return self.df.nlargest(columns="similarity", n=n_rec + 1)
recommendation_app/main.py ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import pandas as pd
3
+ from core.data_handler.data_handler import DataHandler
4
+ from core.model import Model
5
+
6
+ PATH = "../netflix-recommendation-app/data/output/df_titles.csv"
7
+ df2 = pd.read_csv(PATH)
8
+ movie_names = df2["title"].tolist()
9
+
10
+
11
+ def gradio(movie_name, n_rec):
12
+ if __name__ == "__main__":
13
+ PATH = "../netflix-recommendation-app/data/output/df_titles.csv"
14
+ features = [
15
+ "type",
16
+ "release_year",
17
+ "age_certification",
18
+ "runtime",
19
+ "seasons",
20
+ "imdb_score",
21
+ "tmdb_popularity",
22
+ "tmdb_score",
23
+ "genres_transformed",
24
+ "production_countries_transformed",
25
+ ]
26
+ df = pd.read_csv(PATH)
27
+ df_model = df.copy()
28
+ df_model = df_model[features]
29
+ x = DataHandler(df_model)
30
+ numeric_features = [
31
+ "release_year",
32
+ "runtime",
33
+ "seasons",
34
+ "imdb_score",
35
+ "tmdb_popularity",
36
+ "tmdb_score",
37
+ ]
38
+ x.normalize(numeric_features)
39
+ categorical_features = [
40
+ "age_certification",
41
+ "type",
42
+ "genres_transformed",
43
+ "production_countries_transformed",
44
+ ]
45
+ x.one_hot_encode(categorical_features)
46
+ # print(x.one_hot_encode(categorical_features))
47
+ # print(x.df)
48
+ mdl = Model(x.df)
49
+ n_rec = int(n_rec)
50
+ movie_name = str(movie_name)
51
+ movie_id = df.index[df["title"] == movie_name].tolist()
52
+ print(movie_id)
53
+ recommendations = mdl.recommend(movie_id, n_rec)
54
+ top_index = list(recommendations.index)[1:]
55
+ print(df[["title", "description"]].loc[top_index])
56
+ return df[["title", "description"]].loc[top_index]
57
+
58
+
59
+ app = gr.Interface(
60
+ fn=gradio,
61
+ inputs=[gr.Dropdown(choices=movie_names), gr.inputs.Number()],
62
+ outputs=[gr.outputs.Dataframe()],
63
+ )
64
+ app.launch()
requirements.dev.txt ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # setup
2
+ setuptools
3
+ wheel
4
+
5
+ # tests
6
+ pytest
7
+ pytest-cov
8
+
9
+ # code quality
10
+ black
11
+ isort
12
+ flake8
13
+ flake8-isort
14
+ flake8-docstrings
15
+ pep8-naming
16
+ mypy
17
+ black[jupyter]
requirements.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ipykernel==6.15.1
2
+ pandas
3
+ seaborn
4
+ matplotlib
5
+ pandas_profiling
6
+ ipywidgets
7
+ plotly
8
+ sklearn
9
+ numpy
10
+ jupyter
11
+ gradio
tests/test_data_handler.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ df3 = pd.DataFrame(
2
+ [["c", 3, 10, "cat"], ["d", 4, 50, "dog"]],
3
+ columns=["letter", "number", "number2", "animal"],
4
+ )
5
+
6
+
7
+ x = DataHandler(df3)
8
+ x.normalize(["number", "number2"])
9
+ print(x.one_hot_encode(["letter", "animal"]))