{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "Movie_Recommender_System.ipynb", "provenance": [], "collapsed_sections": [], "toc_visible": true }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "## Importing Dependencies" ], "metadata": { "id": "wcXW74VZvGPR" } }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "kf8FPQ7NuXUQ" }, "outputs": [], "source": [ "import os\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns" ] }, { "cell_type": "markdown", "source": [ "## Loading Data" ], "metadata": { "id": "yIKFqnWgvJqe" } }, { "cell_type": "code", "source": [ "movies = pd.read_csv('tmdb_5000_movies.csv')\n", "credits = pd.read_csv('tmdb_5000_credits.csv')" ], "metadata": { "id": "zMj0obSiuugt" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "movies.head(5)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 774 }, "id": "b0nk306ivHVL", "outputId": "7e03e5d7-3d99-42e0-af81-cfd48d3d244f" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
budgetgenreshomepageidkeywordsoriginal_languageoriginal_titleoverviewpopularityproduction_companiesproduction_countriesrelease_daterevenueruntimespoken_languagesstatustaglinetitlevote_averagevote_count
0237000000[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...http://www.avatarmovie.com/19995[{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":...enAvatarIn the 22nd century, a paraplegic Marine is di...150.437577[{\"name\": \"Ingenious Film Partners\", \"id\": 289...[{\"iso_3166_1\": \"US\", \"name\": \"United States o...2009-12-102787965087162.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso...ReleasedEnter the World of Pandora.Avatar7.211800
1300000000[{\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"...http://disney.go.com/disneypictures/pirates/285[{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na...enPirates of the Caribbean: At World's EndCaptain Barbossa, long believed to be dead, ha...139.082615[{\"name\": \"Walt Disney Pictures\", \"id\": 2}, {\"...[{\"iso_3166_1\": \"US\", \"name\": \"United States o...2007-05-19961000000169.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}]ReleasedAt the end of the world, the adventure begins.Pirates of the Caribbean: At World's End6.94500
2245000000[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...http://www.sonypictures.com/movies/spectre/206647[{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name...enSpectreA cryptic message from Bond’s past sends him o...107.376788[{\"name\": \"Columbia Pictures\", \"id\": 5}, {\"nam...[{\"iso_3166_1\": \"GB\", \"name\": \"United Kingdom\"...2015-10-26880674609148.0[{\"iso_639_1\": \"fr\", \"name\": \"Fran\\u00e7ais\"},...ReleasedA Plan No One EscapesSpectre6.34466
3250000000[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam...http://www.thedarkknightrises.com/49026[{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,...enThe Dark Knight RisesFollowing the death of District Attorney Harve...112.312950[{\"name\": \"Legendary Pictures\", \"id\": 923}, {\"...[{\"iso_3166_1\": \"US\", \"name\": \"United States o...2012-07-161084939099165.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}]ReleasedThe Legend EndsThe Dark Knight Rises7.69106
4260000000[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...http://movies.disney.com/john-carter49529[{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":...enJohn CarterJohn Carter is a war-weary, former military ca...43.926995[{\"name\": \"Walt Disney Pictures\", \"id\": 2}][{\"iso_3166_1\": \"US\", \"name\": \"United States o...2012-03-07284139100132.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}]ReleasedLost in our world, found in another.John Carter6.12124
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " budget ... vote_count\n", "0 237000000 ... 11800\n", "1 300000000 ... 4500\n", "2 245000000 ... 4466\n", "3 250000000 ... 9106\n", "4 260000000 ... 2124\n", "\n", "[5 rows x 20 columns]" ] }, "metadata": {}, "execution_count": 6 } ] }, { "cell_type": "code", "source": [ "movies.info()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "NOpwsiZXvIxp", "outputId": "441ea5e4-9b15-4611-e7ea-f962906e5200" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\n", "RangeIndex: 4803 entries, 0 to 4802\n", "Data columns (total 20 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 budget 4803 non-null int64 \n", " 1 genres 4803 non-null object \n", " 2 homepage 1712 non-null object \n", " 3 id 4803 non-null int64 \n", " 4 keywords 4803 non-null object \n", " 5 original_language 4803 non-null object \n", " 6 original_title 4803 non-null object \n", " 7 overview 4800 non-null object \n", " 8 popularity 4803 non-null float64\n", " 9 production_companies 4803 non-null object \n", " 10 production_countries 4803 non-null object \n", " 11 release_date 4802 non-null object \n", " 12 revenue 4803 non-null int64 \n", " 13 runtime 4801 non-null float64\n", " 14 spoken_languages 4803 non-null object \n", " 15 status 4803 non-null object \n", " 16 tagline 3959 non-null object \n", " 17 title 4803 non-null object \n", " 18 vote_average 4803 non-null float64\n", " 19 vote_count 4803 non-null int64 \n", "dtypes: float64(3), int64(4), object(13)\n", "memory usage: 750.6+ KB\n" ] } ] }, { "cell_type": "code", "source": [ "credits.info()" ], "metadata": { "id": "7N6A5vhgvQQD", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "0bf2afa1-0904-419e-afb4-00258254bbf3" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\n", "RangeIndex: 4803 entries, 0 to 4802\n", "Data columns (total 4 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 movie_id 4803 non-null int64 \n", " 1 title 4803 non-null object\n", " 2 cast 4803 non-null object\n", " 3 crew 4803 non-null object\n", "dtypes: int64(1), object(3)\n", "memory usage: 150.2+ KB\n" ] } ] }, { "cell_type": "code", "source": [ "credits.head(5)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "id": "WJDRJz9cT5rT", "outputId": "c02326d0-79d4-4977-abd3-356a82b9cac2" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movie_idtitlecastcrew
019995Avatar[{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...[{\"credit_id\": \"52fe48009251416c750aca23\", \"de...
1285Pirates of the Caribbean: At World's End[{\"cast_id\": 4, \"character\": \"Captain Jack Spa...[{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...
2206647Spectre[{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...[{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...
349026The Dark Knight Rises[{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...[{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...
449529John Carter[{\"cast_id\": 5, \"character\": \"John Carter\", \"c...[{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " movie_id ... crew\n", "0 19995 ... [{\"credit_id\": \"52fe48009251416c750aca23\", \"de...\n", "1 285 ... [{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...\n", "2 206647 ... [{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...\n", "3 49026 ... [{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...\n", "4 49529 ... [{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...\n", "\n", "[5 rows x 4 columns]" ] }, "metadata": {}, "execution_count": 9 } ] }, { "cell_type": "markdown", "source": [ "## Merging both dataframes : Movies & Credits" ], "metadata": { "id": "VDzt49G4Uz2G" } }, { "cell_type": "code", "source": [ "movies = movies.merge(credits,on='title')" ], "metadata": { "id": "iW8RUZd7Uxdl" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "movies.shape" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "XC-LIM9YU9je", "outputId": "c5ebf6c3-a8bc-4272-b796-1d79a2892d79" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(4809, 23)" ] }, "metadata": {}, "execution_count": 11 } ] }, { "cell_type": "code", "source": [ "movies.head(1)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 231 }, "id": "zHv0h2qOVIdq", "outputId": "259f7377-8997-4699-f300-bbda80033473" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
budgetgenreshomepageidkeywordsoriginal_languageoriginal_titleoverviewpopularityproduction_companiesproduction_countriesrelease_daterevenueruntimespoken_languagesstatustaglinetitlevote_averagevote_countmovie_idcastcrew
0237000000[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...http://www.avatarmovie.com/19995[{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":...enAvatarIn the 22nd century, a paraplegic Marine is di...150.437577[{\"name\": \"Ingenious Film Partners\", \"id\": 289...[{\"iso_3166_1\": \"US\", \"name\": \"United States o...2009-12-102787965087162.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso...ReleasedEnter the World of Pandora.Avatar7.21180019995[{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...[{\"credit_id\": \"52fe48009251416c750aca23\", \"de...
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " budget ... crew\n", "0 237000000 ... [{\"credit_id\": \"52fe48009251416c750aca23\", \"de...\n", "\n", "[1 rows x 23 columns]" ] }, "metadata": {}, "execution_count": 12 } ] }, { "cell_type": "markdown", "source": [ "## Data Pre-Processing" ], "metadata": { "id": "ypuF24ZlVPCT" } }, { "cell_type": "code", "source": [ "#> important columns to be used in recommendation system : \n", "\n", "# genres\n", "# id\n", "# keywords\n", "# title\n", "# overview\n", "# cast\n", "# crew " ], "metadata": { "id": "B6fIVn5EVKZ0" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "movies = movies[['movie_id','title','overview','genres','cast','keywords','crew']]" ], "metadata": { "id": "TVwF-TuVXMiU" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "movies.head(5)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 774 }, "id": "qTgoczQ0Xd7i", "outputId": "a633c033-01ff-48d8-9ec0-4e315efcf240" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movie_idtitleoverviewgenrescastkeywordscrew
019995AvatarIn the 22nd century, a paraplegic Marine is di...[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...[{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...[{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":...[{\"credit_id\": \"52fe48009251416c750aca23\", \"de...
1285Pirates of the Caribbean: At World's EndCaptain Barbossa, long believed to be dead, ha...[{\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"...[{\"cast_id\": 4, \"character\": \"Captain Jack Spa...[{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na...[{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...
2206647SpectreA cryptic message from Bond’s past sends him o...[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...[{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...[{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name...[{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...
349026The Dark Knight RisesFollowing the death of District Attorney Harve...[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam...[{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...[{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,...[{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...
449529John CarterJohn Carter is a war-weary, former military ca...[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...[{\"cast_id\": 5, \"character\": \"John Carter\", \"c...[{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":...[{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " movie_id ... crew\n", "0 19995 ... [{\"credit_id\": \"52fe48009251416c750aca23\", \"de...\n", "1 285 ... [{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...\n", "2 206647 ... [{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...\n", "3 49026 ... [{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...\n", "4 49529 ... [{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...\n", "\n", "[5 rows x 7 columns]" ] }, "metadata": {}, "execution_count": 15 } ] }, { "cell_type": "code", "source": [ "movies.isnull().sum()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "tBFjGKgDXf1S", "outputId": "3979bc25-c437-4442-8107-1ff27c3450c0" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "movie_id 0\n", "title 0\n", "overview 3\n", "genres 0\n", "cast 0\n", "keywords 0\n", "crew 0\n", "dtype: int64" ] }, "metadata": {}, "execution_count": 16 } ] }, { "cell_type": "code", "source": [ "movies.dropna(inplace=True)" ], "metadata": { "id": "0KjODpuHYQWn" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "movies.isnull().sum()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "1NB--S0PYYH6", "outputId": "ee8ad5f8-823a-4ac7-c224-d9e21337ebb9" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "movie_id 0\n", "title 0\n", "overview 0\n", "genres 0\n", "cast 0\n", "keywords 0\n", "crew 0\n", "dtype: int64" ] }, "metadata": {}, "execution_count": 18 } ] }, { "cell_type": "code", "source": [ "movies.duplicated().sum()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "vMFZcfm6YZBJ", "outputId": "306ebffa-308b-4429-b6c9-87cd506afe89" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "0" ] }, "metadata": {}, "execution_count": 19 } ] }, { "cell_type": "code", "source": [ "movies.iloc[0].genres" ], "metadata": { "id": "h5R9lcgcYc3C", "colab": { "base_uri": "https://localhost:8080/", "height": 53 }, "outputId": "e0640c41-6947-451e-b21b-4ec51b7c66d4" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "application/vnd.google.colaboratory.intrinsic+json": { "type": "string" }, "text/plain": [ "'[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"name\": \"Fantasy\"}, {\"id\": 878, \"name\": \"Science Fiction\"}]'" ] }, "metadata": {}, "execution_count": 20 } ] }, { "cell_type": "code", "source": [ "import ast" ], "metadata": { "id": "yHh4GXph0wD-" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# extracting genres from raw data for the creation of tags\n", "\n", "def convert(obj):\n", " L = []\n", " for i in ast.literal_eval(obj):\n", " L.append(i['name'])\n", " return L" ], "metadata": { "id": "TIvEMMg60F8Q" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "movies['genres'] = movies['genres'].apply(convert)" ], "metadata": { "id": "rk1iIobi0pZx" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "movies['keywords'] = movies['keywords'].apply(convert)" ], "metadata": { "id": "gBuiUgfV0_Vt" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "movies.head(5)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 774 }, "id": "OreLM2eS1GVo", "outputId": "ee411530-d0a6-45a8-c258-bc244fb37782" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movie_idtitleoverviewgenrescastkeywordscrew
019995AvatarIn the 22nd century, a paraplegic Marine is di...[Action, Adventure, Fantasy, Science Fiction][{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...[culture clash, future, space war, space colon...[{\"credit_id\": \"52fe48009251416c750aca23\", \"de...
1285Pirates of the Caribbean: At World's EndCaptain Barbossa, long believed to be dead, ha...[Adventure, Fantasy, Action][{\"cast_id\": 4, \"character\": \"Captain Jack Spa...[ocean, drug abuse, exotic island, east india ...[{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...
2206647SpectreA cryptic message from Bond’s past sends him o...[Action, Adventure, Crime][{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...[spy, based on novel, secret agent, sequel, mi...[{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...
349026The Dark Knight RisesFollowing the death of District Attorney Harve...[Action, Crime, Drama, Thriller][{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...[dc comics, crime fighter, terrorist, secret i...[{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...
449529John CarterJohn Carter is a war-weary, former military ca...[Action, Adventure, Science Fiction][{\"cast_id\": 5, \"character\": \"John Carter\", \"c...[based on novel, mars, medallion, space travel...[{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " movie_id ... crew\n", "0 19995 ... [{\"credit_id\": \"52fe48009251416c750aca23\", \"de...\n", "1 285 ... [{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...\n", "2 206647 ... [{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...\n", "3 49026 ... [{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...\n", "4 49529 ... [{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...\n", "\n", "[5 rows x 7 columns]" ] }, "metadata": {}, "execution_count": 25 } ] }, { "cell_type": "code", "source": [ "#function for extracting top 3 actors from the movie \n", "\n", "def convert3(obj):\n", " L = []\n", " counter = 0\n", " for i in ast.literal_eval(obj):\n", " if counter !=3:\n", " L.append(i['name'])\n", " counter+=1\n", " else:\n", " break\n", " return L" ], "metadata": { "id": "Pk0AQywG1SjZ" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "movies['cast'] = movies['cast'].apply(convert3)" ], "metadata": { "id": "_rbxxI8G1y77" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "movies.head(5)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 774 }, "id": "OlYk8AmZ14H1", "outputId": "236c70c6-2234-4bc0-9173-12b7758135b5" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movie_idtitleoverviewgenrescastkeywordscrew
019995AvatarIn the 22nd century, a paraplegic Marine is di...[Action, Adventure, Fantasy, Science Fiction][Sam Worthington, Zoe Saldana, Sigourney Weaver][culture clash, future, space war, space colon...[{\"credit_id\": \"52fe48009251416c750aca23\", \"de...
1285Pirates of the Caribbean: At World's EndCaptain Barbossa, long believed to be dead, ha...[Adventure, Fantasy, Action][Johnny Depp, Orlando Bloom, Keira Knightley][ocean, drug abuse, exotic island, east india ...[{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...
2206647SpectreA cryptic message from Bond’s past sends him o...[Action, Adventure, Crime][Daniel Craig, Christoph Waltz, Léa Seydoux][spy, based on novel, secret agent, sequel, mi...[{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...
349026The Dark Knight RisesFollowing the death of District Attorney Harve...[Action, Crime, Drama, Thriller][Christian Bale, Michael Caine, Gary Oldman][dc comics, crime fighter, terrorist, secret i...[{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...
449529John CarterJohn Carter is a war-weary, former military ca...[Action, Adventure, Science Fiction][Taylor Kitsch, Lynn Collins, Samantha Morton][based on novel, mars, medallion, space travel...[{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " movie_id ... crew\n", "0 19995 ... [{\"credit_id\": \"52fe48009251416c750aca23\", \"de...\n", "1 285 ... [{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...\n", "2 206647 ... [{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...\n", "3 49026 ... [{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...\n", "4 49529 ... [{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...\n", "\n", "[5 rows x 7 columns]" ] }, "metadata": {}, "execution_count": 28 } ] }, { "cell_type": "code", "source": [ "#function to fetch the director of movie from crew column\n", "def fetch_director(obj):\n", " L = []\n", " for i in ast.literal_eval(obj):\n", " if i['job'] == 'Director':\n", " L.append(i['name'])\n", " break\n", " return L" ], "metadata": { "id": "KL9NDb_C2Joc" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "movies['crew'] = movies['crew'].apply(fetch_director)" ], "metadata": { "id": "hB6PDWNh20XA" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "movies.head(5)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 718 }, "id": "annE-rHI24jg", "outputId": "56df4ddb-e2b1-4ee6-c27f-5e683f2258a0" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movie_idtitleoverviewgenrescastkeywordscrew
019995AvatarIn the 22nd century, a paraplegic Marine is di...[Action, Adventure, Fantasy, Science Fiction][Sam Worthington, Zoe Saldana, Sigourney Weaver][culture clash, future, space war, space colon...[James Cameron]
1285Pirates of the Caribbean: At World's EndCaptain Barbossa, long believed to be dead, ha...[Adventure, Fantasy, Action][Johnny Depp, Orlando Bloom, Keira Knightley][ocean, drug abuse, exotic island, east india ...[Gore Verbinski]
2206647SpectreA cryptic message from Bond’s past sends him o...[Action, Adventure, Crime][Daniel Craig, Christoph Waltz, Léa Seydoux][spy, based on novel, secret agent, sequel, mi...[Sam Mendes]
349026The Dark Knight RisesFollowing the death of District Attorney Harve...[Action, Crime, Drama, Thriller][Christian Bale, Michael Caine, Gary Oldman][dc comics, crime fighter, terrorist, secret i...[Christopher Nolan]
449529John CarterJohn Carter is a war-weary, former military ca...[Action, Adventure, Science Fiction][Taylor Kitsch, Lynn Collins, Samantha Morton][based on novel, mars, medallion, space travel...[Andrew Stanton]
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " movie_id ... crew\n", "0 19995 ... [James Cameron]\n", "1 285 ... [Gore Verbinski]\n", "2 206647 ... [Sam Mendes]\n", "3 49026 ... [Christopher Nolan]\n", "4 49529 ... [Andrew Stanton]\n", "\n", "[5 rows x 7 columns]" ] }, "metadata": {}, "execution_count": 31 } ] }, { "cell_type": "code", "source": [ "movies['overview'] = movies['overview'].apply(lambda x:x.split())" ], "metadata": { "id": "5k6A8Whp2-n1" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "movies.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 718 }, "id": "1mKqJ2i_3KvO", "outputId": "a02b3c68-ff5f-40c2-aa10-9b1c68ec7913" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movie_idtitleoverviewgenrescastkeywordscrew
019995Avatar[In, the, 22nd, century,, a, paraplegic, Marin...[Action, Adventure, Fantasy, Science Fiction][Sam Worthington, Zoe Saldana, Sigourney Weaver][culture clash, future, space war, space colon...[James Cameron]
1285Pirates of the Caribbean: At World's End[Captain, Barbossa,, long, believed, to, be, d...[Adventure, Fantasy, Action][Johnny Depp, Orlando Bloom, Keira Knightley][ocean, drug abuse, exotic island, east india ...[Gore Verbinski]
2206647Spectre[A, cryptic, message, from, Bond’s, past, send...[Action, Adventure, Crime][Daniel Craig, Christoph Waltz, Léa Seydoux][spy, based on novel, secret agent, sequel, mi...[Sam Mendes]
349026The Dark Knight Rises[Following, the, death, of, District, Attorney...[Action, Crime, Drama, Thriller][Christian Bale, Michael Caine, Gary Oldman][dc comics, crime fighter, terrorist, secret i...[Christopher Nolan]
449529John Carter[John, Carter, is, a, war-weary,, former, mili...[Action, Adventure, Science Fiction][Taylor Kitsch, Lynn Collins, Samantha Morton][based on novel, mars, medallion, space travel...[Andrew Stanton]
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " movie_id ... crew\n", "0 19995 ... [James Cameron]\n", "1 285 ... [Gore Verbinski]\n", "2 206647 ... [Sam Mendes]\n", "3 49026 ... [Christopher Nolan]\n", "4 49529 ... [Andrew Stanton]\n", "\n", "[5 rows x 7 columns]" ] }, "metadata": {}, "execution_count": 33 } ] }, { "cell_type": "code", "source": [ "# applying a transformation to remove spaces between words \n", "\n", "movies['genres'] = movies['genres'].apply(lambda x:[i.replace(\" \",\"\") for i in x])\n", "movies['keywords'] = movies['keywords'].apply(lambda x:[i.replace(\" \",\"\") for i in x])\n", "movies['cast'] = movies['cast'].apply(lambda x:[i.replace(\" \",\"\") for i in x])\n", "movies['crew'] = movies['crew'].apply(lambda x:[i.replace(\" \",\"\") for i in x])" ], "metadata": { "id": "nEjP5OjK3P6t" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "movies.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 634 }, "id": "59I5jLlO4dHh", "outputId": "3fb010bd-21dc-4102-95d2-6639bb38ce8c" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movie_idtitleoverviewgenrescastkeywordscrew
019995Avatar[In, the, 22nd, century,, a, paraplegic, Marin...[Action, Adventure, Fantasy, ScienceFiction][SamWorthington, ZoeSaldana, SigourneyWeaver][cultureclash, future, spacewar, spacecolony, ...[JamesCameron]
1285Pirates of the Caribbean: At World's End[Captain, Barbossa,, long, believed, to, be, d...[Adventure, Fantasy, Action][JohnnyDepp, OrlandoBloom, KeiraKnightley][ocean, drugabuse, exoticisland, eastindiatrad...[GoreVerbinski]
2206647Spectre[A, cryptic, message, from, Bond’s, past, send...[Action, Adventure, Crime][DanielCraig, ChristophWaltz, LéaSeydoux][spy, basedonnovel, secretagent, sequel, mi6, ...[SamMendes]
349026The Dark Knight Rises[Following, the, death, of, District, Attorney...[Action, Crime, Drama, Thriller][ChristianBale, MichaelCaine, GaryOldman][dccomics, crimefighter, terrorist, secretiden...[ChristopherNolan]
449529John Carter[John, Carter, is, a, war-weary,, former, mili...[Action, Adventure, ScienceFiction][TaylorKitsch, LynnCollins, SamanthaMorton][basedonnovel, mars, medallion, spacetravel, p...[AndrewStanton]
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " movie_id ... crew\n", "0 19995 ... [JamesCameron]\n", "1 285 ... [GoreVerbinski]\n", "2 206647 ... [SamMendes]\n", "3 49026 ... [ChristopherNolan]\n", "4 49529 ... [AndrewStanton]\n", "\n", "[5 rows x 7 columns]" ] }, "metadata": {}, "execution_count": 35 } ] }, { "cell_type": "code", "source": [ "movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast'] + movies['crew']" ], "metadata": { "id": "BVu4ySYZ4v2Q" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "movies.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 634 }, "id": "-KNbVA_Z5CMa", "outputId": "15fa3f55-c1cb-4309-a059-0f6c109188b2" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movie_idtitleoverviewgenrescastkeywordscrewtags
019995Avatar[In, the, 22nd, century,, a, paraplegic, Marin...[Action, Adventure, Fantasy, ScienceFiction][SamWorthington, ZoeSaldana, SigourneyWeaver][cultureclash, future, spacewar, spacecolony, ...[JamesCameron][In, the, 22nd, century,, a, paraplegic, Marin...
1285Pirates of the Caribbean: At World's End[Captain, Barbossa,, long, believed, to, be, d...[Adventure, Fantasy, Action][JohnnyDepp, OrlandoBloom, KeiraKnightley][ocean, drugabuse, exoticisland, eastindiatrad...[GoreVerbinski][Captain, Barbossa,, long, believed, to, be, d...
2206647Spectre[A, cryptic, message, from, Bond’s, past, send...[Action, Adventure, Crime][DanielCraig, ChristophWaltz, LéaSeydoux][spy, basedonnovel, secretagent, sequel, mi6, ...[SamMendes][A, cryptic, message, from, Bond’s, past, send...
349026The Dark Knight Rises[Following, the, death, of, District, Attorney...[Action, Crime, Drama, Thriller][ChristianBale, MichaelCaine, GaryOldman][dccomics, crimefighter, terrorist, secretiden...[ChristopherNolan][Following, the, death, of, District, Attorney...
449529John Carter[John, Carter, is, a, war-weary,, former, mili...[Action, Adventure, ScienceFiction][TaylorKitsch, LynnCollins, SamanthaMorton][basedonnovel, mars, medallion, spacetravel, p...[AndrewStanton][John, Carter, is, a, war-weary,, former, mili...
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " movie_id ... tags\n", "0 19995 ... [In, the, 22nd, century,, a, paraplegic, Marin...\n", "1 285 ... [Captain, Barbossa,, long, believed, to, be, d...\n", "2 206647 ... [A, cryptic, message, from, Bond’s, past, send...\n", "3 49026 ... [Following, the, death, of, District, Attorney...\n", "4 49529 ... [John, Carter, is, a, war-weary,, former, mili...\n", "\n", "[5 rows x 8 columns]" ] }, "metadata": {}, "execution_count": 37 } ] }, { "cell_type": "code", "source": [ "new_df = movies[['movie_id','title','tags']]" ], "metadata": { "id": "VQbIIfro5DtT" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "new_df['tags'] = new_df['tags'].apply(lambda x:\" \".join(x))" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "dKDGEiM15Odx", "outputId": "0d8d376a-cc7b-42dc-b838-0af90f8546cd" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " \"\"\"Entry point for launching an IPython kernel.\n" ] } ] }, { "cell_type": "code", "source": [ "new_df.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 337 }, "id": "HIeNbXxa5V-u", "outputId": "9c64ebc7-6223-41aa-e974-8af23967e3cc" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movie_idtitletags
019995AvatarIn the 22nd century, a paraplegic Marine is di...
1285Pirates of the Caribbean: At World's EndCaptain Barbossa, long believed to be dead, ha...
2206647SpectreA cryptic message from Bond’s past sends him o...
349026The Dark Knight RisesFollowing the death of District Attorney Harve...
449529John CarterJohn Carter is a war-weary, former military ca...
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " movie_id ... tags\n", "0 19995 ... In the 22nd century, a paraplegic Marine is di...\n", "1 285 ... Captain Barbossa, long believed to be dead, ha...\n", "2 206647 ... A cryptic message from Bond’s past sends him o...\n", "3 49026 ... Following the death of District Attorney Harve...\n", "4 49529 ... John Carter is a war-weary, former military ca...\n", "\n", "[5 rows x 3 columns]" ] }, "metadata": {}, "execution_count": 40 } ] }, { "cell_type": "code", "source": [ "new_df['tags'][0]" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 122 }, "id": "v1j10FGB5mVj", "outputId": "78df999a-8f18-44d1-959a-649299b052e8" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "application/vnd.google.colaboratory.intrinsic+json": { "type": "string" }, "text/plain": [ "'In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization. Action Adventure Fantasy ScienceFiction cultureclash future spacewar spacecolony society spacetravel futuristic romance space alien tribe alienplanet cgi marine soldier battle loveaffair antiwar powerrelations mindandsoul 3d SamWorthington ZoeSaldana SigourneyWeaver JamesCameron'" ] }, "metadata": {}, "execution_count": 41 } ] }, { "cell_type": "code", "source": [ "new_df['tags'] = new_df['tags'].apply(lambda x:x.lower())" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "WJhTXTrM5sg7", "outputId": "079dfda5-c0e2-4232-88d5-6ea9e81633cd" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " \"\"\"Entry point for launching an IPython kernel.\n" ] } ] }, { "cell_type": "code", "source": [ "new_df['tags'][0]" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 122 }, "id": "3utNb9Jy55GL", "outputId": "ce8333e4-633d-44c4-aa69-8890a6cc8b2d" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "application/vnd.google.colaboratory.intrinsic+json": { "type": "string" }, "text/plain": [ "'in the 22nd century, a paraplegic marine is dispatched to the moon pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization. action adventure fantasy sciencefiction cultureclash future spacewar spacecolony society spacetravel futuristic romance space alien tribe alienplanet cgi marine soldier battle loveaffair antiwar powerrelations mindandsoul 3d samworthington zoesaldana sigourneyweaver jamescameron'" ] }, "metadata": {}, "execution_count": 43 } ] }, { "cell_type": "markdown", "source": [ "## Text Vectorization" ], "metadata": { "id": "JrEbOhky5_7W" } }, { "cell_type": "code", "source": [ "from sklearn.feature_extraction.text import CountVectorizer" ], "metadata": { "id": "f3GU0NQN56zF" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "cv = CountVectorizer(max_features=5000,stop_words='english')" ], "metadata": { "id": "hDu-NVx-JnlC" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "vectors = cv.fit_transform(new_df['tags']).toarray()" ], "metadata": { "id": "r0N_hmQLJ0li" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "## Most frequent 5000 words\n", "# cv.get_feature_names() " ], "metadata": { "id": "806pkYpTKJzM" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Applying Stemming Process" ], "metadata": { "id": "nSs6Q1RmKyIy" } }, { "cell_type": "code", "source": [ "import nltk #for stemming process" ], "metadata": { "id": "-HHasevjKYrt" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "from nltk.stem.porter import PorterStemmer\n", "ps = PorterStemmer()" ], "metadata": { "id": "s0cFnGkpLFnV" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "#defining the stemming function\n", "def stem(text):\n", " y=[]\n", "\n", " for i in text.split():\n", " y.append(ps.stem(i))\n", "\n", " return \" \".join(y)" ], "metadata": { "id": "wXt3DYGXLLKf" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "stem('In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization. Action Adventure Fantasy ScienceFiction cultureclash future spacewar spacecolony society spacetravel futuristic romance space alien tribe alienplanet cgi marine soldier battle loveaffair antiwar powerrelations mindandsoul 3d SamWorthington ZoeSaldana SigourneyWeaver JamesCameron')" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 122 }, "id": "ixQUKuqwMDyk", "outputId": "db1d90b3-9683-4408-bece-a4c420603409" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "application/vnd.google.colaboratory.intrinsic+json": { "type": "string" }, "text/plain": [ "'In the 22nd century, a parapleg marin is dispatch to the moon pandora on a uniqu mission, but becom torn between follow order and protect an alien civilization. action adventur fantasi sciencefict cultureclash futur spacewar spacecoloni societi spacetravel futurist romanc space alien tribe alienplanet cgi marin soldier battl loveaffair antiwar powerrel mindandsoul 3d samworthington zoesaldana sigourneyweav jamescameron'" ] }, "metadata": {}, "execution_count": 84 } ] }, { "cell_type": "code", "source": [ "new_df['tags'] = new_df['tags'].apply(stem)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "jh-NWDT3MI6k", "outputId": "c9e82664-7ea4-4cf3-846d-20357290a114" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " \"\"\"Entry point for launching an IPython kernel.\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Similarity Measures" ], "metadata": { "id": "hXieKPkWvxhZ" } }, { "cell_type": "code", "source": [ "# For calculating similarity, the cosine distance between different vectors will be used. " ], "metadata": { "id": "-OenwgbxMn-P" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "from sklearn.metrics.pairwise import cosine_similarity" ], "metadata": { "id": "415kuyAoNzp9" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "similarity = cosine_similarity(vectors)" ], "metadata": { "id": "litwuAZHN-Bc" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Making the recommendation function" ], "metadata": { "id": "mwwYXyZjOiVK" } }, { "cell_type": "code", "source": [ "def recommend(movie):\n", " movie_index = new_df[new_df['title'] == movie].index[0]\n", " distances = similarity[movie_index]\n", " movies_list = sorted(list(enumerate(distances)),reverse=True, key=lambda x:x[1])[1:6]\n", "\n", " for i in movies_list:\n", " print(new_df.iloc[i[0]].title)" ], "metadata": { "id": "2Aq8eEiROCDN" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Recommendation" ], "metadata": { "id": "CIxLV5VOwD4H" } }, { "cell_type": "code", "source": [ "recommend('Batman Begins') #enter movies only which are in the dataset, otherwise it would result in error" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ni0iCj1gOSIj", "outputId": "9974815f-92af-4229-d1d3-988fdc69eff4" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "The Dark Knight\n", "The Dark Knight Rises\n", "Batman\n", "Batman & Robin\n", "Batman\n" ] } ] }, { "cell_type": "code", "source": [ "new_df.iloc[1216]" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "fGmk7Ir5Tb2Y", "outputId": "693e020a-fd64-4a86-e15b-0f18672508c5" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "movie_id 440\n", "title Aliens vs Predator: Requiem\n", "tags a sequel to 2004' alien vs. predator, the icon...\n", "Name: 1216, dtype: object" ] }, "metadata": {}, "execution_count": 91 } ] }, { "cell_type": "markdown", "source": [ "## Exporting the Model" ], "metadata": { "id": "6S1uQYyQv__u" } }, { "cell_type": "code", "source": [ "import pickle" ], "metadata": { "id": "SfXjUkZ7Tvdv" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "pickle.dump(new_df,open('movies.pkl','wb'))" ], "metadata": { "id": "GZQnyq1AZSGx" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "pickle.dump(new_df.to_dict(),open('movie_dict.pkl','wb'))" ], "metadata": { "id": "eiQ-IgmVZZSv" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "pickle.dump(similarity,open('similarity.pkl','wb'))" ], "metadata": { "id": "96J2Ke84adA3" }, "execution_count": null, "outputs": [] } ] }