Fake_News_Classification_with_Distilbert / Copie_of_Coding_Challenge_for_Fatima_Fellowship
urielnguefack's picture
Upload Copie_of_Coding_Challenge_for_Fatima_Fellowship
7d7bfeb
{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"Copie de Coding Challenge for Fatima Fellowship","provenance":[{"file_id":"1SrD6jmZQc-gZky9-6r7JP518haXX6Dpo","timestamp":1649120015242}],"collapsed_sections":[]},"kernelspec":{"name":"python3","display_name":"Python 3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"eBpjBBZc6IvA"},"source":["# Fatima Fellowship Quick Coding Challenge (Pick 1)\n","\n","Thank you for applying to the Fatima Fellowship. To help us select the Fellows and assess your ability to do machine learning research, we are asking that you complete a short coding challenge. Please pick **1 of these 5** coding challenges, whichever is most aligned with your interests. \n","\n","**Due date: 1 week**\n","\n","**How to submit**: Please make a copy of this colab notebook, add your code and results, and submit your colab notebook to the submission link below. If you have never used a colab notebook, [check out this video](https://www.youtube.com/watch?v=i-HnvsehuSw).\n","\n","**Submission link**: https://airtable.com/shrXy3QKSsO2yALd3"]},{"cell_type":"markdown","metadata":{"id":"braBzmRpMe7_"},"source":["# 1. Deep Learning for Vision"]},{"cell_type":"markdown","metadata":{"id":"1IWw-NZf5WfF"},"source":["**Upside down detector**: Train a model to detect if images are upside down\n","\n","* Pick a dataset of natural images (we suggest looking at datasets on the [Hugging Face Hub](https://huggingface.co/datasets?task_categories=task_categories:image-classification&sort=downloads))\n","* Synthetically turn some of images upside down. Create a training and test set.\n","* Build a neural network (using Tensorflow, PyTorch, or any framework you like)\n","* Train it to classify image orientation until a reasonable accuracy is reached\n","* [Upload the the model to the Hugging Face Hub](https://huggingface.co/docs/hub/adding-a-model), and add a link to your model below.\n","* Look at some of the images that were classified incorrectly. Please explain what you might do to improve your model's performance on these images in the future (you do not need to impelement these suggestions)\n","\n","**Submission instructions**: Please write your code below and include some examples of images that were classified"]},{"cell_type":"code","source":["### WRITE YOUR CODE TO TRAIN THE MODEL HERE"],"metadata":{"id":"K2GJaYBpw91T"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["**Write up**: \n","* Link to the model on Hugging Face Hub: \n","* Include some examples of misclassified images. Please explain what you might do to improve your model's performance on these images in the future (you do not need to impelement these suggestions)"],"metadata":{"id":"qSeLed2JxvGI"}},{"cell_type":"markdown","metadata":{"id":"sFU9LTOyMiMj"},"source":["# 2. Deep Learning for NLP\n","\n","**Fake news classifier**: Train a text classification model to detect fake news articles!\n","\n","* Download the dataset here: https://www.kaggle.com/clmentbisaillon/fake-and-real-news-dataset\n","* Develop an NLP model for classification that uses a pretrained language model\n","* Finetune your model on the dataset, and generate an AUC curve of your model on the test set of your choice. \n","* [Upload the the model to the Hugging Face Hub](https://huggingface.co/docs/hub/adding-a-model), and add a link to your model below.\n","* *Answer the following question*: Look at some of the news articles that were classified incorrectly. Please explain what you might do to improve your model's performance on these news articles in the future (you do not need to impelement these suggestions)"]},{"cell_type":"code","source":["### WRITE YOUR CODE TO TRAIN THE MODEL HERE\n","\n","import numpy as np\n","import matplotlib.pyplot as plt\n","import nltk\n","from nltk.corpus import stopwords\n","from nltk.stem.porter import PorterStemmer\n","from nltk.stem import WordNetLemmatizer\n","import sklearn\n","from sklearn.metrics import accuracy_score, precision_recall_fscore_support\n","from transformers import Trainer, TrainingArguments\n","from transformers import DistilBertTokenizer, DistilBertForSequenceClassification\n","import re\n","import string\n","import pandas as pd\n","import seaborn as sns\n","\n","\n","from google.colab import drive\n","drive.mount('/content/gdrive')\n","\n"],"metadata":{"id":"E90i018KyJH3","executionInfo":{"status":"ok","timestamp":1649160180211,"user_tz":-120,"elapsed":7259,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}},"outputId":"3de3bcec-8b1a-4a20-c78e-95fe48f4dbc9","colab":{"base_uri":"https://localhost:8080/"}},"execution_count":117,"outputs":[{"output_type":"stream","name":"stdout","text":["Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount(\"/content/gdrive\", force_remount=True).\n"]}]},{"cell_type":"code","source":["fake_df = pd.read_csv(\"/content/gdrive/MyDrive/FATIMA FELLOWSHIP /Classifier_News/Fake.csv\" )\n","true_df = pd.read_csv(\"/content/gdrive/MyDrive/FATIMA FELLOWSHIP /Classifier_News/True.csv\" )"],"metadata":{"id":"M_-58cTb8l0H","executionInfo":{"status":"ok","timestamp":1649160189638,"user_tz":-120,"elapsed":3496,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}}},"execution_count":118,"outputs":[]},{"cell_type":"code","source":["fake_df.head()"],"metadata":{"id":"oSu0PDOZ-vgN","executionInfo":{"status":"ok","timestamp":1649160189641,"user_tz":-120,"elapsed":11,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}},"outputId":"2083b75d-4a20-484b-dc92-83c1dc940216","colab":{"base_uri":"https://localhost:8080/","height":206}},"execution_count":119,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" title \\\n","0 Donald Trump Sends Out Embarrassing New Year’... \n","1 Drunk Bragging Trump Staffer Started Russian ... \n","2 Sheriff David Clarke Becomes An Internet Joke... \n","3 Trump Is So Obsessed He Even Has Obama’s Name... \n","4 Pope Francis Just Called Out Donald Trump Dur... \n","\n"," text subject \\\n","0 Donald Trump just couldn t wish all Americans ... News \n","1 House Intelligence Committee Chairman Devin Nu... News \n","2 On Friday, it was revealed that former Milwauk... News \n","3 On Christmas day, Donald Trump announced that ... News \n","4 Pope Francis used his annual Christmas Day mes... News \n","\n"," date \n","0 December 31, 2017 \n","1 December 31, 2017 \n","2 December 30, 2017 \n","3 December 29, 2017 \n","4 December 25, 2017 "],"text/html":["\n"," <div id=\"df-49b54d29-f892-4215-be8d-cbdaf048f4a8\">\n"," <div class=\"colab-df-container\">\n"," <div>\n","<style scoped>\n"," .dataframe tbody tr th:only-of-type {\n"," vertical-align: middle;\n"," }\n","\n"," .dataframe tbody tr th {\n"," vertical-align: top;\n"," }\n","\n"," .dataframe thead th {\n"," text-align: right;\n"," }\n","</style>\n","<table border=\"1\" class=\"dataframe\">\n"," <thead>\n"," <tr style=\"text-align: right;\">\n"," <th></th>\n"," <th>title</th>\n"," <th>text</th>\n"," <th>subject</th>\n"," <th>date</th>\n"," </tr>\n"," </thead>\n"," <tbody>\n"," <tr>\n"," <th>0</th>\n"," <td>Donald Trump Sends Out Embarrassing New Year’...</td>\n"," <td>Donald Trump just couldn t wish all Americans ...</td>\n"," <td>News</td>\n"," <td>December 31, 2017</td>\n"," </tr>\n"," <tr>\n"," <th>1</th>\n"," <td>Drunk Bragging Trump Staffer Started Russian ...</td>\n"," <td>House Intelligence Committee Chairman Devin Nu...</td>\n"," <td>News</td>\n"," <td>December 31, 2017</td>\n"," </tr>\n"," <tr>\n"," <th>2</th>\n"," <td>Sheriff David Clarke Becomes An Internet Joke...</td>\n"," <td>On Friday, it was revealed that former Milwauk...</td>\n"," <td>News</td>\n"," <td>December 30, 2017</td>\n"," </tr>\n"," <tr>\n"," <th>3</th>\n"," <td>Trump Is So Obsessed He Even Has Obama’s Name...</td>\n"," <td>On Christmas day, Donald Trump announced that ...</td>\n"," <td>News</td>\n"," <td>December 29, 2017</td>\n"," </tr>\n"," <tr>\n"," <th>4</th>\n"," <td>Pope Francis Just Called Out Donald Trump Dur...</td>\n"," <td>Pope Francis used his annual Christmas Day mes...</td>\n"," <td>News</td>\n"," <td>December 25, 2017</td>\n"," </tr>\n"," </tbody>\n","</table>\n","</div>\n"," <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-49b54d29-f892-4215-be8d-cbdaf048f4a8')\"\n"," title=\"Convert this dataframe to an interactive table.\"\n"," style=\"display:none;\">\n"," \n"," <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n"," width=\"24px\">\n"," <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n"," <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n"," </svg>\n"," </button>\n"," \n"," <style>\n"," .colab-df-container {\n"," display:flex;\n"," flex-wrap:wrap;\n"," gap: 12px;\n"," }\n","\n"," .colab-df-convert {\n"," background-color: #E8F0FE;\n"," border: none;\n"," border-radius: 50%;\n"," cursor: pointer;\n"," display: none;\n"," fill: #1967D2;\n"," height: 32px;\n"," padding: 0 0 0 0;\n"," width: 32px;\n"," }\n","\n"," .colab-df-convert:hover {\n"," background-color: #E2EBFA;\n"," box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n"," fill: #174EA6;\n"," }\n","\n"," [theme=dark] .colab-df-convert {\n"," background-color: #3B4455;\n"," fill: #D2E3FC;\n"," }\n","\n"," [theme=dark] .colab-df-convert:hover {\n"," background-color: #434B5C;\n"," box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n"," filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n"," fill: #FFFFFF;\n"," }\n"," </style>\n","\n"," <script>\n"," const buttonEl =\n"," document.querySelector('#df-49b54d29-f892-4215-be8d-cbdaf048f4a8 button.colab-df-convert');\n"," buttonEl.style.display =\n"," google.colab.kernel.accessAllowed ? 'block' : 'none';\n","\n"," async function convertToInteractive(key) {\n"," const element = document.querySelector('#df-49b54d29-f892-4215-be8d-cbdaf048f4a8');\n"," const dataTable =\n"," await google.colab.kernel.invokeFunction('convertToInteractive',\n"," [key], {});\n"," if (!dataTable) return;\n","\n"," const docLinkHtml = 'Like what you see? Visit the ' +\n"," '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n"," + ' to learn more about interactive tables.';\n"," element.innerHTML = '';\n"," dataTable['output_type'] = 'display_data';\n"," await google.colab.output.renderOutput(dataTable, element);\n"," const docLink = document.createElement('div');\n"," docLink.innerHTML = docLinkHtml;\n"," element.appendChild(docLink);\n"," }\n"," </script>\n"," </div>\n"," </div>\n"," "]},"metadata":{},"execution_count":119}]},{"cell_type":"code","source":["fake_df.describe()"],"metadata":{"id":"2y1xse5b-7rK","executionInfo":{"status":"ok","timestamp":1649160189642,"user_tz":-120,"elapsed":10,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}},"outputId":"e6b41bdc-119e-413d-8728-a6b737cb6f4f","colab":{"base_uri":"https://localhost:8080/","height":175}},"execution_count":120,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" title text subject \\\n","count 23481 23481 23481 \n","unique 17903 17455 6 \n","top MEDIA IGNORES Time That Bill Clinton FIRED His... News \n","freq 6 626 9050 \n","\n"," date \n","count 23481 \n","unique 1681 \n","top May 10, 2017 \n","freq 46 "],"text/html":["\n"," <div id=\"df-f3698be6-13c1-41d5-9e91-71aed4d8cf3e\">\n"," <div class=\"colab-df-container\">\n"," <div>\n","<style scoped>\n"," .dataframe tbody tr th:only-of-type {\n"," vertical-align: middle;\n"," }\n","\n"," .dataframe tbody tr th {\n"," vertical-align: top;\n"," }\n","\n"," .dataframe thead th {\n"," text-align: right;\n"," }\n","</style>\n","<table border=\"1\" class=\"dataframe\">\n"," <thead>\n"," <tr style=\"text-align: right;\">\n"," <th></th>\n"," <th>title</th>\n"," <th>text</th>\n"," <th>subject</th>\n"," <th>date</th>\n"," </tr>\n"," </thead>\n"," <tbody>\n"," <tr>\n"," <th>count</th>\n"," <td>23481</td>\n"," <td>23481</td>\n"," <td>23481</td>\n"," <td>23481</td>\n"," </tr>\n"," <tr>\n"," <th>unique</th>\n"," <td>17903</td>\n"," <td>17455</td>\n"," <td>6</td>\n"," <td>1681</td>\n"," </tr>\n"," <tr>\n"," <th>top</th>\n"," <td>MEDIA IGNORES Time That Bill Clinton FIRED His...</td>\n"," <td></td>\n"," <td>News</td>\n"," <td>May 10, 2017</td>\n"," </tr>\n"," <tr>\n"," <th>freq</th>\n"," <td>6</td>\n"," <td>626</td>\n"," <td>9050</td>\n"," <td>46</td>\n"," </tr>\n"," </tbody>\n","</table>\n","</div>\n"," <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-f3698be6-13c1-41d5-9e91-71aed4d8cf3e')\"\n"," title=\"Convert this dataframe to an interactive table.\"\n"," style=\"display:none;\">\n"," \n"," <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n"," width=\"24px\">\n"," <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n"," <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n"," </svg>\n"," </button>\n"," \n"," <style>\n"," .colab-df-container {\n"," display:flex;\n"," flex-wrap:wrap;\n"," gap: 12px;\n"," }\n","\n"," .colab-df-convert {\n"," background-color: #E8F0FE;\n"," border: none;\n"," border-radius: 50%;\n"," cursor: pointer;\n"," display: none;\n"," fill: #1967D2;\n"," height: 32px;\n"," padding: 0 0 0 0;\n"," width: 32px;\n"," }\n","\n"," .colab-df-convert:hover {\n"," background-color: #E2EBFA;\n"," box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n"," fill: #174EA6;\n"," }\n","\n"," [theme=dark] .colab-df-convert {\n"," background-color: #3B4455;\n"," fill: #D2E3FC;\n"," }\n","\n"," [theme=dark] .colab-df-convert:hover {\n"," background-color: #434B5C;\n"," box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n"," filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n"," fill: #FFFFFF;\n"," }\n"," </style>\n","\n"," <script>\n"," const buttonEl =\n"," document.querySelector('#df-f3698be6-13c1-41d5-9e91-71aed4d8cf3e button.colab-df-convert');\n"," buttonEl.style.display =\n"," google.colab.kernel.accessAllowed ? 'block' : 'none';\n","\n"," async function convertToInteractive(key) {\n"," const element = document.querySelector('#df-f3698be6-13c1-41d5-9e91-71aed4d8cf3e');\n"," const dataTable =\n"," await google.colab.kernel.invokeFunction('convertToInteractive',\n"," [key], {});\n"," if (!dataTable) return;\n","\n"," const docLinkHtml = 'Like what you see? Visit the ' +\n"," '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n"," + ' to learn more about interactive tables.';\n"," element.innerHTML = '';\n"," dataTable['output_type'] = 'display_data';\n"," await google.colab.output.renderOutput(dataTable, element);\n"," const docLink = document.createElement('div');\n"," docLink.innerHTML = docLinkHtml;\n"," element.appendChild(docLink);\n"," }\n"," </script>\n"," </div>\n"," </div>\n"," "]},"metadata":{},"execution_count":120}]},{"cell_type":"code","source":["fake_df.info()"],"metadata":{"id":"XxaCSeo-_A0_","executionInfo":{"status":"ok","timestamp":1649160189642,"user_tz":-120,"elapsed":9,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}},"outputId":"a819af84-5a6b-49ea-82c5-e2aa4c95b285","colab":{"base_uri":"https://localhost:8080/"}},"execution_count":121,"outputs":[{"output_type":"stream","name":"stdout","text":["<class 'pandas.core.frame.DataFrame'>\n","RangeIndex: 23481 entries, 0 to 23480\n","Data columns (total 4 columns):\n"," # Column Non-Null Count Dtype \n","--- ------ -------------- ----- \n"," 0 title 23481 non-null object\n"," 1 text 23481 non-null object\n"," 2 subject 23481 non-null object\n"," 3 date 23481 non-null object\n","dtypes: object(4)\n","memory usage: 733.9+ KB\n"]}]},{"cell_type":"code","source":["true_df['target'] = 1\n","\n","fake_df['target'] = 0\n","\n","df_final = pd.concat([true_df, fake_df]).reset_index(drop = True)\n","\n","df_final['merged'] = df_final['title'] + ' ' + df_final['text']\n","\n","df_final.head()"],"metadata":{"id":"_YbqjK118iFV","executionInfo":{"status":"ok","timestamp":1649160191285,"user_tz":-120,"elapsed":1650,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}},"outputId":"de19d327-6256-4dc5-e394-696617675721","colab":{"base_uri":"https://localhost:8080/","height":206}},"execution_count":122,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" title \\\n","0 As U.S. budget fight looms, Republicans flip t... \n","1 U.S. military to accept transgender recruits o... \n","2 Senior U.S. Republican senator: 'Let Mr. Muell... \n","3 FBI Russia probe helped by Australian diplomat... \n","4 Trump wants Postal Service to charge 'much mor... \n","\n"," text subject \\\n","0 WASHINGTON (Reuters) - The head of a conservat... politicsNews \n","1 WASHINGTON (Reuters) - Transgender people will... politicsNews \n","2 WASHINGTON (Reuters) - The special counsel inv... politicsNews \n","3 WASHINGTON (Reuters) - Trump campaign adviser ... politicsNews \n","4 SEATTLE/WASHINGTON (Reuters) - President Donal... politicsNews \n","\n"," date target \\\n","0 December 31, 2017 1 \n","1 December 29, 2017 1 \n","2 December 31, 2017 1 \n","3 December 30, 2017 1 \n","4 December 29, 2017 1 \n","\n"," merged \n","0 As U.S. budget fight looms, Republicans flip t... \n","1 U.S. military to accept transgender recruits o... \n","2 Senior U.S. Republican senator: 'Let Mr. Muell... \n","3 FBI Russia probe helped by Australian diplomat... \n","4 Trump wants Postal Service to charge 'much mor... "],"text/html":["\n"," <div id=\"df-56eede01-7846-4ad5-b0ed-1ba4b4dd515a\">\n"," <div class=\"colab-df-container\">\n"," <div>\n","<style scoped>\n"," .dataframe tbody tr th:only-of-type {\n"," vertical-align: middle;\n"," }\n","\n"," .dataframe tbody tr th {\n"," vertical-align: top;\n"," }\n","\n"," .dataframe thead th {\n"," text-align: right;\n"," }\n","</style>\n","<table border=\"1\" class=\"dataframe\">\n"," <thead>\n"," <tr style=\"text-align: right;\">\n"," <th></th>\n"," <th>title</th>\n"," <th>text</th>\n"," <th>subject</th>\n"," <th>date</th>\n"," <th>target</th>\n"," <th>merged</th>\n"," </tr>\n"," </thead>\n"," <tbody>\n"," <tr>\n"," <th>0</th>\n"," <td>As U.S. budget fight looms, Republicans flip t...</td>\n"," <td>WASHINGTON (Reuters) - The head of a conservat...</td>\n"," <td>politicsNews</td>\n"," <td>December 31, 2017</td>\n"," <td>1</td>\n"," <td>As U.S. budget fight looms, Republicans flip t...</td>\n"," </tr>\n"," <tr>\n"," <th>1</th>\n"," <td>U.S. military to accept transgender recruits o...</td>\n"," <td>WASHINGTON (Reuters) - Transgender people will...</td>\n"," <td>politicsNews</td>\n"," <td>December 29, 2017</td>\n"," <td>1</td>\n"," <td>U.S. military to accept transgender recruits o...</td>\n"," </tr>\n"," <tr>\n"," <th>2</th>\n"," <td>Senior U.S. Republican senator: 'Let Mr. Muell...</td>\n"," <td>WASHINGTON (Reuters) - The special counsel inv...</td>\n"," <td>politicsNews</td>\n"," <td>December 31, 2017</td>\n"," <td>1</td>\n"," <td>Senior U.S. Republican senator: 'Let Mr. Muell...</td>\n"," </tr>\n"," <tr>\n"," <th>3</th>\n"," <td>FBI Russia probe helped by Australian diplomat...</td>\n"," <td>WASHINGTON (Reuters) - Trump campaign adviser ...</td>\n"," <td>politicsNews</td>\n"," <td>December 30, 2017</td>\n"," <td>1</td>\n"," <td>FBI Russia probe helped by Australian diplomat...</td>\n"," </tr>\n"," <tr>\n"," <th>4</th>\n"," <td>Trump wants Postal Service to charge 'much mor...</td>\n"," <td>SEATTLE/WASHINGTON (Reuters) - President Donal...</td>\n"," <td>politicsNews</td>\n"," <td>December 29, 2017</td>\n"," <td>1</td>\n"," <td>Trump wants Postal Service to charge 'much mor...</td>\n"," </tr>\n"," </tbody>\n","</table>\n","</div>\n"," <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-56eede01-7846-4ad5-b0ed-1ba4b4dd515a')\"\n"," title=\"Convert this dataframe to an interactive table.\"\n"," style=\"display:none;\">\n"," \n"," <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n"," width=\"24px\">\n"," <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n"," <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n"," </svg>\n"," </button>\n"," \n"," <style>\n"," .colab-df-container {\n"," display:flex;\n"," flex-wrap:wrap;\n"," gap: 12px;\n"," }\n","\n"," .colab-df-convert {\n"," background-color: #E8F0FE;\n"," border: none;\n"," border-radius: 50%;\n"," cursor: pointer;\n"," display: none;\n"," fill: #1967D2;\n"," height: 32px;\n"," padding: 0 0 0 0;\n"," width: 32px;\n"," }\n","\n"," .colab-df-convert:hover {\n"," background-color: #E2EBFA;\n"," box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n"," fill: #174EA6;\n"," }\n","\n"," [theme=dark] .colab-df-convert {\n"," background-color: #3B4455;\n"," fill: #D2E3FC;\n"," }\n","\n"," [theme=dark] .colab-df-convert:hover {\n"," background-color: #434B5C;\n"," box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n"," filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n"," fill: #FFFFFF;\n"," }\n"," </style>\n","\n"," <script>\n"," const buttonEl =\n"," document.querySelector('#df-56eede01-7846-4ad5-b0ed-1ba4b4dd515a button.colab-df-convert');\n"," buttonEl.style.display =\n"," google.colab.kernel.accessAllowed ? 'block' : 'none';\n","\n"," async function convertToInteractive(key) {\n"," const element = document.querySelector('#df-56eede01-7846-4ad5-b0ed-1ba4b4dd515a');\n"," const dataTable =\n"," await google.colab.kernel.invokeFunction('convertToInteractive',\n"," [key], {});\n"," if (!dataTable) return;\n","\n"," const docLinkHtml = 'Like what you see? Visit the ' +\n"," '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n"," + ' to learn more about interactive tables.';\n"," element.innerHTML = '';\n"," dataTable['output_type'] = 'display_data';\n"," await google.colab.output.renderOutput(dataTable, element);\n"," const docLink = document.createElement('div');\n"," docLink.innerHTML = docLinkHtml;\n"," element.appendChild(docLink);\n"," }\n"," </script>\n"," </div>\n"," </div>\n"," "]},"metadata":{},"execution_count":122}]},{"cell_type":"code","source":["true_df['target'] = 1\n","\n","fake_df['target'] = 0\n","\n"],"metadata":{"id":"3S1cSTyk_zmg","executionInfo":{"status":"ok","timestamp":1649160191286,"user_tz":-120,"elapsed":8,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}}},"execution_count":123,"outputs":[]},{"cell_type":"code","source":["#drop columns\n","df_final.drop(['title','text', 'subject', 'date'],axis=1,inplace=True)\n","df_final.head()"],"metadata":{"id":"iqYwGdXd9Ly6","executionInfo":{"status":"ok","timestamp":1649160191289,"user_tz":-120,"elapsed":11,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}},"outputId":"c2c8df2d-c35f-43aa-a73f-861613f931dd","colab":{"base_uri":"https://localhost:8080/","height":206}},"execution_count":124,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" target merged\n","0 1 As U.S. budget fight looms, Republicans flip t...\n","1 1 U.S. military to accept transgender recruits o...\n","2 1 Senior U.S. Republican senator: 'Let Mr. Muell...\n","3 1 FBI Russia probe helped by Australian diplomat...\n","4 1 Trump wants Postal Service to charge 'much mor..."],"text/html":["\n"," <div id=\"df-deb70683-7956-4b2f-ba53-b7844403049a\">\n"," <div class=\"colab-df-container\">\n"," <div>\n","<style scoped>\n"," .dataframe tbody tr th:only-of-type {\n"," vertical-align: middle;\n"," }\n","\n"," .dataframe tbody tr th {\n"," vertical-align: top;\n"," }\n","\n"," .dataframe thead th {\n"," text-align: right;\n"," }\n","</style>\n","<table border=\"1\" class=\"dataframe\">\n"," <thead>\n"," <tr style=\"text-align: right;\">\n"," <th></th>\n"," <th>target</th>\n"," <th>merged</th>\n"," </tr>\n"," </thead>\n"," <tbody>\n"," <tr>\n"," <th>0</th>\n"," <td>1</td>\n"," <td>As U.S. budget fight looms, Republicans flip t...</td>\n"," </tr>\n"," <tr>\n"," <th>1</th>\n"," <td>1</td>\n"," <td>U.S. military to accept transgender recruits o...</td>\n"," </tr>\n"," <tr>\n"," <th>2</th>\n"," <td>1</td>\n"," <td>Senior U.S. Republican senator: 'Let Mr. Muell...</td>\n"," </tr>\n"," <tr>\n"," <th>3</th>\n"," <td>1</td>\n"," <td>FBI Russia probe helped by Australian diplomat...</td>\n"," </tr>\n"," <tr>\n"," <th>4</th>\n"," <td>1</td>\n"," <td>Trump wants Postal Service to charge 'much mor...</td>\n"," </tr>\n"," </tbody>\n","</table>\n","</div>\n"," <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-deb70683-7956-4b2f-ba53-b7844403049a')\"\n"," title=\"Convert this dataframe to an interactive table.\"\n"," style=\"display:none;\">\n"," \n"," <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n"," width=\"24px\">\n"," <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n"," <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n"," </svg>\n"," </button>\n"," \n"," <style>\n"," .colab-df-container {\n"," display:flex;\n"," flex-wrap:wrap;\n"," gap: 12px;\n"," }\n","\n"," .colab-df-convert {\n"," background-color: #E8F0FE;\n"," border: none;\n"," border-radius: 50%;\n"," cursor: pointer;\n"," display: none;\n"," fill: #1967D2;\n"," height: 32px;\n"," padding: 0 0 0 0;\n"," width: 32px;\n"," }\n","\n"," .colab-df-convert:hover {\n"," background-color: #E2EBFA;\n"," box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n"," fill: #174EA6;\n"," }\n","\n"," [theme=dark] .colab-df-convert {\n"," background-color: #3B4455;\n"," fill: #D2E3FC;\n"," }\n","\n"," [theme=dark] .colab-df-convert:hover {\n"," background-color: #434B5C;\n"," box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n"," filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n"," fill: #FFFFFF;\n"," }\n"," </style>\n","\n"," <script>\n"," const buttonEl =\n"," document.querySelector('#df-deb70683-7956-4b2f-ba53-b7844403049a button.colab-df-convert');\n"," buttonEl.style.display =\n"," google.colab.kernel.accessAllowed ? 'block' : 'none';\n","\n"," async function convertToInteractive(key) {\n"," const element = document.querySelector('#df-deb70683-7956-4b2f-ba53-b7844403049a');\n"," const dataTable =\n"," await google.colab.kernel.invokeFunction('convertToInteractive',\n"," [key], {});\n"," if (!dataTable) return;\n","\n"," const docLinkHtml = 'Like what you see? Visit the ' +\n"," '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n"," + ' to learn more about interactive tables.';\n"," element.innerHTML = '';\n"," dataTable['output_type'] = 'display_data';\n"," await google.colab.output.renderOutput(dataTable, element);\n"," const docLink = document.createElement('div');\n"," docLink.innerHTML = docLinkHtml;\n"," element.appendChild(docLink);\n"," }\n"," </script>\n"," </div>\n"," </div>\n"," "]},"metadata":{},"execution_count":124}]},{"cell_type":"code","source":["df_final = df_final[['merged','target']]\n","df_final.head()"],"metadata":{"id":"VftmPW3ABk6R","executionInfo":{"status":"ok","timestamp":1649160195694,"user_tz":-120,"elapsed":1022,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}},"outputId":"3800b986-2176-4c4a-bec2-f023162c2d8f","colab":{"base_uri":"https://localhost:8080/","height":206}},"execution_count":125,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" merged target\n","0 As U.S. budget fight looms, Republicans flip t... 1\n","1 U.S. military to accept transgender recruits o... 1\n","2 Senior U.S. Republican senator: 'Let Mr. Muell... 1\n","3 FBI Russia probe helped by Australian diplomat... 1\n","4 Trump wants Postal Service to charge 'much mor... 1"],"text/html":["\n"," <div id=\"df-0ba4cd8b-7852-40ae-8d59-c524c0a59abb\">\n"," <div class=\"colab-df-container\">\n"," <div>\n","<style scoped>\n"," .dataframe tbody tr th:only-of-type {\n"," vertical-align: middle;\n"," }\n","\n"," .dataframe tbody tr th {\n"," vertical-align: top;\n"," }\n","\n"," .dataframe thead th {\n"," text-align: right;\n"," }\n","</style>\n","<table border=\"1\" class=\"dataframe\">\n"," <thead>\n"," <tr style=\"text-align: right;\">\n"," <th></th>\n"," <th>merged</th>\n"," <th>target</th>\n"," </tr>\n"," </thead>\n"," <tbody>\n"," <tr>\n"," <th>0</th>\n"," <td>As U.S. budget fight looms, Republicans flip t...</td>\n"," <td>1</td>\n"," </tr>\n"," <tr>\n"," <th>1</th>\n"," <td>U.S. military to accept transgender recruits o...</td>\n"," <td>1</td>\n"," </tr>\n"," <tr>\n"," <th>2</th>\n"," <td>Senior U.S. Republican senator: 'Let Mr. Muell...</td>\n"," <td>1</td>\n"," </tr>\n"," <tr>\n"," <th>3</th>\n"," <td>FBI Russia probe helped by Australian diplomat...</td>\n"," <td>1</td>\n"," </tr>\n"," <tr>\n"," <th>4</th>\n"," <td>Trump wants Postal Service to charge 'much mor...</td>\n"," <td>1</td>\n"," </tr>\n"," </tbody>\n","</table>\n","</div>\n"," <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-0ba4cd8b-7852-40ae-8d59-c524c0a59abb')\"\n"," title=\"Convert this dataframe to an interactive table.\"\n"," style=\"display:none;\">\n"," \n"," <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n"," width=\"24px\">\n"," <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n"," <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n"," </svg>\n"," </button>\n"," \n"," <style>\n"," .colab-df-container {\n"," display:flex;\n"," flex-wrap:wrap;\n"," gap: 12px;\n"," }\n","\n"," .colab-df-convert {\n"," background-color: #E8F0FE;\n"," border: none;\n"," border-radius: 50%;\n"," cursor: pointer;\n"," display: none;\n"," fill: #1967D2;\n"," height: 32px;\n"," padding: 0 0 0 0;\n"," width: 32px;\n"," }\n","\n"," .colab-df-convert:hover {\n"," background-color: #E2EBFA;\n"," box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n"," fill: #174EA6;\n"," }\n","\n"," [theme=dark] .colab-df-convert {\n"," background-color: #3B4455;\n"," fill: #D2E3FC;\n"," }\n","\n"," [theme=dark] .colab-df-convert:hover {\n"," background-color: #434B5C;\n"," box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n"," filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n"," fill: #FFFFFF;\n"," }\n"," </style>\n","\n"," <script>\n"," const buttonEl =\n"," document.querySelector('#df-0ba4cd8b-7852-40ae-8d59-c524c0a59abb button.colab-df-convert');\n"," buttonEl.style.display =\n"," google.colab.kernel.accessAllowed ? 'block' : 'none';\n","\n"," async function convertToInteractive(key) {\n"," const element = document.querySelector('#df-0ba4cd8b-7852-40ae-8d59-c524c0a59abb');\n"," const dataTable =\n"," await google.colab.kernel.invokeFunction('convertToInteractive',\n"," [key], {});\n"," if (!dataTable) return;\n","\n"," const docLinkHtml = 'Like what you see? Visit the ' +\n"," '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n"," + ' to learn more about interactive tables.';\n"," element.innerHTML = '';\n"," dataTable['output_type'] = 'display_data';\n"," await google.colab.output.renderOutput(dataTable, element);\n"," const docLink = document.createElement('div');\n"," docLink.innerHTML = docLinkHtml;\n"," element.appendChild(docLink);\n"," }\n"," </script>\n"," </div>\n"," </div>\n"," "]},"metadata":{},"execution_count":125}]},{"cell_type":"code","source":["##divide into Input and target\n","Input=df_final.drop('target',axis=1)\n","\n","Output=df_final['target']"],"metadata":{"id":"VKVSUwn49SQ5","executionInfo":{"status":"ok","timestamp":1649160205588,"user_tz":-120,"elapsed":1094,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}}},"execution_count":128,"outputs":[]},{"cell_type":"code","source":["Input.shape"],"metadata":{"id":"OmG8H-NejSED","executionInfo":{"status":"ok","timestamp":1649160214394,"user_tz":-120,"elapsed":549,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}},"outputId":"f7b174a8-4ff5-4d3f-ad2b-4102746b0cbf","colab":{"base_uri":"https://localhost:8080/"}},"execution_count":129,"outputs":[{"output_type":"execute_result","data":{"text/plain":["(44898, 1)"]},"metadata":{},"execution_count":129}]},{"cell_type":"code","source":["Output.shape"],"metadata":{"id":"D56fLr_tjUxU","executionInfo":{"status":"ok","timestamp":1649160216164,"user_tz":-120,"elapsed":6,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}},"outputId":"09042925-b87c-4c12-c6d4-f401890fa52f","colab":{"base_uri":"https://localhost:8080/"}},"execution_count":130,"outputs":[{"output_type":"execute_result","data":{"text/plain":["(44898,)"]},"metadata":{},"execution_count":130}]},{"cell_type":"code","source":["Input.head()"],"metadata":{"id":"pU6hvCzD72uD","executionInfo":{"status":"ok","timestamp":1649160218366,"user_tz":-120,"elapsed":13,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}},"outputId":"d24c714e-a57a-4255-92cb-68649e2453db","colab":{"base_uri":"https://localhost:8080/","height":206}},"execution_count":131,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" merged\n","0 As U.S. budget fight looms, Republicans flip t...\n","1 U.S. military to accept transgender recruits o...\n","2 Senior U.S. Republican senator: 'Let Mr. Muell...\n","3 FBI Russia probe helped by Australian diplomat...\n","4 Trump wants Postal Service to charge 'much mor..."],"text/html":["\n"," <div id=\"df-3a1100f2-5e7c-4f3a-bd81-87c1a01f4a2c\">\n"," <div class=\"colab-df-container\">\n"," <div>\n","<style scoped>\n"," .dataframe tbody tr th:only-of-type {\n"," vertical-align: middle;\n"," }\n","\n"," .dataframe tbody tr th {\n"," vertical-align: top;\n"," }\n","\n"," .dataframe thead th {\n"," text-align: right;\n"," }\n","</style>\n","<table border=\"1\" class=\"dataframe\">\n"," <thead>\n"," <tr style=\"text-align: right;\">\n"," <th></th>\n"," <th>merged</th>\n"," </tr>\n"," </thead>\n"," <tbody>\n"," <tr>\n"," <th>0</th>\n"," <td>As U.S. budget fight looms, Republicans flip t...</td>\n"," </tr>\n"," <tr>\n"," <th>1</th>\n"," <td>U.S. military to accept transgender recruits o...</td>\n"," </tr>\n"," <tr>\n"," <th>2</th>\n"," <td>Senior U.S. Republican senator: 'Let Mr. Muell...</td>\n"," </tr>\n"," <tr>\n"," <th>3</th>\n"," <td>FBI Russia probe helped by Australian diplomat...</td>\n"," </tr>\n"," <tr>\n"," <th>4</th>\n"," <td>Trump wants Postal Service to charge 'much mor...</td>\n"," </tr>\n"," </tbody>\n","</table>\n","</div>\n"," <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-3a1100f2-5e7c-4f3a-bd81-87c1a01f4a2c')\"\n"," title=\"Convert this dataframe to an interactive table.\"\n"," style=\"display:none;\">\n"," \n"," <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n"," width=\"24px\">\n"," <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n"," <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n"," </svg>\n"," </button>\n"," \n"," <style>\n"," .colab-df-container {\n"," display:flex;\n"," flex-wrap:wrap;\n"," gap: 12px;\n"," }\n","\n"," .colab-df-convert {\n"," background-color: #E8F0FE;\n"," border: none;\n"," border-radius: 50%;\n"," cursor: pointer;\n"," display: none;\n"," fill: #1967D2;\n"," height: 32px;\n"," padding: 0 0 0 0;\n"," width: 32px;\n"," }\n","\n"," .colab-df-convert:hover {\n"," background-color: #E2EBFA;\n"," box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n"," fill: #174EA6;\n"," }\n","\n"," [theme=dark] .colab-df-convert {\n"," background-color: #3B4455;\n"," fill: #D2E3FC;\n"," }\n","\n"," [theme=dark] .colab-df-convert:hover {\n"," background-color: #434B5C;\n"," box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n"," filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n"," fill: #FFFFFF;\n"," }\n"," </style>\n","\n"," <script>\n"," const buttonEl =\n"," document.querySelector('#df-3a1100f2-5e7c-4f3a-bd81-87c1a01f4a2c button.colab-df-convert');\n"," buttonEl.style.display =\n"," google.colab.kernel.accessAllowed ? 'block' : 'none';\n","\n"," async function convertToInteractive(key) {\n"," const element = document.querySelector('#df-3a1100f2-5e7c-4f3a-bd81-87c1a01f4a2c');\n"," const dataTable =\n"," await google.colab.kernel.invokeFunction('convertToInteractive',\n"," [key], {});\n"," if (!dataTable) return;\n","\n"," const docLinkHtml = 'Like what you see? Visit the ' +\n"," '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n"," + ' to learn more about interactive tables.';\n"," element.innerHTML = '';\n"," dataTable['output_type'] = 'display_data';\n"," await google.colab.output.renderOutput(dataTable, element);\n"," const docLink = document.createElement('div');\n"," docLink.innerHTML = docLinkHtml;\n"," element.appendChild(docLink);\n"," }\n"," </script>\n"," </div>\n"," </div>\n"," "]},"metadata":{},"execution_count":131}]},{"cell_type":"code","source":["Output.head()"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"HBOK0VOWm_sG","executionInfo":{"status":"ok","timestamp":1649160237743,"user_tz":-120,"elapsed":498,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}},"outputId":"3221e7a2-24c6-4987-a5a1-57e5ff1639b1"},"execution_count":132,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0 1\n","1 1\n","2 1\n","3 1\n","4 1\n","Name: target, dtype: int64"]},"metadata":{},"execution_count":132}]},{"cell_type":"code","source":["import nltk\n","nltk.download('stopwords')\n","nltk.download('punkt')\n","nltk.download('wordnet')"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"M9rWaoC9Dhkw","executionInfo":{"status":"ok","timestamp":1649160242122,"user_tz":-120,"elapsed":564,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}},"outputId":"d3bc0000-e22c-4378-af73-866a72276a9f"},"execution_count":133,"outputs":[{"output_type":"stream","name":"stdout","text":["[nltk_data] Downloading package stopwords to /root/nltk_data...\n","[nltk_data] Package stopwords is already up-to-date!\n","[nltk_data] Downloading package punkt to /root/nltk_data...\n","[nltk_data] Package punkt is already up-to-date!\n","[nltk_data] Downloading package wordnet to /root/nltk_data...\n","[nltk_data] Package wordnet is already up-to-date!\n"]},{"output_type":"execute_result","data":{"text/plain":["True"]},"metadata":{},"execution_count":133}]},{"cell_type":"code","source":["# Librairies for Tokenization\n","import re\n","import string\n","from nltk.tokenize import word_tokenize\n","from nltk.corpus import stopwords\n","from nltk.stem import WordNetLemmatizer\n","from tensorflow.keras.preprocessing.text import Tokenizer\n","\n","stop_words = set(stopwords.words('english'))\n","punctuations = list(string.punctuation)\n","lemma = WordNetLemmatizer() "],"metadata":{"id":"UkM3Q01ukMiU","executionInfo":{"status":"ok","timestamp":1649160245398,"user_tz":-120,"elapsed":6,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}}},"execution_count":134,"outputs":[]},{"cell_type":"code","source":["##We remove punctuations,stop words\n","from nltk.stem import WordNetLemmatizer\n","wn=WordNetLemmatizer()\n","Text=[]\n","for i in range(0,len(Input)):\n"," review=re.sub('[^a-zA-Z]',' ',Input['merged'][i])\n"," review=review.lower()\n"," review=review.split()\n"," review= ' '.join(review)\n"," Text.append(review)"],"metadata":{"id":"kH4eMceMkMfF","executionInfo":{"status":"ok","timestamp":1649160263833,"user_tz":-120,"elapsed":8356,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}}},"execution_count":135,"outputs":[]},{"cell_type":"code","source":["##We split the Data using train_test_split\n","from sklearn.model_selection import train_test_split\n","X_train,X_test,y_train,y_test=train_test_split(Text,Output,test_size=0.25,random_state=42)"],"metadata":{"id":"LZq8h13lkMcv","executionInfo":{"status":"ok","timestamp":1649160267462,"user_tz":-120,"elapsed":684,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}}},"execution_count":136,"outputs":[]},{"cell_type":"code","source":["##using tokenizer of bert\n","from transformers import DistilBertTokenizerFast\n","tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"l4tA4HUKkmzx","executionInfo":{"status":"ok","timestamp":1649160270872,"user_tz":-120,"elapsed":673,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}},"outputId":"3a6c6510-0df2-4b4e-c88d-c189f19059f2"},"execution_count":137,"outputs":[{"output_type":"stream","name":"stderr","text":["loading file https://huggingface.co/distilbert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/0e1bbfda7f63a99bb52e3915dcf10c3c92122b827d92eb2d34ce94ee79ba486c.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n","loading file https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/75abb59d7a06f4f640158a9bfcde005264e59e8d566781ab1415b139d2e4c603.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n","loading file https://huggingface.co/distilbert-base-uncased/resolve/main/added_tokens.json from cache at None\n","loading file https://huggingface.co/distilbert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n","loading file https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/8c8624b8ac8aa99c60c912161f8332de003484428c47906d7ff7eb7f73eecdbb.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n","loading configuration file https://huggingface.co/distilbert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.91b885ab15d631bf9cee9dc9d25ece0afd932f2f5130eba28f2055b2220c0333\n","Model config DistilBertConfig {\n"," \"_name_or_path\": \"distilbert-base-uncased\",\n"," \"activation\": \"gelu\",\n"," \"architectures\": [\n"," \"DistilBertForMaskedLM\"\n"," ],\n"," \"attention_dropout\": 0.1,\n"," \"dim\": 768,\n"," \"dropout\": 0.1,\n"," \"hidden_dim\": 3072,\n"," \"initializer_range\": 0.02,\n"," \"max_position_embeddings\": 512,\n"," \"model_type\": \"distilbert\",\n"," \"n_heads\": 12,\n"," \"n_layers\": 6,\n"," \"pad_token_id\": 0,\n"," \"qa_dropout\": 0.1,\n"," \"seq_classif_dropout\": 0.2,\n"," \"sinusoidal_pos_embds\": false,\n"," \"tie_weights_\": true,\n"," \"transformers_version\": \"4.17.0\",\n"," \"vocab_size\": 30522\n","}\n","\n"]}]},{"cell_type":"code","source":["##We create our tokenize function\n","def text_token(data):\n"," encoded=tokenizer(data,padding=True,truncation=True,return_tensors='np')\n"," return encoded.data"],"metadata":{"id":"5e5CLqFkkuSa","executionInfo":{"status":"ok","timestamp":1649160316545,"user_tz":-120,"elapsed":596,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}}},"execution_count":140,"outputs":[]},{"cell_type":"code","source":["train_data=text_token(X_train) #We tokenize our X_train\n","test_data=text_token(X_test) #We tokenize our X_test"],"metadata":{"id":"HXEFKqlYkuPW","executionInfo":{"status":"ok","timestamp":1649160383483,"user_tz":-120,"elapsed":65050,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}}},"execution_count":141,"outputs":[]},{"cell_type":"markdown","source":["# We build the Model"],"metadata":{"id":"XYvYFbp0ndU_"}},{"cell_type":"code","source":["##We now import our pretrained model which is distilbert\n","from transformers import TFDistilBertForSequenceClassification\n","\n","model1 = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"gMnyRzlgkuHg","executionInfo":{"status":"ok","timestamp":1649160395357,"user_tz":-120,"elapsed":5216,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}},"outputId":"a2215471-b837-4a7d-bdb6-a1daeb6bc49b"},"execution_count":142,"outputs":[{"output_type":"stream","name":"stderr","text":["loading configuration file https://huggingface.co/distilbert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.91b885ab15d631bf9cee9dc9d25ece0afd932f2f5130eba28f2055b2220c0333\n","Model config DistilBertConfig {\n"," \"activation\": \"gelu\",\n"," \"architectures\": [\n"," \"DistilBertForMaskedLM\"\n"," ],\n"," \"attention_dropout\": 0.1,\n"," \"dim\": 768,\n"," \"dropout\": 0.1,\n"," \"hidden_dim\": 3072,\n"," \"initializer_range\": 0.02,\n"," \"max_position_embeddings\": 512,\n"," \"model_type\": \"distilbert\",\n"," \"n_heads\": 12,\n"," \"n_layers\": 6,\n"," \"pad_token_id\": 0,\n"," \"qa_dropout\": 0.1,\n"," \"seq_classif_dropout\": 0.2,\n"," \"sinusoidal_pos_embds\": false,\n"," \"tie_weights_\": true,\n"," \"transformers_version\": \"4.17.0\",\n"," \"vocab_size\": 30522\n","}\n","\n","loading weights file https://huggingface.co/distilbert-base-uncased/resolve/main/tf_model.h5 from cache at /root/.cache/huggingface/transformers/fa107dc22c014df078a1b75235144a927f7e9764916222711f239b7ee6092ec9.bc4b731be56d8422e12b1d5bfa86fbd81d18d2770da1f5ac4f33640a17b7dde9.h5\n","Some layers from the model checkpoint at distilbert-base-uncased were not used when initializing TFDistilBertForSequenceClassification: ['vocab_transform', 'vocab_layer_norm', 'activation_13', 'vocab_projector']\n","- This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n","- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n","Some layers of TFDistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier', 'dropout_39', 'classifier']\n","You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"]}]},{"cell_type":"code","source":["##Importing Librairies\n","from tensorflow.keras.losses import SparseCategoricalCrossentropy\n","import tensorflow as tf"],"metadata":{"id":"aOc0ybEylfDZ","executionInfo":{"status":"ok","timestamp":1649160400860,"user_tz":-120,"elapsed":671,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}}},"execution_count":143,"outputs":[]},{"cell_type":"code","source":["model1.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=2e-4), loss=model1.compute_loss,metrics=['acc'])\n","model1.fit(train_data, np.array(y_train), \n"," validation_data=(test_data,np.array(y_test)),\n"," batch_size=64,epochs=2)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":589},"id":"Q5K0YbvBlaX3","executionInfo":{"status":"error","timestamp":1649160423774,"user_tz":-120,"elapsed":3270,"user":{"displayName":"Uriel Nguefack Yefou","userId":"13596555282661082935"}},"outputId":"ef2960a8-761a-44d2-80fb-eaa57abed644"},"execution_count":145,"outputs":[{"output_type":"stream","name":"stdout","text":["Epoch 1/2\n"]},{"output_type":"error","ename":"AttributeError","evalue":"ignored","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)","\u001b[0;32m<ipython-input-145-b940afd232ff>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 2\u001b[0m model1.fit(train_data, np.array(y_train), \n\u001b[1;32m 3\u001b[0m \u001b[0mvalidation_data\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtest_data\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0my_test\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m batch_size=64,epochs=2)\n\u001b[0m","\u001b[0;32m/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py\u001b[0m in \u001b[0;36merror_handler\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 65\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;31m# pylint: disable=broad-except\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 66\u001b[0m \u001b[0mfiltered_tb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_process_traceback_frames\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__traceback__\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 67\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwith_traceback\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfiltered_tb\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 68\u001b[0m \u001b[0;32mfinally\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 69\u001b[0m \u001b[0;32mdel\u001b[0m \u001b[0mfiltered_tb\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py\u001b[0m in \u001b[0;36mautograph_handler\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 1145\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;31m# pylint:disable=broad-except\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1146\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mhasattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"ag_error_metadata\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1147\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mag_error_metadata\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mto_exception\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1148\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1149\u001b[0m \u001b[0;32mraise\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;31mAttributeError\u001b[0m: in user code:\n\n File \"/usr/local/lib/python3.7/dist-packages/keras/engine/training.py\", line 1021, in train_function *\n return step_function(self, iterator)\n File \"/usr/local/lib/python3.7/dist-packages/transformers/modeling_tf_utils.py\", line 875, in compute_loss *\n return super().compute_loss(*args, **kwargs)\n File \"/usr/local/lib/python3.7/dist-packages/keras/engine/training.py\", line 919, in compute_loss **\n y, y_pred, sample_weight, regularization_losses=self.losses)\n File \"/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py\", line 199, in __call__\n y_t, y_p, sw = match_dtype_and_rank(y_t, y_p, sw)\n File \"/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py\", line 684, in match_dtype_and_rank\n if ((y_t.dtype.is_floating and y_p.dtype.is_floating) or\n\n AttributeError: 'NoneType' object has no attribute 'dtype'\n"]}]},{"cell_type":"markdown","source":["## I have tried several methods to deal with this error but I couldn't. However I have continued to write the code "],"metadata":{"id":"hS_7UcJkn0U5"}},{"cell_type":"code","source":["##saving model\n","model1.save_pretrained('/content/gdrive/MyDrive/FATIMA FELLOWSHIP /Classifier_News/Uriel')"],"metadata":{"id":"3jZXalm7DEKz"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["y_predict=model1.predict(test_data)"],"metadata":{"id":"Nvlrjc3rDEHd"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["classes=np.argmax(preds['logits'],axis=1)"],"metadata":{"id":"MJhn4b-6DEGh"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["##accuracy\n","from sklearn import metrics\n","metrics.accuracy_score(classes,y_test)"],"metadata":{"id":"9UI4DBwvDEBo"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["#confusion matrix\n","metrics.confusion_matrix(classes,y_test)"],"metadata":{"id":"u9X65Ek2DEAC"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["import numpy as np\n","from sklearn import metrics\n","from sklearn.metrics import roc_curve\n","import matplotlib.pyplot as plt\n","fpr, tpr, _ = metrics.roc_curve(y_test, y_predict)\n","auc = metrics.roc_auc_score(y_test, y_predict)\n","\n","#create ROC curve\n","plt.plot(fpr,tpr,label=\"AUC=\"+str(auc))\n","plt.ylabel('True News')\n","plt.xlabel('Fake News')\n","plt.legend(loc=4)\n","plt.show()"],"metadata":{"id":"5HLMUSqxDUYb"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## About Improving the model may be we could implement others loss function and Optimizers; and also another model like LSTM "],"metadata":{"id":"EEVz5ZbJptby"}},{"cell_type":"markdown","source":["**Write up**: \n","* Link to the model on Hugging Face Hub: \n","* Include some examples of misclassified news articles. Please explain what you might do to improve your model's performance on these news articles in the future (you do not need to impelement these suggestions)"],"metadata":{"id":"kpInVUMLyJ24"}},{"cell_type":"markdown","metadata":{"id":"jTfHpo6BOmE8"},"source":["# 3. Deep RL / Robotics"]},{"cell_type":"markdown","metadata":{"id":"saB64bbTXWgZ"},"source":["**RL for Classical Control:** Using any of the [classical control](https://github.com/openai/gym/blob/master/docs/environments.md#classic-control) environments from OpenAI's `gym`, implement a deep NN that learns an optimal policy which maximizes the reward of the environment.\n","\n","* Describe the NN you implemented and the behavior you observe from the agent as the model converges (or diverges).\n","* Plot the reward as a function of steps (or Epochs).\n","Compare your results to a random agent.\n","* Discuss whether you think your model has learned the optimal policy and potential methods for improving it and/or where it might fail.\n","* (Optional) [Upload the the model to the Hugging Face Hub](https://huggingface.co/docs/hub/adding-a-model), and add a link to your model below.\n","\n","\n","You may use any frameworks you like, but you must implement your NN on your own (no pre-defined/trained models like [`stable_baselines`](https://stable-baselines.readthedocs.io/en/master/)).\n","\n","You may use any simulator other than `gym` _however_:\n","* The environment has to be similar to the classical control environments (or more complex like [`robosuite`](https://github.com/ARISE-Initiative/robosuite)).\n","* You cannot choose a game/Atari/text based environment. The purpose of this challenge is to demonstrate an understanding of basic kinematic/dynamic systems."]},{"cell_type":"code","source":["### WRITE YOUR CODE TO TRAIN THE MODEL HERE"],"metadata":{"id":"CUhkTcoeynVv"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["**Write up**: \n","* (Optional) link to the model on Hugging Face Hub: \n","* Discuss whether you think your model has learned the optimal policy and potential methods for improving it and/or where it might fail."],"metadata":{"id":"bWllPZhJyotg"}},{"cell_type":"markdown","metadata":{"id":"rbrRbrISa5J_"},"source":["# 4. Theory / Linear Algebra "]},{"cell_type":"markdown","metadata":{"id":"KFkLRCzTXTzL"},"source":["**Implement Contrastive PCA** Read [this paper](https://www.nature.com/articles/s41467-018-04608-8) and implement contrastive PCA in Python.\n","\n","* First, please discuss what kind of dataset this would make sense to use this method on\n","* Implement the method in Python (do not use previous implementations of the method if they already exist)\n","* Then create a synthetic dataset and apply the method to the synthetic data. Compare with standard PCA.\n"]},{"cell_type":"markdown","source":["**Write up**: Discuss what kind of dataset it would make sense to use Contrastive PCA"],"metadata":{"id":"TpyqWl-ly0wy"}},{"cell_type":"code","source":["### WRITE YOUR CODE HERE"],"metadata":{"id":"1CQzUSfQywRk"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["# 5. Systems"],"metadata":{"id":"dlqmZS5Hy6q-"}},{"cell_type":"markdown","source":["**Inference on the edge**: Measure the inference times in various computationally-constrained settings\n","\n","* Pick a few different speech detection models (we suggest looking at models on the [Hugging Face Hub](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads))\n","* Simulate different memory constraints and CPU allocations that are realistic for edge devices that might run such models, such as smart speakers or microcontrollers, and measure what is the average inference time of the models under these conditions \n","* How does the inference time vary with (1) choice of model (2) available system memory (3) available CPU (4) size of input?\n","\n","Are there any surprising discoveries? (Note that this coding challenge is fairly open-ended, so we will be considering the amount of effort invested in discovering something interesting here)."],"metadata":{"id":"QW_eiDFw1QKm"}},{"cell_type":"code","source":["### WRITE YOUR CODE HERE"],"metadata":{"id":"OYp94wLP1kWJ"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["**Write up**: What surprising discoveries do you see?"],"metadata":{"id":"yoHmutWx2jer"}}]}