{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "eBpjBBZc6IvA"
},
"source": [
"# Fatima Fellowship Coding Challenge (Pick 1)\n",
"\n",
"Thank you for applying to the Fatima Fellowship. To help us select the Fellows and assess your ability to do machine learning research, we are asking that you complete a short coding challenge. Please pick **1 of these 5** coding challenges, whichever is most aligned with your interests. These coding challenges are not meant to take too long, do NOT spend more than 4-6 hours on them -- you can submit whatever you have.\n",
"\n",
"**How to submit**: Please make a copy of this colab notebook, add your code and results, and submit your colab notebook along with your application. If you have never used a colab notebook, [check out this video](https://www.youtube.com/watch?v=i-HnvsehuSw)"
]
},
{
"cell_type": "markdown",
"source": [
"\n",
"\n",
"---\n",
"\n",
"\n",
"### **Important**: Beore you get started, please make sure to make a **copy of this notebook** and set sharing permissions so that **anyone with the link can view**. Otherwise, we will NOT be able to assess your application.\n",
"\n",
"\n",
"\n",
"---\n",
"\n"
],
"metadata": {
"id": "lQNUZjvuRt-m"
}
},
{
"cell_type": "markdown",
"metadata": {
"id": "braBzmRpMe7_"
},
"source": [
"# 1. Deep Learning for Vision"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1IWw-NZf5WfF"
},
"source": [
"**Generated by AI detector**: Train a model to detect if images are generated by AI\n",
"\n",
"* Find a dataset of natural images and images generated by AI (here is one such dataset on the [Hugging Face Hub](https://huggingface.co/datasets/competitions/aiornot) but you're welcome to use any dataset you've found.\n",
"* Create a training and test set.\n",
"* Build a neural network (using Tensorflow, PyTorch, or any framework you like)\n",
"* Train it to classify the image as being generated by an AI or not until a reasonable accuracy is reached\n",
"* [Upload the the model to the Hugging Face Hub](https://huggingface.co/docs/hub/adding-a-model), and add a link to your model below.\n",
"* Look at some of the images that were classified incorrectly. Please explain what you might do to improve your model's performance on these images in the future (you do not need to impelement these suggestions)\n",
"\n",
"**Submission instructions**: Please write your code below and include some examples of images that were classified"
]
},
{
"cell_type": "code",
"source": [
"### WRITE YOUR CODE TO TRAIN THE MODEL HERE\n",
"print('Hi')"
],
"metadata": {
"id": "K2GJaYBpw91T",
"outputId": "f26681e5-f682-42d2-e837-f949a159c779",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Hi\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"**Write up**: \n",
"* Link to the model on Hugging Face Hub: \n",
"* Include some examples of misclassified images. Please explain what you might do to improve your model's performance on these images in the future (you do not need to impelement these suggestions)\n",
"\n",
"[Please put your write up here]"
],
"metadata": {
"id": "qSeLed2JxvGI"
}
},
{
"cell_type": "markdown",
"metadata": {
"id": "sFU9LTOyMiMj"
},
"source": [
"# 2. Deep Learning for NLP\n",
"\n",
"**Fake news classifier**: Train a text classification model to detect fake news articles!\n",
"\n",
"* Download the dataset here: https://www.kaggle.com/datasets/sadikaljarif/fake-news-detection-dataset-english (if you'd like, you can also look at fake news datasets in other languages, which are available on the Huggingface Hub)\n",
"* Develop an NLP model for classification that uses a pretrained language model and the *text* of the article. It should *NOT* use the URL\n",
"* Finetune your model on the dataset, and generate an AUC curve of your model on the test set of your choice. \n",
"* [Upload the the model to the Hugging Face Hub](https://huggingface.co/docs/hub/adding-a-model), and add a link to your model below.\n",
"* *Answer the following question*: Look at some of the news articles that were classified incorrectly. Please explain what you might do to improve your model's performance on these news articles in the future (you do not need to impelement these suggestions)"
]
},
{
"cell_type": "code",
"source": [
"#installing libraries\n",
"!pip install opendatasets\n",
"!pip install pandas\n",
"!pip install -q kaggle\n",
"!pip install transformers\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "MvlmVtfz8LY5",
"outputId": "14ca9f13-661c-45b7-9ed0-62c1ce5aaef4"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting opendatasets\n",
" Downloading opendatasets-0.1.22-py3-none-any.whl (15 kB)\n",
"Requirement already satisfied: kaggle in /usr/local/lib/python3.9/dist-packages (from opendatasets) (1.5.13)\n",
"Requirement already satisfied: tqdm in /usr/local/lib/python3.9/dist-packages (from opendatasets) (4.65.0)\n",
"Requirement already satisfied: click in /usr/local/lib/python3.9/dist-packages (from opendatasets) (8.1.3)\n",
"Requirement already satisfied: requests in /usr/local/lib/python3.9/dist-packages (from kaggle->opendatasets) (2.25.1)\n",
"Requirement already satisfied: certifi in /usr/local/lib/python3.9/dist-packages (from kaggle->opendatasets) (2022.12.7)\n",
"Requirement already satisfied: six>=1.10 in /usr/local/lib/python3.9/dist-packages (from kaggle->opendatasets) (1.15.0)\n",
"Requirement already satisfied: python-dateutil in /usr/local/lib/python3.9/dist-packages (from kaggle->opendatasets) (2.8.2)\n",
"Requirement already satisfied: python-slugify in /usr/local/lib/python3.9/dist-packages (from kaggle->opendatasets) (8.0.1)\n",
"Requirement already satisfied: urllib3 in /usr/local/lib/python3.9/dist-packages (from kaggle->opendatasets) (1.26.14)\n",
"Requirement already satisfied: text-unidecode>=1.3 in /usr/local/lib/python3.9/dist-packages (from python-slugify->kaggle->opendatasets) (1.3)\n",
"Requirement already satisfied: chardet<5,>=3.0.2 in /usr/local/lib/python3.9/dist-packages (from requests->kaggle->opendatasets) (4.0.0)\n",
"Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.9/dist-packages (from requests->kaggle->opendatasets) (2.10)\n",
"Installing collected packages: opendatasets\n",
"Successfully installed opendatasets-0.1.22\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Requirement already satisfied: pandas in /usr/local/lib/python3.9/dist-packages (1.4.4)\n",
"Requirement already satisfied: numpy>=1.18.5 in /usr/local/lib/python3.9/dist-packages (from pandas) (1.22.4)\n",
"Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.9/dist-packages (from pandas) (2.8.2)\n",
"Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.9/dist-packages (from pandas) (2022.7.1)\n",
"Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.9/dist-packages (from python-dateutil>=2.8.1->pandas) (1.15.0)\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting transformers\n",
" Downloading transformers-4.26.1-py3-none-any.whl (6.3 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m6.3/6.3 MB\u001b[0m \u001b[31m33.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: filelock in /usr/local/lib/python3.9/dist-packages (from transformers) (3.9.0)\n",
"Collecting tokenizers!=0.11.3,<0.14,>=0.11.1\n",
" Downloading tokenizers-0.13.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.6/7.6 MB\u001b[0m \u001b[31m81.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.9/dist-packages (from transformers) (4.65.0)\n",
"Requirement already satisfied: requests in /usr/local/lib/python3.9/dist-packages (from transformers) (2.25.1)\n",
"Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.9/dist-packages (from transformers) (6.0)\n",
"Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.9/dist-packages (from transformers) (1.22.4)\n",
"Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.9/dist-packages (from transformers) (23.0)\n",
"Collecting huggingface-hub<1.0,>=0.11.0\n",
" Downloading huggingface_hub-0.13.2-py3-none-any.whl (199 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m199.2/199.2 KB\u001b[0m \u001b[31m21.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.9/dist-packages (from transformers) (2022.6.2)\n",
"Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.9/dist-packages (from huggingface-hub<1.0,>=0.11.0->transformers) (4.5.0)\n",
"Requirement already satisfied: chardet<5,>=3.0.2 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (4.0.0)\n",
"Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (2.10)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (1.26.14)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (2022.12.7)\n",
"Installing collected packages: tokenizers, huggingface-hub, transformers\n",
"Successfully installed huggingface-hub-0.13.2 tokenizers-0.13.2 transformers-4.26.1\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"#Importing libraries\n",
"import opendatasets as od\n",
"from tensorflow.keras.models import Model, Sequential\n",
"from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D,Input\n",
"from tensorflow.keras.callbacks import EarlyStopping\n",
"from tensorflow.python.ops.numpy_ops import np_utils\n",
"from transformers import BertModel, TFBertModel \n",
"import tensorflow as tf\n",
"from tensorflow.keras.optimizers import Adam\n",
"from transformers import BertTokenizer, BertForSequenceClassification, AdamW, TFBertModel\n",
"from sklearn.metrics import roc_auc_score\n",
"from torch.utils.data import DataLoader, RandomSampler, SequentialSampler\n",
"\n",
"from tensorflow.keras import regularizers\n",
"from sklearn.metrics import classification_report\n",
"from sklearn.metrics import confusion_matrix\n",
"\n",
"import pandas as pd\n",
"from matplotlib import rcParams\n",
"import seaborn as sns\n",
"import numpy as np\n",
"from PIL import Image\n",
"from sklearn.model_selection import train_test_split\n",
"from matplotlib import pyplot as plt\n",
"from transformers import AutoTokenizer"
],
"metadata": {
"id": "OaxYb0_T8Wn4"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"#loading the datasets\n",
"fake_news=pd.read_csv(\"/Fake.csv\")\n",
"true_news=pd.read_csv(\"/content/True.csv\")"
],
"metadata": {
"id": "bJU3ck0SIQqx"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"#Exploring the datasets\n",
"fake_news.head()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "fhucg9DT1VDX",
"outputId": "a9be23df-d443-4a0e-dd41-f5547f674810"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" title \\\n",
"0 Donald Trump Sends Out Embarrassing New Year’... \n",
"1 Drunk Bragging Trump Staffer Started Russian ... \n",
"2 Sheriff David Clarke Becomes An Internet Joke... \n",
"3 Trump Is So Obsessed He Even Has Obama’s Name... \n",
"4 Pope Francis Just Called Out Donald Trump Dur... \n",
"\n",
" text subject \\\n",
"0 Donald Trump just couldn t wish all Americans ... News \n",
"1 House Intelligence Committee Chairman Devin Nu... News \n",
"2 On Friday, it was revealed that former Milwauk... News \n",
"3 On Christmas day, Donald Trump announced that ... News \n",
"4 Pope Francis used his annual Christmas Day mes... News \n",
"\n",
" date \n",
"0 December 31, 2017 \n",
"1 December 31, 2017 \n",
"2 December 30, 2017 \n",
"3 December 29, 2017 \n",
"4 December 25, 2017 "
],
"text/html": [
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
title
\n",
"
text
\n",
"
subject
\n",
"
date
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Donald Trump Sends Out Embarrassing New Year’...
\n",
"
Donald Trump just couldn t wish all Americans ...
\n",
"
News
\n",
"
December 31, 2017
\n",
"
\n",
"
\n",
"
1
\n",
"
Drunk Bragging Trump Staffer Started Russian ...
\n",
"
House Intelligence Committee Chairman Devin Nu...
\n",
"
News
\n",
"
December 31, 2017
\n",
"
\n",
"
\n",
"
2
\n",
"
Sheriff David Clarke Becomes An Internet Joke...
\n",
"
On Friday, it was revealed that former Milwauk...
\n",
"
News
\n",
"
December 30, 2017
\n",
"
\n",
"
\n",
"
3
\n",
"
Trump Is So Obsessed He Even Has Obama’s Name...
\n",
"
On Christmas day, Donald Trump announced that ...
\n",
"
News
\n",
"
December 29, 2017
\n",
"
\n",
"
\n",
"
4
\n",
"
Pope Francis Just Called Out Donald Trump Dur...
\n",
"
Pope Francis used his annual Christmas Day mes...
\n",
"
News
\n",
"
December 25, 2017
\n",
"
\n",
" \n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
]
},
"metadata": {},
"execution_count": 51
}
]
},
{
"cell_type": "code",
"source": [
"#Exploring the datasets\n",
"true_news.head()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "KcSJxAah1gv2",
"outputId": "0d13a388-7693-437e-933e-c153043b3037"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" title \\\n",
"0 As U.S. budget fight looms, Republicans flip t... \n",
"1 U.S. military to accept transgender recruits o... \n",
"2 Senior U.S. Republican senator: 'Let Mr. Muell... \n",
"3 FBI Russia probe helped by Australian diplomat... \n",
"4 Trump wants Postal Service to charge 'much mor... \n",
"\n",
" text subject \\\n",
"0 WASHINGTON (Reuters) - The head of a conservat... politicsNews \n",
"1 WASHINGTON (Reuters) - Transgender people will... politicsNews \n",
"2 WASHINGTON (Reuters) - The special counsel inv... politicsNews \n",
"3 WASHINGTON (Reuters) - Trump campaign adviser ... politicsNews \n",
"4 SEATTLE/WASHINGTON (Reuters) - President Donal... politicsNews \n",
"\n",
" date \n",
"0 December 31, 2017 \n",
"1 December 29, 2017 \n",
"2 December 31, 2017 \n",
"3 December 30, 2017 \n",
"4 December 29, 2017 "
],
"text/html": [
"\n",
"
"
],
"image/png": "\n"
},
"metadata": {}
}
]
},
{
"cell_type": "code",
"source": [
"# Assigning label to the datasets\n",
"fake_news[\"label\"]=\"fake\"\n",
"true_news[\"label\"]=\"true\"\n",
"\n",
"# Merging the real and true news datasets to create the final one\n",
"final_news_dataset= pd.concat([fake_news,true_news])\n",
"\n",
"#Shuffling\n",
"final_news_dataset = final_news_dataset.sample(frac=1).reset_index(drop=True)\n",
"\n",
"# Exploring the final dataset\n",
"final_news_dataset.head(10)\n",
"\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 363
},
"id": "noYYMxvF2Ag8",
"outputId": "ce5142c8-631f-4420-e201-63d769ec11d6"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" title \\\n",
"0 U.S. responds in court fight over illegal Indo... \n",
"1 Numbskull Republican Ignores History, Says Re... \n",
"2 US-UK DIRTY WAR: ‘Latin American-style’ Death ... \n",
"3 SUPREME COURT JUSTICE Goes All Creepy Predicti... \n",
"4 Hillary Clinton: ‘Israel First’ (and no peace ... \n",
"5 Boiler Room EP #119 – Zombie Disneyland & The ... \n",
"6 EU Parliament calls on Myanmar to free Reuters... \n",
"7 Donald Trump Releases Statement On Cruz Sex S... \n",
"8 McConnell Just ADMITTED The NRA Must Approve ... \n",
"9 Putin says question of who hacked Democratic p... \n",
"\n",
" text subject \\\n",
"0 BOSTON (Reuters) - U.S. immigration officials ... politicsNews \n",
"1 Republican Rep. Ted Poe (R-Texas) spoke with F... News \n",
"2 Patrick Henningsen 21st Century WireThis week... Middle-east \n",
"3 What the heck is wrong with these loony libera... politics \n",
"4 Robert Fantina CounterpunchAlthough the United... US_News \n",
"5 Tune in to the Alternate Current Radio Network... US_News \n",
"6 BRUSSELS (Reuters) - The president of the Euro... worldnews \n",
"7 With the sex scandal allegations piling up aga... News \n",
"8 We could already make the assumption that Sena... News \n",
"9 MOSCOW (Reuters) - Russian President Vladimir ... politicsNews \n",
"\n",
" date label \n",
"0 December 21, 2017 true \n",
"1 January 20, 2016 fake \n",
"2 July 14, 2016 fake \n",
"3 Jul 10, 2016 fake \n",
"4 January 18, 2016 fake \n",
"5 July 29, 2017 fake \n",
"6 December 14, 2017 true \n",
"7 March 25, 2016 fake \n",
"8 July 6, 2016 fake \n",
"9 December 23, 2016 true "
],
"text/html": [
"\n",
"