DataScienceProject/CNN · Hugging Face

Real art vs AI-Generated art image classification

This project provides a Convolutional Neural Network (CNN) model for classifying images as either 'real art' or 'fake art'. CNN is a type of deep learning model specifically designed to process and analyze visual data by applying convolutional layers that automatically detect patterns and features in images. Our goal is to accurately classify the source of the image with at least 85% accuracy and achieve at least 80% in the recall test.

Installation instructions

The following libraries or packages are required: numpy, pandas, tensorflow, keras, matplotlib, sklearn, cv2.
We prepare the data for the model by sorted the images into 2 types of folders which are divided equally(real art- labeled as 0, fake art- labeled as 1). Our CNN model is based on 2,800 images that have been resized and normalized, the files formats is PNG‬, JPG‬. The images are divided into a training set that contains 90% from data and a testing set that contains the remaining 10%.

CNN model architecture

Convolutional Layers: for feature extraction from images, applying 32 or 64 filters with a size of 3x3, the activation function used id ReLU . MaxPooling Layers: for reducing the spatial dimensions to a size of 2x2. Flatten: converts the multi-dimensional output of previous layers into a one-dimensional vector for input into fully connected layers. Dropout Layer: to prevent overfitting with a thinning rate of 0.5 after the first Dense layer. Dense Layer: last layer of dense for classification with a sigmoid activation function.

Training Details

The model is trained using binary cross-entropy loss and the Adam optimizer. It is validated with 20% of the training data reserved for validation. The model employs 4-fold cross-validation to ensure robust performance. The following callbacks are used during training: EarlyStopping: Stops training if the validation accuracy ceases to improve for a specified patience period. ModelCheckpoint: Saves the best weights during training based on validation accuracy. The best-performing model from each fold is saved, and the model with the best weights overall is selected for final testing.

Performance Evaluation

After training, the model is evaluated on the test set. The following metrics are used to measure performance: Accuracy: The percentage of correct classifications. Precision, Recall, F1-Score: For evaluating the model’s classification ability on both real and AI-generated images. Confusion Matrix: Displays true positives, false positives, true negatives, and false negatives. Instructions

To run the project

Place the images in the respective training and testing folders. Preprocess the images by resizing and normalizing them. Train the model using the provided code. Evaluate the model on the test set.

Visualization results

Confusion Matrix: To visualize the classification performance. Training and Validation Metrics: Plots for accuracy and loss over the epochs.

Results

Test accuracy = 0.77

Test loss = 0.49

Precision = 0.77

Recall = 0.77

F1 = 0.77

Confusion Matrix:

DataScienceProject
/

CNN

Dataset used to train DataScienceProject/CNN