dariog commited on
Commit
34b9ee4
1 Parent(s): b5d4397

First commit

Browse files
README.md CHANGED
@@ -1,14 +1,317 @@
1
  ---
2
- title: SuSyGame
3
- emoji: 🏃
4
- colorFrom: pink
5
- colorTo: red
6
  sdk: gradio
7
- sdk_version: 5.3.0
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
- short_description: A game of synthetic image detection against SuSy
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: SuSy
3
+ emoji: 🔎
4
+ colorFrom: gray
5
+ colorTo: pink
6
  sdk: gradio
7
+ sdk_version: 4.44.1
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
+ short_description: Spot AI-Generated images with SuSy!
12
  ---
13
 
14
+ # SuSy - Synthetic Image Detector
15
+
16
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/62f7a16192950415b637e201/NobqlpFbFkTyBi1LsT9JE.png" alt="image" width="300" height="auto">
17
+
18
+
19
+ - **Paper:** https://arxiv.org/abs/2409.14128
20
+ - **Model:** https://huggingface.co/HPAI-BSC/SuSy
21
+ - **Code:** https://github.com/HPAI-BSC/SuSy
22
+ - **Dataset:** https://huggingface.co/datasets/HPAI-BSC/SuSy-Dataset
23
+
24
+ **Model Results**
25
+
26
+ | Dataset | Type | Model | Year | Recall |
27
+ |:-------------------:|:---------:|:-------------------------:|:----:|:------:|
28
+ | Flickr30k | Authentic | - | 2014 | 90.53 |
29
+ | Google Landmarks v2 | Authentic | - | 2020 | 64.54 |
30
+ | Synthbuster | Synthetic | Glide | 2021 | 53.50 |
31
+ | Synthbuster | Synthetic | Stable Diffusion 1.3 | 2022 | 87.00 |
32
+ | Synthbuster | Synthetic | Stable Diffusion 1.4 | 2022 | 87.10 |
33
+ | Synthbuster | Synthetic | Stable Diffusion 2 | 2022 | 68.40 |
34
+ | Synthbuster | Synthetic | DALL-E 2 | 2022 | 20.70 |
35
+ | Synthbuster | Synthetic | MidJourney V5 | 2023 | 73.10 |
36
+ | Synthbuster | Synthetic | Stable Diffusion XL | 2023 | 79.50 |
37
+ | Synthbuster | Synthetic | Firefly | 2023 | 40.90 |
38
+ | Synthbuster | Synthetic | DALL-E 3 | 2023 | 88.60 |
39
+ | Authors | Synthetic | Stable Diffusion 3 Medium | 2024 | 93.23 |
40
+ | Authors | Synthetic | Flux.1-dev | 2024 | 96.46 |
41
+ | In-the-wild | Synthetic | Mixed/Unknown | 2024 | 89.90 |
42
+ | In-the-wild | Authentic | - | 2024 | 33.06 |
43
+
44
+ ## Model Details
45
+
46
+ <!-- Provide a longer summary of what this model is. -->
47
+
48
+ SuSy is a Spatial-Based Synthetic Image Detection and Recognition Model, designed and trained to detect synthetic images and attribute them to a generative model (i.e., two StableDiffusion models, two Midjourney versions and DALL·E 3). The model takes image patches of size 224x224 as input, and outputs the probability of the image being authentic or having been created by each of the aforementioned generative models.
49
+
50
+ <img src="model_architecture.png" alt="image" width="900" height="auto">
51
+
52
+ The model is based on a CNN architecture and is trained using a supervised learning approach. It's design is based on [previous work](https://upcommons.upc.edu/handle/2117/395959), originally intended for video superresolution detection, adapted here for the tasks of synthetic image detection and recognition. The architecture consists of two modules: a feature extractor and a multi-layer perceptron (MLP), as it's quite light weight. SuSy has a total of 12.7M parameters, with the feature extractor accounting for 12.5M parameters and the MLP accounting for the remaining 197K.
53
+
54
+ The CNN feature extractor consists of five stages following a ResNet-18 scheme. The output of each of the blocks is used as input for various bottleneck modules that are arranged in a staircase pattern. The bottleneck modules consist of three 2D convolutional layers. Each level of bottlenecks takes input at a later stage than the previous level, and each bottleneck module takes input from the current stage and, except the first bottleneck of each level, from the previous bottleneck module.
55
+
56
+ The outputs of each level of bottlenecks and stage 4 are passed to a 2D adaptative average pooling layer and then concatenated to form the feature map feeding the MLP. The MLP consists of three fully connected layers with 512, 256 and 256 units, respectively. Between each layer, a dropout layer (rate of 0.5) prevents overfitting. The output of the MLP has 6 units, corresponding to the number of classes in the dataset (5 synthetic models and 1 real image class).
57
+
58
+ The model can be used as a detector by either taking the class with the highest probability as the output or summing the probabilities of the synthetic classes and comparing them to the real class. The model can also be used as an recognition model by taking the class with the highest probability as the output.
59
+
60
+ ### Model Description
61
+
62
+ - **Developed by:** [Pablo Bernabeu Perez](https://huggingface.co/pabberpe), [Enrique Lopez Cuena](https://huggingface.co/Cuena) and [Dario Garcia Gasulla](https://huggingface.co/dariog) from [HPAI](https://hpai.bsc.es/)
63
+ - **Model type:** Spatial-Based Synthetic Image Detection and Recognition Convolutional Neural Network
64
+ - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
65
+
66
+ ## Uses
67
+
68
+ This model can be used to detect synthetic images in a scalable manner, thanks to its small size. Since it operates on patches of 224x224, a moving window should be implemented in inference when applied on larger inputs (the most likely scenario, and the one it was trained under). This also enables the capacity for synthetic content localization within a high resolution input.
69
+
70
+ Any individual or organization seeking for support on the identification of synthetic content can use this model. However, it should not be used as the only source of evidence, particularly when applied to inputs produced by generative models not included in its training (see details in Training Data below).
71
+
72
+ ### Intended Uses
73
+
74
+ Intended uses include the following:
75
+
76
+ * Detection of authentic and synthetic images
77
+ * Attribution of synthetic images to their generative model (if included in the training data)
78
+ * Localization of image patches likely to be synthetic or tampered.
79
+
80
+ ### Out-of-Scope Uses
81
+
82
+ Out-of-scope uses include the following:
83
+
84
+ * Detection of manually edited images using traditional tools.
85
+ * Detection of images automatically downscaled and/or upscaled. These are considered as non-synthetic samples in the model training phase.
86
+ * Detection of inpainted images.
87
+ * Detection of synthetic vs manually crafted illustrations. The model is trained mainly on photorealistic samples.
88
+ * Attribution of synthetic images to their generative model if the model was not included in the training data. AThis model may not be used to train generative models or tools aimed at lthough some generalization capabilities are expected, reliability in this case cannot be estimated.
89
+
90
+ ### Forbidden Uses
91
+
92
+ This model may not be used to train generative models or tools aimed at purposefully deceiving the model or creating misleading content.
93
+
94
+ ## Bias, Risks, and Limitations
95
+
96
+ The model may be biased in the following ways:
97
+
98
+ * The model may be biased towards the training data, which may not be representative of all authentic and synthetic images. Particularly for the class of real world images, which were obtained from a single source.
99
+ * The model may be biased towards the generative models included in the training data, which may not be representative of all possible generative models. Particularly new ones, since all models included were released between 2022 and 2023.
100
+ * The model may be biased towards certain type of images or contents. While it is trained using roughly 18K synthetic images, no assessment was made on which domains and profiles are included in those.
101
+
102
+ The model has the following technical limitations:
103
+
104
+ * The performance of the model may be influenced by transformations and editions performed on the images. While the model was trained on some alterations (blur, brightness, compression and gamma) there are other alterations applicable to images that could reduce the model accuracy.
105
+ * The performance of the model might vary depending on the type and source of images
106
+ * The model will not be able to attribute synthetic images to their generative model if the model was not included in the training data.
107
+ * The model is trained on patches with high gray-level contrast. For images composed entirely by low contrast regions, the model may not work as expected.
108
+
109
+ ### Recommendations
110
+
111
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
112
+
113
+ ## How to Get Started with the Model
114
+
115
+ Use the code below to get started with the model.
116
+
117
+ ```python
118
+ import torch
119
+ from PIL import Image
120
+ from torchvision import transforms
121
+
122
+ # Load the model
123
+ model = torch.jit.load("SuSy.pt")
124
+
125
+ # Load patch
126
+ patch = Image.open("midjourney-images-example-patch0.png")
127
+
128
+ # Transform patch to tensor
129
+ patch = transforms.PILToTensor()(patch).unsqueeze(0) / 255.
130
+
131
+ # Predict patch
132
+ model.eval()
133
+ with torch.no_grad():
134
+ preds = model(patch)
135
+
136
+ print(preds)
137
+ ```
138
+
139
+ See `test_image.py` and `test_patch.py` for other examples on how to use the model.
140
+
141
+ ## Training Details
142
+
143
+ ### Training Data
144
+
145
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
146
+
147
+ The dataset is available at: https://huggingface.co/datasets/HPAI-BSC/SuSy-Dataset
148
+
149
+ | Dataset | Year | Train | Validation | Test | Total |
150
+ |:-----------------:|:----:|:-----:|:----------:|:-----:|:-----:|
151
+ | COCO | 2017 | 2,967 | 1,234 | 1,234 | 5,435 |
152
+ | dalle-3-images | 2023 | 987 | 330 | 330 | 1,647 |
153
+ | diffusiondb | 2022 | 2,967 | 1,234 | 1,234 | 5,435 |
154
+ | realisticSDXL | 2023 | 2,967 | 1,234 | 1,234 | 5,435 |
155
+ | midjourney-tti | 2022 | 2,718 | 906 | 906 | 4,530 |
156
+ | midjourney-images | 2023 | 1,845 | 617 | 617 | 3,079 |
157
+
158
+ #### Authentic Images
159
+
160
+ - [COCO](https://cocodataset.org/)
161
+
162
+ We use a random subset of the COCO dataset, containing 5,435 images, for the authentic images in our training dataset. The partitions are made respecting the original COCO splits, with 2,967 images in the training partition and 1,234 in the validation and test partitions.
163
+
164
+ #### Synthetic Images
165
+
166
+ - [dalle-3-images](https://huggingface.co/datasets/ehristoforu/dalle-3-images)
167
+ - [diffusiondb](https://poloclub.github.io/diffusiondb/)
168
+ - [midjourney-images](https://huggingface.co/datasets/ehristoforu/midjourney-images)
169
+ - [midjourney-texttoimage](https://www.kaggle.com/datasets/succinctlyai/midjourney-texttoimage)
170
+ - [realistic-SDXL](https://huggingface.co/datasets/DucHaiten/DucHaiten-realistic-SDXL)
171
+
172
+ For the diffusiondb dataset, we use a random subset of 5,435 images, with 2,967 in the training partition and 1,234 in the validation and test partitions. We use only the realistic images from the realisticSDXL dataset, with images in the realistic-2.2 split in our training data and the realistic-1 split for our test partition. The remaining datasets are used in their entirety, with 60% of the images in the training partition, 20% in the validation partition and 20% in the test partition.
173
+
174
+ ### Training Procedure
175
+
176
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
177
+
178
+ The training code is available at: https://github.com/HPAI-BSC/SuSy
179
+
180
+ #### Preprocessing
181
+
182
+ **Patch Extraction**
183
+
184
+ To prepare the training data, we extract 240x240 patches from the images, minimizing the overlap between them. We then select the most informative patches by calculating the gray-level co-occurrence matrix (GLCM) for each patch. Given the GLCM, we calculate the contrast and select the five patches with the highest contrast. These patches are then passed to the model in their original RGB format and cropped to 224x224.
185
+
186
+ **Data Augmentation**
187
+
188
+ | Technique | Probability | Other Parameters |
189
+ |:------------------------:|:-----------:|:-----------------------------------------:|
190
+ | HorizontalFlip | 0.50 | - |
191
+ | RandomBrightnessContrast | 0.20 | brightness\_limit=0.2 contrast\_limit=0.2 |
192
+ | RandomGamma | 0.20 | gamma\_limit=(80, 120) |
193
+ | AdvancedBlur | 0.20 | |
194
+ | GaussianBlur | 0.20 | |
195
+ | JPEGCompression | 0.20 | quality\_lower=75 quality\_upper=100 |
196
+
197
+
198
+ #### Training Hyperparameters
199
+
200
+ - Loss Function: Cross-Entropy Loss
201
+ - Optimizer: Adam
202
+ - Learning Rate: 0.0001
203
+ - Weight Decay: 0
204
+ - Scheduler: ReduceLROnPlateau
205
+ - Factor: 0.1
206
+ - Patience: 4
207
+ - Batch Size: 128
208
+ - Epochs: 10
209
+ - Early Stopping: 2
210
+
211
+ ## Evaluation
212
+
213
+ <!-- This section describes the evaluation protocols and provides the results. -->
214
+
215
+ The evaluation code is available at: https://github.com/HPAI-BSC/SuSy
216
+
217
+ ### Testing Data, Factors & Metrics
218
+
219
+ #### Testing Data
220
+
221
+ <!-- This should link to a Dataset Card if possible. -->
222
+
223
+ - Test Split of our Training Dataset
224
+ - Synthetic Images generated with [Stable Diffusion 3 Medium](https://huggingface.co/stabilityai/stable-diffusion-3-medium) and [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) using prompts from [Gustavosta/Stable-Diffusion-Prompts](https://huggingface.co/datasets/Gustavosta/Stable-Diffusion-Prompts)
225
+ - Synthetic Images in the Wild: Dataset containing 210 Authentic and Synthetic Images obtained from Social Media Platforms
226
+ - [Flickr 30k Dataset](https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset)
227
+ - [Google Landmarks v2](https://github.com/cvdfoundation/google-landmark)
228
+ - [Synthbuster](https://zenodo.org/records/10066460)
229
+
230
+ #### Metrics
231
+
232
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
233
+
234
+ - Recall: The proportion of correctly classified positive instances out of all actual positive instances in a dataset.
235
+
236
+ ### Results
237
+
238
+ <!-- This section provides the results of the evaluation. -->
239
+
240
+ #### Authentic Sources
241
+
242
+ | Dataset | Year | Recall |
243
+ |:-------------------:|:----:|:------:|
244
+ | Flickr30k | 2014 | 90.53 |
245
+ | Google Landmarks v2 | 2020 | 64.54 |
246
+ | In-the-wild | 2024 | 33.06 |
247
+
248
+ #### Synthetic Sources
249
+
250
+ | Dataset | Model | Year | Recall |
251
+ |:-----------:|:-------------------------:|:----:|:------:|
252
+ | Synthbuster | Glide | 2021 | 53.50 |
253
+ | Synthbuster | Stable Diffusion 1.3 | 2022 | 87.00 |
254
+ | Synthbuster | Stable Diffusion 1.4 | 2022 | 87.10 |
255
+ | Synthbuster | Stable Diffusion 2 | 2022 | 68.40 |
256
+ | Synthbuster | DALL-E 2 | 2022 | 20.70 |
257
+ | Synthbuster | MidJourney V5 | 2023 | 73.10 |
258
+ | Synthbuster | Stable Diffusion XL | 2023 | 79.50 |
259
+ | Synthbuster | Firefly | 2023 | 40.90 |
260
+ | Synthbuster | DALL-E 3 | 2023 | 88.60 |
261
+ | Authors | Stable Diffusion 3 Medium | 2024 | 93.23 |
262
+ | Authors | Flux.1-dev | 2024 | 96.46 |
263
+ | In-the-wild | Mixed/Unknown | 2024 | 89.90 |
264
+
265
+ ### Summary
266
+
267
+ The results for authentic image datasets reveal varying detection performance across different sources. Recall rates range from 33.06% for the In-the-wild dataset to 90.53% for the Flickr30k dataset. The Google Landmarks v2 dataset shows an intermediate recall rate of 64.54%. These results indicate a significant disparity in the detectability of authentic images across different datasets, with the In-the-wild dataset presenting the most challenging case for SuSy.
268
+
269
+ The results for synthetic image datasets show varying detection performance across different image generation models. Recall rates range from 20.70% for DALL-E 2 (2022) to 96.46% for Flux.1-dev (2024). Stable Diffusion models generally exhibited high detectability, with versions 1.3 and 1.4 (2022) showing recall rates above 87%. More recent models tested by the authors, such as Stable Diffusion 3 Medium (2024) and Flux.1-dev (2024), demonstrate even higher detectability with recall rates above 93%. The in-the-wild mixed/unknown synthetic dataset from 2024 showed a high recall of 89.90%, indicating effective detection across various unknown generation methods. These results suggest an overall trend of improving detection capabilities for synthetic images, with newer generation models generally being more easily detectable.
270
+
271
+ It must be noted that these metrics were computed using the center-patch of images, instead of using the patch voting mechanisms described previously. This strategy allows a more fair comparison with other state-of-the-art methods although it hinders the performance of SuSy.
272
+
273
+ ## Environmental Impact
274
+
275
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
276
+
277
+ - **Hardware Type:** H100
278
+ - **Hours used:** 16
279
+ - **Hardware Provider:** Barcelona Supercomputing Center (BSC)
280
+ - **Compute Region:** Spain
281
+ - **Carbon Emitted:** 0.63kg
282
+
283
+ ## Citation
284
+
285
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
286
+
287
+ **BibTeX:**
288
+
289
+ ```bibtex
290
+ @misc{bernabeu2024susy,
291
+ title={Present and Future Generalization of Synthetic Image Detectors},
292
+ author={Pablo Bernabeu-Perez and Enrique Lopez-Cuena and Dario Garcia-Gasulla},
293
+ year={2024},
294
+ eprint={2409.14128},
295
+ archivePrefix={arXiv},
296
+ primaryClass={cs.CV},
297
+ url={https://arxiv.org/abs/2409.14128},
298
+ }
299
+ ```
300
+
301
+ ```bibtex
302
+ @thesis{bernabeu2024aidetection,
303
+ title={Detecting and Attributing AI-Generated Images with Machine Learning},
304
+ author={Bernabeu Perez, Pablo},
305
+ school={UPC, Facultat d'Informàtica de Barcelona, Departament de Ciències de la Computació},
306
+ year={2024},
307
+ month={06}
308
+ }
309
+ ```
310
+
311
+ ## Model Card Authors
312
+
313
+ [Pablo Bernabeu Perez](https://huggingface.co/pabberpe) and [Dario Garcia Gasulla](https://huggingface.co/dariog)
314
+
315
+ ## Model Card Contact
316
+
317
+ For further inquiries, please contact [HPAI](mailto:hpai@bsc.es)
app.py ADDED
@@ -0,0 +1,270 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import numpy as np
3
+ import torch
4
+ import random
5
+ from PIL import Image
6
+ from skimage.feature import graycomatrix, graycoprops
7
+ from torchvision import transforms
8
+ import os
9
+
10
+ NUM_ROUNDS = 5 # Adjust the number of game rounds here
11
+ PROB_THRESHOLD = 0.5 # Adjust the probability threshold for model prediction here
12
+
13
+ # Load the model
14
+ model = torch.jit.load("SuSy.pt")
15
+
16
+ def process_image(image):
17
+ # Set Parameters
18
+ top_k_patches = 5
19
+ patch_size = 224
20
+
21
+ # Get the image dimensions
22
+ width, height = image.size
23
+
24
+ # Calculate the number of patches
25
+ num_patches_x = width // patch_size
26
+ num_patches_y = height // patch_size
27
+
28
+ # Divide the image in patches
29
+ patches = np.zeros((num_patches_x * num_patches_y, patch_size, patch_size, 3), dtype=np.uint8)
30
+ for i in range(num_patches_x):
31
+ for j in range(num_patches_y):
32
+ x = i * patch_size
33
+ y = j * patch_size
34
+ patch = image.crop((x, y, x + patch_size, y + patch_size))
35
+ patches[i * num_patches_y + j] = np.array(patch)
36
+
37
+ # Compute the most relevant patches (optional)
38
+ dissimilarity_scores = []
39
+ for patch in patches:
40
+ transform_patch = transforms.Compose([transforms.PILToTensor(), transforms.Grayscale()])
41
+ grayscale_patch = transform_patch(Image.fromarray(patch)).squeeze(0)
42
+ glcm = graycomatrix(grayscale_patch, [5], [0], 256, symmetric=True, normed=True)
43
+ dissimilarity_scores.append(graycoprops(glcm, "contrast")[0, 0])
44
+
45
+ # Sort patch indices by their dissimilarity score
46
+ sorted_indices = np.argsort(dissimilarity_scores)[::-1]
47
+
48
+ # Extract top k patches and convert them to tensor
49
+ top_patches = patches[sorted_indices[:top_k_patches]]
50
+ top_patches = torch.from_numpy(np.transpose(top_patches, (0, 3, 1, 2))) / 255.0
51
+
52
+ # Predict patches
53
+ model.eval()
54
+ with torch.no_grad():
55
+ preds = model(top_patches)
56
+
57
+ # Process results
58
+ classes = ['Authentic', 'DALL·E 3', 'Stable Diffusion 1.x', 'MJ V5/V6', 'MJ V1/V2', 'Stable Diffusion XL']
59
+ mean_probs = preds.mean(dim=0).numpy()
60
+
61
+ # Create a dictionary of class probabilities
62
+ class_probs = {cls: prob for cls, prob in zip(classes, mean_probs)}
63
+
64
+ # Sort probabilities in descending order
65
+ sorted_probs = dict(sorted(class_probs.items(), key=lambda item: item[1], reverse=True))
66
+
67
+ return sorted_probs
68
+
69
+
70
+ class GameState:
71
+ def __init__(self):
72
+ self.user_score = 0
73
+ self.model_score = 0
74
+ self.current_round = 0
75
+ self.total_rounds = 2
76
+ self.game_images = []
77
+ self.is_game_active = False
78
+ self.last_results = None
79
+ self.waiting_for_input = True
80
+
81
+ def reset(self):
82
+ self.__init__()
83
+
84
+ game_state = GameState()
85
+
86
+ def load_images():
87
+ real_image_folder = "real_images"
88
+ fake_image_folder = "fake_images"
89
+ real_images = [os.path.join(real_image_folder, img) for img in os.listdir(real_image_folder)]
90
+ fake_images = [os.path.join(fake_image_folder, img) for img in os.listdir(fake_image_folder)]
91
+ selected_images = random.sample(real_images, 1) + random.sample(fake_images, 1)
92
+ random.shuffle(selected_images)
93
+ return selected_images
94
+
95
+ def create_score_html():
96
+ results_html = ""
97
+ if game_state.last_results:
98
+ results_html = f"""
99
+ <div style='margin-top: 1rem; padding: 1rem; background-color: #e0e0e0; border-radius: 8px; color: #333;'>
100
+ <h4 style='color: #333; margin-bottom: 0.5rem;'>Last Round Results:</h4>
101
+ <p style='color: #333;'>Your guess: {game_state.last_results['user_guess']}</p>
102
+ <p style='color: #333;'>Model's guess: {game_state.last_results['model_guess']}</p>
103
+ <p style='color: #333;'>Correct answer: {game_state.last_results['correct_answer']}</p>
104
+ </div>
105
+ """
106
+
107
+ current_display_round = min(game_state.current_round + 1, game_state.total_rounds)
108
+
109
+ return f"""
110
+ <div style='padding: 1rem; background-color: #f0f0f0; border-radius: 8px; color: #333;'>
111
+ <h3 style='margin-bottom: 1rem; color: #333;'>Score Board</h3>
112
+ <div style='display: flex; justify-content: space-around;'>
113
+ <div>
114
+ <h4 style='color: #333;'>You</h4>
115
+ <p style='font-size: 1.5rem; color: #333;'>{game_state.user_score}</p>
116
+ </div>
117
+ <div>
118
+ <h4 style='color: #333;'>AI Model</h4>
119
+ <p style='font-size: 1.5rem; color: #333;'>{game_state.model_score}</p>
120
+ </div>
121
+ </div>
122
+ <div style='margin-top: 1rem;'>
123
+ <p style='color: #333;'>Round: {current_display_round}/{game_state.total_rounds}</p>
124
+ </div>
125
+ {results_html}
126
+ </div>
127
+ """
128
+
129
+ def start_game():
130
+ game_state.reset()
131
+ game_state.game_images = load_images()
132
+ game_state.is_game_active = True
133
+ game_state.waiting_for_input = True
134
+ current_image = Image.open(game_state.game_images[0])
135
+
136
+ return (
137
+ gr.update(value=current_image, visible=True), # Show image
138
+ gr.update(visible=False), # Hide start button
139
+ gr.update(interactive=True, visible=True, value=None), # Show radio buttons
140
+ gr.update(visible=True, interactive=True), # Show submit button
141
+ create_score_html(),
142
+ gr.update(visible=False) # Hide feedback
143
+ )
144
+
145
+ def submit_guess(user_guess):
146
+ if not game_state.is_game_active or not game_state.waiting_for_input or user_guess is None:
147
+ return [gr.update()] * 6 # Return no updates if invalid state
148
+
149
+ current_image = Image.open(game_state.game_images[game_state.current_round])
150
+ model_prediction = process_image(current_image)
151
+ correct_answer = "Real" if "real_images" in game_state.game_images[game_state.current_round] else "Fake"
152
+
153
+ # Determine model's guess based on probabilities
154
+ model_guess = "Real" if model_prediction['Authentic'] > 0.5 else "Fake"
155
+
156
+ # Update scores
157
+ if user_guess == correct_answer:
158
+ game_state.user_score += 1
159
+ if model_guess == correct_answer:
160
+ game_state.model_score += 1
161
+
162
+ # Store last results for display
163
+ game_state.last_results = {
164
+ 'user_guess': user_guess,
165
+ 'model_guess': model_guess,
166
+ 'correct_answer': correct_answer
167
+ }
168
+
169
+ game_state.current_round += 1
170
+ game_state.waiting_for_input = True
171
+
172
+ # Check if game is over
173
+ if game_state.current_round >= game_state.total_rounds:
174
+ game_state.is_game_active = False
175
+ return (
176
+ gr.update(value=None, visible=False), # Hide image
177
+ gr.update(visible=True), # Show start button
178
+ gr.update(interactive=False, visible=False, value=None), # Hide radio
179
+ gr.update(visible=False), # Hide submit button
180
+ create_score_html(),
181
+ gr.update(visible=True, value="<div style='text-align: center; margin-top: 20px; font-size: 1.2em;'>Game Over! Click 'Start New Game' to play again.</div>")
182
+ )
183
+
184
+ # Continue to next round
185
+ next_image = Image.open(game_state.game_images[game_state.current_round])
186
+ return (
187
+ gr.update(value=next_image, visible=True), # Show next image
188
+ gr.update(visible=False), # Keep start button hidden
189
+ gr.update(interactive=True, visible=True, value=None), # Reset radio
190
+ gr.update(visible=True, interactive=True), # Show submit button
191
+ create_score_html(),
192
+ gr.update(visible=False) # Keep feedback hidden
193
+ )
194
+
195
+ # Custom CSS
196
+ custom_css = """
197
+ #game-container {
198
+ max-width: 1200px;
199
+ margin: 0 auto;
200
+ padding: 20px;
201
+ }
202
+ #start-button {
203
+ max-width: 200px;
204
+ margin: 0 auto;
205
+ }
206
+ """
207
+
208
+ # Define Gradio interface
209
+ with gr.Blocks(css=custom_css) as iface:
210
+ with gr.Column(elem_id="game-container"):
211
+ gr.Markdown("# Real or Fake Image Challenge")
212
+ gr.Markdown("Can you beat the AI at detecting synthetic images?")
213
+
214
+ with gr.Row():
215
+ with gr.Column(scale=2):
216
+ image_display = gr.Image(
217
+ type="pil",
218
+ label="Current Image",
219
+ interactive=False,
220
+ visible=False
221
+ )
222
+ guess_input = gr.Radio(
223
+ choices=["Real", "Fake"],
224
+ label="Your Guess",
225
+ interactive=False,
226
+ visible=False
227
+ )
228
+ submit_button = gr.Button(
229
+ "Submit Guess",
230
+ visible=False,
231
+ variant="primary"
232
+ )
233
+
234
+ with gr.Column(scale=1):
235
+ score_display = gr.HTML()
236
+
237
+ with gr.Row():
238
+ with gr.Column(elem_id="start-button"):
239
+ start_button = gr.Button("Start New Game", variant="primary", size="sm")
240
+
241
+ feedback_display = gr.Markdown(visible=False)
242
+
243
+ # Event handlers
244
+ start_button.click(
245
+ fn=start_game,
246
+ outputs=[
247
+ image_display,
248
+ start_button,
249
+ guess_input,
250
+ submit_button,
251
+ score_display,
252
+ feedback_display
253
+ ]
254
+ )
255
+
256
+ submit_button.click(
257
+ fn=submit_guess,
258
+ inputs=[guess_input],
259
+ outputs=[
260
+ image_display,
261
+ start_button,
262
+ guess_input,
263
+ submit_button,
264
+ score_display,
265
+ feedback_display
266
+ ]
267
+ )
268
+
269
+ # Launch the interface
270
+ iface.launch()
config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_architecture": "ResNet18",
3
+ "num_classes": 6,
4
+ "input_size": [224, 224],
5
+ "pretrained": true,
6
+ "learning_rate": 0.0001,
7
+ "batch_size": 256
8
+ }
fake_images/example_mjv5.jpg ADDED
fake_images/example_sdxl.jpg ADDED
real_images/example_authentic.jpg ADDED
real_images/example_dalle3.jpg ADDED
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ torch
2
+ torchvision
3
+ pillow
4
+ scikit-image
5
+ gradio