File size: 2,728 Bytes
485f76b
 
d8acda9
 
 
 
1732876
50a0ff5
1732876
6120e5b
 
d8acda9
6120e5b
d8acda9
6120e5b
50a0ff5
 
d8acda9
 
 
 
e919aa3
 
1732876
d8acda9
 
 
 
 
1732876
d8acda9
 
 
 
485f76b
 
 
 
8f69832
 
 
1732876
8f69832
 
26ef429
485f76b
d8acda9
485f76b
 
 
 
95698cf
d8acda9
95698cf
 
 
1732876
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#+TITLE: Spoof Detect

Detect spoofed website by detecting logos from bank and financial entities in
pages with =ssl certificates= that do not match.

The process is pretty simple:
 - [1/2] scrape gvt websites to get a list of entities.
   - [x] 🇦🇷  BCRA
   - [ ] other countries 
 - [x] get logos, names and url
 - [x] navigate the url, extract the ssl certificate and look for =img= and tags
   with =id= or =class= logo (needs more heuristics) to make a db of logos
 - [x] screenshot the page and slice it into tiles generating YOLO annotations for
   the detected logos
 - [x] augment data using the logos database and the logoless tiles as background images
 - [x] train yolov5s
 - [x] feed everything to a web extension that will detect the logos in any page and show a warning if the =SSL certificate= mismatches the collected one. (this is actually a bit hacky, as yolov6 has A LOT of postprocessing and i can't be bothered to reimplement it all in JS for TSJS to work, it currently relies on a rather hackish local deamon written in python)

* running
#+begin_src sh
  # build the training dataset
  docker-compose up --build --remove-orphans -d
  docker-compose exec python ./run

  # run the training on your machine or collab
  # https://colab.research.google.com/drive/10R7uwVJJ1R1k6oTjbkkhxPDka7COK-WE
  git clone https://github.com/ultralytics/yolov5  # clone repo
  pip install -U -r yolov5/requirements.txt  # install dependencies
  python3 yolov5/train.py --img 416 --batch 80 --epochs 100 --data ./ia/data.yaml  --cfg ./ia/yolov5s.yaml --weights ''

#+end_src

* research
** yolo
https://github.com/ModelDepot/tfjs-yolo-tiny
https://github.com/Hyuto/yolov5-tfjs

** augmentation
there were a lot of augmentation solutions out there, because it had better
piplines and multicore support we went with:
 - https://github.com/aleju/imgaug

but leaving the other here for refs
 - https://github.com/srp-31/Data-Augmentation-for-Object-Detection-YOLO-
 - https://github.com/mdbloice/Augmentor 

** proveedores
http://www.bcra.gov.ar/SistemasFinancierosYdePagos/Proveedores-servicios-de-pago-ofrecen-cuentas-de-pago.asp
http://www.bcra.gov.ar/SistemasFinancierosYdePagos/Proveedores-servicios-de-billeteras-digitales-Interoperables.asp

http://www.bcra.gob.ar/SistemasFinancierosYdePagos/Entidades_financieras.asp

** certs in browsers
https://stackoverflow.com/questions/6566545/is-there-any-way-to-access-certificate-information-from-a-chrome-extension
https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/webRequest#accessing_security_information
https://chromium-review.googlesource.com/c/chromium/src/+/644858

** papers
https://logomotive.sidnlabs.nl/downloads/LogoMotive_paper.pdf