File size: 2,583 Bytes
485f76b
 
d8acda9
 
 
 
1732876
 
 
6120e5b
 
d8acda9
6120e5b
d8acda9
6120e5b
1732876
 
 
 
d16b094
d8acda9
 
 
 
 
e919aa3
 
1732876
d8acda9
 
 
 
 
1732876
d8acda9
 
 
 
485f76b
 
 
 
8f69832
 
 
1732876
8f69832
 
26ef429
485f76b
d8acda9
485f76b
 
 
 
95698cf
d8acda9
95698cf
 
 
1732876
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#+TITLE: Spoof Detect

Detect spoofed website by detecting logos from bank and financial entities in
pages with =ssl certificates= that do not match.

The process is pretty simple:
 - [1/2] scrape gvt websites to get a list of entities.
   - [x] 🇦🇷 BCRA ok
   - [ ] other countries 
 - [x] get logos, names and url
 - [x] navigate the url, extract the ssl certificate and look for =img= and tags
   with =id= or =class= logo (needs more heuristics) to make a db of logos
 - [x] screenshot the page and slice it into tiles generating YOLO annotations for
   the detected logos
 - [x] augment data using the logos database and the logoless tiles as background images
 - [2/3] train YOLO
   - [x] v5
   - [x] v6
   . [ ] v7 (actually slower than v6)
 - [ ] feed everything to a web extension that will detect the logos in any page
   and show a warning if the =SSL certificate= mismatches the collected one.

* running
#+begin_src sh
  # build the training dataset
  docker-compose up --build --remove-orphans -d
  docker-compose exec python ./run

  # run the training on your machine or collab
  # https://colab.research.google.com/drive/10R7uwVJJ1R1k6oTjbkkhxPDka7COK-WE
  git clone https://github.com/ultralytics/yolov5  # clone repo
  pip install -U -r yolov5/requirements.txt  # install dependencies
  python3 yolov5/train.py --img 416 --batch 80 --epochs 100 --data ./ia/data.yaml  --cfg ./ia/yolov5s.yaml --weights ''

#+end_src

* research
** yolo
https://github.com/ModelDepot/tfjs-yolo-tiny
https://github.com/Hyuto/yolov5-tfjs

** augmentation
there were a lot of augmentation solutions out there, because it had better
piplines and multicore support we went with:
 - https://github.com/aleju/imgaug

but leaving the other here for refs
 - https://github.com/srp-31/Data-Augmentation-for-Object-Detection-YOLO-
 - https://github.com/mdbloice/Augmentor 

** proveedores
http://www.bcra.gov.ar/SistemasFinancierosYdePagos/Proveedores-servicios-de-pago-ofrecen-cuentas-de-pago.asp
http://www.bcra.gov.ar/SistemasFinancierosYdePagos/Proveedores-servicios-de-billeteras-digitales-Interoperables.asp

http://www.bcra.gob.ar/SistemasFinancierosYdePagos/Entidades_financieras.asp

** certs in browsers
https://stackoverflow.com/questions/6566545/is-there-any-way-to-access-certificate-information-from-a-chrome-extension
https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/webRequest#accessing_security_information
https://chromium-review.googlesource.com/c/chromium/src/+/644858

** papers
https://logomotive.sidnlabs.nl/downloads/LogoMotive_paper.pdf