karma689 commited on
Commit
a1d2f62
·
verified ·
1 Parent(s): 0662896

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ patches_clahe/confusion_matrix.png filter=lfs diff=lfs merge=lfs -text
37
+ patches_color/confusion_matrix.png filter=lfs diff=lfs merge=lfs -text
38
+ whole_page/confusion_matrix.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Tibetan Script Classifier (DINOv3)
2
+
3
+ This repository contains fine-tuned Tibetan script classification checkpoints for 18 classes, trained from the DINOv3 ViT-S backbone:
4
+
5
+ - Backbone: `facebook/dinov3-vits16-pretrain-lvd1689m`
6
+ - Task: 18-way script classification
7
+ - Training script included: `finetune_dinov3.py`
8
+
9
+ **Hugging Face access:** DINOv3 requires access approval at [huggingface.co/facebook/dinov3-vits16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vits16-pretrain-lvd1689m) before `from_pretrained` / downloads will work. Anyone cloning this repo will see the same gated-model error until their HF account is granted access and they are logged in (`huggingface-cli login` or `HF_TOKEN`).
10
+
11
+ ## Label Set
12
+
13
+ `dhumri`, `difficult`, `drathung`, `drudring`, `druring`, `druthung`, `khyuyig`, `multi_scripts`, `non_tibetan`, `peri`, `petsuk`, `trinyig`, `tsegdrig`, `tsugchung`, `tsumachug`, `uchen_sugdring`, `uchen_sugthung`, `yigchung`.
14
+
15
+ ## Preprocessing (per experiment)
16
+
17
+ Images for training were produced as follows (see `preprocess.py` in the parent project):
18
+
19
+ - **`whole_page`:** resize so the **short edge is 224 px**, then **center crop** to 224×224 (one crop per source page).
20
+ - **`patches_color`:** same short-edge resize to 224, then **sliding-window** 224×224 patches with **25% overlap** between windows (multiple crops per page).
21
+ - **`patches_clahe`:** identical patch layout as `patches_color`; each patch is converted to grayscale and **CLAHE** contrast normalization is applied (`clipLimit=2.0`, `tileGridSize=(8,8)`), then saved as BGR/RGB for training.
22
+
23
+ ## Training recipe
24
+
25
+ - **Progressive unfreezing (defaults in `finetune_dinov3.py`):**
26
+ - **Stage A — head only:** 20 epochs, backbone frozen, classifier head LR **1e-3** (backbone LR 0).
27
+ - **Stage B — last 2 blocks:** 10 epochs, backbone LR **1e-5**, head LR **1e-3**.
28
+ - **Stage C — last 4 blocks:** 10 epochs, backbone LR **5e-6**, head LR **5e-4**.
29
+ - **Loss:** class-weighted **cross-entropy** with inverse-frequency weights over the training split (`nn.CrossEntropyLoss(weight=...)`).
30
+ - **Sampling:** the published runs use a standard `DataLoader` with **`shuffle=True`**. The script also defines **`get_weighted_sampler` → `WeightedRandomSampler`** if you want to switch the train loader to explicit class-balanced sampling.
31
+ - **Document-aware augmentations (train only):** `RandomRotation` **±5°** (fill white), `ColorJitter` brightness/contrast **±20%** (`0.2`), plus `RandomResizedCrop` and light `RandomErasing` as in `ScriptDataset`; **no horizontal flip**.
32
+
33
+ ## Class distribution (`whole_page` split totals)
34
+
35
+ The whole-page split has **5,684** samples in total (**train 3,996 / val 844 / test 844**). The **844** figure in test metrics is **only the test split** (roughly 15% of pages per class held out for testing), not the full dataset.
36
+
37
+ The per-class table below sums **train + val + test** counts. A benchmark exclusion manifest (`benchmark_page_ids.json`, **88** page IDs across 18 classes) is consulted when building splits: any file whose page ID matches is skipped and counted in `splits.json` under `skipped_excluded_files_by_class`. For the published `whole_page/splits.json`, that skip map is **empty**—so either those pages were **not present** under the training `data_dir`, or they had **already been removed** before this split was built. The **5,684** totals are whatever files remained for stratified splitting.
38
+
39
+ | Class | Samples |
40
+ |---|---:|
41
+ | dhumri | 98 |
42
+ | difficult | 170 |
43
+ | drathung | 129 |
44
+ | drudring | 132 |
45
+ | druring | 119 |
46
+ | druthung | 207 |
47
+ | khyuyig | 113 |
48
+ | multi_scripts | 235 |
49
+ | non_tibetan | 192 |
50
+ | peri | 614 |
51
+ | petsuk | 1388 |
52
+ | trinyig | 42 |
53
+ | tsegdrig | 749 |
54
+ | tsugchung | 77 |
55
+ | tsumachug | 178 |
56
+ | uchen_sugdring | 835 |
57
+ | uchen_sugthung | 240 |
58
+ | yigchung | 166 |
59
+
60
+ ## Experiments Included
61
+
62
+ ### 1) `whole_page`
63
+
64
+ - Files: `whole_page/final_model.pt`, `results.json`, `confusion_matrix.png`, `confusion_matrix.csv`, `splits.json`
65
+ - Test (image-level) macro-F1: **0.5124**
66
+ - Test accuracy: **0.5711**
67
+
68
+ ### 2) `patches_color`
69
+
70
+ - Files: `patches_color/final_model.pt`, `results.json`, `confusion_matrix.png`, `checkpoint_page_eval.json`, `splits.json`
71
+ - Test (patch-level) macro-F1: **0.4899**
72
+ - Re-eval **page-level** macro-F1 for shipped `final_model.pt` (`checkpoint_page_eval.json`): **0.5017**
73
+ - Best **page-level** macro-F1 among stage checkpoints on the same grid: **0.5043** (**Stage A**)
74
+
75
+ ### 3) `patches_clahe`
76
+
77
+ - Files: `patches_clahe/final_model.pt`, `results.json`, `confusion_matrix.png`, `checkpoint_page_eval.json`, `splits.json`
78
+ - Test (patch-level) macro-F1: **0.4911**
79
+ - Re-eval page-level macro-F1 for shipped `final_model.pt`: **0.5261**
80
+ - Best **page-level** macro-F1 among stage checkpoints: **0.529** (**Stage B**)
81
+
82
+ ## Which stage produced which checkpoint?
83
+
84
+ - **`final_model.pt` in each folder** is the stage with the highest **validation macro-F1** among `best_stage_*.pt` checkpoints (see `best_val_checkpoint` in each `results.json`): **Stage B** for `whole_page`, **Stage C** for both `patches_color` and `patches_clahe`.
85
+ - For **page-level** quality on the patch runs, the best single stage on the re-eval grid differs: **Stage A** (`patches_color`) and **Stage B** (`patches_clahe`) beat their respective `final_model.pt` page scores—use `checkpoint_page_eval.json` if you want to deploy a stage checkpoint instead of the val-selected default.
86
+
87
+ ## Which experiment won?
88
+
89
+ CLAHE patches achieved the highest **page-level** macro-F1 (**0.529** on the best stage checkpoint), while **whole page** achieved the best **image-level** macro-F1 (**0.512**). **Whole page** is recommended for production due to simpler inference.
90
+
91
+ ## How To Load a Checkpoint
92
+
93
+ ```python
94
+ import torch
95
+ from pathlib import Path
96
+ from finetune_dinov3 import DINOv3Classifier, DINOV3_MODEL_ID
97
+
98
+ ckpt_path = Path("whole_page/final_model.pt")
99
+ payload = torch.load(ckpt_path, map_location="cpu")
100
+
101
+ label_to_idx = payload["label_to_idx"]
102
+ idx_to_label = {v: k for k, v in label_to_idx.items()}
103
+ num_classes = len(label_to_idx)
104
+
105
+ model = DINOv3Classifier(DINOV3_MODEL_ID, num_classes)
106
+ model.load_state_dict(payload["model_state_dict"])
107
+ model.eval()
108
+ ```
109
+
110
+ ## Inference (Single Image)
111
+
112
+ ```python
113
+ import torch
114
+ from PIL import Image
115
+ from transformers import AutoImageProcessor
116
+
117
+ processor = AutoImageProcessor.from_pretrained(DINOV3_MODEL_ID)
118
+ img = Image.open("example.png").convert("RGB")
119
+ inputs = processor(images=img, return_tensors="pt")
120
+
121
+ with torch.no_grad():
122
+ logits = model(inputs["pixel_values"])
123
+ probs = torch.softmax(logits, dim=1)[0].cpu().numpy()
124
+ pred_idx = int(probs.argmax())
125
+ pred_label = idx_to_label[pred_idx]
126
+ print(pred_label, float(probs[pred_idx]))
127
+ ```
128
+
129
+ ## Page-Level Inference (Patch Aggregation)
130
+
131
+ For patch experiments (`patches_color`, `patches_clahe`), aggregate by page stem:
132
+
133
+ 1. group patch probabilities by page ID (strip `_pN` suffix),
134
+ 2. average probabilities per page,
135
+ 3. take `argmax` of averaged probabilities.
136
+
137
+ This is the same logic used in the re-evaluation script output (`checkpoint_page_eval.json`).
138
+
139
+ ## Known Limitations
140
+
141
+ - Class imbalance is high (for example `petsuk` and `uchen_sugdring` dominate, while `trinyig` is small).
142
+ - Results can vary by preprocessing variant and by patch vs page-level evaluation protocol.
143
+ - Patch-level metrics and page-level metrics are not directly interchangeable.
144
+ - The model expects Tibetan manuscript-style inputs; performance can drop on out-of-domain scans or mixed/noisy pages.
145
+ - Checkpoints are tied to the exact label mapping saved in each payload (`label_to_idx`).
146
+
147
+ ## Reproducibility Notes
148
+
149
+ - Exclusion manifest support is enabled in training (`benchmark_page_ids.json`).
150
+ - Full training code used for these artifacts is included at `finetune_dinov3.py`.
finetune_dinov3.py ADDED
@@ -0,0 +1,965 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Dino V3 finetunning for script classification
3
+ ==============================================
4
+
5
+ Progressive finetuning with page-level train/val/test split
6
+ Runs on three preprocessed variants:
7
+ - whole page /
8
+ - patches color /
9
+ - patches_clahe/
10
+
11
+ Usage:
12
+ #Exp1: whole page
13
+ python finetune_dinov3.py --data_dir ./Data/output/whole_page --experiment whole_page
14
+
15
+ #Exp2: patches_color
16
+ python finetune_dinov3.py --data_dir ./Data/output/patches_color --experiment patches_color
17
+
18
+ #Exp3: CLAHE_patches
19
+ python finetune_dinov3.py --data_dir ./Data/output/patches_clahe --experiment patches_clahe
20
+
21
+ Outputs (under --output_dir/<experiment>/):
22
+ best_<stage_slug>.pt — best val macro-F1 per stage
23
+ history_stage_{a,b,c}.json — per-epoch metrics per stage
24
+ training_history_stage_{a,b,c}.png — curves per stage
25
+ final_model.pt — weights chosen by best val across stages + test_metrics metadata
26
+ results.json, confusion_matrix.*, training_history.png (full run)
27
+
28
+ Requirements:
29
+ pip install torch torchvision transformers scikit-learn matplotlib seaborn
30
+ # DINOv3 requires transformers >= 4.56.0
31
+ # If not available: pip install --upgrade git+https://github.com/huggingface/transformers.git
32
+ """
33
+ import os
34
+ import re
35
+ import json
36
+ import argparse
37
+ import random
38
+ from pathlib import Path
39
+ from collections import Counter, defaultdict
40
+ from datetime import datetime
41
+
42
+ import numpy as np
43
+ import torch
44
+ import torch.nn as nn
45
+ from torch.cuda.amp import GradScaler
46
+ from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
47
+ from torchvision import transforms
48
+ from PIL import Image
49
+ from sklearn.metrics import (classification_report, confusion_matrix, f1_score, accuracy_score )
50
+
51
+ try:
52
+ from transformers import AutoImageProcessor, AutoModel
53
+ except ImportError:
54
+ raise ImportError("transformers >= 4.56.0 required for DINOv3.\n"
55
+ "Install: pip install --upgrade git+https://github.com/huggingface/transformers.git"
56
+ )
57
+ # =====================
58
+ # CONFIG
59
+ # =====================
60
+
61
+ DINOV3_MODEL_ID = "facebook/dinov3-vits16-pretrain-lvd1689m"
62
+ EMBEDDING_DIM = 384
63
+ VALID_EXT = {'.jpg', '.jpeg', '.png', '.tif', '.tiff', '.bmp', '.webp'}
64
+ SEED = 42
65
+
66
+ def set_seed(seed):
67
+ random.seed(seed)
68
+ np.random.seed(seed)
69
+ torch.manual_seed(seed)
70
+ if torch.cuda.is_available():
71
+ torch.cuda.manual_seed_all(seed)
72
+
73
+ # ======================
74
+ # Page level spliting
75
+ # ======================
76
+
77
+ def get_page_name(filepath):
78
+
79
+ """
80
+ Extract the original page name from a patch filename.
81
+ e.g., 'manuscript001_p3.png' → 'manuscript001'
82
+ e.g., 'manuscript001.png' → 'manuscript001'
83
+
84
+ This ensures all patches from the same page stay in the same split.
85
+ """
86
+ stem = Path(filepath).stem
87
+ page_name = re.sub(r'_p\d+$','',stem)
88
+ return page_name
89
+
90
+
91
+ def normalize_label_key(label: str) -> str:
92
+ """Normalize class names for manifest lookup."""
93
+ return re.sub(r'[^a-z0-9]+', '_', label.lower()).strip('_')
94
+
95
+
96
+ def load_exclusion_manifest(manifest_path: str):
97
+ """
98
+ Load class->page_ids exclusions from JSON.
99
+ Returns a dict keyed by normalized class labels.
100
+ """
101
+ if not manifest_path:
102
+ return {}
103
+
104
+ path = Path(manifest_path)
105
+ if not path.is_file():
106
+ print(f" Exclusion manifest not found, skipping exclusions: {path}")
107
+ return {}
108
+
109
+ with open(path, "r", encoding="utf-8") as f:
110
+ raw = json.load(f)
111
+ if not isinstance(raw, dict):
112
+ raise ValueError(f"Exclusion manifest must be a JSON object: {path}")
113
+
114
+ manifest = {}
115
+ for label, ids in raw.items():
116
+ if not isinstance(ids, list):
117
+ continue
118
+ norm_label = normalize_label_key(str(label))
119
+ manifest[norm_label] = {str(x).strip() for x in ids if str(x).strip()}
120
+ return manifest
121
+
122
+
123
+ def create_page_level(data_dir, val_ratio=0.15, test_ratio=0.15, seed=SEED, excluded_pages_by_label=None):
124
+
125
+ """
126
+ Split at the PAGE level, not the image/patch level.
127
+ All patches from one page go into the same split.
128
+
129
+ Returns:
130
+ splits: dict with 'train', 'val', 'test' keys
131
+ each value is a list of (filepath, label) tuples
132
+ label_to_idx: dict mapping label strings to integers
133
+ """
134
+ set_seed(seed)
135
+ data_dir = Path(data_dir)
136
+
137
+ class_pages = defaultdict(lambda: defaultdict(list))
138
+ skipped_by_label = Counter()
139
+
140
+ for cls_dir in sorted(data_dir.iterdir()):
141
+ if not cls_dir.is_dir() or cls_dir.name.startswith('.'):
142
+ continue
143
+ label = cls_dir.name
144
+ excluded_pages = set()
145
+ if excluded_pages_by_label:
146
+ excluded_pages = excluded_pages_by_label.get(normalize_label_key(label), set())
147
+ for img_path in sorted(cls_dir.iterdir()):
148
+ if img_path.suffix.lower() in VALID_EXT:
149
+ page = get_page_name(str(img_path))
150
+ if page in excluded_pages:
151
+ skipped_by_label[label] += 1
152
+ continue
153
+ class_pages[label][page].append(str(img_path))
154
+
155
+ # Create label mapping
156
+ labels = sorted(class_pages.keys())
157
+ label_to_idx = {label: idx for idx, label in enumerate(labels)}
158
+ idx_to_label = {idx: label for label, idx in label_to_idx.items()}
159
+
160
+ # Split pages per class (stratified)
161
+ splits = {'train': [], 'val': [], 'test': []}
162
+
163
+ for label in labels:
164
+ pages = list(class_pages[label].keys())
165
+ random.shuffle(pages)
166
+
167
+ n_pages = len(pages)
168
+ n_test = max(1, int(n_pages * test_ratio))
169
+ n_val = max(1, int(n_pages * val_ratio))
170
+ n_train = n_pages - n_test - n_val
171
+
172
+ test_pages = pages[:n_test]
173
+ val_pages = pages[n_test:n_test + n_val]
174
+ train_pages = pages[n_test + n_val:]
175
+
176
+ for page in train_pages:
177
+ for fpath in class_pages[label][page]:
178
+ splits['train'].append((fpath, label))
179
+ for page in val_pages:
180
+ for fpath in class_pages[label][page]:
181
+ splits['val'].append((fpath, label))
182
+ for page in test_pages:
183
+ for fpath in class_pages[label][page]:
184
+ splits['test'].append((fpath, label))
185
+
186
+ return splits, label_to_idx, idx_to_label, dict(skipped_by_label)
187
+
188
+ class ScriptDataset(Dataset):
189
+ def __init__(self, samples, label_to_idx, processor, augment = False):
190
+ self.samples = samples
191
+ self.label_to_idx = label_to_idx
192
+ self.processor = processor
193
+ self.augment = augment
194
+
195
+ #Document aware augmentation
196
+ if augment:
197
+ self.aug_transform = transforms.Compose([
198
+ transforms.RandomRotation(degrees=5, fill=255),
199
+ transforms.ColorJitter(brightness=0.2, contrast=0.2),
200
+ transforms.RandomResizedCrop(224, scale=(0.7, 1.0), ratio=(0.9, 1.1)),
201
+ transforms.RandomErasing(p=0.1, scale=(0.02, 0.08)),
202
+ ])
203
+ else:
204
+ self.aug_transform = None
205
+
206
+ def __len__(self):
207
+ return len(self.samples)
208
+
209
+ def __getitem__(self, idx):
210
+ file_path,label_str = self.samples[idx]
211
+
212
+ #Load Image
213
+ img = Image.open(file_path).convert('RGB')
214
+
215
+ if self.aug_transform is not None and self.augment:
216
+ img = transforms.ToTensor()(img)
217
+ img = self.aug_transform(img)
218
+ img = transforms.ToPILImage()(img)
219
+
220
+ # Process with DINOv3 processor (resize, normalize)
221
+ inputs = self.processor(images=img, return_tensors="pt")
222
+ pixel_values = inputs['pixel_values'].squeeze(0)
223
+
224
+ label_idx = self.label_to_idx[label_str]
225
+
226
+ return pixel_values, label_idx
227
+
228
+ class DINOv3Classifier(nn.Module):
229
+ """
230
+ DINOv3 ViT-S backbone + MLP classification head.
231
+
232
+ The backbone outputs:
233
+ - CLS token: 384-dim embedding (used for classification)
234
+ - Patch tokens: 196 × 384-dim (not used in this version)
235
+ - Register tokens: 4 × 384-dim (not used)
236
+
237
+ Classification head: 384 → 128 → num_classes
238
+ """
239
+
240
+ def __init__(self, model_id, num_classes, dropout=0.1):
241
+ super().__init__()
242
+
243
+ #Load pretrained backbone
244
+ self.backbone = AutoModel.from_pretrained(model_id)
245
+
246
+ #Get embedding dim
247
+ hidden_size = self.backbone.config.hidden_size
248
+
249
+ #Classification head
250
+ self.head = nn.Sequential(
251
+ nn.LayerNorm(hidden_size),
252
+ nn.Dropout(dropout),
253
+ nn.Linear(hidden_size, 128),
254
+ nn.GELU(),
255
+ nn.Dropout(dropout),
256
+ nn.Linear(128, num_classes),
257
+ )
258
+
259
+ self.freeze_backbone()
260
+
261
+ def freeze_backbone(self):
262
+ """Freeze all the backbone paramenters"""
263
+ for params in self.backbone.parameters():
264
+ params.requires_grad = False
265
+
266
+ def unfreeze_last_n_blocks(self, n):
267
+ """
268
+ Unfreeze the last N transformer blocks.
269
+ DINOv3 ViT-S has 12 blocks (layers).
270
+ """
271
+ # First freeze everything
272
+ self.freeze_backbone()
273
+
274
+ # HF DINOv3ViTModel: blocks at backbone.model.layer, final norm at backbone.norm
275
+ # (not ViT/BERT-style backbone.encoder.layer).
276
+ if hasattr(self.backbone, "model") and hasattr(self.backbone.model, "layer"):
277
+ layers = self.backbone.model.layer
278
+ elif hasattr(self.backbone, "encoder") and hasattr(self.backbone.encoder, "layer"):
279
+ layers = self.backbone.encoder.layer
280
+ else:
281
+ raise AttributeError(
282
+ "Backbone has no recognizable transformer blocks "
283
+ "(expected .model.layer for DINOv3 or .encoder.layer for ViT/BERT)."
284
+ )
285
+
286
+ total_layers = len(layers)
287
+ for i in range(max(0, total_layers - n), total_layers):
288
+ for param in layers[i].parameters():
289
+ param.requires_grad = True
290
+
291
+ if hasattr(self.backbone, "norm"):
292
+ for param in self.backbone.norm.parameters():
293
+ param.requires_grad = True
294
+ elif hasattr(self.backbone, "layernorm"):
295
+ for param in self.backbone.layernorm.parameters():
296
+ param.requires_grad = True
297
+
298
+ def forward(self, pixel_values):
299
+ # Get backbone outputs
300
+ outputs = self.backbone(pixel_values=pixel_values)
301
+
302
+ # Use CLS token (first token)
303
+ cls_embedding = outputs.last_hidden_state[:, 0, :]
304
+
305
+ # Classify
306
+ logits = self.head(cls_embedding)
307
+ return logits
308
+
309
+
310
+
311
+
312
+ # ====================================
313
+ # Tranining
314
+ # ====================================
315
+
316
+ def get_class_weights(samples, label_to_idx, device):
317
+ """Compute inverse-frequency class weights for balanced training."""
318
+ counts = Counter(label for _, label in samples)
319
+ total = sum(counts.values())
320
+ weights = torch.zeros(len(label_to_idx), device=device)
321
+ for label, idx in label_to_idx.items():
322
+ cnt = max(counts.get(label, 1), 1)
323
+ weights[idx] = total / (len(label_to_idx) * cnt)
324
+ return weights
325
+
326
+ def get_weighted_sampler(samples, label_to_idx):
327
+ """WeightedRandomSampler for balanced batches."""
328
+ counts = Counter(label for _, label in samples)
329
+ total = sum(counts.values())
330
+ class_weights = {label: total / count for label, count in counts.items()}
331
+ sample_weights = [class_weights[label] for _, label in samples]
332
+ return WeightedRandomSampler(sample_weights, len(samples), replacement=True)
333
+
334
+ def train_one_epoch(model, loader, criterion, optimizer, device, scaler=None):
335
+ """Train for one epoch with optional mixed precision."""
336
+ model.train()
337
+ total_loss = 0
338
+ correct = 0
339
+ total = 0
340
+
341
+ for batch_idx, (images,labels) in enumerate(loader):
342
+ images = images.to(device)
343
+ labels = labels.to(device)
344
+
345
+ optimizer.zero_grad()
346
+
347
+ if scaler:
348
+ with torch.autocast(device_type='cuda', dtype=torch.float16):
349
+ logits = model(images)
350
+ loss = criterion(logits, labels)
351
+ scaler.scale(loss).backward()
352
+ scaler.step(optimizer)
353
+ scaler.update()
354
+ else:
355
+ logits = model(images)
356
+ loss = criterion(logits, labels)
357
+ loss.backward()
358
+ optimizer.step()
359
+
360
+ total_loss += loss.item() * images.size(0)
361
+ _, predicted = logits.max(1)
362
+ correct += predicted.eq(labels).sum().item()
363
+ total += labels.size(0)
364
+
365
+ if(batch_idx + 1) % 50 == 0:
366
+ print(f" batch {batch_idx+1}/{len(loader)} | "
367
+ f"loss: {loss.item():.4f} | acc: {correct/total:.3f}")
368
+
369
+ return total_loss / total, correct / total
370
+
371
+
372
+ def _stage_checkpoint_slug(stage_name: str) -> str:
373
+ """Stable filename fragment (no spaces/colons) for checkpoint paths."""
374
+ s = re.sub(r"[^a-z0-9]+", "_", stage_name.lower())
375
+ return re.sub(r"_+", "_", s).strip("_")
376
+
377
+
378
+ @torch.no_grad()
379
+ def evaluate(model, loader, criterion, device, idx_to_label=None):
380
+ """Return validation/test metrics and per-sample preds, labels, probs."""
381
+ model.eval()
382
+ total_loss = 0.0
383
+ total = 0
384
+ all_preds = []
385
+ all_labels = []
386
+ all_probs = []
387
+
388
+ for images, labels in loader:
389
+ images = images.to(device)
390
+ labels = labels.to(device)
391
+ logits = model(images)
392
+ loss = criterion(logits, labels)
393
+ bs = images.size(0)
394
+ total_loss += loss.item() * bs
395
+ total += bs
396
+ probs = torch.softmax(logits, dim=1)
397
+ pred = logits.argmax(dim=1)
398
+ all_preds.extend(pred.cpu().numpy().tolist())
399
+ all_labels.extend(labels.cpu().numpy().tolist())
400
+ all_probs.extend(probs.cpu().numpy().tolist())
401
+
402
+ avg_loss = total_loss / max(total, 1)
403
+ acc = accuracy_score(all_labels, all_preds)
404
+ macro_f1 = f1_score(all_labels, all_preds, average="macro", zero_division=0)
405
+ weighted_f1 = f1_score(all_labels, all_preds, average="weighted", zero_division=0)
406
+ metrics = {
407
+ "loss": float(avg_loss),
408
+ "accuracy": float(acc),
409
+ "macro_f1": float(macro_f1),
410
+ "weighted_f1": float(weighted_f1),
411
+ }
412
+ return metrics, all_preds, all_labels, all_probs
413
+
414
+
415
+ def evaluate_page_level(samples, probs, label_to_idx, idx_to_label):
416
+ """
417
+ Aggregate patch-level probabilities to page-level predictions.
418
+
419
+ Args:
420
+ samples: list of (filepath, label_str) for the evaluated split.
421
+ probs: list of per-sample probability vectors (same order as samples).
422
+ """
423
+ if len(samples) != len(probs):
424
+ raise ValueError(
425
+ f"samples/probs length mismatch: {len(samples)} != {len(probs)}"
426
+ )
427
+
428
+ page_preds = defaultdict(list)
429
+ page_labels = {}
430
+
431
+ # Page-level true labels from file stems
432
+ for filepath, label_str in samples:
433
+ page = get_page_name(filepath)
434
+ page_labels[page] = label_to_idx[label_str]
435
+
436
+ # Group probabilities by page
437
+ for (filepath, _), p in zip(samples, probs):
438
+ page = get_page_name(filepath)
439
+ page_preds[page].append(np.asarray(p, dtype=np.float32))
440
+
441
+ pages_sorted = sorted(page_preds.keys())
442
+ all_page_true = []
443
+ all_page_pred = []
444
+ page_avg_probs = {}
445
+
446
+ for page in pages_sorted:
447
+ avg_probs = np.mean(page_preds[page], axis=0)
448
+ pred_idx = int(np.argmax(avg_probs))
449
+ true_idx = int(page_labels[page])
450
+ all_page_true.append(true_idx)
451
+ all_page_pred.append(pred_idx)
452
+ page_avg_probs[page] = avg_probs.tolist()
453
+
454
+ acc = accuracy_score(all_page_true, all_page_pred)
455
+ macro_f1 = f1_score(all_page_true, all_page_pred, average="macro", zero_division=0)
456
+ weighted_f1 = f1_score(all_page_true, all_page_pred, average="weighted", zero_division=0)
457
+
458
+ metrics = {
459
+ "accuracy": float(acc),
460
+ "macro_f1": float(macro_f1),
461
+ "weighted_f1": float(weighted_f1),
462
+ "num_pages": int(len(pages_sorted)),
463
+ "num_samples": int(len(samples)),
464
+ }
465
+
466
+ return {
467
+ "metrics": metrics,
468
+ "pages": pages_sorted,
469
+ "page_true": all_page_true,
470
+ "page_pred": all_page_pred,
471
+ "page_avg_probs": page_avg_probs,
472
+ }
473
+
474
+
475
+ #============================
476
+ # Progressive fine-tunning
477
+ #============================
478
+
479
+ def run_stage(model, train_loader, val_loader, criterion, device, stage_name, lr_backbone, lr_head, epochs, output_dir, idx_to_label, use_amp=True):
480
+ """Run one stage of progressive fine-tuning."""
481
+
482
+ print(f"\n{'='*60}")
483
+ print(f" {stage_name}")
484
+ print(f"{'='*60}")
485
+
486
+ # Set up optimizer with different LRs for backbone and head
487
+ param_groups = []
488
+
489
+ backbone_params = [p for p in model.backbone.parameters() if p.requires_grad]
490
+ head_params = list(model.head.parameters())
491
+
492
+ if backbone_params:
493
+ param_groups.append({'params': backbone_params, 'lr': lr_backbone})
494
+ print(f" Backbone params (trainable): {sum(p.numel() for p in backbone_params):,}")
495
+
496
+ param_groups.append({'params': head_params, 'lr': lr_head})
497
+ print(f" Head params: {sum(p.numel() for p in head_params):,}")
498
+ print(f" LR backbone: {lr_backbone}, LR head: {lr_head}")
499
+ print(f" Epochs: {epochs}")
500
+
501
+ optimizer = torch.optim.AdamW(param_groups, weight_decay=0.01)
502
+ scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=epochs)
503
+ scaler = torch.amp.GradScaler() if use_amp and device.type == 'cuda' else None
504
+
505
+ slug = _stage_checkpoint_slug(stage_name)
506
+ checkpoint_path = output_dir / f'best_{slug}.pt'
507
+
508
+ best_val_f1 = 0
509
+ best_epoch = 0
510
+ history = []
511
+
512
+ for epoch in range(epochs):
513
+ print(f"\n Epoch {epoch+1}/{epochs}")
514
+
515
+ train_loss, train_acc = train_one_epoch(
516
+ model, train_loader, criterion, optimizer, device, scaler
517
+ )
518
+
519
+ val_metrics, _, _, _ = evaluate(model, val_loader, criterion, device)
520
+
521
+ scheduler.step()
522
+
523
+ print(f" Train loss: {train_loss:.4f} | acc: {train_acc:.3f}")
524
+ print(f" Val loss: {val_metrics['loss']:.4f} | "
525
+ f"acc: {val_metrics['accuracy']:.3f} | "
526
+ f"macro-F1: {val_metrics['macro_f1']:.3f}")
527
+
528
+ history.append({
529
+ 'epoch': epoch + 1,
530
+ 'train_loss': train_loss,
531
+ 'train_acc': train_acc,
532
+ 'val_macro_f1': val_metrics['macro_f1'],
533
+ 'val_loss': val_metrics['loss'],
534
+ 'val_accuracy': val_metrics['accuracy'],
535
+ })
536
+
537
+ # Save best model (always use slug path so load paths in main() match)
538
+ if val_metrics['macro_f1'] > best_val_f1:
539
+ best_val_f1 = val_metrics['macro_f1']
540
+ best_epoch = epoch + 1
541
+ torch.save({
542
+ 'model_state_dict': model.state_dict(),
543
+ 'epoch': epoch + 1,
544
+ 'val_macro_f1': best_val_f1,
545
+ 'val_accuracy': val_metrics['accuracy'],
546
+ 'stage_name': stage_name,
547
+ 'stage_slug': slug,
548
+ }, checkpoint_path)
549
+ print(f" * New best! Saved to {checkpoint_path}")
550
+
551
+ print(f"\n {stage_name} complete. Best: epoch {best_epoch}, macro-F1: {best_val_f1:.3f}")
552
+ return history, best_val_f1
553
+
554
+ # ==========================
555
+ # MAIN
556
+ # ==========================
557
+
558
+ def _torch_load(path):
559
+ try:
560
+ return torch.load(path, weights_only=False)
561
+ except TypeError:
562
+ return torch.load(path)
563
+
564
+
565
+ def _save_stage_history_json(output_dir: Path, stage_key: str, history: list) -> None:
566
+ """Write one JSON file per training stage (loss / val metrics per epoch)."""
567
+ path = output_dir / f'history_{stage_key}.json'
568
+ with open(path, 'w') as f:
569
+ json.dump(history, f, indent=2, default=str)
570
+ print(f" Stage history saved: {path}")
571
+
572
+
573
+ def _plot_stage_history(output_dir: Path, stage_key: str, history: list, experiment: str) -> None:
574
+ """Save train loss + val macro-F1 curves for a single stage."""
575
+ if not history:
576
+ return
577
+ try:
578
+ import matplotlib
579
+ matplotlib.use('Agg')
580
+ import matplotlib.pyplot as plt
581
+
582
+ epochs = [h['epoch'] for h in history]
583
+ train_loss = [h['train_loss'] for h in history]
584
+ val_f1 = [h['val_macro_f1'] for h in history]
585
+
586
+ fig, axes = plt.subplots(1, 2, figsize=(12, 4))
587
+ axes[0].plot(epochs, train_loss, 'b-')
588
+ axes[0].set_xlabel('Epoch')
589
+ axes[0].set_ylabel('Train loss')
590
+ axes[0].set_title(f'{stage_key} — train loss')
591
+
592
+ axes[1].plot(epochs, val_f1, 'g-')
593
+ axes[1].set_xlabel('Epoch')
594
+ axes[1].set_ylabel('Val macro-F1')
595
+ axes[1].set_title(f'{stage_key} — validation')
596
+
597
+ fig.suptitle(f'{experiment} / {stage_key}')
598
+ plt.tight_layout()
599
+ out_path = output_dir / f'training_history_{stage_key}.png'
600
+ plt.savefig(out_path, dpi=150)
601
+ plt.close()
602
+ print(f" Stage plot saved: {out_path}")
603
+ except Exception as e:
604
+ print(f" (Skipping stage plot for {stage_key}: {e})")
605
+
606
+
607
+ def _save_stage_artifacts(output_dir: Path, stage_key: str, history: list, experiment: str) -> None:
608
+ _save_stage_history_json(output_dir, stage_key, history)
609
+ _plot_stage_history(output_dir, stage_key, history, experiment)
610
+
611
+
612
+ def main():
613
+ parser = argparse.ArgumentParser(description="Fine-tune DINO ViT-S")
614
+ parser.add_argument(
615
+ "--data_dir", type=str, required=True,
616
+ help="Path to processed data (e.g., ./Data/output/whole_page)",
617
+ )
618
+ parser.add_argument(
619
+ "--experiment", type=str, required=True,
620
+ choices=["whole_page", "patches_color", "patches_clahe"],
621
+ help="Which experiment variant",
622
+ )
623
+ parser.add_argument("--output_dir", type=str, default="./results",
624
+ help="Where to save checkpoints and results")
625
+ parser.add_argument("--batch_size", type=int, default=32,
626
+ help="Batch size (reduce if OOM)")
627
+ parser.add_argument("--epochs_a", type=int, default=20,
628
+ help="Epochs for Stage A (head only)")
629
+ parser.add_argument("--epochs_b", type=int, default=10,
630
+ help="Epochs for Stage B (last 2 blocks)")
631
+ parser.add_argument("--epochs_c", type=int, default=10,
632
+ help="Epochs for Stage C (last 4 blocks)")
633
+ parser.add_argument("--num_workers", type=int, default=4)
634
+ parser.add_argument("--no_amp", action="store_true",
635
+ help="Disable mixed precision")
636
+ parser.add_argument("--skip_stage_c", action="store_true",
637
+ help="Skip Stage C (last 4 blocks)")
638
+ parser.add_argument(
639
+ "--exclude_manifest",
640
+ type=str,
641
+ default="./benchmark_page_ids.json",
642
+ help="Optional class->page_ids JSON; excluded pages are skipped during split creation",
643
+ )
644
+ args = parser.parse_args()
645
+
646
+ stage_a_name = "Stage A: Head only"
647
+ stage_b_name = "Stage B: Last 2 blocks"
648
+ stage_c_name = "Stage C: Last 4 blocks"
649
+
650
+ set_seed(SEED)
651
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
652
+ output_dir = Path(args.output_dir) / args.experiment
653
+ output_dir.mkdir(parents=True, exist_ok=True)
654
+
655
+ print(f"\n{'='*60}")
656
+ print(f" DINOv3 ViT-S Fine-Tuning")
657
+ print(f" {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
658
+ print(f"{'='*60}")
659
+ print(f" Experiment: {args.experiment}")
660
+ print(f" Data dir: {args.data_dir}")
661
+ print(f" Device: {device}")
662
+ print(f" Batch size: {args.batch_size}")
663
+ print(f" AMP: {not args.no_amp}")
664
+ print(f" Exclusions: {args.exclude_manifest}")
665
+
666
+ # Page level split
667
+ print(f"\n Creating page level split")
668
+ excluded_pages_by_label = load_exclusion_manifest(args.exclude_manifest)
669
+ excluded_label_count = len(excluded_pages_by_label)
670
+ excluded_id_count = sum(len(v) for v in excluded_pages_by_label.values())
671
+ if excluded_label_count:
672
+ print(f" Loaded exclusions: {excluded_label_count} labels, {excluded_id_count} page IDs")
673
+ splits, label_to_idx, idx_to_label, skipped_by_label = create_page_level(
674
+ args.data_dir,
675
+ excluded_pages_by_label=excluded_pages_by_label,
676
+ )
677
+ num_classes = len(label_to_idx)
678
+
679
+ print(f" Classes: {num_classes}")
680
+ print(f" Train: {len(splits['train'])} | Val: {len(splits['val'])} | Test: {len(splits['test'])}")
681
+ if skipped_by_label:
682
+ print("\n Skipped excluded files by class:")
683
+ for label, count in sorted(skipped_by_label.items()):
684
+ print(f" {label:<20s} {count:>6d}")
685
+
686
+ # Print per-class split counts
687
+ for split_name in ['train', 'val', 'test']:
688
+ counts = Counter(label for _, label in splits[split_name])
689
+ print(f"\n {split_name}:")
690
+ for label in sorted(counts.keys()):
691
+ print(f" {label:<20s} {counts[label]:>6d}")
692
+
693
+ # Save splits for reproducibility
694
+ splits_info = {
695
+ split_name: [(fp, label) for fp, label in samples]
696
+ for split_name, samples in splits.items()
697
+ }
698
+ with open(output_dir / 'splits.json', 'w') as f:
699
+ json.dump({
700
+ 'label_to_idx': label_to_idx,
701
+ 'idx_to_label': {str(k): v for k, v in idx_to_label.items()},
702
+ 'split_counts': {
703
+ name: dict(Counter(l for _, l in samples))
704
+ for name, samples in splits.items()
705
+ },
706
+ 'exclude_manifest': str(args.exclude_manifest),
707
+ 'excluded_label_count': excluded_label_count,
708
+ 'excluded_page_id_count': excluded_id_count,
709
+ 'skipped_excluded_files_by_class': dict(skipped_by_label),
710
+ }, f, indent=2)
711
+
712
+ print(f"Loading DINOv3 processor: {DINOV3_MODEL_ID}")
713
+ processor = AutoImageProcessor.from_pretrained(DINOV3_MODEL_ID)
714
+
715
+ train_dataset = ScriptDataset(splits['train'], label_to_idx, processor, augment=True)
716
+ val_dataset = ScriptDataset(splits['val'], label_to_idx, processor, augment=False)
717
+ test_dataset = ScriptDataset(splits['test'], label_to_idx, processor, augment=False)
718
+
719
+ train_loader = DataLoader(
720
+ train_dataset, batch_size=args.batch_size, shuffle=True,
721
+ num_workers=args.num_workers, pin_memory=(device.type == 'cuda'),
722
+ )
723
+ val_loader = DataLoader(
724
+ val_dataset, batch_size=args.batch_size, shuffle=False,
725
+ num_workers=args.num_workers, pin_memory=(device.type == 'cuda'),
726
+ )
727
+ test_loader = DataLoader(
728
+ test_dataset, batch_size=args.batch_size, shuffle=False,
729
+ num_workers=args.num_workers, pin_memory=(device.type == 'cuda'),
730
+ )
731
+
732
+ print(f"\n Building DINOv3 classifier ({num_classes} classes)...")
733
+ model = DINOv3Classifier(DINOV3_MODEL_ID, num_classes, dropout=0.1)
734
+ model = model.to(device)
735
+
736
+ total_params = sum(p.numel() for p in model.parameters())
737
+ trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
738
+ print(f" Total params: {total_params:,}")
739
+ print(f" Trainable params: {trainable_params:,} (head only)")
740
+
741
+ class_weights = get_class_weights(splits['train'], label_to_idx, device)
742
+ criterion = nn.CrossEntropyLoss(weight=class_weights)
743
+
744
+ use_amp = not args.no_amp and device.type == 'cuda'
745
+ all_history = {}
746
+
747
+ # Stage A: Head only (backbone frozen)
748
+ model.freeze_backbone()
749
+ history_a, best_f1_a = run_stage(
750
+ model, train_loader, val_loader, criterion, device,
751
+ stage_name=stage_a_name,
752
+ lr_backbone=0, lr_head=1e-3,
753
+ epochs=args.epochs_a, output_dir=output_dir,
754
+ idx_to_label=idx_to_label, use_amp=use_amp,
755
+ )
756
+ all_history['stage_a'] = history_a
757
+ _save_stage_artifacts(output_dir, 'stage_a', history_a, args.experiment)
758
+
759
+ ckpt_a = output_dir / f"best_{_stage_checkpoint_slug(stage_a_name)}.pt"
760
+ best_a = _torch_load(ckpt_a)
761
+ model.load_state_dict(best_a['model_state_dict'])
762
+
763
+ model.unfreeze_last_n_blocks(2)
764
+ history_b, best_f1_b = run_stage(
765
+ model, train_loader, val_loader, criterion, device,
766
+ stage_name=stage_b_name,
767
+ lr_backbone=1e-5, lr_head=1e-3,
768
+ epochs=args.epochs_b, output_dir=output_dir,
769
+ idx_to_label=idx_to_label, use_amp=use_amp,
770
+ )
771
+ all_history['stage_b'] = history_b
772
+ _save_stage_artifacts(output_dir, 'stage_b', history_b, args.experiment)
773
+
774
+ if not args.skip_stage_c:
775
+ ckpt_b = output_dir / f"best_{_stage_checkpoint_slug(stage_b_name)}.pt"
776
+ best_b = _torch_load(ckpt_b)
777
+ model.load_state_dict(best_b['model_state_dict'])
778
+
779
+ model.unfreeze_last_n_blocks(4)
780
+ history_c, best_f1_c = run_stage(
781
+ model, train_loader, val_loader, criterion, device,
782
+ stage_name=stage_c_name,
783
+ lr_backbone=5e-6, lr_head=5e-4,
784
+ epochs=args.epochs_c, output_dir=output_dir,
785
+ idx_to_label=idx_to_label, use_amp=use_amp,
786
+ )
787
+ all_history['stage_c'] = history_c
788
+ _save_stage_artifacts(output_dir, 'stage_c', history_c, args.experiment)
789
+
790
+ # Final evaluation on test set
791
+ print(f"\n{'='*60}")
792
+ print(f" FINAL TEST EVALUATION")
793
+ print(f"{'='*60}")
794
+
795
+ best_checkpoints = list(output_dir.glob('best_*.pt'))
796
+ best_f1 = 0.0
797
+ best_ckpt = None
798
+ for ckpt_path in best_checkpoints:
799
+ ckpt = _torch_load(ckpt_path)
800
+ if ckpt.get('val_macro_f1', 0) > best_f1:
801
+ best_f1 = ckpt['val_macro_f1']
802
+ best_ckpt = ckpt_path
803
+
804
+ if best_ckpt is None:
805
+ raise RuntimeError("No checkpoint found under output_dir; cannot run test evaluation.")
806
+
807
+ print(f" Loading best checkpoint: {best_ckpt} (val F1: {best_f1:.3f})")
808
+ model.load_state_dict(_torch_load(best_ckpt)['model_state_dict'])
809
+
810
+ test_metrics, test_preds, test_labels, test_probs = evaluate(
811
+ model, test_loader, criterion, device, idx_to_label
812
+ )
813
+ page_eval = evaluate_page_level(
814
+ splits['test'],
815
+ test_probs,
816
+ label_to_idx=label_to_idx,
817
+ idx_to_label=idx_to_label,
818
+ )
819
+ page_metrics = page_eval["metrics"]
820
+
821
+ # Canonical weights for this experiment (same as loaded best val checkpoint, after test eval)
822
+ final_model_path = output_dir / 'final_model.pt'
823
+ torch.save(
824
+ {
825
+ 'model_state_dict': model.state_dict(),
826
+ 'experiment': args.experiment,
827
+ 'model_id': DINOV3_MODEL_ID,
828
+ 'num_classes': num_classes,
829
+ 'label_to_idx': label_to_idx,
830
+ 'source_val_checkpoint': str(best_ckpt),
831
+ 'val_macro_f1_at_selection': float(best_f1),
832
+ 'test_metrics': test_metrics,
833
+ 'page_test_metrics': page_metrics,
834
+ },
835
+ final_model_path,
836
+ )
837
+ print(f"\n Final model (for deployment / comparison) saved: {final_model_path}")
838
+
839
+ print(f"\n Test accuracy: {test_metrics['accuracy']:.3f}")
840
+ print(f" Test macro-F1: {test_metrics['macro_f1']:.3f}")
841
+ print(f" Test weighted-F1: {test_metrics['weighted_f1']:.3f}")
842
+ print(f" Page accuracy: {page_metrics['accuracy']:.3f} "
843
+ f"| Page macro-F1: {page_metrics['macro_f1']:.3f} "
844
+ f"| Pages: {page_metrics['num_pages']}")
845
+
846
+ #Classification report
847
+ target_names = [idx_to_label[i] for i in range(num_classes)]
848
+ report = classification_report(
849
+ test_labels, test_preds, target_names=target_names, zero_division=0
850
+ )
851
+ print(f"\n{report}")
852
+
853
+ # Confusion matrix
854
+ cm = confusion_matrix(test_labels, test_preds)
855
+ page_cm = confusion_matrix(page_eval["page_true"], page_eval["page_pred"])
856
+
857
+ # Save everything
858
+ results = {
859
+ 'experiment': args.experiment,
860
+ 'model': DINOV3_MODEL_ID,
861
+ 'num_classes': num_classes,
862
+ 'best_val_checkpoint': str(best_ckpt),
863
+ 'val_macro_f1_at_selection': float(best_f1),
864
+ 'final_model_path': str(final_model_path),
865
+ 'test_metrics': test_metrics,
866
+ 'page_test_metrics': page_metrics,
867
+ 'history': all_history,
868
+ 'confusion_matrix': cm.tolist(),
869
+ 'page_confusion_matrix': page_cm.tolist(),
870
+ 'label_to_idx': label_to_idx,
871
+ 'classification_report': report,
872
+ 'page_classification_report': classification_report(
873
+ page_eval["page_true"], page_eval["page_pred"], target_names=target_names, zero_division=0
874
+ ),
875
+ }
876
+
877
+ with open(output_dir / 'results.json', 'w') as f:
878
+ json.dump(results, f, indent=2, default=str)
879
+
880
+ # Save confusion matrix as CSV
881
+ import pandas as pd
882
+ cm_df = pd.DataFrame(cm, index=target_names, columns=target_names)
883
+ cm_df.to_csv(output_dir / 'confusion_matrix.csv')
884
+ page_cm_df = pd.DataFrame(page_cm, index=target_names, columns=target_names)
885
+ page_cm_df.to_csv(output_dir / 'page_confusion_matrix.csv')
886
+
887
+ # Plot confusion matrix
888
+ try:
889
+ import matplotlib
890
+ matplotlib.use('Agg')
891
+ import matplotlib.pyplot as plt
892
+ import seaborn as sns
893
+
894
+ fig, ax = plt.subplots(figsize=(14, 12))
895
+
896
+ sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
897
+ xticklabels=target_names, yticklabels=target_names, ax=ax)
898
+ ax.set_xlabel('Predicted label')
899
+ ax.set_ylabel('True label')
900
+ ax.set_title(f'Confusion Matrix — {args.experiment} (macro-F1: {test_metrics["macro_f1"]:.3f})')
901
+ plt.tight_layout()
902
+ plt.savefig(output_dir / 'confusion_matrix.png', dpi=150)
903
+ plt.close()
904
+ print(f"\n Confusion matrix saved: {output_dir / 'confusion_matrix.png'}")
905
+ except ImportError:
906
+ print(" (matplotlib/seaborn not available, skipping plot)")
907
+
908
+ # Plot page-level confusion matrix
909
+ try:
910
+ fig, ax = plt.subplots(figsize=(14, 12))
911
+ sns.heatmap(page_cm, annot=True, fmt='d', cmap='Greens',
912
+ xticklabels=target_names, yticklabels=target_names, ax=ax)
913
+ ax.set_xlabel('Predicted label')
914
+ ax.set_ylabel('True label')
915
+ ax.set_title(
916
+ f'Page Confusion Matrix — {args.experiment} '
917
+ f'(macro-F1: {page_metrics["macro_f1"]:.3f})'
918
+ )
919
+ plt.tight_layout()
920
+ plt.savefig(output_dir / 'page_confusion_matrix.png', dpi=150)
921
+ plt.close()
922
+ print(f" Page confusion matrix saved: {output_dir / 'page_confusion_matrix.png'}")
923
+ except Exception:
924
+ pass
925
+
926
+ # Plot training history
927
+ try:
928
+ fig, axes = plt.subplots(1, 2, figsize=(14, 5))
929
+
930
+ all_epochs = []
931
+ all_train_loss = []
932
+ all_val_f1 = []
933
+ offset = 0
934
+
935
+ for stage_name, stage_history in all_history.items():
936
+ for entry in stage_history:
937
+ all_epochs.append(entry['epoch'] + offset)
938
+ all_train_loss.append(entry['train_loss'])
939
+ all_val_f1.append(entry['val_macro_f1'])
940
+ offset += len(stage_history)
941
+
942
+ axes[0].plot(all_epochs, all_train_loss, 'b-')
943
+ axes[0].set_xlabel('Epoch')
944
+ axes[0].set_ylabel('Train Loss')
945
+ axes[0].set_title('Training Loss')
946
+
947
+ axes[1].plot(all_epochs, all_val_f1, 'g-')
948
+ axes[1].set_xlabel('Epoch')
949
+ axes[1].set_ylabel('Macro F1')
950
+ axes[1].set_title('Validation Macro-F1')
951
+
952
+ plt.suptitle(f'{args.experiment} — Progressive Fine-Tuning')
953
+ plt.tight_layout()
954
+ plt.savefig(output_dir / 'training_history.png', dpi=150)
955
+ plt.close()
956
+ print(f" Training history saved: {output_dir / 'training_history.png'}")
957
+ except Exception:
958
+ pass
959
+
960
+ print(f"\n{'='*60}")
961
+ print(f" All results saved to: {output_dir}")
962
+ print(f"{'='*60}\n")
963
+
964
+ if __name__ == "__main__":
965
+ main()
patches_clahe/checkpoint_page_eval.json ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "experiment": "patches_clahe",
3
+ "data_dir": "./Data/output/patches_clahe",
4
+ "exclude_manifest": "./benchmark_page_ids.json",
5
+ "num_classes": 18,
6
+ "checkpoint_results": {
7
+ "best_stage_a_head_only.pt": {
8
+ "patch_metrics": {
9
+ "loss": 1.252472641594146,
10
+ "accuracy": 0.5506856023506367,
11
+ "macro_f1": 0.47513258745169057,
12
+ "weighted_f1": 0.5599438485560059
13
+ },
14
+ "page_metrics": {
15
+ "accuracy": 0.5793838862559242,
16
+ "macro_f1": 0.48727754394876716,
17
+ "weighted_f1": 0.5878127224538057,
18
+ "num_pages": 844,
19
+ "num_samples": 4084
20
+ },
21
+ "val_macro_f1_at_save": 0.5136114729148367,
22
+ "epoch_at_save": 18
23
+ },
24
+ "best_stage_b_last_2_blocks.pt": {
25
+ "patch_metrics": {
26
+ "loss": 1.2555613597156252,
27
+ "accuracy": 0.5614593535749265,
28
+ "macro_f1": 0.49372885021461854,
29
+ "weighted_f1": 0.5659541837806491
30
+ },
31
+ "page_metrics": {
32
+ "accuracy": 0.5995260663507109,
33
+ "macro_f1": 0.5293513765903048,
34
+ "weighted_f1": 0.6026293588445379,
35
+ "num_pages": 844,
36
+ "num_samples": 4084
37
+ },
38
+ "val_macro_f1_at_save": 0.5008343237274931,
39
+ "epoch_at_save": 7
40
+ },
41
+ "best_stage_c_last_4_blocks.pt": {
42
+ "patch_metrics": {
43
+ "loss": 1.2517013753546324,
44
+ "accuracy": 0.5631733594515181,
45
+ "macro_f1": 0.49105727133084387,
46
+ "weighted_f1": 0.5689654809805166
47
+ },
48
+ "page_metrics": {
49
+ "accuracy": 0.5995260663507109,
50
+ "macro_f1": 0.5260995491182711,
51
+ "weighted_f1": 0.6006737611724785,
52
+ "num_pages": 844,
53
+ "num_samples": 4084
54
+ },
55
+ "val_macro_f1_at_save": 0.5169021957485435,
56
+ "epoch_at_save": 5
57
+ },
58
+ "final_model.pt": {
59
+ "patch_metrics": {
60
+ "loss": 1.2517013753546324,
61
+ "accuracy": 0.5631733594515181,
62
+ "macro_f1": 0.49105727133084387,
63
+ "weighted_f1": 0.5689654809805166
64
+ },
65
+ "page_metrics": {
66
+ "accuracy": 0.5995260663507109,
67
+ "macro_f1": 0.5260995491182711,
68
+ "weighted_f1": 0.6006737611724785,
69
+ "num_pages": 844,
70
+ "num_samples": 4084
71
+ },
72
+ "val_macro_f1_at_save": -1.0,
73
+ "epoch_at_save": null
74
+ }
75
+ }
76
+ }
patches_clahe/confusion_matrix.png ADDED

Git LFS Details

  • SHA256: 65a951b7cbf3810bba1f3768fa687abb92acf718842c352daabab3701767a45a
  • Pointer size: 131 Bytes
  • Size of remote file: 199 kB
patches_clahe/final_model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3aac9a9de02a2c21fcd88e5788c1a8389ccea84332381b75e168f784cabb3531
3
+ size 86680521
patches_clahe/results.json ADDED
@@ -0,0 +1,725 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "experiment": "patches_clahe",
3
+ "model": "facebook/dinov3-vits16-pretrain-lvd1689m",
4
+ "num_classes": 18,
5
+ "best_val_checkpoint": "results/patches_clahe/best_stage_c_last_4_blocks.pt",
6
+ "val_macro_f1_at_selection": 0.5169021957485435,
7
+ "final_model_path": "results/patches_clahe/final_model.pt",
8
+ "test_metrics": {
9
+ "loss": 1.2517013753546324,
10
+ "accuracy": 0.5631733594515181,
11
+ "macro_f1": 0.49105727133084387,
12
+ "weighted_f1": 0.5689654809805166
13
+ },
14
+ "history": {
15
+ "stage_a": [
16
+ {
17
+ "epoch": 1,
18
+ "train_loss": 1.8593816012204836,
19
+ "train_acc": 0.362362413371382,
20
+ "val_macro_f1": 0.4319974508023712,
21
+ "val_loss": 1.4901656766267863,
22
+ "val_accuracy": 0.4875974486180014
23
+ },
24
+ {
25
+ "epoch": 2,
26
+ "train_loss": 1.5385081531852787,
27
+ "train_acc": 0.4296269873624134,
28
+ "val_macro_f1": 0.4376218690431116,
29
+ "val_loss": 1.3288157734638857,
30
+ "val_accuracy": 0.5206709189699976
31
+ },
32
+ {
33
+ "epoch": 3,
34
+ "train_loss": 1.462932827907147,
35
+ "train_acc": 0.4516918059518956,
36
+ "val_macro_f1": 0.500725408171822,
37
+ "val_loss": 1.2287645069611655,
38
+ "val_accuracy": 0.5598866052445074
39
+ },
40
+ {
41
+ "epoch": 4,
42
+ "train_loss": 1.4217422395349277,
43
+ "train_acc": 0.46045658377496945,
44
+ "val_macro_f1": 0.48274125175979077,
45
+ "val_loss": 1.2352880899777459,
46
+ "val_accuracy": 0.5454760217339948
47
+ },
48
+ {
49
+ "epoch": 5,
50
+ "train_loss": 1.3553591080213439,
51
+ "train_acc": 0.47141255605381166,
52
+ "val_macro_f1": 0.48175273735132706,
53
+ "val_loss": 1.2118230883516996,
54
+ "val_accuracy": 0.5553980628395937
55
+ },
56
+ {
57
+ "epoch": 6,
58
+ "train_loss": 1.318518064190211,
59
+ "train_acc": 0.4869037912759886,
60
+ "val_macro_f1": 0.4869184819503965,
61
+ "val_loss": 1.1853021681836782,
62
+ "val_accuracy": 0.5648476257973069
63
+ },
64
+ {
65
+ "epoch": 7,
66
+ "train_loss": 1.3179870111525327,
67
+ "train_acc": 0.48303098247044435,
68
+ "val_macro_f1": 0.48765648195901945,
69
+ "val_loss": 1.274599445843398,
70
+ "val_accuracy": 0.5426411528466808
71
+ },
72
+ {
73
+ "epoch": 8,
74
+ "train_loss": 1.273892571670495,
75
+ "train_acc": 0.4956685690990624,
76
+ "val_macro_f1": 0.48543837069544804,
77
+ "val_loss": 1.2382452835718523,
78
+ "val_accuracy": 0.5476021733994803
79
+ },
80
+ {
81
+ "epoch": 9,
82
+ "train_loss": 1.2597664558688233,
83
+ "train_acc": 0.49821646962902566,
84
+ "val_macro_f1": 0.4922089846332509,
85
+ "val_loss": 1.2238617013623296,
86
+ "val_accuracy": 0.5695724072761634
87
+ },
88
+ {
89
+ "epoch": 10,
90
+ "train_loss": 1.2202024454007185,
91
+ "train_acc": 0.5098858540562576,
92
+ "val_macro_f1": 0.512737966359169,
93
+ "val_loss": 1.21785488924658,
94
+ "val_accuracy": 0.5695724072761634
95
+ },
96
+ {
97
+ "epoch": 11,
98
+ "train_loss": 1.2321005250743784,
99
+ "train_acc": 0.5068793314309009,
100
+ "val_macro_f1": 0.503358840147939,
101
+ "val_loss": 1.1998826106607619,
102
+ "val_accuracy": 0.5679187337585637
103
+ },
104
+ {
105
+ "epoch": 12,
106
+ "train_loss": 1.192835262295085,
107
+ "train_acc": 0.5158479412963718,
108
+ "val_macro_f1": 0.4992412596002297,
109
+ "val_loss": 1.2032046473507845,
110
+ "val_accuracy": 0.5653201039451925
111
+ },
112
+ {
113
+ "epoch": 13,
114
+ "train_loss": 1.188643198959982,
115
+ "train_acc": 0.5185487158581329,
116
+ "val_macro_f1": 0.4925479807046458,
117
+ "val_loss": 1.236337137727796,
118
+ "val_accuracy": 0.5530356721001654
119
+ },
120
+ {
121
+ "epoch": 14,
122
+ "train_loss": 1.169746606387851,
123
+ "train_acc": 0.5211475743986955,
124
+ "val_macro_f1": 0.49465692373316666,
125
+ "val_loss": 1.2227921510824153,
126
+ "val_accuracy": 0.5653201039451925
127
+ },
128
+ {
129
+ "epoch": 15,
130
+ "train_loss": 1.1576695175400278,
131
+ "train_acc": 0.5165103954341622,
132
+ "val_macro_f1": 0.49926720552181625,
133
+ "val_loss": 1.235735728896709,
134
+ "val_accuracy": 0.5622489959839357
135
+ },
136
+ {
137
+ "epoch": 16,
138
+ "train_loss": 1.1471103678369736,
139
+ "train_acc": 0.5271606196494089,
140
+ "val_macro_f1": 0.5011249988005891,
141
+ "val_loss": 1.22383168439352,
142
+ "val_accuracy": 0.5608315615402788
143
+ },
144
+ {
145
+ "epoch": 17,
146
+ "train_loss": 1.122234392982573,
147
+ "train_acc": 0.5326131267835303,
148
+ "val_macro_f1": 0.49651596391481484,
149
+ "val_loss": 1.2158161995894177,
150
+ "val_accuracy": 0.5579966926529648
151
+ },
152
+ {
153
+ "epoch": 18,
154
+ "train_loss": 1.1230225592693697,
155
+ "train_acc": 0.5311353444761516,
156
+ "val_macro_f1": 0.5136114729148367,
157
+ "val_loss": 1.2290783647580714,
158
+ "val_accuracy": 0.5657925820930781
159
+ },
160
+ {
161
+ "epoch": 19,
162
+ "train_loss": 1.1248822599054111,
163
+ "train_acc": 0.5294537301263759,
164
+ "val_macro_f1": 0.5098656024509526,
165
+ "val_loss": 1.223455751830034,
166
+ "val_accuracy": 0.5650838648712497
167
+ },
168
+ {
169
+ "epoch": 20,
170
+ "train_loss": 1.1545256153204662,
171
+ "train_acc": 0.5255299633102324,
172
+ "val_macro_f1": 0.5110926162945351,
173
+ "val_loss": 1.221439600145735,
174
+ "val_accuracy": 0.5650838648712497
175
+ }
176
+ ],
177
+ "stage_b": [
178
+ {
179
+ "epoch": 1,
180
+ "train_loss": 1.2200194986266018,
181
+ "train_acc": 0.5027517325723604,
182
+ "val_macro_f1": 0.46506942353605496,
183
+ "val_loss": 1.3105420627064268,
184
+ "val_accuracy": 0.5171273328608552
185
+ },
186
+ {
187
+ "epoch": 2,
188
+ "train_loss": 1.1641008270258228,
189
+ "train_acc": 0.5174785976355483,
190
+ "val_macro_f1": 0.488298599121119,
191
+ "val_loss": 1.2541892450947922,
192
+ "val_accuracy": 0.5622489959839357
193
+ },
194
+ {
195
+ "epoch": 3,
196
+ "train_loss": 1.1500101727074719,
197
+ "train_acc": 0.5224215246636771,
198
+ "val_macro_f1": 0.4956599255579888,
199
+ "val_loss": 1.2533673877510343,
200
+ "val_accuracy": 0.5719347980155918
201
+ },
202
+ {
203
+ "epoch": 4,
204
+ "train_loss": 1.109286910652677,
205
+ "train_acc": 0.5318997146351406,
206
+ "val_macro_f1": 0.4869455753172703,
207
+ "val_loss": 1.2878183309783502,
208
+ "val_accuracy": 0.5369714150720529
209
+ },
210
+ {
211
+ "epoch": 5,
212
+ "train_loss": 1.1026983434965403,
213
+ "train_acc": 0.5340909090909091,
214
+ "val_macro_f1": 0.49383977694366493,
215
+ "val_loss": 1.2160760782361595,
216
+ "val_accuracy": 0.563193952279707
217
+ },
218
+ {
219
+ "epoch": 6,
220
+ "train_loss": 1.0861972221976341,
221
+ "train_acc": 0.5346004891969017,
222
+ "val_macro_f1": 0.4928857194617713,
223
+ "val_loss": 1.2282306231631368,
224
+ "val_accuracy": 0.5667375383888495
225
+ },
226
+ {
227
+ "epoch": 7,
228
+ "train_loss": 1.0392189267407717,
229
+ "train_acc": 0.5468304117407257,
230
+ "val_macro_f1": 0.5008343237274931,
231
+ "val_loss": 1.2261127899210578,
232
+ "val_accuracy": 0.5705173635719348
233
+ },
234
+ {
235
+ "epoch": 8,
236
+ "train_loss": 1.0304037320142467,
237
+ "train_acc": 0.5592641663269466,
238
+ "val_macro_f1": 0.49877089719135825,
239
+ "val_loss": 1.233303443623098,
240
+ "val_accuracy": 0.566028821167021
241
+ },
242
+ {
243
+ "epoch": 9,
244
+ "train_loss": 1.009766974281011,
245
+ "train_acc": 0.5577863840195679,
246
+ "val_macro_f1": 0.4996623108709126,
247
+ "val_loss": 1.2278242774405415,
248
+ "val_accuracy": 0.5719347980155918
249
+ },
250
+ {
251
+ "epoch": 10,
252
+ "train_loss": 1.0051777931169448,
253
+ "train_acc": 0.5638503872808805,
254
+ "val_macro_f1": 0.500006795107852,
255
+ "val_loss": 1.2239024188231342,
256
+ "val_accuracy": 0.5705173635719348
257
+ }
258
+ ],
259
+ "stage_c": [
260
+ {
261
+ "epoch": 1,
262
+ "train_loss": 1.0662864717813205,
263
+ "train_acc": 0.5465756216877293,
264
+ "val_macro_f1": 0.4984768337725192,
265
+ "val_loss": 1.2097023466138617,
266
+ "val_accuracy": 0.5759508622726199
267
+ },
268
+ {
269
+ "epoch": 2,
270
+ "train_loss": 1.028777121720001,
271
+ "train_acc": 0.5545760293518142,
272
+ "val_macro_f1": 0.4938877916675499,
273
+ "val_loss": 1.2337657653856666,
274
+ "val_accuracy": 0.5624852350578786
275
+ },
276
+ {
277
+ "epoch": 3,
278
+ "train_loss": 1.0153135317858024,
279
+ "train_acc": 0.5535059111292295,
280
+ "val_macro_f1": 0.49907722687144723,
281
+ "val_loss": 1.2192433534331748,
282
+ "val_accuracy": 0.5742971887550201
283
+ },
284
+ {
285
+ "epoch": 4,
286
+ "train_loss": 1.0038959708351822,
287
+ "train_acc": 0.5601304525071341,
288
+ "val_macro_f1": 0.5059140741600023,
289
+ "val_loss": 1.2162246975293984,
290
+ "val_accuracy": 0.5790219702338767
291
+ },
292
+ {
293
+ "epoch": 5,
294
+ "train_loss": 0.9943306450935174,
295
+ "train_acc": 0.5619649408887077,
296
+ "val_macro_f1": 0.5169021957485435,
297
+ "val_loss": 1.2105160319626318,
298
+ "val_accuracy": 0.5794944483817623
299
+ },
300
+ {
301
+ "epoch": 6,
302
+ "train_loss": 0.9618599005149814,
303
+ "train_acc": 0.570067264573991,
304
+ "val_macro_f1": 0.504840497803461,
305
+ "val_loss": 1.228955523002679,
306
+ "val_accuracy": 0.5752421450507914
307
+ },
308
+ {
309
+ "epoch": 7,
310
+ "train_loss": 0.9511148705562182,
311
+ "train_acc": 0.5765389319200979,
312
+ "val_macro_f1": 0.4990006211421147,
313
+ "val_loss": 1.233863600319176,
314
+ "val_accuracy": 0.568863690054335
315
+ },
316
+ {
317
+ "epoch": 8,
318
+ "train_loss": 0.9406011274917437,
319
+ "train_acc": 0.5768446799836935,
320
+ "val_macro_f1": 0.5112059339127408,
321
+ "val_loss": 1.2131912938006506,
322
+ "val_accuracy": 0.5811481218993622
323
+ },
324
+ {
325
+ "epoch": 9,
326
+ "train_loss": 0.9396456855683729,
327
+ "train_acc": 0.5827558092132084,
328
+ "val_macro_f1": 0.5095793102793661,
329
+ "val_loss": 1.2089267937183124,
330
+ "val_accuracy": 0.581384360973305
331
+ },
332
+ {
333
+ "epoch": 10,
334
+ "train_loss": 0.9486989188063062,
335
+ "train_acc": 0.5774052181002853,
336
+ "val_macro_f1": 0.5112753932863323,
337
+ "val_loss": 1.2089088627612696,
338
+ "val_accuracy": 0.581384360973305
339
+ }
340
+ ]
341
+ },
342
+ "confusion_matrix": [
343
+ [
344
+ 55,
345
+ 0,
346
+ 0,
347
+ 1,
348
+ 0,
349
+ 17,
350
+ 0,
351
+ 0,
352
+ 0,
353
+ 2,
354
+ 0,
355
+ 0,
356
+ 6,
357
+ 0,
358
+ 0,
359
+ 0,
360
+ 0,
361
+ 0
362
+ ],
363
+ [
364
+ 3,
365
+ 65,
366
+ 3,
367
+ 4,
368
+ 1,
369
+ 0,
370
+ 0,
371
+ 2,
372
+ 0,
373
+ 0,
374
+ 0,
375
+ 0,
376
+ 8,
377
+ 0,
378
+ 4,
379
+ 0,
380
+ 0,
381
+ 0
382
+ ],
383
+ [
384
+ 0,
385
+ 0,
386
+ 24,
387
+ 2,
388
+ 3,
389
+ 0,
390
+ 3,
391
+ 4,
392
+ 0,
393
+ 15,
394
+ 2,
395
+ 2,
396
+ 13,
397
+ 6,
398
+ 5,
399
+ 0,
400
+ 0,
401
+ 5
402
+ ],
403
+ [
404
+ 0,
405
+ 4,
406
+ 2,
407
+ 42,
408
+ 7,
409
+ 19,
410
+ 1,
411
+ 1,
412
+ 0,
413
+ 2,
414
+ 1,
415
+ 0,
416
+ 0,
417
+ 2,
418
+ 0,
419
+ 0,
420
+ 0,
421
+ 0
422
+ ],
423
+ [
424
+ 0,
425
+ 2,
426
+ 0,
427
+ 30,
428
+ 42,
429
+ 0,
430
+ 0,
431
+ 4,
432
+ 0,
433
+ 2,
434
+ 0,
435
+ 0,
436
+ 0,
437
+ 1,
438
+ 0,
439
+ 0,
440
+ 0,
441
+ 0
442
+ ],
443
+ [
444
+ 32,
445
+ 0,
446
+ 7,
447
+ 15,
448
+ 4,
449
+ 76,
450
+ 0,
451
+ 4,
452
+ 1,
453
+ 4,
454
+ 2,
455
+ 0,
456
+ 8,
457
+ 0,
458
+ 0,
459
+ 0,
460
+ 0,
461
+ 0
462
+ ],
463
+ [
464
+ 0,
465
+ 3,
466
+ 0,
467
+ 0,
468
+ 0,
469
+ 0,
470
+ 40,
471
+ 1,
472
+ 0,
473
+ 0,
474
+ 0,
475
+ 14,
476
+ 0,
477
+ 0,
478
+ 14,
479
+ 0,
480
+ 0,
481
+ 1
482
+ ],
483
+ [
484
+ 0,
485
+ 5,
486
+ 5,
487
+ 5,
488
+ 3,
489
+ 4,
490
+ 0,
491
+ 67,
492
+ 1,
493
+ 13,
494
+ 20,
495
+ 3,
496
+ 12,
497
+ 4,
498
+ 6,
499
+ 0,
500
+ 2,
501
+ 6
502
+ ],
503
+ [
504
+ 1,
505
+ 0,
506
+ 0,
507
+ 1,
508
+ 0,
509
+ 2,
510
+ 0,
511
+ 0,
512
+ 53,
513
+ 0,
514
+ 0,
515
+ 0,
516
+ 0,
517
+ 0,
518
+ 0,
519
+ 0,
520
+ 0,
521
+ 0
522
+ ],
523
+ [
524
+ 6,
525
+ 0,
526
+ 8,
527
+ 3,
528
+ 2,
529
+ 2,
530
+ 0,
531
+ 7,
532
+ 0,
533
+ 308,
534
+ 85,
535
+ 0,
536
+ 55,
537
+ 7,
538
+ 1,
539
+ 3,
540
+ 1,
541
+ 3
542
+ ],
543
+ [
544
+ 9,
545
+ 11,
546
+ 21,
547
+ 1,
548
+ 2,
549
+ 7,
550
+ 0,
551
+ 62,
552
+ 0,
553
+ 235,
554
+ 595,
555
+ 0,
556
+ 147,
557
+ 9,
558
+ 1,
559
+ 0,
560
+ 0,
561
+ 4
562
+ ],
563
+ [
564
+ 0,
565
+ 2,
566
+ 0,
567
+ 0,
568
+ 1,
569
+ 0,
570
+ 2,
571
+ 0,
572
+ 0,
573
+ 3,
574
+ 0,
575
+ 9,
576
+ 0,
577
+ 9,
578
+ 6,
579
+ 0,
580
+ 0,
581
+ 0
582
+ ],
583
+ [
584
+ 13,
585
+ 3,
586
+ 58,
587
+ 6,
588
+ 2,
589
+ 5,
590
+ 0,
591
+ 28,
592
+ 0,
593
+ 133,
594
+ 74,
595
+ 7,
596
+ 235,
597
+ 8,
598
+ 3,
599
+ 1,
600
+ 0,
601
+ 0
602
+ ],
603
+ [
604
+ 0,
605
+ 2,
606
+ 3,
607
+ 2,
608
+ 0,
609
+ 0,
610
+ 0,
611
+ 1,
612
+ 0,
613
+ 5,
614
+ 0,
615
+ 16,
616
+ 6,
617
+ 18,
618
+ 0,
619
+ 0,
620
+ 0,
621
+ 3
622
+ ],
623
+ [
624
+ 0,
625
+ 2,
626
+ 6,
627
+ 0,
628
+ 3,
629
+ 0,
630
+ 43,
631
+ 5,
632
+ 0,
633
+ 0,
634
+ 0,
635
+ 9,
636
+ 0,
637
+ 2,
638
+ 54,
639
+ 0,
640
+ 0,
641
+ 8
642
+ ],
643
+ [
644
+ 0,
645
+ 3,
646
+ 0,
647
+ 2,
648
+ 1,
649
+ 0,
650
+ 0,
651
+ 1,
652
+ 1,
653
+ 1,
654
+ 1,
655
+ 0,
656
+ 0,
657
+ 0,
658
+ 0,
659
+ 513,
660
+ 28,
661
+ 0
662
+ ],
663
+ [
664
+ 0,
665
+ 1,
666
+ 1,
667
+ 2,
668
+ 0,
669
+ 0,
670
+ 0,
671
+ 7,
672
+ 0,
673
+ 0,
674
+ 0,
675
+ 0,
676
+ 0,
677
+ 0,
678
+ 0,
679
+ 75,
680
+ 89,
681
+ 0
682
+ ],
683
+ [
684
+ 0,
685
+ 7,
686
+ 16,
687
+ 10,
688
+ 0,
689
+ 3,
690
+ 8,
691
+ 13,
692
+ 1,
693
+ 6,
694
+ 0,
695
+ 8,
696
+ 4,
697
+ 3,
698
+ 17,
699
+ 0,
700
+ 0,
701
+ 15
702
+ ]
703
+ ],
704
+ "label_to_idx": {
705
+ "dhumri": 0,
706
+ "difficult": 1,
707
+ "drathung": 2,
708
+ "drudring": 3,
709
+ "druring": 4,
710
+ "druthung": 5,
711
+ "khyuyig": 6,
712
+ "multi_scripts": 7,
713
+ "non_tibetan": 8,
714
+ "peri": 9,
715
+ "petsuk": 10,
716
+ "trinyig": 11,
717
+ "tsegdrig": 12,
718
+ "tsugchung": 13,
719
+ "tsumachug": 14,
720
+ "uchen_sugdring": 15,
721
+ "uchen_sugthung": 16,
722
+ "yigchung": 17
723
+ },
724
+ "classification_report": " precision recall f1-score support\n\n dhumri 0.46 0.68 0.55 81\n difficult 0.59 0.72 0.65 90\n drathung 0.16 0.29 0.20 84\n drudring 0.33 0.52 0.41 81\n druring 0.59 0.52 0.55 81\n druthung 0.56 0.50 0.53 153\n khyuyig 0.41 0.55 0.47 73\n multi_scripts 0.32 0.43 0.37 156\n non_tibetan 0.93 0.93 0.93 57\n peri 0.42 0.63 0.50 491\n petsuk 0.76 0.54 0.63 1104\n trinyig 0.13 0.28 0.18 32\n tsegdrig 0.48 0.41 0.44 576\n tsugchung 0.26 0.32 0.29 56\n tsumachug 0.49 0.41 0.44 132\nuchen_sugdring 0.87 0.93 0.90 551\nuchen_sugthung 0.74 0.51 0.60 175\n yigchung 0.33 0.14 0.19 111\n\n accuracy 0.56 4084\n macro avg 0.49 0.52 0.49 4084\n weighted avg 0.60 0.56 0.57 4084\n"
725
+ }
patches_clahe/splits.json ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "label_to_idx": {
3
+ "dhumri": 0,
4
+ "difficult": 1,
5
+ "drathung": 2,
6
+ "drudring": 3,
7
+ "druring": 4,
8
+ "druthung": 5,
9
+ "khyuyig": 6,
10
+ "multi_scripts": 7,
11
+ "non_tibetan": 8,
12
+ "peri": 9,
13
+ "petsuk": 10,
14
+ "trinyig": 11,
15
+ "tsegdrig": 12,
16
+ "tsugchung": 13,
17
+ "tsumachug": 14,
18
+ "uchen_sugdring": 15,
19
+ "uchen_sugthung": 16,
20
+ "yigchung": 17
21
+ },
22
+ "idx_to_label": {
23
+ "0": "dhumri",
24
+ "1": "difficult",
25
+ "2": "drathung",
26
+ "3": "drudring",
27
+ "4": "druring",
28
+ "5": "druthung",
29
+ "6": "khyuyig",
30
+ "7": "multi_scripts",
31
+ "8": "non_tibetan",
32
+ "9": "peri",
33
+ "10": "petsuk",
34
+ "11": "trinyig",
35
+ "12": "tsegdrig",
36
+ "13": "tsugchung",
37
+ "14": "tsumachug",
38
+ "15": "uchen_sugdring",
39
+ "16": "uchen_sugthung",
40
+ "17": "yigchung"
41
+ },
42
+ "split_counts": {
43
+ "train": {
44
+ "dhumri": 363,
45
+ "difficult": 410,
46
+ "drathung": 458,
47
+ "drudring": 451,
48
+ "druring": 421,
49
+ "druthung": 710,
50
+ "khyuyig": 324,
51
+ "multi_scripts": 824,
52
+ "non_tibetan": 291,
53
+ "peri": 2348,
54
+ "petsuk": 5086,
55
+ "trinyig": 165,
56
+ "tsegdrig": 2766,
57
+ "tsugchung": 293,
58
+ "tsumachug": 604,
59
+ "uchen_sugdring": 2705,
60
+ "uchen_sugthung": 807,
61
+ "yigchung": 598
62
+ },
63
+ "val": {
64
+ "dhumri": 74,
65
+ "difficult": 72,
66
+ "drathung": 95,
67
+ "drudring": 97,
68
+ "druring": 89,
69
+ "druthung": 169,
70
+ "khyuyig": 70,
71
+ "multi_scripts": 182,
72
+ "non_tibetan": 52,
73
+ "peri": 514,
74
+ "petsuk": 1091,
75
+ "trinyig": 33,
76
+ "tsegdrig": 613,
77
+ "tsugchung": 56,
78
+ "tsumachug": 132,
79
+ "uchen_sugdring": 601,
80
+ "uchen_sugthung": 180,
81
+ "yigchung": 113
82
+ },
83
+ "test": {
84
+ "dhumri": 81,
85
+ "difficult": 90,
86
+ "drathung": 84,
87
+ "drudring": 81,
88
+ "druring": 81,
89
+ "druthung": 153,
90
+ "khyuyig": 73,
91
+ "multi_scripts": 156,
92
+ "non_tibetan": 57,
93
+ "peri": 491,
94
+ "petsuk": 1104,
95
+ "trinyig": 32,
96
+ "tsegdrig": 576,
97
+ "tsugchung": 56,
98
+ "tsumachug": 132,
99
+ "uchen_sugdring": 551,
100
+ "uchen_sugthung": 175,
101
+ "yigchung": 111
102
+ }
103
+ },
104
+ "exclude_manifest": "./benchmark_page_ids.json",
105
+ "excluded_label_count": 18,
106
+ "excluded_page_id_count": 88,
107
+ "skipped_excluded_files_by_class": {}
108
+ }
patches_color/checkpoint_page_eval.json ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "experiment": "patches_color",
3
+ "data_dir": "./Data/output/patches_color",
4
+ "exclude_manifest": "./benchmark_page_ids.json",
5
+ "num_classes": 18,
6
+ "checkpoint_results": {
7
+ "best_stage_a_head_only.pt": {
8
+ "patch_metrics": {
9
+ "loss": 1.2655372293292018,
10
+ "accuracy": 0.548971596474045,
11
+ "macro_f1": 0.48338756435970365,
12
+ "weighted_f1": 0.5595663273001261
13
+ },
14
+ "page_metrics": {
15
+ "accuracy": 0.5817535545023697,
16
+ "macro_f1": 0.5042916757569069,
17
+ "weighted_f1": 0.5865342057784698,
18
+ "num_pages": 844,
19
+ "num_samples": 4084
20
+ },
21
+ "val_macro_f1_at_save": 0.5201207198529141,
22
+ "epoch_at_save": 20
23
+ },
24
+ "best_stage_b_last_2_blocks.pt": {
25
+ "patch_metrics": {
26
+ "loss": 1.2968507967196061,
27
+ "accuracy": 0.5572967678746327,
28
+ "macro_f1": 0.48656564769193644,
29
+ "weighted_f1": 0.563328932982191
30
+ },
31
+ "page_metrics": {
32
+ "accuracy": 0.5853080568720379,
33
+ "macro_f1": 0.4969784132736875,
34
+ "weighted_f1": 0.5889015907282624,
35
+ "num_pages": 844,
36
+ "num_samples": 4084
37
+ },
38
+ "val_macro_f1_at_save": 0.5246287483695703,
39
+ "epoch_at_save": 7
40
+ },
41
+ "best_stage_c_last_4_blocks.pt": {
42
+ "patch_metrics": {
43
+ "loss": 1.2944221436160084,
44
+ "accuracy": 0.5604799216454457,
45
+ "macro_f1": 0.48988450083500523,
46
+ "weighted_f1": 0.5677717007365548
47
+ },
48
+ "page_metrics": {
49
+ "accuracy": 0.5924170616113744,
50
+ "macro_f1": 0.5017240427906837,
51
+ "weighted_f1": 0.5960050045768394,
52
+ "num_pages": 844,
53
+ "num_samples": 4084
54
+ },
55
+ "val_macro_f1_at_save": 0.5268057868156721,
56
+ "epoch_at_save": 10
57
+ },
58
+ "final_model.pt": {
59
+ "patch_metrics": {
60
+ "loss": 1.2944221436160084,
61
+ "accuracy": 0.5604799216454457,
62
+ "macro_f1": 0.48988450083500523,
63
+ "weighted_f1": 0.5677717007365548
64
+ },
65
+ "page_metrics": {
66
+ "accuracy": 0.5924170616113744,
67
+ "macro_f1": 0.5017240427906837,
68
+ "weighted_f1": 0.5960050045768394,
69
+ "num_pages": 844,
70
+ "num_samples": 4084
71
+ },
72
+ "val_macro_f1_at_save": -1.0,
73
+ "epoch_at_save": null
74
+ }
75
+ }
76
+ }
patches_color/confusion_matrix.png ADDED

Git LFS Details

  • SHA256: d3defea66443207c1bc4c69dee4fd9c07027422d84b58076afd2d492b83eb3f1
  • Pointer size: 131 Bytes
  • Size of remote file: 202 kB
patches_color/final_model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65b3c9f81177e34c74aad910c545a0b97f097dbacd84da0e0c5af0d29eddd54a
3
+ size 86680521
patches_color/results.json ADDED
@@ -0,0 +1,725 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "experiment": "patches_color",
3
+ "model": "facebook/dinov3-vits16-pretrain-lvd1689m",
4
+ "num_classes": 18,
5
+ "best_val_checkpoint": "results/patches_color/best_stage_c_last_4_blocks.pt",
6
+ "val_macro_f1_at_selection": 0.5268057868156721,
7
+ "final_model_path": "results/patches_color/final_model.pt",
8
+ "test_metrics": {
9
+ "loss": 1.2944221436160084,
10
+ "accuracy": 0.5604799216454457,
11
+ "macro_f1": 0.48988450083500523,
12
+ "weighted_f1": 0.5677717007365548
13
+ },
14
+ "history": {
15
+ "stage_a": [
16
+ {
17
+ "epoch": 1,
18
+ "train_loss": 1.8228148812427358,
19
+ "train_acc": 0.37194251936404404,
20
+ "val_macro_f1": 0.4478217149694628,
21
+ "val_loss": 1.4409239574326935,
22
+ "val_accuracy": 0.500590597684857
23
+ },
24
+ {
25
+ "epoch": 2,
26
+ "train_loss": 1.4974706918426401,
27
+ "train_acc": 0.4455768446799837,
28
+ "val_macro_f1": 0.45642959802097366,
29
+ "val_loss": 1.311343546190246,
30
+ "val_accuracy": 0.531301677297425
31
+ },
32
+ {
33
+ "epoch": 3,
34
+ "train_loss": 1.4202212384414827,
35
+ "train_acc": 0.4585201793721973,
36
+ "val_macro_f1": 0.5198020336380464,
37
+ "val_loss": 1.1873418371530393,
38
+ "val_accuracy": 0.5714623198677061
39
+ },
40
+ {
41
+ "epoch": 4,
42
+ "train_loss": 1.3796070234366353,
43
+ "train_acc": 0.47457195271096614,
44
+ "val_macro_f1": 0.48635628120132335,
45
+ "val_loss": 1.2194681716935447,
46
+ "val_accuracy": 0.5398062839593669
47
+ },
48
+ {
49
+ "epoch": 5,
50
+ "train_loss": 1.3158884044574126,
51
+ "train_acc": 0.48797390949857317,
52
+ "val_macro_f1": 0.485095009606241,
53
+ "val_loss": 1.1974138276123025,
54
+ "val_accuracy": 0.5613040396881644
55
+ },
56
+ {
57
+ "epoch": 6,
58
+ "train_loss": 1.2685181100272278,
59
+ "train_acc": 0.5026498165511618,
60
+ "val_macro_f1": 0.5151064930461298,
61
+ "val_loss": 1.181795822871427,
62
+ "val_accuracy": 0.5773682967162769
63
+ },
64
+ {
65
+ "epoch": 7,
66
+ "train_loss": 1.264474732896235,
67
+ "train_acc": 0.4974011414594374,
68
+ "val_macro_f1": 0.49794473606184897,
69
+ "val_loss": 1.2368834672133577,
70
+ "val_accuracy": 0.5499645641389086
71
+ },
72
+ {
73
+ "epoch": 8,
74
+ "train_loss": 1.2266237828569222,
75
+ "train_acc": 0.5090705258866693,
76
+ "val_macro_f1": 0.5073845743840514,
77
+ "val_loss": 1.1930924508656393,
78
+ "val_accuracy": 0.5622489959839357
79
+ },
80
+ {
81
+ "epoch": 9,
82
+ "train_loss": 1.2126428141090464,
83
+ "train_acc": 0.5157969832857725,
84
+ "val_macro_f1": 0.5029403569611346,
85
+ "val_loss": 1.199680348883319,
86
+ "val_accuracy": 0.5714623198677061
87
+ },
88
+ {
89
+ "epoch": 10,
90
+ "train_loss": 1.1708803988941534,
91
+ "train_acc": 0.5228801467590706,
92
+ "val_macro_f1": 0.5074790060262432,
93
+ "val_loss": 1.2058235416010623,
94
+ "val_accuracy": 0.5620127569099929
95
+ },
96
+ {
97
+ "epoch": 11,
98
+ "train_loss": 1.174615762037401,
99
+ "train_acc": 0.5223196086424786,
100
+ "val_macro_f1": 0.5064378041379447,
101
+ "val_loss": 1.179450622276827,
102
+ "val_accuracy": 0.5653201039451925
103
+ },
104
+ {
105
+ "epoch": 12,
106
+ "train_loss": 1.1377201861281614,
107
+ "train_acc": 0.5295046881369752,
108
+ "val_macro_f1": 0.501551417883034,
109
+ "val_loss": 1.187849443206093,
110
+ "val_accuracy": 0.568863690054335
111
+ },
112
+ {
113
+ "epoch": 13,
114
+ "train_loss": 1.1421819735049619,
115
+ "train_acc": 0.5347024052181003,
116
+ "val_macro_f1": 0.5125219961449731,
117
+ "val_loss": 1.208868742721345,
118
+ "val_accuracy": 0.5613040396881644
119
+ },
120
+ {
121
+ "epoch": 14,
122
+ "train_loss": 1.1138945578848445,
123
+ "train_acc": 0.5381165919282511,
124
+ "val_macro_f1": 0.5131290447373811,
125
+ "val_loss": 1.188052623450207,
126
+ "val_accuracy": 0.5738247106071345
127
+ },
128
+ {
129
+ "epoch": 15,
130
+ "train_loss": 1.0984022547845884,
131
+ "train_acc": 0.5382694659600489,
132
+ "val_macro_f1": 0.5108666463642549,
133
+ "val_loss": 1.2041859181424897,
134
+ "val_accuracy": 0.5679187337585637
135
+ },
136
+ {
137
+ "epoch": 16,
138
+ "train_loss": 1.0941936582193246,
139
+ "train_acc": 0.5438748471259682,
140
+ "val_macro_f1": 0.5103848985029631,
141
+ "val_loss": 1.1977766572746182,
142
+ "val_accuracy": 0.5672100165367352
143
+ },
144
+ {
145
+ "epoch": 17,
146
+ "train_loss": 1.0736914152424528,
147
+ "train_acc": 0.5451487973909499,
148
+ "val_macro_f1": 0.5188176724642701,
149
+ "val_loss": 1.189426226291903,
150
+ "val_accuracy": 0.5695724072761634
151
+ },
152
+ {
153
+ "epoch": 18,
154
+ "train_loss": 1.080425682599526,
155
+ "train_acc": 0.5493273542600897,
156
+ "val_macro_f1": 0.5193581019534623,
157
+ "val_loss": 1.1903373014411547,
158
+ "val_accuracy": 0.5716985589416489
159
+ },
160
+ {
161
+ "epoch": 19,
162
+ "train_loss": 1.0716409188020584,
163
+ "train_acc": 0.5456583774969426,
164
+ "val_macro_f1": 0.5196107585917038,
165
+ "val_loss": 1.1931781820898328,
166
+ "val_accuracy": 0.5705173635719348
167
+ },
168
+ {
169
+ "epoch": 20,
170
+ "train_loss": 1.092901507543633,
171
+ "train_acc": 0.5422441907867916,
172
+ "val_macro_f1": 0.5201207198529141,
173
+ "val_loss": 1.1916698552940912,
174
+ "val_accuracy": 0.5709898417198205
175
+ }
176
+ ],
177
+ "stage_b": [
178
+ {
179
+ "epoch": 1,
180
+ "train_loss": 1.1536900823183367,
181
+ "train_acc": 0.520688952303302,
182
+ "val_macro_f1": 0.49201257424371936,
183
+ "val_loss": 1.2939789023856703,
184
+ "val_accuracy": 0.5263406567446256
185
+ },
186
+ {
187
+ "epoch": 2,
188
+ "train_loss": 1.106891256276802,
189
+ "train_acc": 0.531593966571545,
190
+ "val_macro_f1": 0.5033889357742809,
191
+ "val_loss": 1.2282823414699973,
192
+ "val_accuracy": 0.5716985589416489
193
+ },
194
+ {
195
+ "epoch": 3,
196
+ "train_loss": 1.0965523051088117,
197
+ "train_acc": 0.536740725642071,
198
+ "val_macro_f1": 0.5085179283247842,
199
+ "val_loss": 1.2278816308227767,
200
+ "val_accuracy": 0.5695724072761634
201
+ },
202
+ {
203
+ "epoch": 4,
204
+ "train_loss": 1.0530349347623473,
205
+ "train_acc": 0.5524357929066449,
206
+ "val_macro_f1": 0.49767342372352924,
207
+ "val_loss": 1.2748388547732803,
208
+ "val_accuracy": 0.5386250885896527
209
+ },
210
+ {
211
+ "epoch": 5,
212
+ "train_loss": 1.0473334150706997,
213
+ "train_acc": 0.555697105584998,
214
+ "val_macro_f1": 0.5109896618724019,
215
+ "val_loss": 1.204903576978056,
216
+ "val_accuracy": 0.5705173635719348
217
+ },
218
+ {
219
+ "epoch": 6,
220
+ "train_loss": 1.0396876881378212,
221
+ "train_acc": 0.558754586220954,
222
+ "val_macro_f1": 0.5100822540510055,
223
+ "val_loss": 1.2285107204947112,
224
+ "val_accuracy": 0.5615402787621072
225
+ },
226
+ {
227
+ "epoch": 7,
228
+ "train_loss": 0.995308576143376,
229
+ "train_acc": 0.5689461883408071,
230
+ "val_macro_f1": 0.5246287483695703,
231
+ "val_loss": 1.2055314002915454,
232
+ "val_accuracy": 0.5783132530120482
233
+ },
234
+ {
235
+ "epoch": 8,
236
+ "train_loss": 0.9806407083188861,
237
+ "train_acc": 0.5772523440684876,
238
+ "val_macro_f1": 0.5177401992468048,
239
+ "val_loss": 1.2124249308257626,
240
+ "val_accuracy": 0.5719347980155918
241
+ },
242
+ {
243
+ "epoch": 9,
244
+ "train_loss": 0.9580581295116065,
245
+ "train_acc": 0.5807174887892377,
246
+ "val_macro_f1": 0.5221761914804327,
247
+ "val_loss": 1.202580178335147,
248
+ "val_accuracy": 0.5797306874557052
249
+ },
250
+ {
251
+ "epoch": 10,
252
+ "train_loss": 0.9406894806497592,
253
+ "train_acc": 0.5895332246229107,
254
+ "val_macro_f1": 0.5170603831264983,
255
+ "val_loss": 1.198815422158573,
256
+ "val_accuracy": 0.5783132530120482
257
+ }
258
+ ],
259
+ "stage_c": [
260
+ {
261
+ "epoch": 1,
262
+ "train_loss": 1.009486005250843,
263
+ "train_acc": 0.5692519364044027,
264
+ "val_macro_f1": 0.5106719108224338,
265
+ "val_loss": 1.1896873432258441,
266
+ "val_accuracy": 0.5797306874557052
267
+ },
268
+ {
269
+ "epoch": 2,
270
+ "train_loss": 0.9734869377009489,
271
+ "train_acc": 0.5805136567468406,
272
+ "val_macro_f1": 0.501623256775662,
273
+ "val_loss": 1.214058896983005,
274
+ "val_accuracy": 0.5620127569099929
275
+ },
276
+ {
277
+ "epoch": 3,
278
+ "train_loss": 0.9619662467431302,
279
+ "train_acc": 0.5770994700366898,
280
+ "val_macro_f1": 0.5229433422000689,
281
+ "val_loss": 1.2075352982749703,
282
+ "val_accuracy": 0.5780770139381054
283
+ },
284
+ {
285
+ "epoch": 4,
286
+ "train_loss": 0.9472926469224353,
287
+ "train_acc": 0.5816856909906237,
288
+ "val_macro_f1": 0.5203069133009239,
289
+ "val_loss": 1.19137683074447,
290
+ "val_accuracy": 0.5835105126387905
291
+ },
292
+ {
293
+ "epoch": 5,
294
+ "train_loss": 0.935545642668502,
295
+ "train_acc": 0.5829596412556054,
296
+ "val_macro_f1": 0.5257938381975142,
297
+ "val_loss": 1.184187990306801,
298
+ "val_accuracy": 0.5832742735648476
299
+ },
300
+ {
301
+ "epoch": 6,
302
+ "train_loss": 0.9094976569942392,
303
+ "train_acc": 0.5960558499796168,
304
+ "val_macro_f1": 0.5200620740355609,
305
+ "val_loss": 1.2047344187387663,
306
+ "val_accuracy": 0.5830380344909047
307
+ },
308
+ {
309
+ "epoch": 7,
310
+ "train_loss": 0.8874516203799055,
311
+ "train_acc": 0.6018141051773339,
312
+ "val_macro_f1": 0.525057744430925,
313
+ "val_loss": 1.2007991145867674,
314
+ "val_accuracy": 0.5818568391211907
315
+ },
316
+ {
317
+ "epoch": 8,
318
+ "train_loss": 0.8860346450156403,
319
+ "train_acc": 0.6054830819404811,
320
+ "val_macro_f1": 0.523414227808246,
321
+ "val_loss": 1.1990280199972854,
322
+ "val_accuracy": 0.5844554689345618
323
+ },
324
+ {
325
+ "epoch": 9,
326
+ "train_loss": 0.8849416517793428,
327
+ "train_acc": 0.6044639217284957,
328
+ "val_macro_f1": 0.5248536522877667,
329
+ "val_loss": 1.1869062702771045,
330
+ "val_accuracy": 0.5882352941176471
331
+ },
332
+ {
333
+ "epoch": 10,
334
+ "train_loss": 0.8992554767249411,
335
+ "train_acc": 0.6046167957602935,
336
+ "val_macro_f1": 0.5268057868156721,
337
+ "val_loss": 1.1879879839941876,
338
+ "val_accuracy": 0.5891802504134184
339
+ }
340
+ ]
341
+ },
342
+ "confusion_matrix": [
343
+ [
344
+ 51,
345
+ 0,
346
+ 0,
347
+ 0,
348
+ 1,
349
+ 21,
350
+ 0,
351
+ 0,
352
+ 0,
353
+ 1,
354
+ 0,
355
+ 0,
356
+ 7,
357
+ 0,
358
+ 0,
359
+ 0,
360
+ 0,
361
+ 0
362
+ ],
363
+ [
364
+ 0,
365
+ 58,
366
+ 3,
367
+ 2,
368
+ 2,
369
+ 0,
370
+ 1,
371
+ 0,
372
+ 0,
373
+ 1,
374
+ 0,
375
+ 1,
376
+ 6,
377
+ 0,
378
+ 4,
379
+ 4,
380
+ 4,
381
+ 4
382
+ ],
383
+ [
384
+ 0,
385
+ 1,
386
+ 25,
387
+ 0,
388
+ 4,
389
+ 0,
390
+ 7,
391
+ 5,
392
+ 0,
393
+ 7,
394
+ 3,
395
+ 0,
396
+ 16,
397
+ 1,
398
+ 6,
399
+ 0,
400
+ 0,
401
+ 9
402
+ ],
403
+ [
404
+ 0,
405
+ 4,
406
+ 2,
407
+ 36,
408
+ 11,
409
+ 18,
410
+ 1,
411
+ 1,
412
+ 0,
413
+ 3,
414
+ 1,
415
+ 0,
416
+ 0,
417
+ 3,
418
+ 0,
419
+ 1,
420
+ 0,
421
+ 0
422
+ ],
423
+ [
424
+ 0,
425
+ 2,
426
+ 0,
427
+ 22,
428
+ 48,
429
+ 0,
430
+ 0,
431
+ 5,
432
+ 0,
433
+ 1,
434
+ 0,
435
+ 0,
436
+ 0,
437
+ 0,
438
+ 0,
439
+ 0,
440
+ 0,
441
+ 3
442
+ ],
443
+ [
444
+ 20,
445
+ 0,
446
+ 6,
447
+ 15,
448
+ 6,
449
+ 83,
450
+ 0,
451
+ 4,
452
+ 0,
453
+ 3,
454
+ 0,
455
+ 0,
456
+ 9,
457
+ 5,
458
+ 0,
459
+ 1,
460
+ 0,
461
+ 1
462
+ ],
463
+ [
464
+ 0,
465
+ 1,
466
+ 0,
467
+ 0,
468
+ 0,
469
+ 0,
470
+ 43,
471
+ 0,
472
+ 0,
473
+ 0,
474
+ 0,
475
+ 5,
476
+ 0,
477
+ 0,
478
+ 18,
479
+ 0,
480
+ 0,
481
+ 6
482
+ ],
483
+ [
484
+ 0,
485
+ 5,
486
+ 1,
487
+ 6,
488
+ 1,
489
+ 1,
490
+ 0,
491
+ 66,
492
+ 0,
493
+ 13,
494
+ 14,
495
+ 2,
496
+ 17,
497
+ 10,
498
+ 5,
499
+ 1,
500
+ 4,
501
+ 10
502
+ ],
503
+ [
504
+ 1,
505
+ 0,
506
+ 0,
507
+ 1,
508
+ 0,
509
+ 2,
510
+ 0,
511
+ 0,
512
+ 53,
513
+ 0,
514
+ 0,
515
+ 0,
516
+ 0,
517
+ 0,
518
+ 0,
519
+ 0,
520
+ 0,
521
+ 0
522
+ ],
523
+ [
524
+ 4,
525
+ 1,
526
+ 9,
527
+ 3,
528
+ 0,
529
+ 5,
530
+ 0,
531
+ 9,
532
+ 0,
533
+ 308,
534
+ 71,
535
+ 3,
536
+ 64,
537
+ 6,
538
+ 3,
539
+ 3,
540
+ 0,
541
+ 2
542
+ ],
543
+ [
544
+ 3,
545
+ 9,
546
+ 17,
547
+ 1,
548
+ 1,
549
+ 8,
550
+ 0,
551
+ 64,
552
+ 0,
553
+ 209,
554
+ 607,
555
+ 0,
556
+ 168,
557
+ 6,
558
+ 1,
559
+ 0,
560
+ 0,
561
+ 10
562
+ ],
563
+ [
564
+ 0,
565
+ 2,
566
+ 0,
567
+ 0,
568
+ 1,
569
+ 0,
570
+ 2,
571
+ 0,
572
+ 0,
573
+ 1,
574
+ 0,
575
+ 4,
576
+ 1,
577
+ 9,
578
+ 11,
579
+ 0,
580
+ 0,
581
+ 1
582
+ ],
583
+ [
584
+ 9,
585
+ 5,
586
+ 50,
587
+ 6,
588
+ 3,
589
+ 4,
590
+ 0,
591
+ 22,
592
+ 0,
593
+ 123,
594
+ 80,
595
+ 11,
596
+ 243,
597
+ 6,
598
+ 3,
599
+ 2,
600
+ 0,
601
+ 9
602
+ ],
603
+ [
604
+ 0,
605
+ 1,
606
+ 3,
607
+ 1,
608
+ 1,
609
+ 0,
610
+ 0,
611
+ 0,
612
+ 0,
613
+ 2,
614
+ 0,
615
+ 13,
616
+ 5,
617
+ 24,
618
+ 0,
619
+ 0,
620
+ 0,
621
+ 6
622
+ ],
623
+ [
624
+ 0,
625
+ 2,
626
+ 1,
627
+ 0,
628
+ 3,
629
+ 0,
630
+ 49,
631
+ 4,
632
+ 0,
633
+ 0,
634
+ 0,
635
+ 4,
636
+ 0,
637
+ 1,
638
+ 47,
639
+ 0,
640
+ 0,
641
+ 21
642
+ ],
643
+ [
644
+ 1,
645
+ 2,
646
+ 0,
647
+ 1,
648
+ 0,
649
+ 0,
650
+ 0,
651
+ 2,
652
+ 2,
653
+ 0,
654
+ 0,
655
+ 0,
656
+ 1,
657
+ 0,
658
+ 0,
659
+ 479,
660
+ 62,
661
+ 1
662
+ ],
663
+ [
664
+ 0,
665
+ 0,
666
+ 0,
667
+ 2,
668
+ 1,
669
+ 0,
670
+ 0,
671
+ 8,
672
+ 0,
673
+ 0,
674
+ 0,
675
+ 0,
676
+ 0,
677
+ 0,
678
+ 0,
679
+ 77,
680
+ 87,
681
+ 0
682
+ ],
683
+ [
684
+ 0,
685
+ 14,
686
+ 4,
687
+ 2,
688
+ 0,
689
+ 2,
690
+ 6,
691
+ 13,
692
+ 1,
693
+ 1,
694
+ 1,
695
+ 7,
696
+ 3,
697
+ 9,
698
+ 20,
699
+ 1,
700
+ 0,
701
+ 27
702
+ ]
703
+ ],
704
+ "label_to_idx": {
705
+ "dhumri": 0,
706
+ "difficult": 1,
707
+ "drathung": 2,
708
+ "drudring": 3,
709
+ "druring": 4,
710
+ "druthung": 5,
711
+ "khyuyig": 6,
712
+ "multi_scripts": 7,
713
+ "non_tibetan": 8,
714
+ "peri": 9,
715
+ "petsuk": 10,
716
+ "trinyig": 11,
717
+ "tsegdrig": 12,
718
+ "tsugchung": 13,
719
+ "tsumachug": 14,
720
+ "uchen_sugdring": 15,
721
+ "uchen_sugthung": 16,
722
+ "yigchung": 17
723
+ },
724
+ "classification_report": " precision recall f1-score support\n\n dhumri 0.57 0.63 0.60 81\n difficult 0.54 0.64 0.59 90\n drathung 0.21 0.30 0.24 84\n drudring 0.37 0.44 0.40 81\n druring 0.58 0.59 0.59 81\n druthung 0.58 0.54 0.56 153\n khyuyig 0.39 0.59 0.47 73\n multi_scripts 0.33 0.42 0.37 156\n non_tibetan 0.95 0.93 0.94 57\n peri 0.46 0.63 0.53 491\n petsuk 0.78 0.55 0.65 1104\n trinyig 0.08 0.12 0.10 32\n tsegdrig 0.45 0.42 0.44 576\n tsugchung 0.30 0.43 0.35 56\n tsumachug 0.40 0.36 0.38 132\nuchen_sugdring 0.84 0.87 0.86 551\nuchen_sugthung 0.55 0.50 0.52 175\n yigchung 0.25 0.24 0.24 111\n\n accuracy 0.56 4084\n macro avg 0.48 0.51 0.49 4084\n weighted avg 0.59 0.56 0.57 4084\n"
725
+ }
patches_color/splits.json ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "label_to_idx": {
3
+ "dhumri": 0,
4
+ "difficult": 1,
5
+ "drathung": 2,
6
+ "drudring": 3,
7
+ "druring": 4,
8
+ "druthung": 5,
9
+ "khyuyig": 6,
10
+ "multi_scripts": 7,
11
+ "non_tibetan": 8,
12
+ "peri": 9,
13
+ "petsuk": 10,
14
+ "trinyig": 11,
15
+ "tsegdrig": 12,
16
+ "tsugchung": 13,
17
+ "tsumachug": 14,
18
+ "uchen_sugdring": 15,
19
+ "uchen_sugthung": 16,
20
+ "yigchung": 17
21
+ },
22
+ "idx_to_label": {
23
+ "0": "dhumri",
24
+ "1": "difficult",
25
+ "2": "drathung",
26
+ "3": "drudring",
27
+ "4": "druring",
28
+ "5": "druthung",
29
+ "6": "khyuyig",
30
+ "7": "multi_scripts",
31
+ "8": "non_tibetan",
32
+ "9": "peri",
33
+ "10": "petsuk",
34
+ "11": "trinyig",
35
+ "12": "tsegdrig",
36
+ "13": "tsugchung",
37
+ "14": "tsumachug",
38
+ "15": "uchen_sugdring",
39
+ "16": "uchen_sugthung",
40
+ "17": "yigchung"
41
+ },
42
+ "split_counts": {
43
+ "train": {
44
+ "dhumri": 363,
45
+ "difficult": 410,
46
+ "drathung": 458,
47
+ "drudring": 451,
48
+ "druring": 421,
49
+ "druthung": 710,
50
+ "khyuyig": 324,
51
+ "multi_scripts": 824,
52
+ "non_tibetan": 291,
53
+ "peri": 2348,
54
+ "petsuk": 5086,
55
+ "trinyig": 165,
56
+ "tsegdrig": 2766,
57
+ "tsugchung": 293,
58
+ "tsumachug": 604,
59
+ "uchen_sugdring": 2705,
60
+ "uchen_sugthung": 807,
61
+ "yigchung": 598
62
+ },
63
+ "val": {
64
+ "dhumri": 74,
65
+ "difficult": 72,
66
+ "drathung": 95,
67
+ "drudring": 97,
68
+ "druring": 89,
69
+ "druthung": 169,
70
+ "khyuyig": 70,
71
+ "multi_scripts": 182,
72
+ "non_tibetan": 52,
73
+ "peri": 514,
74
+ "petsuk": 1091,
75
+ "trinyig": 33,
76
+ "tsegdrig": 613,
77
+ "tsugchung": 56,
78
+ "tsumachug": 132,
79
+ "uchen_sugdring": 601,
80
+ "uchen_sugthung": 180,
81
+ "yigchung": 113
82
+ },
83
+ "test": {
84
+ "dhumri": 81,
85
+ "difficult": 90,
86
+ "drathung": 84,
87
+ "drudring": 81,
88
+ "druring": 81,
89
+ "druthung": 153,
90
+ "khyuyig": 73,
91
+ "multi_scripts": 156,
92
+ "non_tibetan": 57,
93
+ "peri": 491,
94
+ "petsuk": 1104,
95
+ "trinyig": 32,
96
+ "tsegdrig": 576,
97
+ "tsugchung": 56,
98
+ "tsumachug": 132,
99
+ "uchen_sugdring": 551,
100
+ "uchen_sugthung": 175,
101
+ "yigchung": 111
102
+ }
103
+ },
104
+ "exclude_manifest": "./benchmark_page_ids.json",
105
+ "excluded_label_count": 18,
106
+ "excluded_page_id_count": 88,
107
+ "skipped_excluded_files_by_class": {}
108
+ }
whole_page/confusion_matrix.csv ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ,dhumri,difficult,drathung,drudring,druring,druthung,khyuyig,multi_scripts,non_tibetan,peri,petsuk,trinyig,tsegdrig,tsugchung,tsumachug,uchen_sugdring,uchen_sugthung,yigchung
2
+ dhumri,7,0,0,0,0,5,0,0,0,0,0,0,2,0,0,0,0,0
3
+ difficult,0,22,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0
4
+ drathung,0,0,5,0,1,0,1,0,0,1,1,0,3,1,3,0,0,3
5
+ drudring,0,0,0,8,4,4,2,0,0,1,0,0,0,0,0,0,0,0
6
+ druring,0,0,0,1,16,0,0,0,0,0,0,0,0,0,0,0,0,0
7
+ druthung,6,0,3,3,0,16,0,1,0,2,0,0,0,0,0,0,0,0
8
+ khyuyig,0,2,0,0,0,0,11,0,0,0,0,0,0,0,3,0,0,0
9
+ multi_scripts,0,0,0,2,2,0,0,16,0,2,2,1,1,2,3,1,3,0
10
+ non_tibetan,0,0,0,1,0,0,0,0,27,0,0,0,0,0,0,0,0,0
11
+ peri,0,0,5,0,0,0,0,1,0,56,15,0,11,2,1,0,0,1
12
+ petsuk,0,2,3,2,0,1,0,19,0,44,105,0,30,1,1,0,0,0
13
+ trinyig,0,0,0,0,0,0,0,0,0,1,0,2,0,1,2,0,0,0
14
+ tsegdrig,1,0,16,2,0,1,0,9,0,20,13,2,43,2,0,0,0,3
15
+ tsugchung,0,0,0,0,0,0,0,0,0,0,0,4,0,6,1,0,0,0
16
+ tsumachug,0,0,0,0,1,0,9,1,0,0,0,3,0,0,8,0,0,4
17
+ uchen_sugdring,0,2,0,0,0,0,0,0,1,0,0,0,0,1,0,107,14,0
18
+ uchen_sugthung,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,13,21,0
19
+ yigchung,0,2,0,0,0,0,1,2,0,0,0,3,0,6,4,0,0,6
whole_page/confusion_matrix.png ADDED

Git LFS Details

  • SHA256: 8685c8b1287e0816eea11ed8a5b2959be657e296af1201ef444c51f9fbc1f7dd
  • Pointer size: 131 Bytes
  • Size of remote file: 169 kB
whole_page/final_model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:987f2d4139415491e3323eb8a6a622365d1b336897dfb07383d35146a2afb38f
3
+ size 86680521
whole_page/results.json ADDED
@@ -0,0 +1,725 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "experiment": "whole_page",
3
+ "model": "facebook/dinov3-vits16-pretrain-lvd1689m",
4
+ "num_classes": 18,
5
+ "best_val_checkpoint": "results/whole_page/best_stage_b_last_2_blocks.pt",
6
+ "val_macro_f1_at_selection": 0.5526159517101854,
7
+ "final_model_path": "results/whole_page/final_model.pt",
8
+ "test_metrics": {
9
+ "loss": 1.191628887755046,
10
+ "accuracy": 0.5710900473933649,
11
+ "macro_f1": 0.5123725993698094,
12
+ "weighted_f1": 0.5781527270486508
13
+ },
14
+ "history": {
15
+ "stage_a": [
16
+ {
17
+ "epoch": 1,
18
+ "train_loss": 2.20866057321474,
19
+ "train_acc": 0.31456456456456455,
20
+ "val_macro_f1": 0.39456076575655513,
21
+ "val_loss": 1.4671607864976495,
22
+ "val_accuracy": 0.48696682464454977
23
+ },
24
+ {
25
+ "epoch": 2,
26
+ "train_loss": 1.706618641589855,
27
+ "train_acc": 0.4216716716716717,
28
+ "val_macro_f1": 0.43415460414512386,
29
+ "val_loss": 1.276340951286786,
30
+ "val_accuracy": 0.5651658767772512
31
+ },
32
+ {
33
+ "epoch": 3,
34
+ "train_loss": 1.5752889993073824,
35
+ "train_acc": 0.4471971971971972,
36
+ "val_macro_f1": 0.44861434264049055,
37
+ "val_loss": 1.2922945689251073,
38
+ "val_accuracy": 0.5509478672985783
39
+ },
40
+ {
41
+ "epoch": 4,
42
+ "train_loss": 1.51657935490718,
43
+ "train_acc": 0.45595595595595595,
44
+ "val_macro_f1": 0.477222997128718,
45
+ "val_loss": 1.2938398139736664,
46
+ "val_accuracy": 0.5379146919431279
47
+ },
48
+ {
49
+ "epoch": 5,
50
+ "train_loss": 1.474987536937267,
51
+ "train_acc": 0.4637137137137137,
52
+ "val_macro_f1": 0.4801019078317697,
53
+ "val_loss": 1.2509917919104698,
54
+ "val_accuracy": 0.5604265402843602
55
+ },
56
+ {
57
+ "epoch": 6,
58
+ "train_loss": 1.3946579505015422,
59
+ "train_acc": 0.4964964964964965,
60
+ "val_macro_f1": 0.4615615264486478,
61
+ "val_loss": 1.2553648507990542,
62
+ "val_accuracy": 0.5545023696682464
63
+ },
64
+ {
65
+ "epoch": 7,
66
+ "train_loss": 1.4076038572761986,
67
+ "train_acc": 0.47597597597597596,
68
+ "val_macro_f1": 0.5026596529501132,
69
+ "val_loss": 1.2138842044848401,
70
+ "val_accuracy": 0.5746445497630331
71
+ },
72
+ {
73
+ "epoch": 8,
74
+ "train_loss": 1.367771333283013,
75
+ "train_acc": 0.49124124124124124,
76
+ "val_macro_f1": 0.502396524188393,
77
+ "val_loss": 1.1715130546081687,
78
+ "val_accuracy": 0.590047393364929
79
+ },
80
+ {
81
+ "epoch": 9,
82
+ "train_loss": 1.3250896964106593,
83
+ "train_acc": 0.49924924924924924,
84
+ "val_macro_f1": 0.4870819897689354,
85
+ "val_loss": 1.2463707618803774,
86
+ "val_accuracy": 0.5639810426540285
87
+ },
88
+ {
89
+ "epoch": 10,
90
+ "train_loss": 1.2848107659661614,
91
+ "train_acc": 0.5075075075075075,
92
+ "val_macro_f1": 0.5136245272249241,
93
+ "val_loss": 1.175841974421135,
94
+ "val_accuracy": 0.5829383886255924
95
+ },
96
+ {
97
+ "epoch": 11,
98
+ "train_loss": 1.247214116491713,
99
+ "train_acc": 0.5185185185185185,
100
+ "val_macro_f1": 0.5022088771755583,
101
+ "val_loss": 1.210867343920667,
102
+ "val_accuracy": 0.5758293838862559
103
+ },
104
+ {
105
+ "epoch": 12,
106
+ "train_loss": 1.2895851612568379,
107
+ "train_acc": 0.5092592592592593,
108
+ "val_macro_f1": 0.5025588475579801,
109
+ "val_loss": 1.1879199217846044,
110
+ "val_accuracy": 0.5770142180094787
111
+ },
112
+ {
113
+ "epoch": 13,
114
+ "train_loss": 1.2661696529245234,
115
+ "train_acc": 0.5125125125125125,
116
+ "val_macro_f1": 0.49546766292382316,
117
+ "val_loss": 1.173302908644292,
118
+ "val_accuracy": 0.5699052132701422
119
+ },
120
+ {
121
+ "epoch": 14,
122
+ "train_loss": 1.2424312025696427,
123
+ "train_acc": 0.5175175175175175,
124
+ "val_macro_f1": 0.5122207152797997,
125
+ "val_loss": 1.1634096268793983,
126
+ "val_accuracy": 0.5912322274881516
127
+ },
128
+ {
129
+ "epoch": 15,
130
+ "train_loss": 1.2361648449072011,
131
+ "train_acc": 0.5232732732732732,
132
+ "val_macro_f1": 0.5298881548218799,
133
+ "val_loss": 1.148757901801882,
134
+ "val_accuracy": 0.6042654028436019
135
+ },
136
+ {
137
+ "epoch": 16,
138
+ "train_loss": 1.2283220601392102,
139
+ "train_acc": 0.5232732732732732,
140
+ "val_macro_f1": 0.530170441052968,
141
+ "val_loss": 1.1368969322945834,
142
+ "val_accuracy": 0.6054502369668247
143
+ },
144
+ {
145
+ "epoch": 17,
146
+ "train_loss": 1.2229472460212172,
147
+ "train_acc": 0.5377877877877878,
148
+ "val_macro_f1": 0.5278305486788782,
149
+ "val_loss": 1.1526551461332781,
150
+ "val_accuracy": 0.6018957345971564
151
+ },
152
+ {
153
+ "epoch": 18,
154
+ "train_loss": 1.2219269069226775,
155
+ "train_acc": 0.5325325325325325,
156
+ "val_macro_f1": 0.5243067602535394,
157
+ "val_loss": 1.1563605318702228,
158
+ "val_accuracy": 0.5971563981042654
159
+ },
160
+ {
161
+ "epoch": 19,
162
+ "train_loss": 1.1975611028251227,
163
+ "train_acc": 0.5392892892892893,
164
+ "val_macro_f1": 0.5189097715335087,
165
+ "val_loss": 1.1478128331532411,
166
+ "val_accuracy": 0.5947867298578199
167
+ },
168
+ {
169
+ "epoch": 20,
170
+ "train_loss": 1.1934834290314484,
171
+ "train_acc": 0.5317817817817818,
172
+ "val_macro_f1": 0.5176953192505984,
173
+ "val_loss": 1.150610593257922,
174
+ "val_accuracy": 0.5924170616113744
175
+ }
176
+ ],
177
+ "stage_b": [
178
+ {
179
+ "epoch": 1,
180
+ "train_loss": 1.3155108488596476,
181
+ "train_acc": 0.501001001001001,
182
+ "val_macro_f1": 0.4685903796568659,
183
+ "val_loss": 1.23213265405447,
184
+ "val_accuracy": 0.566350710900474
185
+ },
186
+ {
187
+ "epoch": 2,
188
+ "train_loss": 1.2663482601220186,
189
+ "train_acc": 0.5135135135135135,
190
+ "val_macro_f1": 0.51159501034612,
191
+ "val_loss": 1.2073333534584225,
192
+ "val_accuracy": 0.5770142180094787
193
+ },
194
+ {
195
+ "epoch": 3,
196
+ "train_loss": 1.2039633168353214,
197
+ "train_acc": 0.5212712712712713,
198
+ "val_macro_f1": 0.5001515484504192,
199
+ "val_loss": 1.212121329601342,
200
+ "val_accuracy": 0.5758293838862559
201
+ },
202
+ {
203
+ "epoch": 4,
204
+ "train_loss": 1.237001917920671,
205
+ "train_acc": 0.5132632632632632,
206
+ "val_macro_f1": 0.5071135037099957,
207
+ "val_loss": 1.2374758641301737,
208
+ "val_accuracy": 0.5675355450236966
209
+ },
210
+ {
211
+ "epoch": 5,
212
+ "train_loss": 1.1571017785353943,
213
+ "train_acc": 0.5402902902902903,
214
+ "val_macro_f1": 0.5320860576681696,
215
+ "val_loss": 1.2000342935182473,
216
+ "val_accuracy": 0.5770142180094787
217
+ },
218
+ {
219
+ "epoch": 6,
220
+ "train_loss": 1.1839140787258282,
221
+ "train_acc": 0.5355355355355356,
222
+ "val_macro_f1": 0.5198682878490827,
223
+ "val_loss": 1.1651039225230284,
224
+ "val_accuracy": 0.5864928909952607
225
+ },
226
+ {
227
+ "epoch": 7,
228
+ "train_loss": 1.1373721201856573,
229
+ "train_acc": 0.5402902902902903,
230
+ "val_macro_f1": 0.5301861817792031,
231
+ "val_loss": 1.1298535276928219,
232
+ "val_accuracy": 0.6007109004739336
233
+ },
234
+ {
235
+ "epoch": 8,
236
+ "train_loss": 1.1343186245308265,
237
+ "train_acc": 0.5508008008008008,
238
+ "val_macro_f1": 0.5499040784647757,
239
+ "val_loss": 1.141982549174702,
240
+ "val_accuracy": 0.6018957345971564
241
+ },
242
+ {
243
+ "epoch": 9,
244
+ "train_loss": 1.076228421729606,
245
+ "train_acc": 0.5603103103103103,
246
+ "val_macro_f1": 0.5507236741532686,
247
+ "val_loss": 1.1353786454946508,
248
+ "val_accuracy": 0.6125592417061612
249
+ },
250
+ {
251
+ "epoch": 10,
252
+ "train_loss": 1.0556169597952216,
253
+ "train_acc": 0.5615615615615616,
254
+ "val_macro_f1": 0.5526159517101854,
255
+ "val_loss": 1.1300031246167224,
256
+ "val_accuracy": 0.6125592417061612
257
+ }
258
+ ],
259
+ "stage_c": [
260
+ {
261
+ "epoch": 1,
262
+ "train_loss": 1.1528259001455985,
263
+ "train_acc": 0.5357857857857858,
264
+ "val_macro_f1": 0.5497651661380282,
265
+ "val_loss": 1.1855918479756722,
266
+ "val_accuracy": 0.5924170616113744
267
+ },
268
+ {
269
+ "epoch": 2,
270
+ "train_loss": 1.0998183033607147,
271
+ "train_acc": 0.5528028028028028,
272
+ "val_macro_f1": 0.5383073695552381,
273
+ "val_loss": 1.2090052756088039,
274
+ "val_accuracy": 0.5817535545023697
275
+ },
276
+ {
277
+ "epoch": 3,
278
+ "train_loss": 1.0954274918820646,
279
+ "train_acc": 0.5598098098098098,
280
+ "val_macro_f1": 0.5516726999736293,
281
+ "val_loss": 1.1517153537668887,
282
+ "val_accuracy": 0.6137440758293838
283
+ },
284
+ {
285
+ "epoch": 4,
286
+ "train_loss": 1.0348763885918084,
287
+ "train_acc": 0.5695695695695696,
288
+ "val_macro_f1": 0.5324538841027082,
289
+ "val_loss": 1.1520833731827578,
290
+ "val_accuracy": 0.5924170616113744
291
+ },
292
+ {
293
+ "epoch": 5,
294
+ "train_loss": 1.0805193128528539,
295
+ "train_acc": 0.5645645645645646,
296
+ "val_macro_f1": 0.5350905236046971,
297
+ "val_loss": 1.1407714692337254,
298
+ "val_accuracy": 0.5959715639810427
299
+ },
300
+ {
301
+ "epoch": 6,
302
+ "train_loss": 1.054177247845494,
303
+ "train_acc": 0.5615615615615616,
304
+ "val_macro_f1": 0.5308509777096463,
305
+ "val_loss": 1.1249282184935294,
306
+ "val_accuracy": 0.5995260663507109
307
+ },
308
+ {
309
+ "epoch": 7,
310
+ "train_loss": 1.0641172317651895,
311
+ "train_acc": 0.5685685685685685,
312
+ "val_macro_f1": 0.5462144827909508,
313
+ "val_loss": 1.1365883440767983,
314
+ "val_accuracy": 0.6030805687203792
315
+ },
316
+ {
317
+ "epoch": 8,
318
+ "train_loss": 1.006467128480161,
319
+ "train_acc": 0.5795795795795796,
320
+ "val_macro_f1": 0.5399544885145189,
321
+ "val_loss": 1.1192362528841642,
322
+ "val_accuracy": 0.6054502369668247
323
+ },
324
+ {
325
+ "epoch": 9,
326
+ "train_loss": 0.9975554783781011,
327
+ "train_acc": 0.5818318318318318,
328
+ "val_macro_f1": 0.5521462690027934,
329
+ "val_loss": 1.1212286163845333,
330
+ "val_accuracy": 0.6101895734597157
331
+ },
332
+ {
333
+ "epoch": 10,
334
+ "train_loss": 1.0012436508535743,
335
+ "train_acc": 0.5745745745745746,
336
+ "val_macro_f1": 0.5513181019590729,
337
+ "val_loss": 1.122439599715138,
338
+ "val_accuracy": 0.6078199052132701
339
+ }
340
+ ]
341
+ },
342
+ "confusion_matrix": [
343
+ [
344
+ 7,
345
+ 0,
346
+ 0,
347
+ 0,
348
+ 0,
349
+ 5,
350
+ 0,
351
+ 0,
352
+ 0,
353
+ 0,
354
+ 0,
355
+ 0,
356
+ 2,
357
+ 0,
358
+ 0,
359
+ 0,
360
+ 0,
361
+ 0
362
+ ],
363
+ [
364
+ 0,
365
+ 22,
366
+ 1,
367
+ 0,
368
+ 1,
369
+ 0,
370
+ 0,
371
+ 0,
372
+ 1,
373
+ 0,
374
+ 0,
375
+ 0,
376
+ 0,
377
+ 0,
378
+ 0,
379
+ 0,
380
+ 0,
381
+ 0
382
+ ],
383
+ [
384
+ 0,
385
+ 0,
386
+ 5,
387
+ 0,
388
+ 1,
389
+ 0,
390
+ 1,
391
+ 0,
392
+ 0,
393
+ 1,
394
+ 1,
395
+ 0,
396
+ 3,
397
+ 1,
398
+ 3,
399
+ 0,
400
+ 0,
401
+ 3
402
+ ],
403
+ [
404
+ 0,
405
+ 0,
406
+ 0,
407
+ 8,
408
+ 4,
409
+ 4,
410
+ 2,
411
+ 0,
412
+ 0,
413
+ 1,
414
+ 0,
415
+ 0,
416
+ 0,
417
+ 0,
418
+ 0,
419
+ 0,
420
+ 0,
421
+ 0
422
+ ],
423
+ [
424
+ 0,
425
+ 0,
426
+ 0,
427
+ 1,
428
+ 16,
429
+ 0,
430
+ 0,
431
+ 0,
432
+ 0,
433
+ 0,
434
+ 0,
435
+ 0,
436
+ 0,
437
+ 0,
438
+ 0,
439
+ 0,
440
+ 0,
441
+ 0
442
+ ],
443
+ [
444
+ 6,
445
+ 0,
446
+ 3,
447
+ 3,
448
+ 0,
449
+ 16,
450
+ 0,
451
+ 1,
452
+ 0,
453
+ 2,
454
+ 0,
455
+ 0,
456
+ 0,
457
+ 0,
458
+ 0,
459
+ 0,
460
+ 0,
461
+ 0
462
+ ],
463
+ [
464
+ 0,
465
+ 2,
466
+ 0,
467
+ 0,
468
+ 0,
469
+ 0,
470
+ 11,
471
+ 0,
472
+ 0,
473
+ 0,
474
+ 0,
475
+ 0,
476
+ 0,
477
+ 0,
478
+ 3,
479
+ 0,
480
+ 0,
481
+ 0
482
+ ],
483
+ [
484
+ 0,
485
+ 0,
486
+ 0,
487
+ 2,
488
+ 2,
489
+ 0,
490
+ 0,
491
+ 16,
492
+ 0,
493
+ 2,
494
+ 2,
495
+ 1,
496
+ 1,
497
+ 2,
498
+ 3,
499
+ 1,
500
+ 3,
501
+ 0
502
+ ],
503
+ [
504
+ 0,
505
+ 0,
506
+ 0,
507
+ 1,
508
+ 0,
509
+ 0,
510
+ 0,
511
+ 0,
512
+ 27,
513
+ 0,
514
+ 0,
515
+ 0,
516
+ 0,
517
+ 0,
518
+ 0,
519
+ 0,
520
+ 0,
521
+ 0
522
+ ],
523
+ [
524
+ 0,
525
+ 0,
526
+ 5,
527
+ 0,
528
+ 0,
529
+ 0,
530
+ 0,
531
+ 1,
532
+ 0,
533
+ 56,
534
+ 15,
535
+ 0,
536
+ 11,
537
+ 2,
538
+ 1,
539
+ 0,
540
+ 0,
541
+ 1
542
+ ],
543
+ [
544
+ 0,
545
+ 2,
546
+ 3,
547
+ 2,
548
+ 0,
549
+ 1,
550
+ 0,
551
+ 19,
552
+ 0,
553
+ 44,
554
+ 105,
555
+ 0,
556
+ 30,
557
+ 1,
558
+ 1,
559
+ 0,
560
+ 0,
561
+ 0
562
+ ],
563
+ [
564
+ 0,
565
+ 0,
566
+ 0,
567
+ 0,
568
+ 0,
569
+ 0,
570
+ 0,
571
+ 0,
572
+ 0,
573
+ 1,
574
+ 0,
575
+ 2,
576
+ 0,
577
+ 1,
578
+ 2,
579
+ 0,
580
+ 0,
581
+ 0
582
+ ],
583
+ [
584
+ 1,
585
+ 0,
586
+ 16,
587
+ 2,
588
+ 0,
589
+ 1,
590
+ 0,
591
+ 9,
592
+ 0,
593
+ 20,
594
+ 13,
595
+ 2,
596
+ 43,
597
+ 2,
598
+ 0,
599
+ 0,
600
+ 0,
601
+ 3
602
+ ],
603
+ [
604
+ 0,
605
+ 0,
606
+ 0,
607
+ 0,
608
+ 0,
609
+ 0,
610
+ 0,
611
+ 0,
612
+ 0,
613
+ 0,
614
+ 0,
615
+ 4,
616
+ 0,
617
+ 6,
618
+ 1,
619
+ 0,
620
+ 0,
621
+ 0
622
+ ],
623
+ [
624
+ 0,
625
+ 0,
626
+ 0,
627
+ 0,
628
+ 1,
629
+ 0,
630
+ 9,
631
+ 1,
632
+ 0,
633
+ 0,
634
+ 0,
635
+ 3,
636
+ 0,
637
+ 0,
638
+ 8,
639
+ 0,
640
+ 0,
641
+ 4
642
+ ],
643
+ [
644
+ 0,
645
+ 2,
646
+ 0,
647
+ 0,
648
+ 0,
649
+ 0,
650
+ 0,
651
+ 0,
652
+ 1,
653
+ 0,
654
+ 0,
655
+ 0,
656
+ 0,
657
+ 1,
658
+ 0,
659
+ 107,
660
+ 14,
661
+ 0
662
+ ],
663
+ [
664
+ 0,
665
+ 0,
666
+ 0,
667
+ 0,
668
+ 1,
669
+ 0,
670
+ 0,
671
+ 1,
672
+ 0,
673
+ 0,
674
+ 0,
675
+ 0,
676
+ 0,
677
+ 0,
678
+ 0,
679
+ 13,
680
+ 21,
681
+ 0
682
+ ],
683
+ [
684
+ 0,
685
+ 2,
686
+ 0,
687
+ 0,
688
+ 0,
689
+ 0,
690
+ 1,
691
+ 2,
692
+ 0,
693
+ 0,
694
+ 0,
695
+ 3,
696
+ 0,
697
+ 6,
698
+ 4,
699
+ 0,
700
+ 0,
701
+ 6
702
+ ]
703
+ ],
704
+ "label_to_idx": {
705
+ "dhumri": 0,
706
+ "difficult": 1,
707
+ "drathung": 2,
708
+ "drudring": 3,
709
+ "druring": 4,
710
+ "druthung": 5,
711
+ "khyuyig": 6,
712
+ "multi_scripts": 7,
713
+ "non_tibetan": 8,
714
+ "peri": 9,
715
+ "petsuk": 10,
716
+ "trinyig": 11,
717
+ "tsegdrig": 12,
718
+ "tsugchung": 13,
719
+ "tsumachug": 14,
720
+ "uchen_sugdring": 15,
721
+ "uchen_sugthung": 16,
722
+ "yigchung": 17
723
+ },
724
+ "classification_report": " precision recall f1-score support\n\n dhumri 0.50 0.50 0.50 14\n difficult 0.73 0.88 0.80 25\n drathung 0.15 0.26 0.19 19\n drudring 0.42 0.42 0.42 19\n druring 0.62 0.94 0.74 17\n druthung 0.59 0.52 0.55 31\n khyuyig 0.46 0.69 0.55 16\n multi_scripts 0.32 0.46 0.38 35\n non_tibetan 0.93 0.96 0.95 28\n peri 0.44 0.61 0.51 92\n petsuk 0.77 0.50 0.61 208\n trinyig 0.13 0.33 0.19 6\n tsegdrig 0.48 0.38 0.43 112\n tsugchung 0.27 0.55 0.36 11\n tsumachug 0.31 0.31 0.31 26\nuchen_sugdring 0.88 0.86 0.87 125\nuchen_sugthung 0.55 0.58 0.57 36\n yigchung 0.35 0.25 0.29 24\n\n accuracy 0.57 844\n macro avg 0.50 0.56 0.51 844\n weighted avg 0.61 0.57 0.58 844\n"
725
+ }
whole_page/splits.json ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "label_to_idx": {
3
+ "dhumri": 0,
4
+ "difficult": 1,
5
+ "drathung": 2,
6
+ "drudring": 3,
7
+ "druring": 4,
8
+ "druthung": 5,
9
+ "khyuyig": 6,
10
+ "multi_scripts": 7,
11
+ "non_tibetan": 8,
12
+ "peri": 9,
13
+ "petsuk": 10,
14
+ "trinyig": 11,
15
+ "tsegdrig": 12,
16
+ "tsugchung": 13,
17
+ "tsumachug": 14,
18
+ "uchen_sugdring": 15,
19
+ "uchen_sugthung": 16,
20
+ "yigchung": 17
21
+ },
22
+ "idx_to_label": {
23
+ "0": "dhumri",
24
+ "1": "difficult",
25
+ "2": "drathung",
26
+ "3": "drudring",
27
+ "4": "druring",
28
+ "5": "druthung",
29
+ "6": "khyuyig",
30
+ "7": "multi_scripts",
31
+ "8": "non_tibetan",
32
+ "9": "peri",
33
+ "10": "petsuk",
34
+ "11": "trinyig",
35
+ "12": "tsegdrig",
36
+ "13": "tsugchung",
37
+ "14": "tsumachug",
38
+ "15": "uchen_sugdring",
39
+ "16": "uchen_sugthung",
40
+ "17": "yigchung"
41
+ },
42
+ "split_counts": {
43
+ "train": {
44
+ "dhumri": 70,
45
+ "difficult": 120,
46
+ "drathung": 91,
47
+ "drudring": 94,
48
+ "druring": 85,
49
+ "druthung": 145,
50
+ "khyuyig": 81,
51
+ "multi_scripts": 165,
52
+ "non_tibetan": 136,
53
+ "peri": 430,
54
+ "petsuk": 972,
55
+ "trinyig": 30,
56
+ "tsegdrig": 525,
57
+ "tsugchung": 55,
58
+ "tsumachug": 126,
59
+ "uchen_sugdring": 585,
60
+ "uchen_sugthung": 168,
61
+ "yigchung": 118
62
+ },
63
+ "val": {
64
+ "dhumri": 14,
65
+ "difficult": 25,
66
+ "drathung": 19,
67
+ "drudring": 19,
68
+ "druring": 17,
69
+ "druthung": 31,
70
+ "khyuyig": 16,
71
+ "multi_scripts": 35,
72
+ "non_tibetan": 28,
73
+ "peri": 92,
74
+ "petsuk": 208,
75
+ "trinyig": 6,
76
+ "tsegdrig": 112,
77
+ "tsugchung": 11,
78
+ "tsumachug": 26,
79
+ "uchen_sugdring": 125,
80
+ "uchen_sugthung": 36,
81
+ "yigchung": 24
82
+ },
83
+ "test": {
84
+ "dhumri": 14,
85
+ "difficult": 25,
86
+ "drathung": 19,
87
+ "drudring": 19,
88
+ "druring": 17,
89
+ "druthung": 31,
90
+ "khyuyig": 16,
91
+ "multi_scripts": 35,
92
+ "non_tibetan": 28,
93
+ "peri": 92,
94
+ "petsuk": 208,
95
+ "trinyig": 6,
96
+ "tsegdrig": 112,
97
+ "tsugchung": 11,
98
+ "tsumachug": 26,
99
+ "uchen_sugdring": 125,
100
+ "uchen_sugthung": 36,
101
+ "yigchung": 24
102
+ }
103
+ },
104
+ "exclude_manifest": "./benchmark_page_ids.json",
105
+ "excluded_label_count": 18,
106
+ "excluded_page_id_count": 88,
107
+ "skipped_excluded_files_by_class": {}
108
+ }