karma689 commited on 9 days ago

Commit

a1d2f62

verified ·

1 Parent(s): 0662896

Upload folder using huggingface_hub

Browse files

Files changed (18) hide show

.gitattributes +3 -0
README.md +150 -0
finetune_dinov3.py +965 -0
patches_clahe/checkpoint_page_eval.json +76 -0
patches_clahe/confusion_matrix.png +3 -0
patches_clahe/final_model.pt +3 -0
patches_clahe/results.json +725 -0
patches_clahe/splits.json +108 -0
patches_color/checkpoint_page_eval.json +76 -0
patches_color/confusion_matrix.png +3 -0
patches_color/final_model.pt +3 -0
patches_color/results.json +725 -0
patches_color/splits.json +108 -0
whole_page/confusion_matrix.csv +19 -0
whole_page/confusion_matrix.png +3 -0
whole_page/final_model.pt +3 -0
whole_page/results.json +725 -0
whole_page/splits.json +108 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+patches_clahe/confusion_matrix.png filter=lfs diff=lfs merge=lfs -text
+patches_color/confusion_matrix.png filter=lfs diff=lfs merge=lfs -text
+whole_page/confusion_matrix.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,150 @@

+# Tibetan Script Classifier (DINOv3)
+This repository contains fine-tuned Tibetan script classification checkpoints for 18 classes, trained from the DINOv3 ViT-S backbone:
+- Backbone: `facebook/dinov3-vits16-pretrain-lvd1689m`
+- Task: 18-way script classification
+- Training script included: `finetune_dinov3.py`
+**Hugging Face access:** DINOv3 requires access approval at [huggingface.co/facebook/dinov3-vits16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vits16-pretrain-lvd1689m) before `from_pretrained` / downloads will work. Anyone cloning this repo will see the same gated-model error until their HF account is granted access and they are logged in (`huggingface-cli login` or `HF_TOKEN`).
+## Label Set
+`dhumri`, `difficult`, `drathung`, `drudring`, `druring`, `druthung`, `khyuyig`, `multi_scripts`, `non_tibetan`, `peri`, `petsuk`, `trinyig`, `tsegdrig`, `tsugchung`, `tsumachug`, `uchen_sugdring`, `uchen_sugthung`, `yigchung`.
+## Preprocessing (per experiment)
+Images for training were produced as follows (see `preprocess.py` in the parent project):
+- **`whole_page`:** resize so the **short edge is 224 px**, then **center crop** to 224×224 (one crop per source page).
+- **`patches_color`:** same short-edge resize to 224, then **sliding-window** 224×224 patches with **25% overlap** between windows (multiple crops per page).
+- **`patches_clahe`:** identical patch layout as `patches_color`; each patch is converted to grayscale and **CLAHE** contrast normalization is applied (`clipLimit=2.0`, `tileGridSize=(8,8)`), then saved as BGR/RGB for training.
+## Training recipe
+- **Progressive unfreezing (defaults in `finetune_dinov3.py`):**
+  - **Stage A — head only:** 20 epochs, backbone frozen, classifier head LR **1e-3** (backbone LR 0).
+  - **Stage B — last 2 blocks:** 10 epochs, backbone LR **1e-5**, head LR **1e-3**.
+  - **Stage C — last 4 blocks:** 10 epochs, backbone LR **5e-6**, head LR **5e-4**.
+- **Loss:** class-weighted **cross-entropy** with inverse-frequency weights over the training split (`nn.CrossEntropyLoss(weight=...)`).
+- **Sampling:** the published runs use a standard `DataLoader` with **`shuffle=True`**. The script also defines **`get_weighted_sampler` → `WeightedRandomSampler`** if you want to switch the train loader to explicit class-balanced sampling.
+- **Document-aware augmentations (train only):** `RandomRotation` **±5°** (fill white), `ColorJitter` brightness/contrast **±20%** (`0.2`), plus `RandomResizedCrop` and light `RandomErasing` as in `ScriptDataset`; **no horizontal flip**.
+## Class distribution (`whole_page` split totals)
+The whole-page split has **5,684** samples in total (**train 3,996 / val 844 / test 844**). The **844** figure in test metrics is **only the test split** (roughly 15% of pages per class held out for testing), not the full dataset.
+The per-class table below sums **train + val + test** counts. A benchmark exclusion manifest (`benchmark_page_ids.json`, **88** page IDs across 18 classes) is consulted when building splits: any file whose page ID matches is skipped and counted in `splits.json` under `skipped_excluded_files_by_class`. For the published `whole_page/splits.json`, that skip map is **empty**—so either those pages were **not present** under the training `data_dir`, or they had **already been removed** before this split was built. The **5,684** totals are whatever files remained for stratified splitting.
+| Class | Samples |
+|---|---:|
+| dhumri | 98 |
+| difficult | 170 |
+| drathung | 129 |
+| drudring | 132 |
+| druring | 119 |
+| druthung | 207 |
+| khyuyig | 113 |
+| multi_scripts | 235 |
+| non_tibetan | 192 |
+| peri | 614 |
+| petsuk | 1388 |
+| trinyig | 42 |
+| tsegdrig | 749 |
+| tsugchung | 77 |
+| tsumachug | 178 |
+| uchen_sugdring | 835 |
+| uchen_sugthung | 240 |
+| yigchung | 166 |
+## Experiments Included
+### 1) `whole_page`
+- Files: `whole_page/final_model.pt`, `results.json`, `confusion_matrix.png`, `confusion_matrix.csv`, `splits.json`
+- Test (image-level) macro-F1: **0.5124**
+- Test accuracy: **0.5711**
+### 2) `patches_color`
+- Files: `patches_color/final_model.pt`, `results.json`, `confusion_matrix.png`, `checkpoint_page_eval.json`, `splits.json`
+- Test (patch-level) macro-F1: **0.4899**
+- Re-eval **page-level** macro-F1 for shipped `final_model.pt` (`checkpoint_page_eval.json`): **0.5017**
+- Best **page-level** macro-F1 among stage checkpoints on the same grid: **0.5043** (**Stage A**)
+### 3) `patches_clahe`
+- Files: `patches_clahe/final_model.pt`, `results.json`, `confusion_matrix.png`, `checkpoint_page_eval.json`, `splits.json`
+- Test (patch-level) macro-F1: **0.4911**
+- Re-eval page-level macro-F1 for shipped `final_model.pt`: **0.5261**
+- Best **page-level** macro-F1 among stage checkpoints: **0.529** (**Stage B**)
+## Which stage produced which checkpoint?
+- **`final_model.pt` in each folder** is the stage with the highest **validation macro-F1** among `best_stage_*.pt` checkpoints (see `best_val_checkpoint` in each `results.json`): **Stage B** for `whole_page`, **Stage C** for both `patches_color` and `patches_clahe`.
+- For **page-level** quality on the patch runs, the best single stage on the re-eval grid differs: **Stage A** (`patches_color`) and **Stage B** (`patches_clahe`) beat their respective `final_model.pt` page scores—use `checkpoint_page_eval.json` if you want to deploy a stage checkpoint instead of the val-selected default.
+## Which experiment won?
+CLAHE patches achieved the highest **page-level** macro-F1 (**0.529** on the best stage checkpoint), while **whole page** achieved the best **image-level** macro-F1 (**0.512**). **Whole page** is recommended for production due to simpler inference.
+## How To Load a Checkpoint
+```python
+import torch
+from pathlib import Path
+from finetune_dinov3 import DINOv3Classifier, DINOV3_MODEL_ID
+ckpt_path = Path("whole_page/final_model.pt")
+payload = torch.load(ckpt_path, map_location="cpu")
+label_to_idx = payload["label_to_idx"]
+idx_to_label = {v: k for k, v in label_to_idx.items()}
+num_classes = len(label_to_idx)
+model = DINOv3Classifier(DINOV3_MODEL_ID, num_classes)
+model.load_state_dict(payload["model_state_dict"])
+model.eval()
+```
+## Inference (Single Image)
+```python
+import torch
+from PIL import Image
+from transformers import AutoImageProcessor
+processor = AutoImageProcessor.from_pretrained(DINOV3_MODEL_ID)
+img = Image.open("example.png").convert("RGB")
+inputs = processor(images=img, return_tensors="pt")
+with torch.no_grad():
+    logits = model(inputs["pixel_values"])
+    probs = torch.softmax(logits, dim=1)[0].cpu().numpy()
+pred_idx = int(probs.argmax())
+pred_label = idx_to_label[pred_idx]
+print(pred_label, float(probs[pred_idx]))
+```
+## Page-Level Inference (Patch Aggregation)
+For patch experiments (`patches_color`, `patches_clahe`), aggregate by page stem:
+1. group patch probabilities by page ID (strip `_pN` suffix),
+2. average probabilities per page,
+3. take `argmax` of averaged probabilities.
+This is the same logic used in the re-evaluation script output (`checkpoint_page_eval.json`).
+## Known Limitations
+- Class imbalance is high (for example `petsuk` and `uchen_sugdring` dominate, while `trinyig` is small).
+- Results can vary by preprocessing variant and by patch vs page-level evaluation protocol.
+- Patch-level metrics and page-level metrics are not directly interchangeable.
+- The model expects Tibetan manuscript-style inputs; performance can drop on out-of-domain scans or mixed/noisy pages.
+- Checkpoints are tied to the exact label mapping saved in each payload (`label_to_idx`).
+## Reproducibility Notes
+- Exclusion manifest support is enabled in training (`benchmark_page_ids.json`).
+- Full training code used for these artifacts is included at `finetune_dinov3.py`.

finetune_dinov3.py ADDED Viewed

	@@ -0,0 +1,965 @@

+"""
+Dino V3 finetunning for script classification
+==============================================
+Progressive finetuning with page-level train/val/test split
+Runs on three preprocessed variants:
+    - whole page /
+    - patches color /
+    - patches_clahe/
+Usage:
+    #Exp1: whole page
+    python finetune_dinov3.py --data_dir ./Data/output/whole_page --experiment whole_page
+    #Exp2: patches_color
+    python finetune_dinov3.py --data_dir ./Data/output/patches_color --experiment patches_color
+    #Exp3: CLAHE_patches
+    python finetune_dinov3.py --data_dir ./Data/output/patches_clahe --experiment patches_clahe
+Outputs (under --output_dir/<experiment>/):
+    best_<stage_slug>.pt   — best val macro-F1 per stage
+    history_stage_{a,b,c}.json — per-epoch metrics per stage
+    training_history_stage_{a,b,c}.png — curves per stage
+    final_model.pt — weights chosen by best val across stages + test_metrics metadata
+    results.json, confusion_matrix.*, training_history.png (full run)
+Requirements:
+    pip install torch torchvision transformers scikit-learn matplotlib seaborn
+        # DINOv3 requires transformers >= 4.56.0
+    # If not available: pip install --upgrade git+https://github.com/huggingface/transformers.git
+"""
+import os
+import re
+import json
+import argparse
+import random
+from pathlib import Path
+from collections import Counter, defaultdict
+from datetime import datetime
+import numpy as np
+import torch
+import torch.nn as nn
+from torch.cuda.amp import GradScaler
+from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
+from torchvision import transforms
+from PIL import Image
+from sklearn.metrics import (classification_report, confusion_matrix, f1_score, accuracy_score )
+try:
+    from transformers import AutoImageProcessor, AutoModel
+except ImportError:
+    raise ImportError("transformers >= 4.56.0 required for DINOv3.\n"
+        "Install: pip install --upgrade git+https://github.com/huggingface/transformers.git"
+    )
+# =====================
+# CONFIG
+# =====================
+DINOV3_MODEL_ID = "facebook/dinov3-vits16-pretrain-lvd1689m"
+EMBEDDING_DIM = 384
+VALID_EXT = {'.jpg', '.jpeg', '.png', '.tif', '.tiff', '.bmp', '.webp'}
+SEED = 42
+def set_seed(seed):
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    if torch.cuda.is_available():
+        torch.cuda.manual_seed_all(seed)
+# ======================
+# Page level spliting
+# ======================
+def get_page_name(filepath):
+    """
+    Extract the original page name from a patch filename.
+    e.g., 'manuscript001_p3.png' → 'manuscript001'
+    e.g., 'manuscript001.png' → 'manuscript001'
+    This ensures all patches from the same page stay in the same split.
+    """
+    stem = Path(filepath).stem
+    page_name = re.sub(r'_p\d+$','',stem)
+    return page_name
+def normalize_label_key(label: str) -> str:
+    """Normalize class names for manifest lookup."""
+    return re.sub(r'[^a-z0-9]+', '_', label.lower()).strip('_')
+def load_exclusion_manifest(manifest_path: str):
+    """
+    Load class->page_ids exclusions from JSON.
+    Returns a dict keyed by normalized class labels.
+    """
+    if not manifest_path:
+        return {}
+    path = Path(manifest_path)
+    if not path.is_file():
+        print(f"  Exclusion manifest not found, skipping exclusions: {path}")
+        return {}
+    with open(path, "r", encoding="utf-8") as f:
+        raw = json.load(f)
+    if not isinstance(raw, dict):
+        raise ValueError(f"Exclusion manifest must be a JSON object: {path}")
+    manifest = {}
+    for label, ids in raw.items():
+        if not isinstance(ids, list):
+            continue
+        norm_label = normalize_label_key(str(label))
+        manifest[norm_label] = {str(x).strip() for x in ids if str(x).strip()}
+    return manifest
+def create_page_level(data_dir, val_ratio=0.15, test_ratio=0.15, seed=SEED, excluded_pages_by_label=None):
+    """
+    Split at the PAGE level, not the image/patch level.
+    All patches from one page go into the same split.
+    Returns:
+        splits: dict with 'train', 'val', 'test' keys
+                each value is a list of (filepath, label) tuples
+        label_to_idx: dict mapping label strings to integers
+    """
+    set_seed(seed)
+    data_dir = Path(data_dir)
+    class_pages = defaultdict(lambda: defaultdict(list))
+    skipped_by_label = Counter()
+    for cls_dir in sorted(data_dir.iterdir()):
+        if not cls_dir.is_dir() or cls_dir.name.startswith('.'):
+            continue
+        label = cls_dir.name
+        excluded_pages = set()
+        if excluded_pages_by_label:
+            excluded_pages = excluded_pages_by_label.get(normalize_label_key(label), set())
+        for img_path in sorted(cls_dir.iterdir()):
+            if img_path.suffix.lower() in VALID_EXT:
+                page = get_page_name(str(img_path))
+                if page in excluded_pages:
+                    skipped_by_label[label] += 1
+                    continue
+                class_pages[label][page].append(str(img_path))
+    # Create label mapping
+    labels = sorted(class_pages.keys())
+    label_to_idx = {label: idx for idx, label in enumerate(labels)}
+    idx_to_label = {idx: label for label, idx in label_to_idx.items()}
+    # Split pages per class (stratified)
+    splits = {'train': [], 'val': [], 'test': []}
+    for label in labels:
+        pages = list(class_pages[label].keys())
+        random.shuffle(pages)
+        n_pages = len(pages)
+        n_test = max(1, int(n_pages * test_ratio))
+        n_val = max(1, int(n_pages * val_ratio))
+        n_train = n_pages - n_test - n_val
+        test_pages = pages[:n_test]
+        val_pages = pages[n_test:n_test + n_val]
+        train_pages = pages[n_test + n_val:]
+        for page in train_pages:
+            for fpath in class_pages[label][page]:
+                splits['train'].append((fpath, label))
+        for page in val_pages:
+            for fpath in class_pages[label][page]:
+                splits['val'].append((fpath, label))
+        for page in test_pages:
+            for fpath in class_pages[label][page]:
+                splits['test'].append((fpath, label))
+    return splits, label_to_idx, idx_to_label, dict(skipped_by_label)
+class ScriptDataset(Dataset):
+    def __init__(self, samples, label_to_idx, processor, augment = False):
+        self.samples = samples
+        self.label_to_idx = label_to_idx
+        self.processor = processor
+        self.augment = augment
+        #Document aware augmentation
+        if augment:
+            self.aug_transform = transforms.Compose([
+                transforms.RandomRotation(degrees=5, fill=255),
+                transforms.ColorJitter(brightness=0.2, contrast=0.2),
+                transforms.RandomResizedCrop(224, scale=(0.7, 1.0), ratio=(0.9, 1.1)),
+                transforms.RandomErasing(p=0.1, scale=(0.02, 0.08)),
+            ])
+        else:
+            self.aug_transform = None
+    def __len__(self):
+        return len(self.samples)
+    def __getitem__(self, idx):
+        file_path,label_str = self.samples[idx]
+        #Load Image
+        img = Image.open(file_path).convert('RGB')
+        if self.aug_transform is not None and self.augment:
+            img = transforms.ToTensor()(img)
+            img = self.aug_transform(img)
+            img = transforms.ToPILImage()(img)
+        # Process with DINOv3 processor (resize, normalize)
+        inputs = self.processor(images=img, return_tensors="pt")
+        pixel_values = inputs['pixel_values'].squeeze(0)
+        label_idx = self.label_to_idx[label_str]
+        return pixel_values, label_idx
+class DINOv3Classifier(nn.Module):
+    """
+    DINOv3 ViT-S backbone + MLP classification head.
+    The backbone outputs:
+      - CLS token: 384-dim embedding (used for classification)
+      - Patch tokens: 196 × 384-dim (not used in this version)
+      - Register tokens: 4 × 384-dim (not used)
+    Classification head: 384 → 128 → num_classes
+    """
+    def __init__(self, model_id, num_classes, dropout=0.1):
+        super().__init__()
+        #Load pretrained backbone
+        self.backbone = AutoModel.from_pretrained(model_id)
+        #Get embedding dim
+        hidden_size = self.backbone.config.hidden_size
+        #Classification head
+        self.head = nn.Sequential(
+                nn.LayerNorm(hidden_size),
+                nn.Dropout(dropout),
+                nn.Linear(hidden_size, 128),
+                nn.GELU(),
+                nn.Dropout(dropout),
+                nn.Linear(128, num_classes),
+            )
+        self.freeze_backbone()
+    def freeze_backbone(self):
+        """Freeze all the backbone paramenters"""
+        for params in self.backbone.parameters():
+            params.requires_grad = False
+    def unfreeze_last_n_blocks(self, n):
+        """
+        Unfreeze the last N transformer blocks.
+        DINOv3 ViT-S has 12 blocks (layers).
+        """
+        # First freeze everything
+        self.freeze_backbone()
+        # HF DINOv3ViTModel: blocks at backbone.model.layer, final norm at backbone.norm
+        # (not ViT/BERT-style backbone.encoder.layer).
+        if hasattr(self.backbone, "model") and hasattr(self.backbone.model, "layer"):
+            layers = self.backbone.model.layer
+        elif hasattr(self.backbone, "encoder") and hasattr(self.backbone.encoder, "layer"):
+            layers = self.backbone.encoder.layer
+        else:
+            raise AttributeError(
+                "Backbone has no recognizable transformer blocks "
+                "(expected .model.layer for DINOv3 or .encoder.layer for ViT/BERT)."
+            )
+        total_layers = len(layers)
+        for i in range(max(0, total_layers - n), total_layers):
+            for param in layers[i].parameters():
+                param.requires_grad = True
+        if hasattr(self.backbone, "norm"):
+            for param in self.backbone.norm.parameters():
+                param.requires_grad = True
+        elif hasattr(self.backbone, "layernorm"):
+            for param in self.backbone.layernorm.parameters():
+                param.requires_grad = True
+    def forward(self, pixel_values):
+        # Get backbone outputs
+        outputs = self.backbone(pixel_values=pixel_values)
+        # Use CLS token (first token)
+        cls_embedding = outputs.last_hidden_state[:, 0, :]
+        # Classify
+        logits = self.head(cls_embedding)
+        return logits
+# ====================================
+# Tranining
+# ====================================
+def get_class_weights(samples, label_to_idx, device):
+    """Compute inverse-frequency class weights for balanced training."""
+    counts = Counter(label for _, label in samples)
+    total = sum(counts.values())
+    weights = torch.zeros(len(label_to_idx), device=device)
+    for label, idx in label_to_idx.items():
+        cnt = max(counts.get(label, 1), 1)
+        weights[idx] = total / (len(label_to_idx) * cnt)
+    return weights
+def get_weighted_sampler(samples, label_to_idx):
+    """WeightedRandomSampler for balanced batches."""
+    counts = Counter(label for _, label in samples)
+    total = sum(counts.values())
+    class_weights = {label: total / count for label, count in counts.items()}
+    sample_weights = [class_weights[label] for _, label in samples]
+    return WeightedRandomSampler(sample_weights, len(samples), replacement=True)
+def train_one_epoch(model, loader, criterion, optimizer, device, scaler=None):
+    """Train for one epoch with optional mixed precision."""
+    model.train()
+    total_loss = 0
+    correct = 0
+    total = 0
+    for batch_idx, (images,labels) in enumerate(loader):
+        images = images.to(device)
+        labels = labels.to(device)
+        optimizer.zero_grad()
+        if scaler:
+            with torch.autocast(device_type='cuda', dtype=torch.float16):
+                logits = model(images)
+                loss = criterion(logits, labels)
+            scaler.scale(loss).backward()
+            scaler.step(optimizer)
+            scaler.update()
+        else:
+            logits = model(images)
+            loss = criterion(logits, labels)
+            loss.backward()
+            optimizer.step()
+        total_loss += loss.item() * images.size(0)
+        _, predicted = logits.max(1)
+        correct += predicted.eq(labels).sum().item()
+        total += labels.size(0)
+        if(batch_idx + 1) % 50 == 0:
+            print(f" batch {batch_idx+1}/{len(loader)} | "
+                  f"loss: {loss.item():.4f} | acc: {correct/total:.3f}")
+    return total_loss / total, correct / total
+def _stage_checkpoint_slug(stage_name: str) -> str:
+    """Stable filename fragment (no spaces/colons) for checkpoint paths."""
+    s = re.sub(r"[^a-z0-9]+", "_", stage_name.lower())
+    return re.sub(r"_+", "_", s).strip("_")
+@torch.no_grad()
+def evaluate(model, loader, criterion, device, idx_to_label=None):
+    """Return validation/test metrics and per-sample preds, labels, probs."""
+    model.eval()
+    total_loss = 0.0
+    total = 0
+    all_preds = []
+    all_labels = []
+    all_probs = []
+    for images, labels in loader:
+        images = images.to(device)
+        labels = labels.to(device)
+        logits = model(images)
+        loss = criterion(logits, labels)
+        bs = images.size(0)
+        total_loss += loss.item() * bs
+        total += bs
+        probs = torch.softmax(logits, dim=1)
+        pred = logits.argmax(dim=1)
+        all_preds.extend(pred.cpu().numpy().tolist())
+        all_labels.extend(labels.cpu().numpy().tolist())
+        all_probs.extend(probs.cpu().numpy().tolist())
+    avg_loss = total_loss / max(total, 1)
+    acc = accuracy_score(all_labels, all_preds)
+    macro_f1 = f1_score(all_labels, all_preds, average="macro", zero_division=0)
+    weighted_f1 = f1_score(all_labels, all_preds, average="weighted", zero_division=0)
+    metrics = {
+        "loss": float(avg_loss),
+        "accuracy": float(acc),
+        "macro_f1": float(macro_f1),
+        "weighted_f1": float(weighted_f1),
+    }
+    return metrics, all_preds, all_labels, all_probs
+def evaluate_page_level(samples, probs, label_to_idx, idx_to_label):
+    """
+    Aggregate patch-level probabilities to page-level predictions.
+    Args:
+        samples: list of (filepath, label_str) for the evaluated split.
+        probs: list of per-sample probability vectors (same order as samples).
+    """
+    if len(samples) != len(probs):
+        raise ValueError(
+            f"samples/probs length mismatch: {len(samples)} != {len(probs)}"
+        )
+    page_preds = defaultdict(list)
+    page_labels = {}
+    # Page-level true labels from file stems
+    for filepath, label_str in samples:
+        page = get_page_name(filepath)
+        page_labels[page] = label_to_idx[label_str]
+    # Group probabilities by page
+    for (filepath, _), p in zip(samples, probs):
+        page = get_page_name(filepath)
+        page_preds[page].append(np.asarray(p, dtype=np.float32))
+    pages_sorted = sorted(page_preds.keys())
+    all_page_true = []
+    all_page_pred = []
+    page_avg_probs = {}
+    for page in pages_sorted:
+        avg_probs = np.mean(page_preds[page], axis=0)
+        pred_idx = int(np.argmax(avg_probs))
+        true_idx = int(page_labels[page])
+        all_page_true.append(true_idx)
+        all_page_pred.append(pred_idx)
+        page_avg_probs[page] = avg_probs.tolist()
+    acc = accuracy_score(all_page_true, all_page_pred)
+    macro_f1 = f1_score(all_page_true, all_page_pred, average="macro", zero_division=0)
+    weighted_f1 = f1_score(all_page_true, all_page_pred, average="weighted", zero_division=0)
+    metrics = {
+        "accuracy": float(acc),
+        "macro_f1": float(macro_f1),
+        "weighted_f1": float(weighted_f1),
+        "num_pages": int(len(pages_sorted)),
+        "num_samples": int(len(samples)),
+    }
+    return {
+        "metrics": metrics,
+        "pages": pages_sorted,
+        "page_true": all_page_true,
+        "page_pred": all_page_pred,
+        "page_avg_probs": page_avg_probs,
+    }
+#============================
+# Progressive fine-tunning
+#============================
+def run_stage(model, train_loader, val_loader, criterion, device, stage_name, lr_backbone, lr_head, epochs, output_dir, idx_to_label, use_amp=True):
+    """Run one stage of progressive fine-tuning."""
+    print(f"\n{'='*60}")
+    print(f" {stage_name}")
+    print(f"{'='*60}")
+    # Set up optimizer with different LRs for backbone and head
+    param_groups = []
+    backbone_params = [p for p in model.backbone.parameters() if p.requires_grad]
+    head_params = list(model.head.parameters())
+    if backbone_params:
+        param_groups.append({'params': backbone_params, 'lr': lr_backbone})
+        print(f" Backbone params (trainable): {sum(p.numel() for p in backbone_params):,}")
+    param_groups.append({'params': head_params, 'lr': lr_head})
+    print(f" Head params: {sum(p.numel() for p in head_params):,}")
+    print(f" LR backbone: {lr_backbone}, LR head: {lr_head}")
+    print(f" Epochs: {epochs}")
+    optimizer = torch.optim.AdamW(param_groups, weight_decay=0.01)
+    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=epochs)
+    scaler = torch.amp.GradScaler() if use_amp and device.type == 'cuda' else None
+    slug = _stage_checkpoint_slug(stage_name)
+    checkpoint_path = output_dir / f'best_{slug}.pt'
+    best_val_f1 = 0
+    best_epoch = 0
+    history = []
+    for epoch in range(epochs):
+        print(f"\n Epoch {epoch+1}/{epochs}")
+        train_loss, train_acc = train_one_epoch(
+                model, train_loader, criterion, optimizer, device, scaler
+        )
+        val_metrics, _, _, _ = evaluate(model, val_loader, criterion, device)
+        scheduler.step()
+        print(f" Train loss: {train_loss:.4f} | acc: {train_acc:.3f}")
+        print(f" Val loss: {val_metrics['loss']:.4f} | "
+              f"acc: {val_metrics['accuracy']:.3f} | "
+              f"macro-F1: {val_metrics['macro_f1']:.3f}")
+        history.append({
+              'epoch': epoch + 1,
+              'train_loss': train_loss,
+              'train_acc': train_acc,
+              'val_macro_f1': val_metrics['macro_f1'],
+              'val_loss': val_metrics['loss'],
+              'val_accuracy': val_metrics['accuracy'],
+        })
+        # Save best model (always use slug path so load paths in main() match)
+        if val_metrics['macro_f1'] > best_val_f1:
+            best_val_f1 = val_metrics['macro_f1']
+            best_epoch = epoch + 1
+            torch.save({
+                'model_state_dict': model.state_dict(),
+                'epoch': epoch + 1,
+                'val_macro_f1': best_val_f1,
+                'val_accuracy': val_metrics['accuracy'],
+                'stage_name': stage_name,
+                'stage_slug': slug,
+            }, checkpoint_path)
+            print(f" * New best! Saved to {checkpoint_path}")
+    print(f"\n {stage_name} complete. Best: epoch {best_epoch}, macro-F1: {best_val_f1:.3f}")
+    return history, best_val_f1
+# ==========================
+# MAIN
+# ==========================
+def _torch_load(path):
+    try:
+        return torch.load(path, weights_only=False)
+    except TypeError:
+        return torch.load(path)
+def _save_stage_history_json(output_dir: Path, stage_key: str, history: list) -> None:
+    """Write one JSON file per training stage (loss / val metrics per epoch)."""
+    path = output_dir / f'history_{stage_key}.json'
+    with open(path, 'w') as f:
+        json.dump(history, f, indent=2, default=str)
+    print(f"  Stage history saved: {path}")
+def _plot_stage_history(output_dir: Path, stage_key: str, history: list, experiment: str) -> None:
+    """Save train loss + val macro-F1 curves for a single stage."""
+    if not history:
+        return
+    try:
+        import matplotlib
+        matplotlib.use('Agg')
+        import matplotlib.pyplot as plt
+        epochs = [h['epoch'] for h in history]
+        train_loss = [h['train_loss'] for h in history]
+        val_f1 = [h['val_macro_f1'] for h in history]
+        fig, axes = plt.subplots(1, 2, figsize=(12, 4))
+        axes[0].plot(epochs, train_loss, 'b-')
+        axes[0].set_xlabel('Epoch')
+        axes[0].set_ylabel('Train loss')
+        axes[0].set_title(f'{stage_key} — train loss')
+        axes[1].plot(epochs, val_f1, 'g-')
+        axes[1].set_xlabel('Epoch')
+        axes[1].set_ylabel('Val macro-F1')
+        axes[1].set_title(f'{stage_key} — validation')
+        fig.suptitle(f'{experiment} / {stage_key}')
+        plt.tight_layout()
+        out_path = output_dir / f'training_history_{stage_key}.png'
+        plt.savefig(out_path, dpi=150)
+        plt.close()
+        print(f"  Stage plot saved: {out_path}")
+    except Exception as e:
+        print(f"  (Skipping stage plot for {stage_key}: {e})")
+def _save_stage_artifacts(output_dir: Path, stage_key: str, history: list, experiment: str) -> None:
+    _save_stage_history_json(output_dir, stage_key, history)
+    _plot_stage_history(output_dir, stage_key, history, experiment)
+def main():
+    parser = argparse.ArgumentParser(description="Fine-tune DINO ViT-S")
+    parser.add_argument(
+        "--data_dir", type=str, required=True,
+        help="Path to processed data (e.g., ./Data/output/whole_page)",
+    )
+    parser.add_argument(
+        "--experiment", type=str, required=True,
+        choices=["whole_page", "patches_color", "patches_clahe"],
+        help="Which experiment variant",
+    )
+    parser.add_argument("--output_dir", type=str, default="./results",
+                        help="Where to save checkpoints and results")
+    parser.add_argument("--batch_size", type=int, default=32,
+                        help="Batch size (reduce if OOM)")
+    parser.add_argument("--epochs_a", type=int, default=20,
+                        help="Epochs for Stage A (head only)")
+    parser.add_argument("--epochs_b", type=int, default=10,
+                        help="Epochs for Stage B (last 2 blocks)")
+    parser.add_argument("--epochs_c", type=int, default=10,
+                        help="Epochs for Stage C (last 4 blocks)")
+    parser.add_argument("--num_workers", type=int, default=4)
+    parser.add_argument("--no_amp", action="store_true",
+                        help="Disable mixed precision")
+    parser.add_argument("--skip_stage_c", action="store_true",
+                        help="Skip Stage C (last 4 blocks)")
+    parser.add_argument(
+        "--exclude_manifest",
+        type=str,
+        default="./benchmark_page_ids.json",
+        help="Optional class->page_ids JSON; excluded pages are skipped during split creation",
+    )
+    args = parser.parse_args()
+    stage_a_name = "Stage A: Head only"
+    stage_b_name = "Stage B: Last 2 blocks"
+    stage_c_name = "Stage C: Last 4 blocks"
+    set_seed(SEED)
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    output_dir = Path(args.output_dir) / args.experiment
+    output_dir.mkdir(parents=True, exist_ok=True)
+    print(f"\n{'='*60}")
+    print(f"  DINOv3 ViT-S Fine-Tuning")
+    print(f"  {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
+    print(f"{'='*60}")
+    print(f"  Experiment:  {args.experiment}")
+    print(f"  Data dir:    {args.data_dir}")
+    print(f"  Device:      {device}")
+    print(f"  Batch size:  {args.batch_size}")
+    print(f"  AMP:         {not args.no_amp}")
+    print(f"  Exclusions:  {args.exclude_manifest}")
+    # Page level split
+    print(f"\n Creating page level split")
+    excluded_pages_by_label = load_exclusion_manifest(args.exclude_manifest)
+    excluded_label_count = len(excluded_pages_by_label)
+    excluded_id_count = sum(len(v) for v in excluded_pages_by_label.values())
+    if excluded_label_count:
+        print(f"  Loaded exclusions: {excluded_label_count} labels, {excluded_id_count} page IDs")
+    splits, label_to_idx, idx_to_label, skipped_by_label = create_page_level(
+        args.data_dir,
+        excluded_pages_by_label=excluded_pages_by_label,
+    )
+    num_classes = len(label_to_idx)
+    print(f"  Classes: {num_classes}")
+    print(f"  Train: {len(splits['train'])} | Val: {len(splits['val'])} | Test: {len(splits['test'])}")
+    if skipped_by_label:
+        print("\n  Skipped excluded files by class:")
+        for label, count in sorted(skipped_by_label.items()):
+            print(f"    {label:<20s} {count:>6d}")
+    # Print per-class split counts
+    for split_name in ['train', 'val', 'test']:
+        counts = Counter(label for _, label in splits[split_name])
+        print(f"\n  {split_name}:")
+        for label in sorted(counts.keys()):
+            print(f"    {label:<20s} {counts[label]:>6d}")
+    # Save splits for reproducibility
+    splits_info = {
+        split_name: [(fp, label) for fp, label in samples]
+        for split_name, samples in splits.items()
+    }
+    with open(output_dir / 'splits.json', 'w') as f:
+        json.dump({
+            'label_to_idx': label_to_idx,
+            'idx_to_label': {str(k): v for k, v in idx_to_label.items()},
+            'split_counts': {
+                name: dict(Counter(l for _, l in samples))
+                for name, samples in splits.items()
+            },
+            'exclude_manifest': str(args.exclude_manifest),
+            'excluded_label_count': excluded_label_count,
+            'excluded_page_id_count': excluded_id_count,
+            'skipped_excluded_files_by_class': dict(skipped_by_label),
+        }, f, indent=2)
+    print(f"Loading DINOv3 processor: {DINOV3_MODEL_ID}")
+    processor = AutoImageProcessor.from_pretrained(DINOV3_MODEL_ID)
+    train_dataset = ScriptDataset(splits['train'], label_to_idx, processor, augment=True)
+    val_dataset = ScriptDataset(splits['val'], label_to_idx, processor, augment=False)
+    test_dataset = ScriptDataset(splits['test'], label_to_idx, processor, augment=False)
+    train_loader = DataLoader(
+        train_dataset, batch_size=args.batch_size, shuffle=True,
+        num_workers=args.num_workers, pin_memory=(device.type == 'cuda'),
+    )
+    val_loader = DataLoader(
+        val_dataset, batch_size=args.batch_size, shuffle=False,
+        num_workers=args.num_workers, pin_memory=(device.type == 'cuda'),
+    )
+    test_loader = DataLoader(
+        test_dataset, batch_size=args.batch_size, shuffle=False,
+        num_workers=args.num_workers, pin_memory=(device.type == 'cuda'),
+    )
+    print(f"\n  Building DINOv3 classifier ({num_classes} classes)...")
+    model = DINOv3Classifier(DINOV3_MODEL_ID, num_classes, dropout=0.1)
+    model = model.to(device)
+    total_params = sum(p.numel() for p in model.parameters())
+    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
+    print(f"  Total params:     {total_params:,}")
+    print(f"  Trainable params: {trainable_params:,} (head only)")
+    class_weights = get_class_weights(splits['train'], label_to_idx, device)
+    criterion = nn.CrossEntropyLoss(weight=class_weights)
+    use_amp = not args.no_amp and device.type == 'cuda'
+    all_history = {}
+    # Stage A: Head only (backbone frozen)
+    model.freeze_backbone()
+    history_a, best_f1_a = run_stage(
+            model, train_loader, val_loader, criterion, device,
+            stage_name=stage_a_name,
+            lr_backbone=0, lr_head=1e-3,
+            epochs=args.epochs_a, output_dir=output_dir,
+            idx_to_label=idx_to_label, use_amp=use_amp,
+    )
+    all_history['stage_a'] = history_a
+    _save_stage_artifacts(output_dir, 'stage_a', history_a, args.experiment)
+    ckpt_a = output_dir / f"best_{_stage_checkpoint_slug(stage_a_name)}.pt"
+    best_a = _torch_load(ckpt_a)
+    model.load_state_dict(best_a['model_state_dict'])
+    model.unfreeze_last_n_blocks(2)
+    history_b, best_f1_b = run_stage(
+            model, train_loader, val_loader, criterion, device,
+            stage_name=stage_b_name,
+            lr_backbone=1e-5, lr_head=1e-3,
+            epochs=args.epochs_b, output_dir=output_dir,
+            idx_to_label=idx_to_label, use_amp=use_amp,
+    )
+    all_history['stage_b'] = history_b
+    _save_stage_artifacts(output_dir, 'stage_b', history_b, args.experiment)
+    if not args.skip_stage_c:
+        ckpt_b = output_dir / f"best_{_stage_checkpoint_slug(stage_b_name)}.pt"
+        best_b = _torch_load(ckpt_b)
+        model.load_state_dict(best_b['model_state_dict'])
+        model.unfreeze_last_n_blocks(4)
+        history_c, best_f1_c = run_stage(
+                model, train_loader, val_loader, criterion, device,
+                stage_name=stage_c_name,
+                lr_backbone=5e-6, lr_head=5e-4,
+                epochs=args.epochs_c, output_dir=output_dir,
+                idx_to_label=idx_to_label, use_amp=use_amp,
+        )
+        all_history['stage_c'] = history_c
+        _save_stage_artifacts(output_dir, 'stage_c', history_c, args.experiment)
+    # Final evaluation on test set
+    print(f"\n{'='*60}")
+    print(f" FINAL TEST EVALUATION")
+    print(f"{'='*60}")
+    best_checkpoints = list(output_dir.glob('best_*.pt'))
+    best_f1 = 0.0
+    best_ckpt = None
+    for ckpt_path in best_checkpoints:
+        ckpt = _torch_load(ckpt_path)
+        if ckpt.get('val_macro_f1', 0) > best_f1:
+            best_f1 = ckpt['val_macro_f1']
+            best_ckpt = ckpt_path
+    if best_ckpt is None:
+        raise RuntimeError("No checkpoint found under output_dir; cannot run test evaluation.")
+    print(f" Loading best checkpoint: {best_ckpt} (val F1: {best_f1:.3f})")
+    model.load_state_dict(_torch_load(best_ckpt)['model_state_dict'])
+    test_metrics, test_preds, test_labels, test_probs = evaluate(
+            model, test_loader, criterion, device, idx_to_label
+    )
+    page_eval = evaluate_page_level(
+        splits['test'],
+        test_probs,
+        label_to_idx=label_to_idx,
+        idx_to_label=idx_to_label,
+    )
+    page_metrics = page_eval["metrics"]
+    # Canonical weights for this experiment (same as loaded best val checkpoint, after test eval)
+    final_model_path = output_dir / 'final_model.pt'
+    torch.save(
+        {
+            'model_state_dict': model.state_dict(),
+            'experiment': args.experiment,
+            'model_id': DINOV3_MODEL_ID,
+            'num_classes': num_classes,
+            'label_to_idx': label_to_idx,
+            'source_val_checkpoint': str(best_ckpt),
+            'val_macro_f1_at_selection': float(best_f1),
+            'test_metrics': test_metrics,
+            'page_test_metrics': page_metrics,
+        },
+        final_model_path,
+    )
+    print(f"\n  Final model (for deployment / comparison) saved: {final_model_path}")
+    print(f"\n Test accuracy: {test_metrics['accuracy']:.3f}")
+    print(f" Test macro-F1: {test_metrics['macro_f1']:.3f}")
+    print(f" Test weighted-F1: {test_metrics['weighted_f1']:.3f}")
+    print(f" Page accuracy: {page_metrics['accuracy']:.3f} "
+          f"| Page macro-F1: {page_metrics['macro_f1']:.3f} "
+          f"| Pages: {page_metrics['num_pages']}")
+    #Classification report
+    target_names = [idx_to_label[i] for i in range(num_classes)]
+    report = classification_report(
+            test_labels, test_preds, target_names=target_names, zero_division=0
+    )
+    print(f"\n{report}")
+    # Confusion matrix
+    cm = confusion_matrix(test_labels, test_preds)
+    page_cm = confusion_matrix(page_eval["page_true"], page_eval["page_pred"])
+    # Save everything
+    results = {
+            'experiment': args.experiment,
+            'model': DINOV3_MODEL_ID,
+            'num_classes': num_classes,
+            'best_val_checkpoint': str(best_ckpt),
+            'val_macro_f1_at_selection': float(best_f1),
+            'final_model_path': str(final_model_path),
+            'test_metrics': test_metrics,
+            'page_test_metrics': page_metrics,
+            'history': all_history,
+            'confusion_matrix': cm.tolist(),
+            'page_confusion_matrix': page_cm.tolist(),
+            'label_to_idx': label_to_idx,
+            'classification_report': report,
+            'page_classification_report': classification_report(
+                page_eval["page_true"], page_eval["page_pred"], target_names=target_names, zero_division=0
+            ),
+    }
+    with open(output_dir / 'results.json', 'w') as f:
+        json.dump(results, f, indent=2, default=str)
+    # Save confusion matrix as CSV
+    import pandas as pd
+    cm_df = pd.DataFrame(cm, index=target_names, columns=target_names)
+    cm_df.to_csv(output_dir / 'confusion_matrix.csv')
+    page_cm_df = pd.DataFrame(page_cm, index=target_names, columns=target_names)
+    page_cm_df.to_csv(output_dir / 'page_confusion_matrix.csv')
+    # Plot confusion matrix
+    try:
+        import matplotlib
+        matplotlib.use('Agg')
+        import matplotlib.pyplot as plt
+        import seaborn as sns
+        fig, ax = plt.subplots(figsize=(14, 12))
+        sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
+                    xticklabels=target_names, yticklabels=target_names, ax=ax)
+        ax.set_xlabel('Predicted label')
+        ax.set_ylabel('True label')
+        ax.set_title(f'Confusion Matrix — {args.experiment} (macro-F1: {test_metrics["macro_f1"]:.3f})')
+        plt.tight_layout()
+        plt.savefig(output_dir / 'confusion_matrix.png', dpi=150)
+        plt.close()
+        print(f"\n  Confusion matrix saved: {output_dir / 'confusion_matrix.png'}")
+    except ImportError:
+        print("  (matplotlib/seaborn not available, skipping plot)")
+    # Plot page-level confusion matrix
+    try:
+        fig, ax = plt.subplots(figsize=(14, 12))
+        sns.heatmap(page_cm, annot=True, fmt='d', cmap='Greens',
+                    xticklabels=target_names, yticklabels=target_names, ax=ax)
+        ax.set_xlabel('Predicted label')
+        ax.set_ylabel('True label')
+        ax.set_title(
+            f'Page Confusion Matrix — {args.experiment} '
+            f'(macro-F1: {page_metrics["macro_f1"]:.3f})'
+        )
+        plt.tight_layout()
+        plt.savefig(output_dir / 'page_confusion_matrix.png', dpi=150)
+        plt.close()
+        print(f"  Page confusion matrix saved: {output_dir / 'page_confusion_matrix.png'}")
+    except Exception:
+        pass
+    # Plot training history
+    try:
+        fig, axes = plt.subplots(1, 2, figsize=(14, 5))
+        all_epochs = []
+        all_train_loss = []
+        all_val_f1 = []
+        offset = 0
+        for stage_name, stage_history in all_history.items():
+            for entry in stage_history:
+                all_epochs.append(entry['epoch'] + offset)
+                all_train_loss.append(entry['train_loss'])
+                all_val_f1.append(entry['val_macro_f1'])
+            offset += len(stage_history)
+        axes[0].plot(all_epochs, all_train_loss, 'b-')
+        axes[0].set_xlabel('Epoch')
+        axes[0].set_ylabel('Train Loss')
+        axes[0].set_title('Training Loss')
+        axes[1].plot(all_epochs, all_val_f1, 'g-')
+        axes[1].set_xlabel('Epoch')
+        axes[1].set_ylabel('Macro F1')
+        axes[1].set_title('Validation Macro-F1')
+        plt.suptitle(f'{args.experiment} — Progressive Fine-Tuning')
+        plt.tight_layout()
+        plt.savefig(output_dir / 'training_history.png', dpi=150)
+        plt.close()
+        print(f"  Training history saved: {output_dir / 'training_history.png'}")
+    except Exception:
+        pass
+    print(f"\n{'='*60}")
+    print(f"  All results saved to: {output_dir}")
+    print(f"{'='*60}\n")
+if __name__ == "__main__":
+    main()

patches_clahe/checkpoint_page_eval.json ADDED Viewed

	@@ -0,0 +1,76 @@

+{
+  "experiment": "patches_clahe",
+  "data_dir": "./Data/output/patches_clahe",
+  "exclude_manifest": "./benchmark_page_ids.json",
+  "num_classes": 18,
+  "checkpoint_results": {
+    "best_stage_a_head_only.pt": {
+      "patch_metrics": {
+        "loss": 1.252472641594146,
+        "accuracy": 0.5506856023506367,
+        "macro_f1": 0.47513258745169057,
+        "weighted_f1": 0.5599438485560059
+      },
+      "page_metrics": {
+        "accuracy": 0.5793838862559242,
+        "macro_f1": 0.48727754394876716,
+        "weighted_f1": 0.5878127224538057,
+        "num_pages": 844,
+        "num_samples": 4084
+      },
+      "val_macro_f1_at_save": 0.5136114729148367,
+      "epoch_at_save": 18
+    },
+    "best_stage_b_last_2_blocks.pt": {
+      "patch_metrics": {
+        "loss": 1.2555613597156252,
+        "accuracy": 0.5614593535749265,
+        "macro_f1": 0.49372885021461854,
+        "weighted_f1": 0.5659541837806491
+      },
+      "page_metrics": {
+        "accuracy": 0.5995260663507109,
+        "macro_f1": 0.5293513765903048,
+        "weighted_f1": 0.6026293588445379,
+        "num_pages": 844,
+        "num_samples": 4084
+      },
+      "val_macro_f1_at_save": 0.5008343237274931,
+      "epoch_at_save": 7
+    },
+    "best_stage_c_last_4_blocks.pt": {
+      "patch_metrics": {
+        "loss": 1.2517013753546324,
+        "accuracy": 0.5631733594515181,
+        "macro_f1": 0.49105727133084387,
+        "weighted_f1": 0.5689654809805166
+      },
+      "page_metrics": {
+        "accuracy": 0.5995260663507109,
+        "macro_f1": 0.5260995491182711,
+        "weighted_f1": 0.6006737611724785,
+        "num_pages": 844,
+        "num_samples": 4084
+      },
+      "val_macro_f1_at_save": 0.5169021957485435,
+      "epoch_at_save": 5
+    },
+    "final_model.pt": {
+      "patch_metrics": {
+        "loss": 1.2517013753546324,
+        "accuracy": 0.5631733594515181,
+        "macro_f1": 0.49105727133084387,
+        "weighted_f1": 0.5689654809805166
+      },
+      "page_metrics": {
+        "accuracy": 0.5995260663507109,
+        "macro_f1": 0.5260995491182711,
+        "weighted_f1": 0.6006737611724785,
+        "num_pages": 844,
+        "num_samples": 4084
+      },
+      "val_macro_f1_at_save": -1.0,
+      "epoch_at_save": null
+    }
+  }
+}

patches_clahe/confusion_matrix.png ADDED Viewed

Git LFS Details

SHA256: 65a951b7cbf3810bba1f3768fa687abb92acf718842c352daabab3701767a45a
Pointer size: 131 Bytes
Size of remote file: 199 kB

patches_clahe/final_model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3aac9a9de02a2c21fcd88e5788c1a8389ccea84332381b75e168f784cabb3531
+size 86680521

patches_clahe/results.json ADDED Viewed

	@@ -0,0 +1,725 @@

+{
+  "experiment": "patches_clahe",
+  "model": "facebook/dinov3-vits16-pretrain-lvd1689m",
+  "num_classes": 18,
+  "best_val_checkpoint": "results/patches_clahe/best_stage_c_last_4_blocks.pt",
+  "val_macro_f1_at_selection": 0.5169021957485435,
+  "final_model_path": "results/patches_clahe/final_model.pt",
+  "test_metrics": {
+    "loss": 1.2517013753546324,
+    "accuracy": 0.5631733594515181,
+    "macro_f1": 0.49105727133084387,
+    "weighted_f1": 0.5689654809805166
+  },
+  "history": {
+    "stage_a": [
+      {
+        "epoch": 1,
+        "train_loss": 1.8593816012204836,
+        "train_acc": 0.362362413371382,
+        "val_macro_f1": 0.4319974508023712,
+        "val_loss": 1.4901656766267863,
+        "val_accuracy": 0.4875974486180014
+      },
+      {
+        "epoch": 2,
+        "train_loss": 1.5385081531852787,
+        "train_acc": 0.4296269873624134,
+        "val_macro_f1": 0.4376218690431116,
+        "val_loss": 1.3288157734638857,
+        "val_accuracy": 0.5206709189699976
+      },
+      {
+        "epoch": 3,
+        "train_loss": 1.462932827907147,
+        "train_acc": 0.4516918059518956,
+        "val_macro_f1": 0.500725408171822,
+        "val_loss": 1.2287645069611655,
+        "val_accuracy": 0.5598866052445074
+      },
+      {
+        "epoch": 4,
+        "train_loss": 1.4217422395349277,
+        "train_acc": 0.46045658377496945,
+        "val_macro_f1": 0.48274125175979077,
+        "val_loss": 1.2352880899777459,
+        "val_accuracy": 0.5454760217339948
+      },
+      {
+        "epoch": 5,
+        "train_loss": 1.3553591080213439,
+        "train_acc": 0.47141255605381166,
+        "val_macro_f1": 0.48175273735132706,
+        "val_loss": 1.2118230883516996,
+        "val_accuracy": 0.5553980628395937
+      },
+      {
+        "epoch": 6,
+        "train_loss": 1.318518064190211,
+        "train_acc": 0.4869037912759886,
+        "val_macro_f1": 0.4869184819503965,
+        "val_loss": 1.1853021681836782,
+        "val_accuracy": 0.5648476257973069
+      },
+      {
+        "epoch": 7,
+        "train_loss": 1.3179870111525327,
+        "train_acc": 0.48303098247044435,
+        "val_macro_f1": 0.48765648195901945,
+        "val_loss": 1.274599445843398,
+        "val_accuracy": 0.5426411528466808
+      },
+      {
+        "epoch": 8,
+        "train_loss": 1.273892571670495,
+        "train_acc": 0.4956685690990624,
+        "val_macro_f1": 0.48543837069544804,
+        "val_loss": 1.2382452835718523,
+        "val_accuracy": 0.5476021733994803
+      },
+      {
+        "epoch": 9,
+        "train_loss": 1.2597664558688233,
+        "train_acc": 0.49821646962902566,
+        "val_macro_f1": 0.4922089846332509,
+        "val_loss": 1.2238617013623296,
+        "val_accuracy": 0.5695724072761634
+      },
+      {
+        "epoch": 10,
+        "train_loss": 1.2202024454007185,
+        "train_acc": 0.5098858540562576,
+        "val_macro_f1": 0.512737966359169,
+        "val_loss": 1.21785488924658,
+        "val_accuracy": 0.5695724072761634
+      },
+      {
+        "epoch": 11,
+        "train_loss": 1.2321005250743784,
+        "train_acc": 0.5068793314309009,
+        "val_macro_f1": 0.503358840147939,
+        "val_loss": 1.1998826106607619,
+        "val_accuracy": 0.5679187337585637
+      },
+      {
+        "epoch": 12,
+        "train_loss": 1.192835262295085,
+        "train_acc": 0.5158479412963718,
+        "val_macro_f1": 0.4992412596002297,
+        "val_loss": 1.2032046473507845,
+        "val_accuracy": 0.5653201039451925
+      },
+      {
+        "epoch": 13,
+        "train_loss": 1.188643198959982,
+        "train_acc": 0.5185487158581329,
+        "val_macro_f1": 0.4925479807046458,
+        "val_loss": 1.236337137727796,
+        "val_accuracy": 0.5530356721001654
+      },
+      {
+        "epoch": 14,
+        "train_loss": 1.169746606387851,
+        "train_acc": 0.5211475743986955,
+        "val_macro_f1": 0.49465692373316666,
+        "val_loss": 1.2227921510824153,
+        "val_accuracy": 0.5653201039451925
+      },
+      {
+        "epoch": 15,
+        "train_loss": 1.1576695175400278,
+        "train_acc": 0.5165103954341622,
+        "val_macro_f1": 0.49926720552181625,
+        "val_loss": 1.235735728896709,
+        "val_accuracy": 0.5622489959839357
+      },
+      {
+        "epoch": 16,
+        "train_loss": 1.1471103678369736,
+        "train_acc": 0.5271606196494089,
+        "val_macro_f1": 0.5011249988005891,
+        "val_loss": 1.22383168439352,
+        "val_accuracy": 0.5608315615402788
+      },
+      {
+        "epoch": 17,
+        "train_loss": 1.122234392982573,
+        "train_acc": 0.5326131267835303,
+        "val_macro_f1": 0.49651596391481484,
+        "val_loss": 1.2158161995894177,
+        "val_accuracy": 0.5579966926529648
+      },
+      {
+        "epoch": 18,
+        "train_loss": 1.1230225592693697,
+        "train_acc": 0.5311353444761516,
+        "val_macro_f1": 0.5136114729148367,
+        "val_loss": 1.2290783647580714,
+        "val_accuracy": 0.5657925820930781
+      },
+      {
+        "epoch": 19,
+        "train_loss": 1.1248822599054111,
+        "train_acc": 0.5294537301263759,
+        "val_macro_f1": 0.5098656024509526,
+        "val_loss": 1.223455751830034,
+        "val_accuracy": 0.5650838648712497
+      },
+      {
+        "epoch": 20,
+        "train_loss": 1.1545256153204662,
+        "train_acc": 0.5255299633102324,
+        "val_macro_f1": 0.5110926162945351,
+        "val_loss": 1.221439600145735,
+        "val_accuracy": 0.5650838648712497
+      }
+    ],
+    "stage_b": [
+      {
+        "epoch": 1,
+        "train_loss": 1.2200194986266018,
+        "train_acc": 0.5027517325723604,
+        "val_macro_f1": 0.46506942353605496,
+        "val_loss": 1.3105420627064268,
+        "val_accuracy": 0.5171273328608552
+      },
+      {
+        "epoch": 2,
+        "train_loss": 1.1641008270258228,
+        "train_acc": 0.5174785976355483,
+        "val_macro_f1": 0.488298599121119,
+        "val_loss": 1.2541892450947922,
+        "val_accuracy": 0.5622489959839357
+      },
+      {
+        "epoch": 3,
+        "train_loss": 1.1500101727074719,
+        "train_acc": 0.5224215246636771,
+        "val_macro_f1": 0.4956599255579888,
+        "val_loss": 1.2533673877510343,
+        "val_accuracy": 0.5719347980155918
+      },
+      {
+        "epoch": 4,
+        "train_loss": 1.109286910652677,
+        "train_acc": 0.5318997146351406,
+        "val_macro_f1": 0.4869455753172703,
+        "val_loss": 1.2878183309783502,
+        "val_accuracy": 0.5369714150720529
+      },
+      {
+        "epoch": 5,
+        "train_loss": 1.1026983434965403,
+        "train_acc": 0.5340909090909091,
+        "val_macro_f1": 0.49383977694366493,
+        "val_loss": 1.2160760782361595,
+        "val_accuracy": 0.563193952279707
+      },
+      {
+        "epoch": 6,
+        "train_loss": 1.0861972221976341,
+        "train_acc": 0.5346004891969017,
+        "val_macro_f1": 0.4928857194617713,
+        "val_loss": 1.2282306231631368,
+        "val_accuracy": 0.5667375383888495
+      },
+      {
+        "epoch": 7,
+        "train_loss": 1.0392189267407717,
+        "train_acc": 0.5468304117407257,
+        "val_macro_f1": 0.5008343237274931,
+        "val_loss": 1.2261127899210578,
+        "val_accuracy": 0.5705173635719348
+      },
+      {
+        "epoch": 8,
+        "train_loss": 1.0304037320142467,
+        "train_acc": 0.5592641663269466,
+        "val_macro_f1": 0.49877089719135825,
+        "val_loss": 1.233303443623098,
+        "val_accuracy": 0.566028821167021
+      },
+      {
+        "epoch": 9,
+        "train_loss": 1.009766974281011,
+        "train_acc": 0.5577863840195679,
+        "val_macro_f1": 0.4996623108709126,
+        "val_loss": 1.2278242774405415,
+        "val_accuracy": 0.5719347980155918
+      },
+      {
+        "epoch": 10,
+        "train_loss": 1.0051777931169448,
+        "train_acc": 0.5638503872808805,
+        "val_macro_f1": 0.500006795107852,
+        "val_loss": 1.2239024188231342,
+        "val_accuracy": 0.5705173635719348
+      }
+    ],
+    "stage_c": [
+      {
+        "epoch": 1,
+        "train_loss": 1.0662864717813205,
+        "train_acc": 0.5465756216877293,
+        "val_macro_f1": 0.4984768337725192,
+        "val_loss": 1.2097023466138617,
+        "val_accuracy": 0.5759508622726199
+      },
+      {
+        "epoch": 2,
+        "train_loss": 1.028777121720001,
+        "train_acc": 0.5545760293518142,
+        "val_macro_f1": 0.4938877916675499,
+        "val_loss": 1.2337657653856666,
+        "val_accuracy": 0.5624852350578786
+      },
+      {
+        "epoch": 3,
+        "train_loss": 1.0153135317858024,
+        "train_acc": 0.5535059111292295,
+        "val_macro_f1": 0.49907722687144723,
+        "val_loss": 1.2192433534331748,
+        "val_accuracy": 0.5742971887550201
+      },
+      {
+        "epoch": 4,
+        "train_loss": 1.0038959708351822,
+        "train_acc": 0.5601304525071341,
+        "val_macro_f1": 0.5059140741600023,
+        "val_loss": 1.2162246975293984,
+        "val_accuracy": 0.5790219702338767
+      },
+      {
+        "epoch": 5,
+        "train_loss": 0.9943306450935174,
+        "train_acc": 0.5619649408887077,
+        "val_macro_f1": 0.5169021957485435,
+        "val_loss": 1.2105160319626318,
+        "val_accuracy": 0.5794944483817623
+      },
+      {
+        "epoch": 6,
+        "train_loss": 0.9618599005149814,
+        "train_acc": 0.570067264573991,
+        "val_macro_f1": 0.504840497803461,
+        "val_loss": 1.228955523002679,
+        "val_accuracy": 0.5752421450507914
+      },
+      {
+        "epoch": 7,
+        "train_loss": 0.9511148705562182,
+        "train_acc": 0.5765389319200979,
+        "val_macro_f1": 0.4990006211421147,
+        "val_loss": 1.233863600319176,
+        "val_accuracy": 0.568863690054335
+      },
+      {
+        "epoch": 8,
+        "train_loss": 0.9406011274917437,
+        "train_acc": 0.5768446799836935,
+        "val_macro_f1": 0.5112059339127408,
+        "val_loss": 1.2131912938006506,
+        "val_accuracy": 0.5811481218993622
+      },
+      {
+        "epoch": 9,
+        "train_loss": 0.9396456855683729,
+        "train_acc": 0.5827558092132084,
+        "val_macro_f1": 0.5095793102793661,
+        "val_loss": 1.2089267937183124,
+        "val_accuracy": 0.581384360973305
+      },
+      {
+        "epoch": 10,
+        "train_loss": 0.9486989188063062,
+        "train_acc": 0.5774052181002853,
+        "val_macro_f1": 0.5112753932863323,
+        "val_loss": 1.2089088627612696,
+        "val_accuracy": 0.581384360973305
+      }
+    ]
+  },
+  "confusion_matrix": [
+    [
+      55,
+      0,
+      0,
+      1,
+      0,
+      17,
+      0,
+      0,
+      0,
+      2,
+      0,
+      0,
+      6,
+      0,
+      0,
+      0,
+      0,
+      0
+    ],
+    [
+      3,
+      65,
+      3,
+      4,
+      1,
+      0,
+      0,
+      2,
+      0,
+      0,
+      0,
+      0,
+      8,
+      0,
+      4,
+      0,
+      0,
+      0
+    ],
+    [
+      0,
+      0,
+      24,
+      2,
+      3,
+      0,
+      3,
+      4,
+      0,
+      15,
+      2,
+      2,
+      13,
+      6,
+      5,
+      0,
+      0,
+      5
+    ],
+    [
+      0,
+      4,
+      2,
+      42,
+      7,
+      19,
+      1,
+      1,
+      0,
+      2,
+      1,
+      0,
+      0,
+      2,
+      0,
+      0,
+      0,
+      0
+    ],
+    [
+      0,
+      2,
+      0,
+      30,
+      42,
+      0,
+      0,
+      4,
+      0,
+      2,
+      0,
+      0,
+      0,
+      1,
+      0,
+      0,
+      0,
+      0
+    ],
+    [
+      32,
+      0,
+      7,
+      15,
+      4,
+      76,
+      0,
+      4,
+      1,
+      4,
+      2,
+      0,
+      8,
+      0,
+      0,
+      0,
+      0,
+      0
+    ],
+    [
+      0,
+      3,
+      0,
+      0,
+      0,
+      0,
+      40,
+      1,
+      0,
+      0,
+      0,
+      14,
+      0,
+      0,
+      14,
+      0,
+      0,
+      1
+    ],
+    [
+      0,
+      5,
+      5,
+      5,
+      3,
+      4,
+      0,
+      67,
+      1,
+      13,
+      20,
+      3,
+      12,
+      4,
+      6,
+      0,
+      2,
+      6
+    ],
+    [
+      1,
+      0,
+      0,
+      1,
+      0,
+      2,
+      0,
+      0,
+      53,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0
+    ],
+    [
+      6,
+      0,
+      8,
+      3,
+      2,
+      2,
+      0,
+      7,
+      0,
+      308,
+      85,
+      0,
+      55,
+      7,
+      1,
+      3,
+      1,
+      3
+    ],
+    [
+      9,
+      11,
+      21,
+      1,
+      2,
+      7,
+      0,
+      62,
+      0,
+      235,
+      595,
+      0,
+      147,
+      9,
+      1,
+      0,
+      0,
+      4
+    ],
+    [
+      0,
+      2,
+      0,
+      0,
+      1,
+      0,
+      2,
+      0,
+      0,
+      3,
+      0,
+      9,
+      0,
+      9,
+      6,
+      0,
+      0,
+      0
+    ],
+    [
+      13,
+      3,
+      58,
+      6,
+      2,
+      5,
+      0,
+      28,
+      0,
+      133,
+      74,
+      7,
+      235,
+      8,
+      3,
+      1,
+      0,
+      0
+    ],
+    [
+      0,
+      2,
+      3,
+      2,
+      0,
+      0,
+      0,
+      1,
+      0,
+      5,
+      0,
+      16,
+      6,
+      18,
+      0,
+      0,
+      0,
+      3
+    ],
+    [
+      0,
+      2,
+      6,
+      0,
+      3,
+      0,
+      43,
+      5,
+      0,
+      0,
+      0,
+      9,
+      0,
+      2,
+      54,
+      0,
+      0,
+      8
+    ],
+    [
+      0,
+      3,
+      0,
+      2,
+      1,
+      0,
+      0,
+      1,
+      1,
+      1,
+      1,
+      0,
+      0,
+      0,
+      0,
+      513,
+      28,
+      0
+    ],
+    [
+      0,
+      1,
+      1,
+      2,
+      0,
+      0,
+      0,
+      7,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      75,
+      89,
+      0
+    ],
+    [
+      0,
+      7,
+      16,
+      10,
+      0,
+      3,
+      8,
+      13,
+      1,
+      6,
+      0,
+      8,
+      4,
+      3,
+      17,
+      0,
+      0,
+      15
+    ]
+  ],
+  "label_to_idx": {
+    "dhumri": 0,
+    "difficult": 1,
+    "drathung": 2,
+    "drudring": 3,
+    "druring": 4,
+    "druthung": 5,
+    "khyuyig": 6,
+    "multi_scripts": 7,
+    "non_tibetan": 8,
+    "peri": 9,
+    "petsuk": 10,
+    "trinyig": 11,
+    "tsegdrig": 12,
+    "tsugchung": 13,
+    "tsumachug": 14,
+    "uchen_sugdring": 15,
+    "uchen_sugthung": 16,
+    "yigchung": 17
+  },
+  "classification_report": "                precision    recall  f1-score   support\n\n        dhumri       0.46      0.68      0.55        81\n     difficult       0.59      0.72      0.65        90\n      drathung       0.16      0.29      0.20        84\n      drudring       0.33      0.52      0.41        81\n       druring       0.59      0.52      0.55        81\n      druthung       0.56      0.50      0.53       153\n       khyuyig       0.41      0.55      0.47        73\n multi_scripts       0.32      0.43      0.37       156\n   non_tibetan       0.93      0.93      0.93        57\n          peri       0.42      0.63      0.50       491\n        petsuk       0.76      0.54      0.63      1104\n       trinyig       0.13      0.28      0.18        32\n      tsegdrig       0.48      0.41      0.44       576\n     tsugchung       0.26      0.32      0.29        56\n     tsumachug       0.49      0.41      0.44       132\nuchen_sugdring       0.87      0.93      0.90       551\nuchen_sugthung       0.74      0.51      0.60       175\n      yigchung       0.33      0.14      0.19       111\n\n      accuracy                           0.56      4084\n     macro avg       0.49      0.52      0.49      4084\n  weighted avg       0.60      0.56      0.57      4084\n"
+}

patches_clahe/splits.json ADDED Viewed

	@@ -0,0 +1,108 @@

+{
+  "label_to_idx": {
+    "dhumri": 0,
+    "difficult": 1,
+    "drathung": 2,
+    "drudring": 3,
+    "druring": 4,
+    "druthung": 5,
+    "khyuyig": 6,
+    "multi_scripts": 7,
+    "non_tibetan": 8,
+    "peri": 9,
+    "petsuk": 10,
+    "trinyig": 11,
+    "tsegdrig": 12,
+    "tsugchung": 13,
+    "tsumachug": 14,
+    "uchen_sugdring": 15,
+    "uchen_sugthung": 16,
+    "yigchung": 17
+  },
+  "idx_to_label": {
+    "0": "dhumri",
+    "1": "difficult",
+    "2": "drathung",
+    "3": "drudring",
+    "4": "druring",
+    "5": "druthung",
+    "6": "khyuyig",
+    "7": "multi_scripts",
+    "8": "non_tibetan",
+    "9": "peri",
+    "10": "petsuk",
+    "11": "trinyig",
+    "12": "tsegdrig",
+    "13": "tsugchung",
+    "14": "tsumachug",
+    "15": "uchen_sugdring",
+    "16": "uchen_sugthung",
+    "17": "yigchung"
+  },
+  "split_counts": {
+    "train": {
+      "dhumri": 363,
+      "difficult": 410,
+      "drathung": 458,
+      "drudring": 451,
+      "druring": 421,
+      "druthung": 710,
+      "khyuyig": 324,
+      "multi_scripts": 824,
+      "non_tibetan": 291,
+      "peri": 2348,
+      "petsuk": 5086,
+      "trinyig": 165,
+      "tsegdrig": 2766,
+      "tsugchung": 293,
+      "tsumachug": 604,
+      "uchen_sugdring": 2705,
+      "uchen_sugthung": 807,
+      "yigchung": 598
+    },
+    "val": {
+      "dhumri": 74,
+      "difficult": 72,
+      "drathung": 95,
+      "drudring": 97,
+      "druring": 89,
+      "druthung": 169,
+      "khyuyig": 70,
+      "multi_scripts": 182,
+      "non_tibetan": 52,
+      "peri": 514,
+      "petsuk": 1091,
+      "trinyig": 33,
+      "tsegdrig": 613,
+      "tsugchung": 56,
+      "tsumachug": 132,
+      "uchen_sugdring": 601,
+      "uchen_sugthung": 180,
+      "yigchung": 113
+    },
+    "test": {
+      "dhumri": 81,
+      "difficult": 90,
+      "drathung": 84,
+      "drudring": 81,
+      "druring": 81,
+      "druthung": 153,
+      "khyuyig": 73,
+      "multi_scripts": 156,
+      "non_tibetan": 57,
+      "peri": 491,
+      "petsuk": 1104,
+      "trinyig": 32,
+      "tsegdrig": 576,
+      "tsugchung": 56,
+      "tsumachug": 132,
+      "uchen_sugdring": 551,
+      "uchen_sugthung": 175,
+      "yigchung": 111
+    }
+  },
+  "exclude_manifest": "./benchmark_page_ids.json",
+  "excluded_label_count": 18,
+  "excluded_page_id_count": 88,
+  "skipped_excluded_files_by_class": {}
+}

patches_color/checkpoint_page_eval.json ADDED Viewed

	@@ -0,0 +1,76 @@

+{
+  "experiment": "patches_color",
+  "data_dir": "./Data/output/patches_color",
+  "exclude_manifest": "./benchmark_page_ids.json",
+  "num_classes": 18,
+  "checkpoint_results": {
+    "best_stage_a_head_only.pt": {
+      "patch_metrics": {
+        "loss": 1.2655372293292018,
+        "accuracy": 0.548971596474045,
+        "macro_f1": 0.48338756435970365,
+        "weighted_f1": 0.5595663273001261
+      },
+      "page_metrics": {
+        "accuracy": 0.5817535545023697,
+        "macro_f1": 0.5042916757569069,
+        "weighted_f1": 0.5865342057784698,
+        "num_pages": 844,
+        "num_samples": 4084
+      },
+      "val_macro_f1_at_save": 0.5201207198529141,
+      "epoch_at_save": 20
+    },
+    "best_stage_b_last_2_blocks.pt": {
+      "patch_metrics": {
+        "loss": 1.2968507967196061,
+        "accuracy": 0.5572967678746327,
+        "macro_f1": 0.48656564769193644,
+        "weighted_f1": 0.563328932982191
+      },
+      "page_metrics": {
+        "accuracy": 0.5853080568720379,
+        "macro_f1": 0.4969784132736875,
+        "weighted_f1": 0.5889015907282624,
+        "num_pages": 844,
+        "num_samples": 4084
+      },
+      "val_macro_f1_at_save": 0.5246287483695703,
+      "epoch_at_save": 7
+    },
+    "best_stage_c_last_4_blocks.pt": {
+      "patch_metrics": {
+        "loss": 1.2944221436160084,
+        "accuracy": 0.5604799216454457,
+        "macro_f1": 0.48988450083500523,
+        "weighted_f1": 0.5677717007365548
+      },
+      "page_metrics": {
+        "accuracy": 0.5924170616113744,
+        "macro_f1": 0.5017240427906837,
+        "weighted_f1": 0.5960050045768394,
+        "num_pages": 844,
+        "num_samples": 4084
+      },
+      "val_macro_f1_at_save": 0.5268057868156721,
+      "epoch_at_save": 10
+    },
+    "final_model.pt": {
+      "patch_metrics": {
+        "loss": 1.2944221436160084,
+        "accuracy": 0.5604799216454457,
+        "macro_f1": 0.48988450083500523,
+        "weighted_f1": 0.5677717007365548
+      },
+      "page_metrics": {
+        "accuracy": 0.5924170616113744,
+        "macro_f1": 0.5017240427906837,
+        "weighted_f1": 0.5960050045768394,
+        "num_pages": 844,
+        "num_samples": 4084
+      },
+      "val_macro_f1_at_save": -1.0,
+      "epoch_at_save": null
+    }
+  }
+}

patches_color/confusion_matrix.png ADDED Viewed

Git LFS Details

SHA256: d3defea66443207c1bc4c69dee4fd9c07027422d84b58076afd2d492b83eb3f1
Pointer size: 131 Bytes
Size of remote file: 202 kB

patches_color/final_model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:65b3c9f81177e34c74aad910c545a0b97f097dbacd84da0e0c5af0d29eddd54a
+size 86680521

patches_color/results.json ADDED Viewed

	@@ -0,0 +1,725 @@

+{
+  "experiment": "patches_color",
+  "model": "facebook/dinov3-vits16-pretrain-lvd1689m",
+  "num_classes": 18,
+  "best_val_checkpoint": "results/patches_color/best_stage_c_last_4_blocks.pt",
+  "val_macro_f1_at_selection": 0.5268057868156721,
+  "final_model_path": "results/patches_color/final_model.pt",
+  "test_metrics": {
+    "loss": 1.2944221436160084,
+    "accuracy": 0.5604799216454457,
+    "macro_f1": 0.48988450083500523,
+    "weighted_f1": 0.5677717007365548
+  },
+  "history": {
+    "stage_a": [
+      {
+        "epoch": 1,
+        "train_loss": 1.8228148812427358,
+        "train_acc": 0.37194251936404404,
+        "val_macro_f1": 0.4478217149694628,
+        "val_loss": 1.4409239574326935,
+        "val_accuracy": 0.500590597684857
+      },
+      {
+        "epoch": 2,
+        "train_loss": 1.4974706918426401,
+        "train_acc": 0.4455768446799837,
+        "val_macro_f1": 0.45642959802097366,
+        "val_loss": 1.311343546190246,
+        "val_accuracy": 0.531301677297425
+      },
+      {
+        "epoch": 3,
+        "train_loss": 1.4202212384414827,
+        "train_acc": 0.4585201793721973,
+        "val_macro_f1": 0.5198020336380464,
+        "val_loss": 1.1873418371530393,
+        "val_accuracy": 0.5714623198677061
+      },
+      {
+        "epoch": 4,
+        "train_loss": 1.3796070234366353,
+        "train_acc": 0.47457195271096614,
+        "val_macro_f1": 0.48635628120132335,
+        "val_loss": 1.2194681716935447,
+        "val_accuracy": 0.5398062839593669
+      },
+      {
+        "epoch": 5,
+        "train_loss": 1.3158884044574126,
+        "train_acc": 0.48797390949857317,
+        "val_macro_f1": 0.485095009606241,
+        "val_loss": 1.1974138276123025,
+        "val_accuracy": 0.5613040396881644
+      },
+      {
+        "epoch": 6,
+        "train_loss": 1.2685181100272278,
+        "train_acc": 0.5026498165511618,
+        "val_macro_f1": 0.5151064930461298,
+        "val_loss": 1.181795822871427,
+        "val_accuracy": 0.5773682967162769
+      },
+      {
+        "epoch": 7,
+        "train_loss": 1.264474732896235,
+        "train_acc": 0.4974011414594374,
+        "val_macro_f1": 0.49794473606184897,
+        "val_loss": 1.2368834672133577,
+        "val_accuracy": 0.5499645641389086
+      },
+      {
+        "epoch": 8,
+        "train_loss": 1.2266237828569222,
+        "train_acc": 0.5090705258866693,
+        "val_macro_f1": 0.5073845743840514,
+        "val_loss": 1.1930924508656393,
+        "val_accuracy": 0.5622489959839357
+      },
+      {
+        "epoch": 9,
+        "train_loss": 1.2126428141090464,
+        "train_acc": 0.5157969832857725,
+        "val_macro_f1": 0.5029403569611346,
+        "val_loss": 1.199680348883319,
+        "val_accuracy": 0.5714623198677061
+      },
+      {
+        "epoch": 10,
+        "train_loss": 1.1708803988941534,
+        "train_acc": 0.5228801467590706,
+        "val_macro_f1": 0.5074790060262432,
+        "val_loss": 1.2058235416010623,
+        "val_accuracy": 0.5620127569099929
+      },
+      {
+        "epoch": 11,
+        "train_loss": 1.174615762037401,
+        "train_acc": 0.5223196086424786,
+        "val_macro_f1": 0.5064378041379447,
+        "val_loss": 1.179450622276827,
+        "val_accuracy": 0.5653201039451925
+      },
+      {
+        "epoch": 12,
+        "train_loss": 1.1377201861281614,
+        "train_acc": 0.5295046881369752,
+        "val_macro_f1": 0.501551417883034,
+        "val_loss": 1.187849443206093,
+        "val_accuracy": 0.568863690054335
+      },
+      {
+        "epoch": 13,
+        "train_loss": 1.1421819735049619,
+        "train_acc": 0.5347024052181003,
+        "val_macro_f1": 0.5125219961449731,
+        "val_loss": 1.208868742721345,
+        "val_accuracy": 0.5613040396881644
+      },
+      {
+        "epoch": 14,
+        "train_loss": 1.1138945578848445,
+        "train_acc": 0.5381165919282511,
+        "val_macro_f1": 0.5131290447373811,
+        "val_loss": 1.188052623450207,
+        "val_accuracy": 0.5738247106071345
+      },
+      {
+        "epoch": 15,
+        "train_loss": 1.0984022547845884,
+        "train_acc": 0.5382694659600489,
+        "val_macro_f1": 0.5108666463642549,
+        "val_loss": 1.2041859181424897,
+        "val_accuracy": 0.5679187337585637
+      },
+      {
+        "epoch": 16,
+        "train_loss": 1.0941936582193246,
+        "train_acc": 0.5438748471259682,
+        "val_macro_f1": 0.5103848985029631,
+        "val_loss": 1.1977766572746182,
+        "val_accuracy": 0.5672100165367352
+      },
+      {
+        "epoch": 17,
+        "train_loss": 1.0736914152424528,
+        "train_acc": 0.5451487973909499,
+        "val_macro_f1": 0.5188176724642701,
+        "val_loss": 1.189426226291903,
+        "val_accuracy": 0.5695724072761634
+      },
+      {
+        "epoch": 18,
+        "train_loss": 1.080425682599526,
+        "train_acc": 0.5493273542600897,
+        "val_macro_f1": 0.5193581019534623,
+        "val_loss": 1.1903373014411547,
+        "val_accuracy": 0.5716985589416489
+      },
+      {
+        "epoch": 19,
+        "train_loss": 1.0716409188020584,
+        "train_acc": 0.5456583774969426,
+        "val_macro_f1": 0.5196107585917038,
+        "val_loss": 1.1931781820898328,
+        "val_accuracy": 0.5705173635719348
+      },
+      {
+        "epoch": 20,
+        "train_loss": 1.092901507543633,
+        "train_acc": 0.5422441907867916,
+        "val_macro_f1": 0.5201207198529141,
+        "val_loss": 1.1916698552940912,
+        "val_accuracy": 0.5709898417198205
+      }
+    ],
+    "stage_b": [
+      {
+        "epoch": 1,
+        "train_loss": 1.1536900823183367,
+        "train_acc": 0.520688952303302,
+        "val_macro_f1": 0.49201257424371936,
+        "val_loss": 1.2939789023856703,
+        "val_accuracy": 0.5263406567446256
+      },
+      {
+        "epoch": 2,
+        "train_loss": 1.106891256276802,
+        "train_acc": 0.531593966571545,
+        "val_macro_f1": 0.5033889357742809,
+        "val_loss": 1.2282823414699973,
+        "val_accuracy": 0.5716985589416489
+      },
+      {
+        "epoch": 3,
+        "train_loss": 1.0965523051088117,
+        "train_acc": 0.536740725642071,
+        "val_macro_f1": 0.5085179283247842,
+        "val_loss": 1.2278816308227767,
+        "val_accuracy": 0.5695724072761634
+      },
+      {
+        "epoch": 4,
+        "train_loss": 1.0530349347623473,
+        "train_acc": 0.5524357929066449,
+        "val_macro_f1": 0.49767342372352924,
+        "val_loss": 1.2748388547732803,
+        "val_accuracy": 0.5386250885896527
+      },
+      {
+        "epoch": 5,
+        "train_loss": 1.0473334150706997,
+        "train_acc": 0.555697105584998,
+        "val_macro_f1": 0.5109896618724019,
+        "val_loss": 1.204903576978056,
+        "val_accuracy": 0.5705173635719348
+      },
+      {
+        "epoch": 6,
+        "train_loss": 1.0396876881378212,
+        "train_acc": 0.558754586220954,
+        "val_macro_f1": 0.5100822540510055,
+        "val_loss": 1.2285107204947112,
+        "val_accuracy": 0.5615402787621072
+      },
+      {
+        "epoch": 7,
+        "train_loss": 0.995308576143376,
+        "train_acc": 0.5689461883408071,
+        "val_macro_f1": 0.5246287483695703,
+        "val_loss": 1.2055314002915454,
+        "val_accuracy": 0.5783132530120482
+      },
+      {
+        "epoch": 8,
+        "train_loss": 0.9806407083188861,
+        "train_acc": 0.5772523440684876,
+        "val_macro_f1": 0.5177401992468048,
+        "val_loss": 1.2124249308257626,
+        "val_accuracy": 0.5719347980155918
+      },
+      {
+        "epoch": 9,
+        "train_loss": 0.9580581295116065,
+        "train_acc": 0.5807174887892377,
+        "val_macro_f1": 0.5221761914804327,
+        "val_loss": 1.202580178335147,
+        "val_accuracy": 0.5797306874557052
+      },
+      {
+        "epoch": 10,
+        "train_loss": 0.9406894806497592,
+        "train_acc": 0.5895332246229107,
+        "val_macro_f1": 0.5170603831264983,
+        "val_loss": 1.198815422158573,
+        "val_accuracy": 0.5783132530120482
+      }
+    ],
+    "stage_c": [
+      {
+        "epoch": 1,
+        "train_loss": 1.009486005250843,
+        "train_acc": 0.5692519364044027,
+        "val_macro_f1": 0.5106719108224338,
+        "val_loss": 1.1896873432258441,
+        "val_accuracy": 0.5797306874557052
+      },
+      {
+        "epoch": 2,
+        "train_loss": 0.9734869377009489,
+        "train_acc": 0.5805136567468406,
+        "val_macro_f1": 0.501623256775662,
+        "val_loss": 1.214058896983005,
+        "val_accuracy": 0.5620127569099929
+      },
+      {
+        "epoch": 3,
+        "train_loss": 0.9619662467431302,
+        "train_acc": 0.5770994700366898,
+        "val_macro_f1": 0.5229433422000689,
+        "val_loss": 1.2075352982749703,
+        "val_accuracy": 0.5780770139381054
+      },
+      {
+        "epoch": 4,
+        "train_loss": 0.9472926469224353,
+        "train_acc": 0.5816856909906237,
+        "val_macro_f1": 0.5203069133009239,
+        "val_loss": 1.19137683074447,
+        "val_accuracy": 0.5835105126387905
+      },
+      {
+        "epoch": 5,
+        "train_loss": 0.935545642668502,
+        "train_acc": 0.5829596412556054,
+        "val_macro_f1": 0.5257938381975142,
+        "val_loss": 1.184187990306801,
+        "val_accuracy": 0.5832742735648476
+      },
+      {
+        "epoch": 6,
+        "train_loss": 0.9094976569942392,
+        "train_acc": 0.5960558499796168,
+        "val_macro_f1": 0.5200620740355609,
+        "val_loss": 1.2047344187387663,
+        "val_accuracy": 0.5830380344909047
+      },
+      {
+        "epoch": 7,
+        "train_loss": 0.8874516203799055,
+        "train_acc": 0.6018141051773339,
+        "val_macro_f1": 0.525057744430925,
+        "val_loss": 1.2007991145867674,
+        "val_accuracy": 0.5818568391211907
+      },
+      {
+        "epoch": 8,
+        "train_loss": 0.8860346450156403,
+        "train_acc": 0.6054830819404811,
+        "val_macro_f1": 0.523414227808246,
+        "val_loss": 1.1990280199972854,
+        "val_accuracy": 0.5844554689345618
+      },
+      {
+        "epoch": 9,
+        "train_loss": 0.8849416517793428,
+        "train_acc": 0.6044639217284957,
+        "val_macro_f1": 0.5248536522877667,
+        "val_loss": 1.1869062702771045,
+        "val_accuracy": 0.5882352941176471
+      },
+      {
+        "epoch": 10,
+        "train_loss": 0.8992554767249411,
+        "train_acc": 0.6046167957602935,
+        "val_macro_f1": 0.5268057868156721,
+        "val_loss": 1.1879879839941876,
+        "val_accuracy": 0.5891802504134184
+      }
+    ]
+  },
+  "confusion_matrix": [
+    [
+      51,
+      0,
+      0,
+      0,
+      1,
+      21,
+      0,
+      0,
+      0,
+      1,
+      0,
+      0,
+      7,
+      0,
+      0,
+      0,
+      0,
+      0
+    ],
+    [
+      0,
+      58,
+      3,
+      2,
+      2,
+      0,
+      1,
+      0,
+      0,
+      1,
+      0,
+      1,
+      6,
+      0,
+      4,
+      4,
+      4,
+      4
+    ],
+    [
+      0,
+      1,
+      25,
+      0,
+      4,
+      0,
+      7,
+      5,
+      0,
+      7,
+      3,
+      0,
+      16,
+      1,
+      6,
+      0,
+      0,
+      9
+    ],
+    [
+      0,
+      4,
+      2,
+      36,
+      11,
+      18,
+      1,
+      1,
+      0,
+      3,
+      1,
+      0,
+      0,
+      3,
+      0,
+      1,
+      0,
+      0
+    ],
+    [
+      0,
+      2,
+      0,
+      22,
+      48,
+      0,
+      0,
+      5,
+      0,
+      1,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      3
+    ],
+    [
+      20,
+      0,
+      6,
+      15,
+      6,
+      83,
+      0,
+      4,
+      0,
+      3,
+      0,
+      0,
+      9,
+      5,
+      0,
+      1,
+      0,
+      1
+    ],
+    [
+      0,
+      1,
+      0,
+      0,
+      0,
+      0,
+      43,
+      0,
+      0,
+      0,
+      0,
+      5,
+      0,
+      0,
+      18,
+      0,
+      0,
+      6
+    ],
+    [
+      0,
+      5,
+      1,
+      6,
+      1,
+      1,
+      0,
+      66,
+      0,
+      13,
+      14,
+      2,
+      17,
+      10,
+      5,
+      1,
+      4,
+      10
+    ],
+    [
+      1,
+      0,
+      0,
+      1,
+      0,
+      2,
+      0,
+      0,
+      53,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0
+    ],
+    [
+      4,
+      1,
+      9,
+      3,
+      0,
+      5,
+      0,
+      9,
+      0,
+      308,
+      71,
+      3,
+      64,
+      6,
+      3,
+      3,
+      0,
+      2
+    ],
+    [
+      3,
+      9,
+      17,
+      1,
+      1,
+      8,
+      0,
+      64,
+      0,
+      209,
+      607,
+      0,
+      168,
+      6,
+      1,
+      0,
+      0,
+      10
+    ],
+    [
+      0,
+      2,
+      0,
+      0,
+      1,
+      0,
+      2,
+      0,
+      0,
+      1,
+      0,
+      4,
+      1,
+      9,
+      11,
+      0,
+      0,
+      1
+    ],
+    [
+      9,
+      5,
+      50,
+      6,
+      3,
+      4,
+      0,
+      22,
+      0,
+      123,
+      80,
+      11,
+      243,
+      6,
+      3,
+      2,
+      0,
+      9
+    ],
+    [
+      0,
+      1,
+      3,
+      1,
+      1,
+      0,
+      0,
+      0,
+      0,
+      2,
+      0,
+      13,
+      5,
+      24,
+      0,
+      0,
+      0,
+      6
+    ],
+    [
+      0,
+      2,
+      1,
+      0,
+      3,
+      0,
+      49,
+      4,
+      0,
+      0,
+      0,
+      4,
+      0,
+      1,
+      47,
+      0,
+      0,
+      21
+    ],
+    [
+      1,
+      2,
+      0,
+      1,
+      0,
+      0,
+      0,
+      2,
+      2,
+      0,
+      0,
+      0,
+      1,
+      0,
+      0,
+      479,
+      62,
+      1
+    ],
+    [
+      0,
+      0,
+      0,
+      2,
+      1,
+      0,
+      0,
+      8,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      77,
+      87,
+      0
+    ],
+    [
+      0,
+      14,
+      4,
+      2,
+      0,
+      2,
+      6,
+      13,
+      1,
+      1,
+      1,
+      7,
+      3,
+      9,
+      20,
+      1,
+      0,
+      27
+    ]
+  ],
+  "label_to_idx": {
+    "dhumri": 0,
+    "difficult": 1,
+    "drathung": 2,
+    "drudring": 3,
+    "druring": 4,
+    "druthung": 5,
+    "khyuyig": 6,
+    "multi_scripts": 7,
+    "non_tibetan": 8,
+    "peri": 9,
+    "petsuk": 10,
+    "trinyig": 11,
+    "tsegdrig": 12,
+    "tsugchung": 13,
+    "tsumachug": 14,
+    "uchen_sugdring": 15,
+    "uchen_sugthung": 16,
+    "yigchung": 17
+  },
+  "classification_report": "                precision    recall  f1-score   support\n\n        dhumri       0.57      0.63      0.60        81\n     difficult       0.54      0.64      0.59        90\n      drathung       0.21      0.30      0.24        84\n      drudring       0.37      0.44      0.40        81\n       druring       0.58      0.59      0.59        81\n      druthung       0.58      0.54      0.56       153\n       khyuyig       0.39      0.59      0.47        73\n multi_scripts       0.33      0.42      0.37       156\n   non_tibetan       0.95      0.93      0.94        57\n          peri       0.46      0.63      0.53       491\n        petsuk       0.78      0.55      0.65      1104\n       trinyig       0.08      0.12      0.10        32\n      tsegdrig       0.45      0.42      0.44       576\n     tsugchung       0.30      0.43      0.35        56\n     tsumachug       0.40      0.36      0.38       132\nuchen_sugdring       0.84      0.87      0.86       551\nuchen_sugthung       0.55      0.50      0.52       175\n      yigchung       0.25      0.24      0.24       111\n\n      accuracy                           0.56      4084\n     macro avg       0.48      0.51      0.49      4084\n  weighted avg       0.59      0.56      0.57      4084\n"
+}

patches_color/splits.json ADDED Viewed

	@@ -0,0 +1,108 @@

+{
+  "label_to_idx": {
+    "dhumri": 0,
+    "difficult": 1,
+    "drathung": 2,
+    "drudring": 3,
+    "druring": 4,
+    "druthung": 5,
+    "khyuyig": 6,
+    "multi_scripts": 7,
+    "non_tibetan": 8,
+    "peri": 9,
+    "petsuk": 10,
+    "trinyig": 11,
+    "tsegdrig": 12,
+    "tsugchung": 13,
+    "tsumachug": 14,
+    "uchen_sugdring": 15,
+    "uchen_sugthung": 16,
+    "yigchung": 17
+  },
+  "idx_to_label": {
+    "0": "dhumri",
+    "1": "difficult",
+    "2": "drathung",
+    "3": "drudring",
+    "4": "druring",
+    "5": "druthung",
+    "6": "khyuyig",
+    "7": "multi_scripts",
+    "8": "non_tibetan",
+    "9": "peri",
+    "10": "petsuk",
+    "11": "trinyig",
+    "12": "tsegdrig",
+    "13": "tsugchung",
+    "14": "tsumachug",
+    "15": "uchen_sugdring",
+    "16": "uchen_sugthung",
+    "17": "yigchung"
+  },
+  "split_counts": {
+    "train": {
+      "dhumri": 363,
+      "difficult": 410,
+      "drathung": 458,
+      "drudring": 451,
+      "druring": 421,
+      "druthung": 710,
+      "khyuyig": 324,
+      "multi_scripts": 824,
+      "non_tibetan": 291,
+      "peri": 2348,
+      "petsuk": 5086,
+      "trinyig": 165,
+      "tsegdrig": 2766,
+      "tsugchung": 293,
+      "tsumachug": 604,
+      "uchen_sugdring": 2705,
+      "uchen_sugthung": 807,
+      "yigchung": 598
+    },
+    "val": {
+      "dhumri": 74,
+      "difficult": 72,
+      "drathung": 95,
+      "drudring": 97,
+      "druring": 89,
+      "druthung": 169,
+      "khyuyig": 70,
+      "multi_scripts": 182,
+      "non_tibetan": 52,
+      "peri": 514,
+      "petsuk": 1091,
+      "trinyig": 33,
+      "tsegdrig": 613,
+      "tsugchung": 56,
+      "tsumachug": 132,
+      "uchen_sugdring": 601,
+      "uchen_sugthung": 180,
+      "yigchung": 113
+    },
+    "test": {
+      "dhumri": 81,
+      "difficult": 90,
+      "drathung": 84,
+      "drudring": 81,
+      "druring": 81,
+      "druthung": 153,
+      "khyuyig": 73,
+      "multi_scripts": 156,
+      "non_tibetan": 57,
+      "peri": 491,
+      "petsuk": 1104,
+      "trinyig": 32,
+      "tsegdrig": 576,
+      "tsugchung": 56,
+      "tsumachug": 132,
+      "uchen_sugdring": 551,
+      "uchen_sugthung": 175,
+      "yigchung": 111
+    }
+  },
+  "exclude_manifest": "./benchmark_page_ids.json",
+  "excluded_label_count": 18,
+  "excluded_page_id_count": 88,
+  "skipped_excluded_files_by_class": {}
+}

whole_page/confusion_matrix.csv ADDED Viewed

	@@ -0,0 +1,19 @@

+,dhumri,difficult,drathung,drudring,druring,druthung,khyuyig,multi_scripts,non_tibetan,peri,petsuk,trinyig,tsegdrig,tsugchung,tsumachug,uchen_sugdring,uchen_sugthung,yigchung
+dhumri,7,0,0,0,0,5,0,0,0,0,0,0,2,0,0,0,0,0
+difficult,0,22,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0
+drathung,0,0,5,0,1,0,1,0,0,1,1,0,3,1,3,0,0,3
+drudring,0,0,0,8,4,4,2,0,0,1,0,0,0,0,0,0,0,0
+druring,0,0,0,1,16,0,0,0,0,0,0,0,0,0,0,0,0,0
+druthung,6,0,3,3,0,16,0,1,0,2,0,0,0,0,0,0,0,0
+khyuyig,0,2,0,0,0,0,11,0,0,0,0,0,0,0,3,0,0,0
+multi_scripts,0,0,0,2,2,0,0,16,0,2,2,1,1,2,3,1,3,0
+non_tibetan,0,0,0,1,0,0,0,0,27,0,0,0,0,0,0,0,0,0
+peri,0,0,5,0,0,0,0,1,0,56,15,0,11,2,1,0,0,1
+petsuk,0,2,3,2,0,1,0,19,0,44,105,0,30,1,1,0,0,0
+trinyig,0,0,0,0,0,0,0,0,0,1,0,2,0,1,2,0,0,0
+tsegdrig,1,0,16,2,0,1,0,9,0,20,13,2,43,2,0,0,0,3
+tsugchung,0,0,0,0,0,0,0,0,0,0,0,4,0,6,1,0,0,0
+tsumachug,0,0,0,0,1,0,9,1,0,0,0,3,0,0,8,0,0,4
+uchen_sugdring,0,2,0,0,0,0,0,0,1,0,0,0,0,1,0,107,14,0
+uchen_sugthung,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,13,21,0
+yigchung,0,2,0,0,0,0,1,2,0,0,0,3,0,6,4,0,0,6

whole_page/confusion_matrix.png ADDED Viewed

Git LFS Details

SHA256: 8685c8b1287e0816eea11ed8a5b2959be657e296af1201ef444c51f9fbc1f7dd
Pointer size: 131 Bytes
Size of remote file: 169 kB

whole_page/final_model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:987f2d4139415491e3323eb8a6a622365d1b336897dfb07383d35146a2afb38f
+size 86680521

whole_page/results.json ADDED Viewed

	@@ -0,0 +1,725 @@

+{
+  "experiment": "whole_page",
+  "model": "facebook/dinov3-vits16-pretrain-lvd1689m",
+  "num_classes": 18,
+  "best_val_checkpoint": "results/whole_page/best_stage_b_last_2_blocks.pt",
+  "val_macro_f1_at_selection": 0.5526159517101854,
+  "final_model_path": "results/whole_page/final_model.pt",
+  "test_metrics": {
+    "loss": 1.191628887755046,
+    "accuracy": 0.5710900473933649,
+    "macro_f1": 0.5123725993698094,
+    "weighted_f1": 0.5781527270486508
+  },
+  "history": {
+    "stage_a": [
+      {
+        "epoch": 1,
+        "train_loss": 2.20866057321474,
+        "train_acc": 0.31456456456456455,
+        "val_macro_f1": 0.39456076575655513,
+        "val_loss": 1.4671607864976495,
+        "val_accuracy": 0.48696682464454977
+      },
+      {
+        "epoch": 2,
+        "train_loss": 1.706618641589855,
+        "train_acc": 0.4216716716716717,
+        "val_macro_f1": 0.43415460414512386,
+        "val_loss": 1.276340951286786,
+        "val_accuracy": 0.5651658767772512
+      },
+      {
+        "epoch": 3,
+        "train_loss": 1.5752889993073824,
+        "train_acc": 0.4471971971971972,
+        "val_macro_f1": 0.44861434264049055,
+        "val_loss": 1.2922945689251073,
+        "val_accuracy": 0.5509478672985783
+      },
+      {
+        "epoch": 4,
+        "train_loss": 1.51657935490718,
+        "train_acc": 0.45595595595595595,
+        "val_macro_f1": 0.477222997128718,
+        "val_loss": 1.2938398139736664,
+        "val_accuracy": 0.5379146919431279
+      },
+      {
+        "epoch": 5,
+        "train_loss": 1.474987536937267,
+        "train_acc": 0.4637137137137137,
+        "val_macro_f1": 0.4801019078317697,
+        "val_loss": 1.2509917919104698,
+        "val_accuracy": 0.5604265402843602
+      },
+      {
+        "epoch": 6,
+        "train_loss": 1.3946579505015422,
+        "train_acc": 0.4964964964964965,
+        "val_macro_f1": 0.4615615264486478,
+        "val_loss": 1.2553648507990542,
+        "val_accuracy": 0.5545023696682464
+      },
+      {
+        "epoch": 7,
+        "train_loss": 1.4076038572761986,
+        "train_acc": 0.47597597597597596,
+        "val_macro_f1": 0.5026596529501132,
+        "val_loss": 1.2138842044848401,
+        "val_accuracy": 0.5746445497630331
+      },
+      {
+        "epoch": 8,
+        "train_loss": 1.367771333283013,
+        "train_acc": 0.49124124124124124,
+        "val_macro_f1": 0.502396524188393,
+        "val_loss": 1.1715130546081687,
+        "val_accuracy": 0.590047393364929
+      },
+      {
+        "epoch": 9,
+        "train_loss": 1.3250896964106593,
+        "train_acc": 0.49924924924924924,
+        "val_macro_f1": 0.4870819897689354,
+        "val_loss": 1.2463707618803774,
+        "val_accuracy": 0.5639810426540285
+      },
+      {
+        "epoch": 10,
+        "train_loss": 1.2848107659661614,
+        "train_acc": 0.5075075075075075,
+        "val_macro_f1": 0.5136245272249241,
+        "val_loss": 1.175841974421135,
+        "val_accuracy": 0.5829383886255924
+      },
+      {
+        "epoch": 11,
+        "train_loss": 1.247214116491713,
+        "train_acc": 0.5185185185185185,
+        "val_macro_f1": 0.5022088771755583,
+        "val_loss": 1.210867343920667,
+        "val_accuracy": 0.5758293838862559
+      },
+      {
+        "epoch": 12,
+        "train_loss": 1.2895851612568379,
+        "train_acc": 0.5092592592592593,
+        "val_macro_f1": 0.5025588475579801,
+        "val_loss": 1.1879199217846044,
+        "val_accuracy": 0.5770142180094787
+      },
+      {
+        "epoch": 13,
+        "train_loss": 1.2661696529245234,
+        "train_acc": 0.5125125125125125,
+        "val_macro_f1": 0.49546766292382316,
+        "val_loss": 1.173302908644292,
+        "val_accuracy": 0.5699052132701422
+      },
+      {
+        "epoch": 14,
+        "train_loss": 1.2424312025696427,
+        "train_acc": 0.5175175175175175,
+        "val_macro_f1": 0.5122207152797997,
+        "val_loss": 1.1634096268793983,
+        "val_accuracy": 0.5912322274881516
+      },
+      {
+        "epoch": 15,
+        "train_loss": 1.2361648449072011,
+        "train_acc": 0.5232732732732732,
+        "val_macro_f1": 0.5298881548218799,
+        "val_loss": 1.148757901801882,
+        "val_accuracy": 0.6042654028436019
+      },
+      {
+        "epoch": 16,
+        "train_loss": 1.2283220601392102,
+        "train_acc": 0.5232732732732732,
+        "val_macro_f1": 0.530170441052968,
+        "val_loss": 1.1368969322945834,
+        "val_accuracy": 0.6054502369668247
+      },
+      {
+        "epoch": 17,
+        "train_loss": 1.2229472460212172,
+        "train_acc": 0.5377877877877878,
+        "val_macro_f1": 0.5278305486788782,
+        "val_loss": 1.1526551461332781,
+        "val_accuracy": 0.6018957345971564
+      },
+      {
+        "epoch": 18,
+        "train_loss": 1.2219269069226775,
+        "train_acc": 0.5325325325325325,
+        "val_macro_f1": 0.5243067602535394,
+        "val_loss": 1.1563605318702228,
+        "val_accuracy": 0.5971563981042654
+      },
+      {
+        "epoch": 19,
+        "train_loss": 1.1975611028251227,
+        "train_acc": 0.5392892892892893,
+        "val_macro_f1": 0.5189097715335087,
+        "val_loss": 1.1478128331532411,
+        "val_accuracy": 0.5947867298578199
+      },
+      {
+        "epoch": 20,
+        "train_loss": 1.1934834290314484,
+        "train_acc": 0.5317817817817818,
+        "val_macro_f1": 0.5176953192505984,
+        "val_loss": 1.150610593257922,
+        "val_accuracy": 0.5924170616113744
+      }
+    ],
+    "stage_b": [
+      {
+        "epoch": 1,
+        "train_loss": 1.3155108488596476,
+        "train_acc": 0.501001001001001,
+        "val_macro_f1": 0.4685903796568659,
+        "val_loss": 1.23213265405447,
+        "val_accuracy": 0.566350710900474
+      },
+      {
+        "epoch": 2,
+        "train_loss": 1.2663482601220186,
+        "train_acc": 0.5135135135135135,
+        "val_macro_f1": 0.51159501034612,
+        "val_loss": 1.2073333534584225,
+        "val_accuracy": 0.5770142180094787
+      },
+      {
+        "epoch": 3,
+        "train_loss": 1.2039633168353214,
+        "train_acc": 0.5212712712712713,
+        "val_macro_f1": 0.5001515484504192,
+        "val_loss": 1.212121329601342,
+        "val_accuracy": 0.5758293838862559
+      },
+      {
+        "epoch": 4,
+        "train_loss": 1.237001917920671,
+        "train_acc": 0.5132632632632632,
+        "val_macro_f1": 0.5071135037099957,
+        "val_loss": 1.2374758641301737,
+        "val_accuracy": 0.5675355450236966
+      },
+      {
+        "epoch": 5,
+        "train_loss": 1.1571017785353943,
+        "train_acc": 0.5402902902902903,
+        "val_macro_f1": 0.5320860576681696,
+        "val_loss": 1.2000342935182473,
+        "val_accuracy": 0.5770142180094787
+      },
+      {
+        "epoch": 6,
+        "train_loss": 1.1839140787258282,
+        "train_acc": 0.5355355355355356,
+        "val_macro_f1": 0.5198682878490827,
+        "val_loss": 1.1651039225230284,
+        "val_accuracy": 0.5864928909952607
+      },
+      {
+        "epoch": 7,
+        "train_loss": 1.1373721201856573,
+        "train_acc": 0.5402902902902903,
+        "val_macro_f1": 0.5301861817792031,
+        "val_loss": 1.1298535276928219,
+        "val_accuracy": 0.6007109004739336
+      },
+      {
+        "epoch": 8,
+        "train_loss": 1.1343186245308265,
+        "train_acc": 0.5508008008008008,
+        "val_macro_f1": 0.5499040784647757,
+        "val_loss": 1.141982549174702,
+        "val_accuracy": 0.6018957345971564
+      },
+      {
+        "epoch": 9,
+        "train_loss": 1.076228421729606,
+        "train_acc": 0.5603103103103103,
+        "val_macro_f1": 0.5507236741532686,
+        "val_loss": 1.1353786454946508,
+        "val_accuracy": 0.6125592417061612
+      },
+      {
+        "epoch": 10,
+        "train_loss": 1.0556169597952216,
+        "train_acc": 0.5615615615615616,
+        "val_macro_f1": 0.5526159517101854,
+        "val_loss": 1.1300031246167224,
+        "val_accuracy": 0.6125592417061612
+      }
+    ],
+    "stage_c": [
+      {
+        "epoch": 1,
+        "train_loss": 1.1528259001455985,
+        "train_acc": 0.5357857857857858,
+        "val_macro_f1": 0.5497651661380282,
+        "val_loss": 1.1855918479756722,
+        "val_accuracy": 0.5924170616113744
+      },
+      {
+        "epoch": 2,
+        "train_loss": 1.0998183033607147,
+        "train_acc": 0.5528028028028028,
+        "val_macro_f1": 0.5383073695552381,
+        "val_loss": 1.2090052756088039,
+        "val_accuracy": 0.5817535545023697
+      },
+      {
+        "epoch": 3,
+        "train_loss": 1.0954274918820646,
+        "train_acc": 0.5598098098098098,
+        "val_macro_f1": 0.5516726999736293,
+        "val_loss": 1.1517153537668887,
+        "val_accuracy": 0.6137440758293838
+      },
+      {
+        "epoch": 4,
+        "train_loss": 1.0348763885918084,
+        "train_acc": 0.5695695695695696,
+        "val_macro_f1": 0.5324538841027082,
+        "val_loss": 1.1520833731827578,
+        "val_accuracy": 0.5924170616113744
+      },
+      {
+        "epoch": 5,
+        "train_loss": 1.0805193128528539,
+        "train_acc": 0.5645645645645646,
+        "val_macro_f1": 0.5350905236046971,
+        "val_loss": 1.1407714692337254,
+        "val_accuracy": 0.5959715639810427
+      },
+      {
+        "epoch": 6,
+        "train_loss": 1.054177247845494,
+        "train_acc": 0.5615615615615616,
+        "val_macro_f1": 0.5308509777096463,
+        "val_loss": 1.1249282184935294,
+        "val_accuracy": 0.5995260663507109
+      },
+      {
+        "epoch": 7,
+        "train_loss": 1.0641172317651895,
+        "train_acc": 0.5685685685685685,
+        "val_macro_f1": 0.5462144827909508,
+        "val_loss": 1.1365883440767983,
+        "val_accuracy": 0.6030805687203792
+      },
+      {
+        "epoch": 8,
+        "train_loss": 1.006467128480161,
+        "train_acc": 0.5795795795795796,
+        "val_macro_f1": 0.5399544885145189,
+        "val_loss": 1.1192362528841642,
+        "val_accuracy": 0.6054502369668247
+      },
+      {
+        "epoch": 9,
+        "train_loss": 0.9975554783781011,
+        "train_acc": 0.5818318318318318,
+        "val_macro_f1": 0.5521462690027934,
+        "val_loss": 1.1212286163845333,
+        "val_accuracy": 0.6101895734597157
+      },
+      {
+        "epoch": 10,
+        "train_loss": 1.0012436508535743,
+        "train_acc": 0.5745745745745746,
+        "val_macro_f1": 0.5513181019590729,
+        "val_loss": 1.122439599715138,
+        "val_accuracy": 0.6078199052132701
+      }
+    ]
+  },
+  "confusion_matrix": [
+    [
+      7,
+      0,
+      0,
+      0,
+      0,
+      5,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      2,
+      0,
+      0,
+      0,
+      0,
+      0
+    ],
+    [
+      0,
+      22,
+      1,
+      0,
+      1,
+      0,
+      0,
+      0,
+      1,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0
+    ],
+    [
+      0,
+      0,
+      5,
+      0,
+      1,
+      0,
+      1,
+      0,
+      0,
+      1,
+      1,
+      0,
+      3,
+      1,
+      3,
+      0,
+      0,
+      3
+    ],
+    [
+      0,
+      0,
+      0,
+      8,
+      4,
+      4,
+      2,
+      0,
+      0,
+      1,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0
+    ],
+    [
+      0,
+      0,
+      0,
+      1,
+      16,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0
+    ],
+    [
+      6,
+      0,
+      3,
+      3,
+      0,
+      16,
+      0,
+      1,
+      0,
+      2,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0
+    ],
+    [
+      0,
+      2,
+      0,
+      0,
+      0,
+      0,
+      11,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      3,
+      0,
+      0,
+      0
+    ],
+    [
+      0,
+      0,
+      0,
+      2,
+      2,
+      0,
+      0,
+      16,
+      0,
+      2,
+      2,
+      1,
+      1,
+      2,
+      3,
+      1,
+      3,
+      0
+    ],
+    [
+      0,
+      0,
+      0,
+      1,
+      0,
+      0,
+      0,
+      0,
+      27,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0
+    ],
+    [
+      0,
+      0,
+      5,
+      0,
+      0,
+      0,
+      0,
+      1,
+      0,
+      56,
+      15,
+      0,
+      11,
+      2,
+      1,
+      0,
+      0,
+      1
+    ],
+    [
+      0,
+      2,
+      3,
+      2,
+      0,
+      1,
+      0,
+      19,
+      0,
+      44,
+      105,
+      0,
+      30,
+      1,
+      1,
+      0,
+      0,
+      0
+    ],
+    [
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      1,
+      0,
+      2,
+      0,
+      1,
+      2,
+      0,
+      0,
+      0
+    ],
+    [
+      1,
+      0,
+      16,
+      2,
+      0,
+      1,
+      0,
+      9,
+      0,
+      20,
+      13,
+      2,
+      43,
+      2,
+      0,
+      0,
+      0,
+      3
+    ],
+    [
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      4,
+      0,
+      6,
+      1,
+      0,
+      0,
+      0
+    ],
+    [
+      0,
+      0,
+      0,
+      0,
+      1,
+      0,
+      9,
+      1,
+      0,
+      0,
+      0,
+      3,
+      0,
+      0,
+      8,
+      0,
+      0,
+      4
+    ],
+    [
+      0,
+      2,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      1,
+      0,
+      0,
+      0,
+      0,
+      1,
+      0,
+      107,
+      14,
+      0
+    ],
+    [
+      0,
+      0,
+      0,
+      0,
+      1,
+      0,
+      0,
+      1,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      0,
+      13,
+      21,
+      0
+    ],
+    [
+      0,
+      2,
+      0,
+      0,
+      0,
+      0,
+      1,
+      2,
+      0,
+      0,
+      0,
+      3,
+      0,
+      6,
+      4,
+      0,
+      0,
+      6
+    ]
+  ],
+  "label_to_idx": {
+    "dhumri": 0,
+    "difficult": 1,
+    "drathung": 2,
+    "drudring": 3,
+    "druring": 4,
+    "druthung": 5,
+    "khyuyig": 6,
+    "multi_scripts": 7,
+    "non_tibetan": 8,
+    "peri": 9,
+    "petsuk": 10,
+    "trinyig": 11,
+    "tsegdrig": 12,
+    "tsugchung": 13,
+    "tsumachug": 14,
+    "uchen_sugdring": 15,
+    "uchen_sugthung": 16,
+    "yigchung": 17
+  },
+  "classification_report": "                precision    recall  f1-score   support\n\n        dhumri       0.50      0.50      0.50        14\n     difficult       0.73      0.88      0.80        25\n      drathung       0.15      0.26      0.19        19\n      drudring       0.42      0.42      0.42        19\n       druring       0.62      0.94      0.74        17\n      druthung       0.59      0.52      0.55        31\n       khyuyig       0.46      0.69      0.55        16\n multi_scripts       0.32      0.46      0.38        35\n   non_tibetan       0.93      0.96      0.95        28\n          peri       0.44      0.61      0.51        92\n        petsuk       0.77      0.50      0.61       208\n       trinyig       0.13      0.33      0.19         6\n      tsegdrig       0.48      0.38      0.43       112\n     tsugchung       0.27      0.55      0.36        11\n     tsumachug       0.31      0.31      0.31        26\nuchen_sugdring       0.88      0.86      0.87       125\nuchen_sugthung       0.55      0.58      0.57        36\n      yigchung       0.35      0.25      0.29        24\n\n      accuracy                           0.57       844\n     macro avg       0.50      0.56      0.51       844\n  weighted avg       0.61      0.57      0.58       844\n"
+}

whole_page/splits.json ADDED Viewed

	@@ -0,0 +1,108 @@

+{
+  "label_to_idx": {
+    "dhumri": 0,
+    "difficult": 1,
+    "drathung": 2,
+    "drudring": 3,
+    "druring": 4,
+    "druthung": 5,
+    "khyuyig": 6,
+    "multi_scripts": 7,
+    "non_tibetan": 8,
+    "peri": 9,
+    "petsuk": 10,
+    "trinyig": 11,
+    "tsegdrig": 12,
+    "tsugchung": 13,
+    "tsumachug": 14,
+    "uchen_sugdring": 15,
+    "uchen_sugthung": 16,
+    "yigchung": 17
+  },
+  "idx_to_label": {
+    "0": "dhumri",
+    "1": "difficult",
+    "2": "drathung",
+    "3": "drudring",
+    "4": "druring",
+    "5": "druthung",
+    "6": "khyuyig",
+    "7": "multi_scripts",
+    "8": "non_tibetan",
+    "9": "peri",
+    "10": "petsuk",
+    "11": "trinyig",
+    "12": "tsegdrig",
+    "13": "tsugchung",
+    "14": "tsumachug",
+    "15": "uchen_sugdring",
+    "16": "uchen_sugthung",
+    "17": "yigchung"
+  },
+  "split_counts": {
+    "train": {
+      "dhumri": 70,
+      "difficult": 120,
+      "drathung": 91,
+      "drudring": 94,
+      "druring": 85,
+      "druthung": 145,
+      "khyuyig": 81,
+      "multi_scripts": 165,
+      "non_tibetan": 136,
+      "peri": 430,
+      "petsuk": 972,
+      "trinyig": 30,
+      "tsegdrig": 525,
+      "tsugchung": 55,
+      "tsumachug": 126,
+      "uchen_sugdring": 585,
+      "uchen_sugthung": 168,
+      "yigchung": 118
+    },
+    "val": {
+      "dhumri": 14,
+      "difficult": 25,
+      "drathung": 19,
+      "drudring": 19,
+      "druring": 17,
+      "druthung": 31,
+      "khyuyig": 16,
+      "multi_scripts": 35,
+      "non_tibetan": 28,
+      "peri": 92,
+      "petsuk": 208,
+      "trinyig": 6,
+      "tsegdrig": 112,
+      "tsugchung": 11,
+      "tsumachug": 26,
+      "uchen_sugdring": 125,
+      "uchen_sugthung": 36,
+      "yigchung": 24
+    },
+    "test": {
+      "dhumri": 14,
+      "difficult": 25,
+      "drathung": 19,
+      "drudring": 19,
+      "druring": 17,
+      "druthung": 31,
+      "khyuyig": 16,
+      "multi_scripts": 35,
+      "non_tibetan": 28,
+      "peri": 92,
+      "petsuk": 208,
+      "trinyig": 6,
+      "tsegdrig": 112,
+      "tsugchung": 11,
+      "tsumachug": 26,
+      "uchen_sugdring": 125,
+      "uchen_sugthung": 36,
+      "yigchung": 24
+    }
+  },
+  "exclude_manifest": "./benchmark_page_ids.json",
+  "excluded_label_count": 18,
+  "excluded_page_id_count": 88,
+  "skipped_excluded_files_by_class": {}
+}