Molbap HF Staff commited on
Commit
24e2b9e
Β·
1 Parent(s): 9236438

nicer model card

Browse files
Files changed (1) hide show
  1. README.md +94 -2
README.md CHANGED
@@ -8,7 +8,99 @@ sdk_version: 5.39.0
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
- short_description: Find duplicated codepaths and possible refactorings in graph
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
+ short_description: Interactive analyzer for modular refactoring opportunities in HuggingFace Transformers
12
  ---
13
 
14
+ # πŸ” Transformers Modular Refactor Analyzer
15
+
16
+ This interactive tool helps analyze modular refactoring opportunities in the HuggingFace Transformers library by visualizing model relationships, similarity patterns, and the impact of modularization on code maintainability.
17
+
18
+ ## πŸ“Š Features Overview
19
+
20
+ ### πŸ•’ **Tab 1: Chronological Timeline**
21
+ Interactive timeline showing the evolution of transformer models with modular dependencies positioned by their creation dates.
22
+
23
+ **Key Features:**
24
+ - Models positioned chronologically by git history
25
+ - Modular dependency connections between models
26
+ - Similarity scores between candidate models (red dashed edges)
27
+ - Timeline axis with year/month markers
28
+ - **Modular Logic Milestone**: May 31, 2024 marker showing when modular logic was introduced
29
+ - Search functionality to highlight specific models and their connections
30
+ - Zoom and pan to explore the full timeline
31
+
32
+ **Visual Legend:**
33
+ - 🟑 **Base models**: Foundation models that others depend on
34
+ - πŸ”΅ **Modular models**: Models with existing `modular_*.py` implementations
35
+ - πŸ”΄ **Candidate models**: Models without modular implementations (refactoring opportunities)
36
+ - **Blue edges**: Import dependencies between modular implementations
37
+ - **Red dashed edges**: High similarity scores indicating refactoring potential
38
+
39
+ ### πŸ“ˆ **Tab 2: LOC Growth**
40
+ Chart visualizing how modular refactoring impacts Lines of Code (LOC) over time in the transformers repository.
41
+
42
+ **Metrics Tracked:**
43
+ - **Effective LOC**: Total maintainable code (modeling LOC for non-modular + modular LOC)
44
+ - **Modular LOC**: Lines of code in `modular_*.py` files
45
+ - **Modeling LOC (all)**: Total lines in all `modeling_*.py` files
46
+ - **Modeling LOC (included)**: Lines in `modeling_*.py` files for models without modular versions
47
+
48
+ **Key Insights:**
49
+ - Shows the trajectory toward reduced code duplication
50
+ - Demonstrates how modular refactoring can reduce total maintainable code
51
+ - May 31, 2024 annotation marks the introduction of modular logic
52
+ - Interactive chart with time-series data from git history
53
+
54
+ ### 🌐 **Tab 3: Dependency Graph**
55
+ Static network visualization focusing on model relationships and similarity patterns without chronological constraints.
56
+
57
+ **Features:**
58
+ - Force-directed graph layout optimized for relationship visibility
59
+ - Toggle to show/hide candidate models and similarity edges
60
+ - Node sizes reflect connection degree (more connected = larger)
61
+ - Interactive drag-and-drop for graph exploration
62
+ - Zoom and pan capabilities
63
+
64
+ **Analysis Capabilities:**
65
+ - Identify clusters of highly similar models (refactoring targets)
66
+ - Understand modular dependency patterns
67
+ - Spot potential consolidation opportunities
68
+ - Explore the current modular architecture
69
+
70
+ ## πŸ› οΈ Technical Details
71
+
72
+ ### Similarity Methods
73
+ - **Jaccard Similarity**: Token-based similarity using identifier overlap in source code
74
+ - **Embedding Similarity**: CodeBERT-based semantic similarity (when available)
75
+
76
+ ### Data Sources
77
+ - **Git History**: Model creation dates from transformers repository commits
78
+ - **Source Analysis**: AST parsing of `modeling_*.py` and `modular_*.py` files
79
+ - **Dependency Tracking**: Import analysis to build modular dependency graphs
80
+ - **Cached Embeddings**: Pre-computed similarity matrices for performance
81
+
82
+ ### Filtering Options
83
+ - **Similarity Threshold**: Adjustable cutoff for showing similarity edges (0.5-0.95)
84
+ - **Multimodal Filter**: Focus on models with multimodal capabilities (models mentioning "pixel_values")
85
+ - **Show/Hide Candidates**: Toggle visibility of non-modular models and their similarities
86
+
87
+ ## 🎯 Use Cases
88
+
89
+ 1. **Refactoring Planning**: Identify which models would benefit most from modularization
90
+ 2. **Architecture Analysis**: Understand current modular dependencies and patterns
91
+ 3. **Code Reduction**: Quantify the impact of modular refactoring on maintainability
92
+ 4. **Timeline Analysis**: See how the transformers library evolved toward modular architecture
93
+
94
+ ## πŸ“š How to Use
95
+
96
+ 1. **Chronological Timeline**: Use the search box to find specific models, zoom to explore different time periods, click nodes to highlight connections
97
+ 2. **LOC Growth**: Hover over data points to see exact metrics, observe the trend toward code reduction
98
+ 3. **Dependency Graph**: Drag nodes to reorganize the layout, toggle candidates on/off, use zoom for detailed exploration
99
+
100
+ ## πŸ”¬ Research Context
101
+
102
+ This tool supports analysis of modular refactoring in large-scale ML libraries, helping identify code duplication patterns and measure the effectiveness of architectural improvements in reducing maintenance burden.
103
+
104
+ ---
105
+
106
+ *Built with Gradio, D3.js, and ApexCharts for interactive data visualization*