loc_1

Sleeping

App Files Files Community

Molbap HF Staff commited on Sep 26

Commit

24e2b9e

1 Parent(s): 9236438

nicer model card

Browse files

Files changed (1) hide show

README.md +94 -2

README.md CHANGED Viewed

@@ -8,7 +8,99 @@ sdk_version: 5.39.0
 app_file: app.py
 pinned: false
 license: mit
-short_description: Find duplicated codepaths and possible refactorings in graph
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 app_file: app.py
 pinned: false
 license: mit
+short_description: Interactive analyzer for modular refactoring opportunities in HuggingFace Transformers
 ---
+# 🔍 Transformers Modular Refactor Analyzer
+This interactive tool helps analyze modular refactoring opportunities in the HuggingFace Transformers library by visualizing model relationships, similarity patterns, and the impact of modularization on code maintainability.
+## 📊 Features Overview
+### 🕒 **Tab 1: Chronological Timeline**
+Interactive timeline showing the evolution of transformer models with modular dependencies positioned by their creation dates.
+**Key Features:**
+- Models positioned chronologically by git history
+- Modular dependency connections between models
+- Similarity scores between candidate models (red dashed edges)
+- Timeline axis with year/month markers
+- **Modular Logic Milestone**: May 31, 2024 marker showing when modular logic was introduced
+- Search functionality to highlight specific models and their connections
+- Zoom and pan to explore the full timeline
+**Visual Legend:**
+- 🟡 **Base models**: Foundation models that others depend on
+- 🔵 **Modular models**: Models with existing `modular_*.py` implementations
+- 🔴 **Candidate models**: Models without modular implementations (refactoring opportunities)
+- **Blue edges**: Import dependencies between modular implementations
+- **Red dashed edges**: High similarity scores indicating refactoring potential
+### 📈 **Tab 2: LOC Growth**
+Chart visualizing how modular refactoring impacts Lines of Code (LOC) over time in the transformers repository.
+**Metrics Tracked:**
+- **Effective LOC**: Total maintainable code (modeling LOC for non-modular + modular LOC)
+- **Modular LOC**: Lines of code in `modular_*.py` files
+- **Modeling LOC (all)**: Total lines in all `modeling_*.py` files
+- **Modeling LOC (included)**: Lines in `modeling_*.py` files for models without modular versions
+**Key Insights:**
+- Shows the trajectory toward reduced code duplication
+- Demonstrates how modular refactoring can reduce total maintainable code
+- May 31, 2024 annotation marks the introduction of modular logic
+- Interactive chart with time-series data from git history
+### 🌐 **Tab 3: Dependency Graph**
+Static network visualization focusing on model relationships and similarity patterns without chronological constraints.
+**Features:**
+- Force-directed graph layout optimized for relationship visibility
+- Toggle to show/hide candidate models and similarity edges
+- Node sizes reflect connection degree (more connected = larger)
+- Interactive drag-and-drop for graph exploration
+- Zoom and pan capabilities
+**Analysis Capabilities:**
+- Identify clusters of highly similar models (refactoring targets)
+- Understand modular dependency patterns
+- Spot potential consolidation opportunities
+- Explore the current modular architecture
+## 🛠️ Technical Details
+### Similarity Methods
+- **Jaccard Similarity**: Token-based similarity using identifier overlap in source code
+- **Embedding Similarity**: CodeBERT-based semantic similarity (when available)
+### Data Sources
+- **Git History**: Model creation dates from transformers repository commits
+- **Source Analysis**: AST parsing of `modeling_*.py` and `modular_*.py` files
+- **Dependency Tracking**: Import analysis to build modular dependency graphs
+- **Cached Embeddings**: Pre-computed similarity matrices for performance
+### Filtering Options
+- **Similarity Threshold**: Adjustable cutoff for showing similarity edges (0.5-0.95)
+- **Multimodal Filter**: Focus on models with multimodal capabilities (models mentioning "pixel_values")
+- **Show/Hide Candidates**: Toggle visibility of non-modular models and their similarities
+## 🎯 Use Cases
+1. **Refactoring Planning**: Identify which models would benefit most from modularization
+2. **Architecture Analysis**: Understand current modular dependencies and patterns
+3. **Code Reduction**: Quantify the impact of modular refactoring on maintainability
+4. **Timeline Analysis**: See how the transformers library evolved toward modular architecture
+## 📚 How to Use
+1. **Chronological Timeline**: Use the search box to find specific models, zoom to explore different time periods, click nodes to highlight connections
+2. **LOC Growth**: Hover over data points to see exact metrics, observe the trend toward code reduction
+3. **Dependency Graph**: Drag nodes to reorganize the layout, toggle candidates on/off, use zoom for detailed exploration
+## 🔬 Research Context
+This tool supports analysis of modular refactoring in large-scale ML libraries, helping identify code duplication patterns and measure the effectiveness of architectural improvements in reducing maintenance burden.
+---
+*Built with Gradio, D3.js, and ApexCharts for interactive data visualization*