Johnny commited on
Commit
e15fb9b
Β·
1 Parent(s): c2f9ec8

added utils directory

Browse files
Files changed (1) hide show
  1. UTILS_DIRECTORY_GUIDE.md +0 -129
UTILS_DIRECTORY_GUIDE.md CHANGED
@@ -65,11 +65,6 @@ pages/Format_Resume.py
65
  | `data/job_titles.json` | **Job title patterns** - Used by extractor_fixed.py for regex matching | When all AI methods fail (fallback) | 🟑 BACKUP |
66
  | `data/skills.json` | **Skills database** - Used by extractor_fixed.py for spaCy skill matching | When all AI methods fail (fallback) | 🟑 BACKUP |
67
 
68
- ### **❌ NOT NEEDED - Other Features**
69
-
70
- | File | Purpose | Why Not Needed |
71
- |------|---------|----------------|
72
- | `screening.py` | Resume evaluation, scoring, candidate screening | Used by TalentLens.py, not Format_Resume.py |
73
 
74
  ## πŸš€ **Format_Resume.py Extraction Flow**
75
 
@@ -83,127 +78,3 @@ pages/Format_Resume.py
83
  └── If all fail β†’ Use extractor_fixed.py (regex fallback) β†’ uses data/*.json
84
  3. builder.py generates formatted Word document with preserved template headers/footers
85
  4. User downloads formatted resume with Qvell branding and proper formatting
86
- ```
87
-
88
- ## πŸ—οΈ **Document Builder Enhancements**
89
-
90
- The `builder.py` has been enhanced to properly handle template preservation:
91
-
92
- ### **Header/Footer Preservation**
93
- - βœ… **Preserves Qvell logo** and branding in header
94
- - βœ… **Maintains footer address** (6001 Tain Dr. Suite 203, Dublin, OH, 43016)
95
- - βœ… **Eliminates blank pages** by clearing only body content
96
- - βœ… **Preserves image references** to prevent broken images
97
-
98
- ### **Content Generation Features**
99
- - βœ… **Professional Summary** extraction and formatting
100
- - βœ… **Skills table** with 3-column layout
101
- - βœ… **Professional Experience** with job titles, companies, dates
102
- - βœ… **Career Timeline** chronological job history
103
- - βœ… **Education and Training** sections
104
- - βœ… **Proper date formatting** (e.g., "February 2017 – Present")
105
-
106
- ## πŸ“Š **File Usage Statistics**
107
-
108
- - **Total utils files**: 11
109
- - **Required for Format_Resume.py**: 10 files (91%)
110
- - **Not needed for Format_Resume.py**: 1 file (9%)
111
-
112
- ## 🧹 **Cleanup Recommendations**
113
-
114
- If you want to **minimize the utils folder** for Format_Resume.py only:
115
-
116
- ### **Keep These 10 Files:**
117
- ```
118
- utils/
119
- β”œβ”€β”€ hybrid_extractor.py # Main orchestrator
120
- β”œβ”€β”€ openai_extractor.py # OpenAI GPT-4o (primary)
121
- β”œβ”€β”€ hf_cloud_extractor.py # HF Cloud (backup)
122
- β”œβ”€β”€ ai_extractor.py # HF AI (fallback)
123
- β”œβ”€β”€ hf_extractor_simple.py # Simple HF (fallback)
124
- β”œβ”€β”€ extractor_fixed.py # Regex (last resort)
125
- β”œβ”€β”€ builder.py # Document generation with template preservation
126
- β”œβ”€β”€ parser.py # File parsing
127
- └── data/
128
- β”œβ”€β”€ job_titles.json # Job title patterns for regex fallback
129
- └── skills.json # Skills database for spaCy fallback
130
- ```
131
-
132
- ### **Can Remove This 1 File (if only using Format_Resume.py):**
133
- ```
134
- utils/
135
- └── screening.py # Only used by TalentLens.py
136
- ```
137
-
138
- ## πŸ’‘ **Best Practices for Format_Resume.py**
139
-
140
- 1. **Always use `hybrid_extractor.py`** as your main entry point
141
- 2. **Set environment variables** for best results:
142
- - `OPENAI_API_KEY` for OpenAI GPT-4o (primary)
143
- - `HF_API_TOKEN` for Hugging Face Cloud (backup)
144
- 3. **Use this configuration** in Format_Resume.py:
145
- ```python
146
- data = extract_resume_sections(
147
- resume_text,
148
- prefer_ai=True,
149
- use_openai=True, # Try OpenAI GPT-4o first (best results)
150
- use_hf_cloud=True # Fallback to HF Cloud (good backup)
151
- )
152
- ```
153
- 4. **Template preservation** is automatic - headers and footers are maintained
154
- 5. **Fallback system** ensures extraction never completely fails
155
-
156
- ## πŸ”§ **Recent System Improvements**
157
-
158
- ### **Header/Footer Preservation (Latest Fix)**
159
- - **Problem**: Template headers and footers were being lost during document generation
160
- - **Solution**: Conservative content clearing that preserves document structure
161
- - **Result**: Qvell branding and footer address now properly maintained
162
-
163
- ### **Extraction Quality Enhancements**
164
- - **OpenAI GPT-4o Integration**: Primary extraction method with structured prompts
165
- - **Contact Info Extraction**: Automatic email, phone, LinkedIn detection
166
- - **Skills Cleaning**: Improved filtering to remove company names and broken fragments
167
- - **Experience Structuring**: Better job title, company, and date extraction
168
-
169
- ### **Fallback System Reliability**
170
- - **JSON Dependencies**: job_titles.json and skills.json required for regex fallback
171
- - **Quality Validation**: Each extraction method is validated before acceptance
172
- - **Graceful Degradation**: System never fails completely, always produces output
173
-
174
- ## πŸ§ͺ **Testing Format_Resume.py Dependencies**
175
-
176
- ```python
177
- # Test all required components for Format_Resume.py
178
- from utils.hybrid_extractor import extract_resume_sections, HybridResumeExtractor
179
- from utils.builder import build_resume_from_data
180
- from utils.parser import parse_resume
181
-
182
- # Test extraction with all fallbacks
183
- sample_text = "John Doe\nSoftware Engineer\nPython, Java, React"
184
- result = extract_resume_sections(sample_text, prefer_ai=True, use_openai=True, use_hf_cloud=True)
185
-
186
- # Test document building with template preservation
187
- template_path = "templates/blank_resume.docx"
188
- doc = build_resume_from_data(template_path, result)
189
-
190
- print("βœ… All Format_Resume.py dependencies working!")
191
- print(f"βœ… Extraction method used: {result.get('extraction_method', 'unknown')}")
192
- print(f"βœ… Headers/footers preserved: {len(doc.sections)} sections")
193
- ```
194
-
195
- ## 🎯 **System Architecture Summary**
196
-
197
- The Format_Resume.py system now provides:
198
-
199
- 1. **Robust Extraction**: 5-tier fallback system (OpenAI β†’ HF Cloud β†’ HF AI β†’ HF Simple β†’ Regex)
200
- 2. **Template Preservation**: Headers, footers, and branding maintained perfectly
201
- 3. **Quality Assurance**: Each extraction method validated for completeness
202
- 4. **Professional Output**: Properly formatted Word documents with consistent styling
203
- 5. **Reliability**: System never fails completely, always produces usable output
204
-
205
- ---
206
-
207
- **The utils directory analysis shows 10 out of 11 files are needed for Format_Resume.py functionality! 🎯**
208
-
209
- **Recent improvements ensure perfect template preservation and reliable extraction quality.** ✨
 
65
  | `data/job_titles.json` | **Job title patterns** - Used by extractor_fixed.py for regex matching | When all AI methods fail (fallback) | 🟑 BACKUP |
66
  | `data/skills.json` | **Skills database** - Used by extractor_fixed.py for spaCy skill matching | When all AI methods fail (fallback) | 🟑 BACKUP |
67
 
 
 
 
 
 
68
 
69
  ## πŸš€ **Format_Resume.py Extraction Flow**
70
 
 
78
  └── If all fail β†’ Use extractor_fixed.py (regex fallback) β†’ uses data/*.json
79
  3. builder.py generates formatted Word document with preserved template headers/footers
80
  4. User downloads formatted resume with Qvell branding and proper formatting