lemms commited on
Commit
5bc5f16
Β·
verified Β·
1 Parent(s): 7a71b42

Add Space setup guide

Browse files
Files changed (1) hide show
  1. HUGGINGFACE_SPACE_SETUP_GUIDE.md +396 -0
HUGGINGFACE_SPACE_SETUP_GUIDE.md ADDED
@@ -0,0 +1,396 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸš€ Hugging Face Space Setup Guide for OpenLLM Training (GitHub Secrets)
2
+
3
+ This guide will help you set up proper authentication for Hugging Face Spaces using GitHub secrets so that your OpenLLM training and model uploads work correctly.
4
+
5
+ ## 🎯 Overview
6
+
7
+ The issue you encountered was that training completed successfully in Hugging Face Spaces, but the model upload failed due to authentication problems. This guide will ensure that future training runs in Spaces will have proper authentication using GitHub secrets and successful uploads.
8
+
9
+ ## πŸ”§ Step-by-Step Setup
10
+
11
+ ### Step 1: Get Your Hugging Face Token
12
+
13
+ 1. Go to [Hugging Face Settings](https://huggingface.co/settings/tokens)
14
+ 2. Click "New token"
15
+ 3. Give it a name (e.g., "OpenLLM Space Training")
16
+ 4. Select "Write" role for full access
17
+ 5. Copy the generated token
18
+
19
+ ### Step 2: Set Up GitHub Repository Secrets
20
+
21
+ 1. Go to your GitHub repository:
22
+ ```
23
+ https://github.com/your-username/your-repo
24
+ ```
25
+
26
+ 2. Click on the "Settings" tab
27
+
28
+ 3. In the left sidebar, click "Secrets and variables" β†’ "Actions"
29
+
30
+ 4. Click "New repository secret"
31
+
32
+ 5. Add a new secret:
33
+ - **Name**: `HF_TOKEN`
34
+ - **Value**: Your Hugging Face token from Step 1
35
+
36
+ 6. Click "Add secret"
37
+
38
+ **Note**: Hugging Face Spaces automatically have access to GitHub repository secrets, so you don't need to set them separately in the Space.
39
+
40
+ ### Step 3: Verify Authentication in Your Space
41
+
42
+ Add this code to your Space to verify authentication is working:
43
+
44
+ ```python
45
+ # Add this to your Space's main script or run it separately
46
+ import os
47
+ from huggingface_hub import HfApi, whoami
48
+
49
+ def verify_space_auth():
50
+ """Verify authentication is working in the Space using GitHub secrets."""
51
+ print("πŸ” Verifying Space Authentication (GitHub Secrets)")
52
+
53
+ # Check if HF_TOKEN is set (from GitHub secrets)
54
+ token = os.getenv("HF_TOKEN")
55
+ if not token:
56
+ print("❌ HF_TOKEN not found in Space environment")
57
+ print(" - Please set HF_TOKEN in your GitHub repository secrets")
58
+ print(" - Go to GitHub repository β†’ Settings β†’ Secrets and variables β†’ Actions")
59
+ return False
60
+
61
+ try:
62
+ # Test authentication
63
+ from huggingface_hub import login
64
+ login(token=token)
65
+
66
+ user_info = whoami()
67
+ username = user_info["name"]
68
+
69
+ print(f"βœ… Authentication successful!")
70
+ print(f" - Username: {username}")
71
+ print(f" - Token: {token[:8]}...{token[-4:]}")
72
+ print(f" - Source: GitHub secrets")
73
+
74
+ # Test API access
75
+ api = HfApi()
76
+ print(f"βœ… API access working")
77
+
78
+ return True
79
+
80
+ except Exception as e:
81
+ print(f"❌ Authentication failed: {e}")
82
+ return False
83
+
84
+ # Run verification
85
+ if __name__ == "__main__":
86
+ verify_space_auth()
87
+ ```
88
+
89
+ ### Step 4: Update Your Training Script
90
+
91
+ Modify your training script to include proper authentication using GitHub secrets:
92
+
93
+ ```python
94
+ import os
95
+ from huggingface_hub import HfApi, login, create_repo
96
+ import json
97
+
98
+ class SpaceTrainingManager:
99
+ """Manages training and upload in Hugging Face Spaces using GitHub secrets."""
100
+
101
+ def __init__(self):
102
+ self.api = None
103
+ self.username = None
104
+ self.setup_authentication()
105
+
106
+ def setup_authentication(self):
107
+ """Set up authentication for the Space using GitHub secrets."""
108
+ try:
109
+ # Get token from GitHub secrets (automatically available in Space)
110
+ token = os.getenv("HF_TOKEN")
111
+ if not token:
112
+ raise ValueError("HF_TOKEN not found in Space environment. Please set it in GitHub repository secrets.")
113
+
114
+ # Login
115
+ login(token=token)
116
+
117
+ # Initialize API
118
+ self.api = HfApi()
119
+ user_info = whoami()
120
+ self.username = user_info["name"]
121
+
122
+ print(f"βœ… Space authentication successful: {self.username}")
123
+ print(f" - Source: GitHub secrets")
124
+
125
+ except Exception as e:
126
+ print(f"❌ Authentication failed: {e}")
127
+ raise
128
+
129
+ def upload_model(self, model_dir: str, model_size: str = "small", steps: int = 8000):
130
+ """Upload the trained model to Hugging Face Hub."""
131
+ try:
132
+ # Create repository name
133
+ repo_name = f"openllm-{model_size}-extended-{steps//1000}k"
134
+ repo_id = f"{self.username}/{repo_name}"
135
+
136
+ print(f"πŸ“€ Uploading model to {repo_id}")
137
+
138
+ # Create repository
139
+ create_repo(
140
+ repo_id=repo_id,
141
+ repo_type="model",
142
+ exist_ok=True,
143
+ private=False
144
+ )
145
+
146
+ # Create model configuration
147
+ self.create_model_config(model_dir, model_size)
148
+
149
+ # Create model card
150
+ self.create_model_card(model_dir, repo_id, model_size, steps)
151
+
152
+ # Upload all files
153
+ self.api.upload_folder(
154
+ folder_path=model_dir,
155
+ repo_id=repo_id,
156
+ repo_type="model",
157
+ commit_message=f"Add OpenLLM {model_size} model ({steps} steps)"
158
+ )
159
+
160
+ print(f"βœ… Model uploaded successfully!")
161
+ print(f" - Repository: https://huggingface.co/{repo_id}")
162
+
163
+ return repo_id
164
+
165
+ except Exception as e:
166
+ print(f"❌ Upload failed: {e}")
167
+ raise
168
+
169
+ def create_model_config(self, model_dir: str, model_size: str):
170
+ """Create Hugging Face compatible configuration."""
171
+ config = {
172
+ "architectures": ["GPTModel"],
173
+ "model_type": "gpt",
174
+ "vocab_size": 32000,
175
+ "n_positions": 2048,
176
+ "n_embd": 768 if model_size == "small" else 1024 if model_size == "medium" else 1280,
177
+ "n_layer": 12 if model_size == "small" else 24 if model_size == "medium" else 32,
178
+ "n_head": 12 if model_size == "small" else 16 if model_size == "medium" else 20,
179
+ "bos_token_id": 1,
180
+ "eos_token_id": 2,
181
+ "pad_token_id": 0,
182
+ "unk_token_id": 3,
183
+ "transformers_version": "4.35.0",
184
+ "use_cache": True
185
+ }
186
+
187
+ config_path = os.path.join(model_dir, "config.json")
188
+ with open(config_path, "w") as f:
189
+ json.dump(config, f, indent=2)
190
+
191
+ def create_model_card(self, model_dir: str, repo_id: str, model_size: str, steps: int):
192
+ """Create model card (README.md)."""
193
+ model_card = f"""# OpenLLM {model_size.capitalize()} Model ({steps} steps)
194
+
195
+ This is a trained OpenLLM {model_size} model with extended training.
196
+
197
+ ## Model Details
198
+
199
+ - **Model Type**: GPT-style decoder-only transformer
200
+ - **Architecture**: Custom OpenLLM implementation
201
+ - **Training Data**: SQUAD dataset (Wikipedia passages)
202
+ - **Vocabulary Size**: 32,000 tokens
203
+ - **Sequence Length**: 2,048 tokens
204
+ - **Model Size**: {model_size.capitalize()}
205
+ - **Training Steps**: {steps:,}
206
+
207
+ ## Usage
208
+
209
+ This model can be used with the OpenLLM framework for text generation and language modeling tasks.
210
+
211
+ ## Training
212
+
213
+ The model was trained using the OpenLLM training pipeline with:
214
+ - SentencePiece tokenization
215
+ - Custom GPT architecture
216
+ - SQUAD dataset for training
217
+ - Extended training for improved performance
218
+
219
+ ## License
220
+
221
+ This model is released under the GNU General Public License v3.0.
222
+
223
+ ## Repository
224
+
225
+ This model is hosted on Hugging Face Hub: https://huggingface.co/{repo_id}
226
+ """
227
+
228
+ readme_path = os.path.join(model_dir, "README.md")
229
+ with open(readme_path, "w") as f:
230
+ f.write(model_card)
231
+
232
+ # Usage in your training script
233
+ def main():
234
+ # Initialize training manager
235
+ training_manager = SpaceTrainingManager()
236
+
237
+ # Your training code here...
238
+ # ... (training logic) ...
239
+
240
+ # After training completes, upload the model
241
+ model_dir = "./openllm-trained" # Your model directory
242
+ repo_id = training_manager.upload_model(model_dir, "small", 8000)
243
+
244
+ print(f"πŸŽ‰ Training and upload completed!")
245
+ print(f" - Model available at: https://huggingface.co/{repo_id}")
246
+
247
+ if __name__ == "__main__":
248
+ main()
249
+ ```
250
+
251
+ ### Step 5: Test the Setup
252
+
253
+ Run the authentication verification script in your Space to ensure everything is working:
254
+
255
+ ```python
256
+ # Add this to your Space to test
257
+ from setup_hf_space_auth import HuggingFaceSpaceAuthSetup
258
+
259
+ def test_space_setup():
260
+ """Test the Space authentication setup with GitHub secrets."""
261
+ auth_setup = HuggingFaceSpaceAuthSetup()
262
+
263
+ if auth_setup.setup_space_authentication():
264
+ print("βœ… Space authentication working")
265
+
266
+ # Test repository creation
267
+ if auth_setup.test_repository_creation():
268
+ print("βœ… Repository creation working")
269
+
270
+ # Test model upload
271
+ if auth_setup.test_model_upload():
272
+ print("βœ… Model upload working")
273
+
274
+ print("πŸŽ‰ All tests passed! Ready for training.")
275
+ else:
276
+ print("❌ Authentication setup failed")
277
+
278
+ # Run the test
279
+ test_space_setup()
280
+ ```
281
+
282
+ ## πŸ” Troubleshooting
283
+
284
+ ### Common Issues
285
+
286
+ 1. **"HF_TOKEN not found"**
287
+ - **Solution**: Make sure you've added the HF_TOKEN secret in your GitHub repository secrets
288
+ - **Check**: Go to GitHub repository β†’ Settings β†’ Secrets and variables β†’ Actions
289
+
290
+ 2. **"401 Unauthorized"**
291
+ - **Solution**: Verify your token has "Write" permissions
292
+ - **Check**: Go to https://huggingface.co/settings/tokens and ensure the token has "Write" role
293
+
294
+ 3. **"Repository creation failed"**
295
+ - **Solution**: Check if the repository name is unique
296
+ - **Check**: Ensure you have permission to create repositories
297
+
298
+ 4. **"Upload failed"**
299
+ - **Solution**: Check Space logs for detailed error messages
300
+ - **Check**: Verify network connectivity and file permissions
301
+
302
+ 5. **"GitHub secrets not accessible"**
303
+ - **Solution**: Ensure your Space is connected to the GitHub repository
304
+ - **Check**: Verify the Space is created from the GitHub repository
305
+
306
+ ### Verification Steps
307
+
308
+ 1. **Check Space Environment**:
309
+ ```python
310
+ import os
311
+ print("Space Environment Variables:")
312
+ for var in ["SPACE_ID", "SPACE_HOST", "HF_TOKEN"]:
313
+ value = os.getenv(var)
314
+ print(f" {var}: {'βœ… Set' if value else '❌ Not set'}")
315
+ ```
316
+
317
+ 2. **Test Authentication**:
318
+ ```python
319
+ from huggingface_hub import whoami
320
+ try:
321
+ user_info = whoami()
322
+ print(f"βœ… Authenticated as: {user_info['name']}")
323
+ except Exception as e:
324
+ print(f"❌ Authentication failed: {e}")
325
+ ```
326
+
327
+ 3. **Test Repository Creation**:
328
+ ```python
329
+ from huggingface_hub import create_repo, delete_repo
330
+ try:
331
+ repo_id = "lemms/test-repo"
332
+ create_repo(repo_id, repo_type="model", private=True)
333
+ print("βœ… Repository creation working")
334
+ delete_repo(repo_id, repo_type="model")
335
+ except Exception as e:
336
+ print(f"❌ Repository creation failed: {e}")
337
+ ```
338
+
339
+ ## πŸ“‹ Complete Workflow
340
+
341
+ 1. **Set up GitHub repository secrets** with your HF_TOKEN
342
+ 2. **Verify authentication** using the test script
343
+ 3. **Run your training** with the updated training manager
344
+ 4. **Monitor upload progress** in the Space logs
345
+ 5. **Verify the model** appears on Hugging Face Hub
346
+
347
+ ## 🎯 Expected Results
348
+
349
+ After successful setup, you should see:
350
+
351
+ ```
352
+ βœ… Running in Hugging Face Space environment
353
+ βœ… HF_TOKEN found: hf_xxxx...xxxx
354
+ - Source: GitHub secrets
355
+ βœ… Authentication successful!
356
+ - Username: lemms
357
+ βœ… API access working
358
+
359
+ πŸ§ͺ Testing Repository Creation
360
+ πŸ”„ Creating test repository: lemms/test-openllm-verification
361
+ βœ… Repository created successfully
362
+ πŸ”„ Cleaning up test repository...
363
+ βœ… Repository deleted
364
+
365
+ πŸŽ‰ All verification tests passed!
366
+ - Authentication: βœ… Working
367
+ - Repository Creation: βœ… Working
368
+ - GitHub Secrets Integration: βœ… Working
369
+ - Ready for training and model uploads!
370
+
371
+ πŸ“€ Uploading model to lemms/openllm-small-extended-8k
372
+ βœ… Model uploaded successfully!
373
+ - Repository: https://huggingface.co/lemms/openllm-small-extended-8k
374
+ ```
375
+
376
+ Your model will then be available at: `https://huggingface.co/lemms/openllm-small-extended-8k`
377
+
378
+ ## πŸ”’ Security Notes
379
+
380
+ - **Token Security**: The HF_TOKEN is stored securely in GitHub repository secrets
381
+ - **Repository Access**: Only you can access your model repositories
382
+ - **Cleanup**: Test repositories are automatically deleted after testing
383
+ - **Monitoring**: Check Space logs for any authentication issues
384
+ - **GitHub Integration**: Secrets are automatically available in connected Spaces
385
+
386
+ ## πŸš€ Benefits of GitHub Secrets
387
+
388
+ 1. **Centralized Management**: All secrets managed in one place
389
+ 2. **Automatic Access**: Spaces automatically have access to repository secrets
390
+ 3. **Version Control**: Secrets are tied to your repository
391
+ 4. **Security**: GitHub provides secure secret management
392
+ 5. **Easy Updates**: Update secrets without touching Space settings
393
+
394
+ ---
395
+
396
+ **Next Steps**: Once you've set up the GitHub repository secrets, you can re-run your training and the model upload should work correctly!