scthornton commited on
Commit
523eb95
·
verified ·
1 Parent(s): b816f35

Training complete: Granite 20B SecureCode (3 epochs, loss 1.639)

Browse files
README.md CHANGED
@@ -1,732 +1,60 @@
1
- # IBM Granite 20B Code - SecureCode Edition
2
-
3
- <div align="center">
4
-
5
- [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
6
- [![Training Dataset](https://img.shields.io/badge/dataset-SecureCode%20v2.0-green.svg)](https://huggingface.co/datasets/scthornton/securecode-v2)
7
- [![Base Model](https://img.shields.io/badge/base-Granite%2020B%20Code-orange.svg)](https://huggingface.co/ibm-granite/granite-20b-code-instruct-8k)
8
- [![perfecXion.ai](https://img.shields.io/badge/by-perfecXion.ai-purple.svg)](https://perfecxion.ai)
9
-
10
- **🏢 Enterprise-scale security intelligence with IBM trust**
11
-
12
- The most powerful model in the SecureCode collection. When you need maximum code understanding, complex reasoning, and IBM's enterprise-grade reliability.
13
-
14
- [🤗 Model Hub](https://huggingface.co/scthornton/granite-20b-code-securecode) | [📊 Dataset](https://huggingface.co/datasets/scthornton/securecode-v2) | [💻 perfecXion.ai](https://perfecxion.ai) | [📚 Collection](https://huggingface.co/collections/scthornton/securecode)
15
-
16
- </div>
17
-
18
- ---
19
-
20
- ## 🎯 Quick Decision Guide
21
-
22
- **Choose This Model If:**
23
- - ✅ You need **maximum code understanding** and security reasoning capability
24
- - ✅ You're analyzing **complex enterprise architectures** with intricate attack surfaces
25
- - ✅ You require **IBM enterprise trust** and brand recognition
26
- - ✅ You have **datacenter infrastructure** (48GB+ GPU)
27
- - ✅ You're conducting **professional security audits** requiring comprehensive analysis
28
- - ✅ You need the **most sophisticated** security intelligence in the collection
29
-
30
- **Consider Smaller Models If:**
31
- - ⚠️ You're on consumer hardware (→ Llama 3B, Qwen 7B)
32
- - ⚠️ You prioritize inference speed over depth (→ Qwen 7B/14B)
33
- - ⚠️ You're building IDE tools needing fast response (→ Llama 3B, DeepSeek 6.7B)
34
- - ⚠️ Budget is primary concern (→ any 7B/13B model)
35
-
36
- ---
37
-
38
- ## 📊 Collection Positioning
39
-
40
- | Model | Size | Best For | Hardware | Inference Speed | Unique Strength |
41
- |-------|------|----------|----------|-----------------|-----------------|
42
- | Llama 3.2 3B | 3B | Consumer deployment | 8GB RAM | ⚡⚡⚡ Fastest | Most accessible |
43
- | DeepSeek 6.7B | 6.7B | Security-optimized baseline | 16GB RAM | ⚡⚡ Fast | Security architecture |
44
- | Qwen 7B | 7B | Best code understanding | 16GB RAM | ⚡⚡ Fast | Best-in-class 7B |
45
- | CodeGemma 7B | 7B | Google ecosystem | 16GB RAM | ⚡⚡ Fast | Instruction following |
46
- | CodeLlama 13B | 13B | Enterprise trust | 24GB RAM | ⚡ Medium | Meta brand, proven |
47
- | Qwen 14B | 14B | Advanced analysis | 32GB RAM | ⚡ Medium | 128K context window |
48
- | StarCoder2 15B | 15B | Multi-language specialist | 32GB RAM | ⚡ Medium | 600+ languages |
49
- | **Granite 20B** | **20B** | **Enterprise-scale** | **48GB RAM** | **Medium** | **IBM trust, largest, most capable** |
50
-
51
- **This Model's Position:** The flagship. Maximum security intelligence, enterprise-grade reliability, IBM brand trust. For when quality matters more than speed.
52
-
53
- ---
54
-
55
- ## 🚨 The Problem This Solves
56
-
57
- **Critical enterprise security gaps require sophisticated analysis.** When a breach costs **$4.45 million on average** (IBM 2024 Cost of Data Breach Report) and 45% of AI-generated code contains vulnerabilities, enterprises need the most capable security analysis available.
58
-
59
- **Real-world enterprise impact:**
60
- - **Equifax** (SQL injection): $425 million settlement + 13-year brand recovery
61
- - **Capital One** (SSRF): 100 million customer records, $80M fine, 2 years of remediation
62
- - **SolarWinds** (supply chain): 18,000 organizations compromised, $18M settlement
63
- - **LastPass** (cryptographic failures): 30M users affected, company reputation destroyed
64
-
65
- **IBM Granite 20B SecureCode Edition** provides the deepest security analysis available in the open-source ecosystem, backed by IBM's enterprise heritage and trust.
66
-
67
- ---
68
-
69
- ## 💡 What is This?
70
-
71
- This is **IBM Granite 20B Code Instruct** fine-tuned on the **SecureCode v2.0 dataset** - IBM's enterprise-grade code model enhanced with production-grade security expertise covering the complete OWASP Top 10:2025.
72
-
73
- IBM Granite models are built on IBM's 40+ years of enterprise software experience, trained on **3.5+ trillion tokens** of code and technical data, with a focus on enterprise deployment reliability.
74
-
75
- Combined with SecureCode training, this model delivers:
76
-
77
- ✅ **Maximum security intelligence** - 20B parameters for deep, nuanced analysis
78
- ✅ **Enterprise-grade reliability** - IBM's proven track record and support ecosystem
79
- ✅ **Comprehensive vulnerability detection** across complex architectures
80
- ✅ **Production-ready trust** - Permissive Apache 2.0 license
81
- ✅ **Advanced reasoning** - Handles multi-layered attack chain analysis
82
-
83
- **The Result:** The most capable security-aware code model in the open-source ecosystem.
84
-
85
- **Why IBM Granite 20B?** This model is the enterprise choice:
86
- - 🏢 **IBM enterprise heritage** - 40+ years of enterprise software leadership
87
- - 🔐 **Largest in collection** - 20B parameters = maximum reasoning capability
88
- - 📋 **Enterprise compliance ready** - Designed for regulated industries
89
- - ⚖️ **Apache 2.0 licensed** - Full commercial freedom
90
- - 🎯 **Security-first training** - Built for mission-critical applications
91
- - 🌍 **Broad language support** - 116+ programming languages
92
-
93
- Perfect for Fortune 500 companies, financial services, healthcare, government, and any organization where security analysis quality is paramount.
94
-
95
- ---
96
-
97
- ## 🔐 Security Training Coverage
98
-
99
- ### Real-World Vulnerability Distribution
100
-
101
- Trained on 1,209 security examples with real CVE grounding:
102
-
103
- | OWASP Category | Examples | Real Incidents |
104
- |----------------|----------|----------------|
105
- | **Broken Access Control** | 224 | Equifax, Facebook, Uber |
106
- | **Authentication Failures** | 199 | SolarWinds, Okta, LastPass |
107
- | **Injection Attacks** | 125 | Capital One, Yahoo, LinkedIn |
108
- | **Cryptographic Failures** | 115 | LastPass, Adobe, Dropbox |
109
- | **Security Misconfiguration** | 98 | Tesla, MongoDB, Elasticsearch |
110
- | **Vulnerable Components** | 87 | Log4Shell, Heartbleed, Struts |
111
- | **Identification/Auth Failures** | 84 | Twitter, GitHub, Reddit |
112
- | **Software/Data Integrity** | 78 | SolarWinds, Codecov, npm |
113
- | **Logging Failures** | 71 | Various incident responses |
114
- | **SSRF** | 69 | Capital One, Shopify |
115
- | **Insecure Design** | 59 | Architectural flaws |
116
-
117
- ### Enterprise-Grade Multi-Language Support
118
-
119
- Fine-tuned on security examples across:
120
- - **Python** (Django, Flask, FastAPI) - 280 examples
121
- - **JavaScript/TypeScript** (Express, NestJS, React) - 245 examples
122
- - **Java** (Spring Boot, Jakarta EE) - 178 examples
123
- - **Go** (Gin, Echo, standard library) - 145 examples
124
- - **PHP** (Laravel, Symfony) - 112 examples
125
- - **C#** (ASP.NET Core, .NET 6+) - 89 examples
126
- - **Ruby** (Rails, Sinatra) - 67 examples
127
- - **Rust** (Actix, Rocket, Axum) - 45 examples
128
- - **C/C++** (Memory safety patterns) - 28 examples
129
- - **Plus 107+ additional languages from Granite's base training**
130
-
131
- ---
132
-
133
- ## 🎯 Deployment Scenarios
134
-
135
- ### Scenario 1: Enterprise Security Audit Platform
136
-
137
- **Professional security assessments for Fortune 500 clients.**
138
-
139
- **Hardware:** Datacenter GPU (A100 80GB or 2x A100 40GB)
140
- **Throughput:** 10-15 comprehensive audits/day
141
- **Use Case:** Professional security consulting
142
-
143
- **Value Proposition:**
144
- - Identify vulnerabilities human auditors miss
145
- - Consistent, comprehensive OWASP coverage
146
- - Scales expert security knowledge
147
- - Reduces audit time by 60-70%
148
-
149
- **ROI:** A single prevented breach pays for years of infrastructure. Typical large enterprise security audit costs $150K-500K. This model can handle preliminary analysis, allowing human experts to focus on novel vulnerabilities and strategic recommendations.
150
-
151
- ---
152
-
153
- ### Scenario 2: Financial Services Security Platform
154
-
155
- **Regulatory compliance and security for banking applications.**
156
-
157
- **Hardware:** Private cloud A100 cluster
158
- **Compliance:** SOC 2, PCI-DSS, GDPR, CCPA
159
- **Use Case:** Pre-deployment security validation
160
-
161
- **Regulatory Benefits:**
162
- - Automated OWASP Top 10 verification
163
- - Audit trail generation
164
- - Compliance report automation
165
- - Reduces regulatory risk
166
-
167
- **ROI:** Regulatory fines cost millions. **Capital One:** $80M fine. **Equifax:** $425M settlement. Preventing one major breach justifies entire deployment.
168
-
169
- ---
170
-
171
- ### Scenario 3: Healthcare Application Security
172
-
173
- **HIPAA-compliant code review for medical systems.**
174
-
175
- **Hardware:** Secure private deployment
176
- **Compliance:** HIPAA, HITECH, FDA software validation
177
- **Use Case:** Medical device and EHR security
178
-
179
- **Critical Healthcare Requirements:**
180
- - Patient data protection (HIPAA)
181
- - Audit logging and compliance
182
- - Cryptographic requirements
183
- - Access control verification
184
-
185
- **Impact:** Healthcare breaches average **$10.93 million per incident** (IBM 2024). Single prevented breach pays for multi-year deployment.
186
-
187
- ---
188
-
189
- ### Scenario 4: Government & Defense Applications
190
-
191
- **Security analysis for critical infrastructure.**
192
-
193
- **Hardware:** Air-gapped secure environment
194
- **Clearance:** Can be deployed in classified environments
195
- **Use Case:** Critical infrastructure security
196
-
197
- **Government Benefits:**
198
- - No external dependencies (fully local)
199
- - Apache 2.0 license (government-friendly)
200
- - IBM enterprise support available
201
- - Meets government security standards
202
-
203
- ---
204
-
205
- ## 📊 Training Details
206
-
207
- | Parameter | Value | Why This Matters |
208
- |-----------|-------|------------------|
209
- | **Base Model** | ibm-granite/granite-20b-code-instruct-8k | IBM's enterprise-grade foundation |
210
- | **Fine-tuning Method** | LoRA (Low-Rank Adaptation) | Efficient training, preserves base capabilities |
211
- | **Training Dataset** | [SecureCode v2.0](https://huggingface.co/datasets/scthornton/securecode-v2) | 100% incident-grounded, expert-validated |
212
- | **Dataset Size** | 841 training examples | Focused on quality over quantity |
213
- | **Training Epochs** | 3 | Optimal convergence without overfitting |
214
- | **LoRA Rank (r)** | 16 | Balanced parameter efficiency |
215
- | **LoRA Alpha** | 32 | Learning rate scaling factor |
216
- | **Learning Rate** | 2e-4 | Standard for LoRA fine-tuning |
217
- | **Quantization** | 4-bit (bitsandbytes) | Enables efficient training |
218
- | **Trainable Parameters** | ~105M (0.525% of 20B total) | Minimal parameters, maximum impact |
219
- | **Total Parameters** | 20B | Maximum reasoning capability |
220
- | **Context Window** | 8K tokens | Enterprise file analysis |
221
- | **GPU Used** | NVIDIA A100 40GB | Enterprise training infrastructure |
222
- | **Training Time** | ~12-14 hours (estimated) | Deep security learning |
223
-
224
- ### Training Methodology
225
-
226
- **LoRA (Low-Rank Adaptation)** was chosen for enterprise reliability:
227
- 1. **Efficiency:** Trains only 0.525% of model parameters (105M vs 20B)
228
- 2. **Quality:** Preserves IBM Granite's enterprise capabilities
229
- 3. **Deployability:** Can be deployed alongside base model for versioning
230
-
231
- **4-bit Quantization** enables efficient training while maintaining enterprise-grade quality.
232
-
233
- **IBM Granite Foundation:** Built on IBM's 40+ years of enterprise software experience, optimized for:
234
- - Reliability and consistency
235
- - Enterprise deployment patterns
236
- - Regulatory compliance requirements
237
- - Long-term support and stability
238
-
239
- ---
240
-
241
- ## 🚀 Usage
242
-
243
- ### Quick Start
244
-
245
- ```python
246
- from transformers import AutoModelForCausalLM, AutoTokenizer
247
- from peft import PeftModel
248
-
249
- # Load IBM Granite base model
250
- base_model = "ibm-granite/granite-20b-code-instruct-8k"
251
- model = AutoModelForCausalLM.from_pretrained(
252
- base_model,
253
- device_map="auto",
254
- torch_dtype="auto",
255
- trust_remote_code=True
256
- )
257
- tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
258
-
259
- # Load SecureCode LoRA adapter
260
- model = PeftModel.from_pretrained(model, "scthornton/granite-20b-code-securecode")
261
-
262
- # Enterprise security analysis
263
- prompt = """### User:
264
- Conduct a comprehensive security audit of this enterprise authentication system. Analyze for:
265
- 1. OWASP Top 10 vulnerabilities
266
- 2. Attack chain opportunities
267
- 3. Compliance gaps (SOC 2, PCI-DSS)
268
- 4. Architectural weaknesses
269
-
270
- ```python
271
- # Enterprise SSO Implementation
272
- class EnterpriseAuthService:
273
- def __init__(self):
274
- self.secret = os.getenv('JWT_SECRET')
275
- self.db = DatabasePool()
276
-
277
- async def authenticate(self, credentials):
278
- user = await self.db.query(
279
- f"SELECT * FROM users WHERE email='{credentials.email}' AND password='{credentials.password}'"
280
- )
281
- if user:
282
- token = jwt.encode({'user_id': user.id}, self.secret)
283
- return {'token': token, 'success': True}
284
- return {'success': False}
285
-
286
- async def verify_token(self, token):
287
- try:
288
- payload = jwt.decode(token, self.secret, algorithms=['HS256'])
289
- return payload
290
- except:
291
- return None
292
- ```
293
-
294
- ### Assistant:
295
- """
296
-
297
- inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
298
- outputs = model.generate(
299
- **inputs,
300
- max_new_tokens=4096,
301
- temperature=0.2, # Lower temperature for precise enterprise analysis
302
- top_p=0.95,
303
- do_sample=True
304
- )
305
-
306
- response = tokenizer.decode(outputs[0], skip_special_tokens=True)
307
- print(response)
308
- ```
309
-
310
- ---
311
-
312
- ### Enterprise Deployment (4-bit Quantization)
313
-
314
- ```python
315
- from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
316
- from peft import PeftModel
317
-
318
- # 4-bit quantization - runs on 40GB GPU
319
- bnb_config = BitsAndBytesConfig(
320
- load_in_4bit=True,
321
- bnb_4bit_use_double_quant=True,
322
- bnb_4bit_quant_type="nf4",
323
- bnb_4bit_compute_dtype="bfloat16"
324
- )
325
-
326
- model = AutoModelForCausalLM.from_pretrained(
327
- "ibm-granite/granite-20b-code-instruct-8k",
328
- quantization_config=bnb_config,
329
- device_map="auto",
330
- trust_remote_code=True
331
- )
332
-
333
- model = PeftModel.from_pretrained(model, "scthornton/granite-20b-code-securecode")
334
- tokenizer = AutoTokenizer.from_pretrained("ibm-granite/granite-20b-code-instruct-8k", trust_remote_code=True)
335
-
336
- # Enterprise-ready: Runs on A100 40GB, A100 80GB, or 2x RTX 4090
337
- ```
338
-
339
- ---
340
-
341
- ### Multi-GPU Deployment (Maximum Performance)
342
-
343
- ```python
344
- from transformers import AutoModelForCausalLM, AutoTokenizer
345
- from peft import PeftModel
346
- import torch
347
-
348
- # Load across multiple GPUs for maximum throughput
349
- model = AutoModelForCausalLM.from_pretrained(
350
- "ibm-granite/granite-20b-code-instruct-8k",
351
- device_map="balanced", # Distribute across available GPUs
352
- torch_dtype=torch.bfloat16,
353
- trust_remote_code=True
354
- )
355
-
356
- model = PeftModel.from_pretrained(model, "scthornton/granite-20b-code-securecode")
357
- tokenizer = AutoTokenizer.from_pretrained("ibm-granite/granite-20b-code-instruct-8k", trust_remote_code=True)
358
-
359
- # Optimal for: 2x A100, 4x RTX 4090, or enterprise GPU clusters
360
- # Throughput: 2-3x faster than single GPU
361
- ```
362
-
363
- ---
364
-
365
- ## 📈 Performance & Benchmarks
366
-
367
- ### Hardware Requirements
368
-
369
- | Deployment | RAM | GPU VRAM | Tokens/Second | Latency (4K response) | Cost/Month |
370
- |-----------|-----|----------|---------------|----------------------|------------|
371
- | **4-bit Quantized** | 40GB | 32GB | ~35 tok/s | ~115 seconds | $0 (on-prem) or $800-1200 (cloud) |
372
- | **8-bit Quantized** | 64GB | 48GB | ~45 tok/s | ~90 seconds | $0 (on-prem) or $1200-1800 (cloud) |
373
- | **Full Precision (bf16)** | 96GB | 80GB | ~60 tok/s | ~67 seconds | $0 (on-prem) or $2000-3000 (cloud) |
374
- | **Multi-GPU (2x A100)** | 128GB | 160GB | ~120 tok/s | ~33 seconds | Enterprise only |
375
-
376
- ### Real-World Performance
377
-
378
- **Tested on A100 40GB** (enterprise GPU):
379
- - **Tokens/second:** ~35 tok/s (4-bit), ~55 tok/s (full precision)
380
- - **Cold start:** ~8 seconds
381
- - **Memory usage:** 28GB (4-bit), 42GB (full precision)
382
- - **Throughput:** 200-300 comprehensive analyses per day
383
-
384
- **Tested on 2x A100 80GB** (multi-GPU):
385
- - **Tokens/second:** ~110-120 tok/s
386
- - **Cold start:** ~6 seconds
387
- - **Throughput:** 500+ analyses per day
388
-
389
- ### Security Analysis Quality
390
-
391
- **The differentiator:** Granite 20B provides the deepest, most nuanced security analysis:
392
- - Identifies **15-25% more vulnerabilities** than 7B models in complex code
393
- - Detects **multi-step attack chains** that smaller models miss
394
- - Provides **enterprise-grade operational guidance** with compliance mapping
395
- - **Reduces false positives** through sophisticated reasoning
396
-
397
  ---
398
-
399
- ## 💰 Cost Analysis
400
-
401
- ### Total Cost of Ownership (TCO) - 1 Year
402
-
403
- **Option 1: On-Premise (Dedicated Server)**
404
- - Hardware: 2x A100 40GB - $20,000 (one-time capital expense)
405
- - Server infrastructure: $5,000
406
- - Electricity: ~$2,400/year
407
- - **Total Year 1:** $27,400
408
- - **Total Year 2+:** $2,400/year
409
-
410
- **Option 2: Cloud GPU (AWS/GCP/Azure)**
411
- - Instance: A100 40GB (p4d.xlarge)
412
- - Cost: ~$3.50/hour
413
- - Usage: 160 hours/month (enterprise team)
414
- - **Total Year 1:** $6,720/year
415
-
416
- **Option 3: Enterprise GPT-4 (for comparison)**
417
- - Cost: $30/1M input tokens, $60/1M output tokens
418
- - Usage: 500M input + 500M output tokens/year
419
- - **Total Year 1:** $45,000/year
420
-
421
- **Option 4: Professional Security Audits (for comparison)**
422
- - Average enterprise security audit: $150,000-500,000
423
- - Frequency: Quarterly (4x/year)
424
- - **Total Year 1:** $600,000-2,000,000
425
-
426
- **ROI Winner:** On-premise deployment pays for itself with **1-2 prevented security audits** or **preventing a single breach** (average cost: $4.45M).
427
-
428
  ---
429
 
430
- ## 🎯 Use Cases & Examples
 
431
 
432
- ### 1. Enterprise Security Architecture Review
433
 
434
- Analyze complex microservices platforms:
435
 
436
- ```python
437
- prompt = """### User:
438
- Conduct a comprehensive security architecture review of this fintech payment platform. Analyze:
439
- 1. Service-to-service authentication security
440
- 2. Data flow security boundaries
441
- 3. Compliance with PCI-DSS requirements
442
- 4. Attack surface analysis
443
- 5. Defense-in-depth gaps
444
 
445
- [Include microservices code across auth-service, payment-service, notification-service]
446
 
447
- ### Assistant:
448
- """
449
- ```
450
-
451
- **Model Response:** Provides 20-30 page comprehensive analysis with specific vulnerability findings, attack chain scenarios, compliance gaps, and remediation priorities.
452
-
453
- ---
454
 
455
- ### 2. Regulatory Compliance Validation
456
 
457
- Validate code against regulatory requirements:
458
 
459
- ```python
460
- prompt = """### User:
461
- Analyze this healthcare EHR system for HIPAA compliance. Verify:
462
- 1. Patient data encryption (at rest and in transit)
463
- 2. Access control and audit logging
464
- 3. Data retention policies
465
- 4. Breach notification capabilities
466
- 5. Business Associate Agreement requirements
467
 
468
- [Include EHR codebase]
469
 
470
- ### Assistant:
471
- """
472
- ```
473
 
474
- **Model Response:** Detailed compliance mapping, gap analysis, and remediation roadmap.
475
-
476
- ---
477
-
478
- ### 3. Supply Chain Security Analysis
479
-
480
- Analyze third-party dependencies and integrations:
481
-
482
- ```python
483
- prompt = """### User:
484
- Perform a supply chain security analysis of this application:
485
- 1. Third-party library vulnerabilities
486
- 2. Dependency confusion risks
487
- 3. Code injection via dependencies
488
- 4. Malicious package detection
489
- 5. License compliance issues
490
-
491
- [Include package.json, requirements.txt, go.mod]
492
-
493
- ### Assistant:
494
- """
495
- ```
496
-
497
- **Model Response:** Comprehensive supply chain risk assessment with mitigation strategies.
498
-
499
- ---
500
-
501
- ### 4. Advanced Penetration Testing Guidance
502
-
503
- Develop sophisticated attack scenarios:
504
-
505
- ```python
506
- prompt = """### User:
507
- Design a comprehensive penetration testing strategy for this enterprise web application. Include:
508
- 1. Attack surface enumeration
509
- 2. Vulnerability prioritization
510
- 3. Multi-stage attack chains
511
- 4. Privilege escalation paths
512
- 5. Data exfiltration scenarios
513
- 6. Post-exploitation persistence
514
-
515
- ### Assistant:
516
- """
517
- ```
518
-
519
- **Model Response:** Professional pentesting methodology with specific attack vectors and validation procedures.
520
-
521
- ---
522
-
523
- ## ⚠️ Limitations & Transparency
524
-
525
- ### What This Model Does Well
526
- ✅ Maximum code understanding and security reasoning
527
- ✅ Complex attack chain analysis and enterprise architecture review
528
- ✅ Detailed operational guidance and compliance mapping
529
- ✅ Sophisticated multi-layered vulnerability detection
530
- ✅ Enterprise-scale codebase analysis
531
- ✅ IBM enterprise trust and reliability
532
-
533
- ### What This Model Doesn't Do
534
- ❌ **Not a security scanner** - Use tools like Semgrep, CodeQL, Snyk, or Veracode
535
- ❌ **Not a penetration testing tool** - Cannot perform active exploitation or network scanning
536
- ❌ **Not legal/compliance advice** - Consult security and legal professionals
537
- ❌ **Not a replacement for security experts** - Critical systems need professional security review and audits
538
- ❌ **Not real-time threat intelligence** - Training data frozen at Dec 2024
539
-
540
- ### Known Issues & Constraints
541
- - **Inference latency:** Larger model means slower responses (35-60 tok/s vs 100+ tok/s for smaller models)
542
- - **Hardware requirements:** Requires enterprise GPU infrastructure (40GB+ VRAM)
543
- - **Detailed analysis:** May generate very comprehensive responses (3000-4000 tokens)
544
- - **Cost consideration:** Higher deployment cost than smaller models
545
- - **Context window:** 8K tokens (vs 128K for Qwen models)
546
-
547
- ### Appropriate Use
548
- ✅ Enterprise security audits and professional assessments
549
- ✅ Regulatory compliance validation
550
- ✅ Critical infrastructure security review
551
- ✅ Financial services and healthcare applications
552
- ✅ Government and defense security analysis
553
-
554
- ### Inappropriate Use
555
- ❌ Sole validation for production deployments (use comprehensive testing)
556
- ❌ Replacement for professional security audits
557
- ❌ Active exploitation or penetration testing without authorization
558
- ❌ Consumer applications (too large, use smaller models)
559
-
560
- ---
561
 
562
- ## 🔬 Dataset Information
563
 
564
- This model was trained on **[SecureCode v2.0](https://huggingface.co/datasets/scthornton/securecode-v2)**, a production-grade security dataset with:
565
 
566
- - **1,209 total examples** (841 train / 175 validation / 193 test)
567
- - **100% incident grounding** - every example tied to real CVEs or security breaches
568
- - **11 vulnerability categories** - complete OWASP Top 10:2025 coverage
569
- - **11 programming languages** - from Python to Rust
570
- - **4-turn conversational structure** - mirrors real developer-AI workflows
571
- - **100% expert validation** - reviewed by independent security professionals
572
-
573
- See the [full dataset card](https://huggingface.co/datasets/scthornton/securecode-v2) and [research paper](https://perfecxion.ai/articles/securecode-v2-dataset-paper.html) for complete details.
574
-
575
- ---
576
-
577
- ## 🏢 About perfecXion.ai
578
-
579
- [perfecXion.ai](https://perfecxion.ai) is dedicated to advancing AI security through research, datasets, and production-grade security tooling.
580
-
581
- **Connect:**
582
- - Website: [perfecxion.ai](https://perfecxion.ai)
583
- - Research: [perfecxion.ai/research](https://perfecxion.ai/research)
584
- - Knowledge Hub: [perfecxion.ai/knowledge](https://perfecxion.ai/knowledge)
585
- - GitHub: [@scthornton](https://github.com/scthornton)
586
- - HuggingFace: [@scthornton](https://huggingface.co/scthornton)
587
- - Email: scott@perfecxion.ai
588
-
589
- ---
590
-
591
- ## 📄 License
592
-
593
- **Model License:** Apache 2.0 (permissive - use in commercial applications)
594
- **Dataset License:** CC BY-NC-SA 4.0 (non-commercial with attribution)
595
-
596
- ### What You CAN Do
597
- ✅ Use this model commercially in production applications
598
- ✅ Fine-tune further for your specific use case
599
- ✅ Deploy in enterprise environments
600
- ✅ Integrate into commercial products
601
- ✅ Distribute and modify the model weights
602
- ✅ Charge for services built on this model
603
- ✅ Use in government and regulated industries
604
-
605
- ### What You CANNOT Do with the Dataset
606
- ❌ Sell or redistribute the raw SecureCode v2.0 dataset commercially
607
- ❌ Use the dataset to train commercial models without releasing under the same license
608
- ❌ Remove attribution or claim ownership of the dataset
609
-
610
- For commercial dataset licensing or custom training, contact: scott@perfecxion.ai
611
-
612
- ---
613
-
614
- ## 📚 Citation
615
-
616
- If you use this model in your research or applications, please cite:
617
-
618
- ```bibtex
619
- @misc{thornton2025securecode-granite20b,
620
- title={IBM Granite 20B Code - SecureCode Edition},
621
- author={Thornton, Scott},
622
- year={2025},
623
- publisher={perfecXion.ai},
624
- url={https://huggingface.co/scthornton/granite-20b-code-securecode},
625
- note={Fine-tuned on SecureCode v2.0: https://huggingface.co/datasets/scthornton/securecode-v2}
626
- }
627
-
628
- @misc{thornton2025securecode-dataset,
629
- title={SecureCode v2.0: A Production-Grade Dataset for Training Security-Aware Code Generation Models},
630
- author={Thornton, Scott},
631
- year={2025},
632
- month={January},
633
- publisher={perfecXion.ai},
634
- url={https://perfecxion.ai/articles/securecode-v2-dataset-paper.html},
635
- note={Dataset: https://huggingface.co/datasets/scthornton/securecode-v2}
636
- }
637
- ```
638
-
639
- ---
640
-
641
- ## 🙏 Acknowledgments
642
-
643
- - **IBM Research** for the exceptional Granite code models and enterprise commitment
644
- - **OWASP Foundation** for maintaining the Top 10 vulnerability taxonomy
645
- - **MITRE Corporation** for the CVE database and vulnerability research
646
- - **Security research community** for responsible disclosure practices
647
- - **Hugging Face** for model hosting and inference infrastructure
648
- - **Enterprise security teams** who validated this model in production environments
649
-
650
- ---
651
-
652
- ## 🤝 Contributing
653
-
654
- Found a security issue or have suggestions for improvement?
655
-
656
- - 🐛 **Report issues:** [GitHub Issues](https://github.com/scthornton/securecode-models/issues)
657
- - 💬 **Discuss improvements:** [HuggingFace Discussions](https://huggingface.co/scthornton/granite-20b-code-securecode/discussions)
658
- - 📧 **Contact:** scott@perfecxion.ai
659
-
660
- ### Community Contributions Welcome
661
-
662
- Especially interested in:
663
- - **Enterprise deployment case studies**
664
- - **Benchmark evaluations** on industry security datasets
665
- - **Compliance validation** (PCI-DSS, HIPAA, SOC 2)
666
- - **Performance optimization** for specific enterprise hardware
667
- - **Integration examples** with enterprise security platforms
668
-
669
- ---
670
-
671
- ## 🔗 SecureCode Model Collection
672
-
673
- Explore other SecureCode fine-tuned models optimized for different use cases:
674
-
675
- ### Entry-Level Models (3-7B)
676
- - **[llama-3.2-3b-securecode](https://huggingface.co/scthornton/llama-3.2-3b-securecode)**
677
- - **Best for:** Consumer hardware, IDE integration, education
678
- - **Hardware:** 8GB RAM minimum
679
- - **Unique strength:** Most accessible
680
-
681
- - **[deepseek-coder-6.7b-securecode](https://huggingface.co/scthornton/deepseek-coder-6.7b-securecode)**
682
- - **Best for:** Security-optimized baseline
683
- - **Hardware:** 16GB RAM
684
- - **Unique strength:** Security-first architecture
685
-
686
- - **[qwen2.5-coder-7b-securecode](https://huggingface.co/scthornton/qwen2.5-coder-7b-securecode)**
687
- - **Best for:** Best code understanding in 7B class
688
- - **Hardware:** 16GB RAM
689
- - **Unique strength:** 128K context, best-in-class
690
-
691
- - **[codegemma-7b-securecode](https://huggingface.co/scthornton/codegemma-7b-securecode)**
692
- - **Best for:** Google ecosystem, instruction following
693
- - **Hardware:** 16GB RAM
694
- - **Unique strength:** Google brand, strong completion
695
-
696
- ### Mid-Range Models (13-15B)
697
- - **[codellama-13b-securecode](https://huggingface.co/scthornton/codellama-13b-securecode)**
698
- - **Best for:** Enterprise trust, Meta brand
699
- - **Hardware:** 24GB RAM
700
- - **Unique strength:** Proven track record
701
-
702
- - **[qwen2.5-coder-14b-securecode](https://huggingface.co/scthornton/qwen2.5-coder-14b-securecode)**
703
- - **Best for:** Advanced code analysis
704
- - **Hardware:** 32GB RAM
705
- - **Unique strength:** 128K context window
706
-
707
- - **[starcoder2-15b-securecode](https://huggingface.co/scthornton/starcoder2-15b-securecode)**
708
- - **Best for:** Multi-language projects (600+ languages)
709
- - **Hardware:** 32GB RAM
710
- - **Unique strength:** Broadest language support
711
-
712
- ### Enterprise-Scale Models (20B+)
713
- - **[granite-20b-code-securecode](https://huggingface.co/scthornton/granite-20b-code-securecode)** ⭐ (YOU ARE HERE)
714
- - **Best for:** Enterprise-scale, IBM trust, maximum capability
715
- - **Hardware:** 48GB RAM
716
- - **Unique strength:** Largest model, deepest analysis
717
-
718
- **View Complete Collection:** [SecureCode Models](https://huggingface.co/collections/scthornton/securecode)
719
-
720
- ---
721
-
722
- <div align="center">
723
-
724
- **Built with ❤️ for secure enterprise software**
725
-
726
- [perfecXion.ai](https://perfecxion.ai) | [Research](https://perfecxion.ai/research) | [Knowledge Hub](https://perfecxion.ai/knowledge) | [Contact](mailto:scott@perfecxion.ai)
727
-
728
- ---
729
 
730
- *Maximum security intelligence. Enterprise trust. IBM heritage.*
731
 
732
- </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: peft
3
+ license: apache-2.0
4
+ base_model: ibm-granite/granite-20b-code-instruct-8k
5
+ tags:
6
+ - base_model:adapter:ibm-granite/granite-20b-code-instruct-8k
7
+ - lora
8
+ - transformers
9
+ pipeline_tag: text-generation
10
+ model-index:
11
+ - name: granite-20b-code-securecode
12
+ results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
+ should probably proofread and complete it, then remove this comment. -->
17
 
18
+ # granite-20b-code-securecode
19
 
20
+ This model is a fine-tuned version of [ibm-granite/granite-20b-code-instruct-8k](https://huggingface.co/ibm-granite/granite-20b-code-instruct-8k) on the None dataset.
21
 
22
+ ## Model description
 
 
 
 
 
 
 
23
 
24
+ More information needed
25
 
26
+ ## Intended uses & limitations
 
 
 
 
 
 
27
 
28
+ More information needed
29
 
30
+ ## Training and evaluation data
31
 
32
+ More information needed
 
 
 
 
 
 
 
33
 
34
+ ## Training procedure
35
 
36
+ ### Training hyperparameters
 
 
37
 
38
+ The following hyperparameters were used during training:
39
+ - learning_rate: 0.0002
40
+ - train_batch_size: 1
41
+ - eval_batch_size: 8
42
+ - seed: 42
43
+ - gradient_accumulation_steps: 16
44
+ - total_train_batch_size: 16
45
+ - optimizer: Use paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
46
+ - lr_scheduler_type: cosine
47
+ - lr_scheduler_warmup_steps: 100
48
+ - num_epochs: 3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
+ ### Training results
51
 
 
52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
+ ### Framework versions
55
 
56
+ - PEFT 0.18.1
57
+ - Transformers 4.57.6
58
+ - Pytorch 2.7.1+cu128
59
+ - Datasets 4.5.0
60
+ - Tokenizers 0.22.2
adapter_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "ibm-granite/granite-20b-code-instruct-8k",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 16,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.1",
27
+ "qalora_group_size": 16,
28
+ "r": 8,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "mlp.c_proj",
33
+ "attn.c_attn",
34
+ "attn.c_proj",
35
+ "mlp.c_fc"
36
+ ],
37
+ "target_parameters": null,
38
+ "task_type": "CAUSAL_LM",
39
+ "trainable_token_indices": null,
40
+ "use_dora": false,
41
+ "use_qalora": false,
42
+ "use_rslora": false
43
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a44825e85a4250d4f3f938f4ba8f73e1630bebe8f36eb3395ccc390b9232853
3
+ size 143610512
chat_template.jinja ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {% for message in messages %}
2
+ {% if message['role'] == 'user' %}
3
+ {{ 'Question:
4
+ ' + message['content'] + '
5
+
6
+ ' }}{% elif message['role'] == 'system' %}
7
+ {{ 'System:
8
+ ' + message['content'] + '
9
+
10
+ ' }}{% elif message['role'] == 'assistant' %}{{ 'Answer:
11
+ ' + message['content'] + '
12
+
13
+ ' }}{% endif %}
14
+ {% if loop.last and add_generation_prompt %}
15
+ {{ 'Answer:
16
+ ' }}{% endif %}{% endfor %}
checkpoint-159/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: ibm-granite/granite-20b-code-instruct-8k
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:ibm-granite/granite-20b-code-instruct-8k
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.1
checkpoint-159/adapter_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "ibm-granite/granite-20b-code-instruct-8k",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 16,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.1",
27
+ "qalora_group_size": 16,
28
+ "r": 8,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "mlp.c_proj",
33
+ "attn.c_attn",
34
+ "attn.c_proj",
35
+ "mlp.c_fc"
36
+ ],
37
+ "target_parameters": null,
38
+ "task_type": "CAUSAL_LM",
39
+ "trainable_token_indices": null,
40
+ "use_dora": false,
41
+ "use_qalora": false,
42
+ "use_rslora": false
43
+ }
checkpoint-159/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a44825e85a4250d4f3f938f4ba8f73e1630bebe8f36eb3395ccc390b9232853
3
+ size 143610512
checkpoint-159/chat_template.jinja ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {% for message in messages %}
2
+ {% if message['role'] == 'user' %}
3
+ {{ 'Question:
4
+ ' + message['content'] + '
5
+
6
+ ' }}{% elif message['role'] == 'system' %}
7
+ {{ 'System:
8
+ ' + message['content'] + '
9
+
10
+ ' }}{% elif message['role'] == 'assistant' %}{{ 'Answer:
11
+ ' + message['content'] + '
12
+
13
+ ' }}{% endif %}
14
+ {% if loop.last and add_generation_prompt %}
15
+ {{ 'Answer:
16
+ ' }}{% endif %}{% endfor %}
checkpoint-159/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-159/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3ad37dd1d338c0ec168fceb829e7b90860de022b48008b734a5181e6b2441371
3
+ size 73390503
checkpoint-159/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76124ad57cd60f8765ae756ad7354b12e1b65fc1c51aaedba29b59951d9667d4
3
+ size 14645
checkpoint-159/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b904ecdbc23198523387cccfa8a56e25f6eb34afce6e2c18607f88714579b221
3
+ size 1465
checkpoint-159/special_tokens_map.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|endoftext|>",
4
+ "<fim_prefix>",
5
+ "<fim_middle>",
6
+ "<fim_suffix>",
7
+ "<fim_pad>",
8
+ "<filename>",
9
+ "<gh_stars>",
10
+ "<issue_start>",
11
+ "<issue_comment>",
12
+ "<issue_closed>",
13
+ "<jupyter_start>",
14
+ "<jupyter_text>",
15
+ "<jupyter_code>",
16
+ "<jupyter_output>",
17
+ "<empty_output>",
18
+ "<commit_before>",
19
+ "<commit_msg>",
20
+ "<commit_after>",
21
+ "<reponame>"
22
+ ],
23
+ "bos_token": {
24
+ "content": "<|endoftext|>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "eos_token": {
31
+ "content": "<|endoftext|>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "pad_token": "<|endoftext|>",
38
+ "unk_token": {
39
+ "content": "<|endoftext|>",
40
+ "lstrip": false,
41
+ "normalized": false,
42
+ "rstrip": false,
43
+ "single_word": false
44
+ }
45
+ }
checkpoint-159/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-159/tokenizer_config.json ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<fim_prefix>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "<fim_middle>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<fim_suffix>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "<fim_pad>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "5": {
45
+ "content": "<filename>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "6": {
53
+ "content": "<gh_stars>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "7": {
61
+ "content": "<issue_start>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "8": {
69
+ "content": "<issue_comment>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "9": {
77
+ "content": "<issue_closed>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "10": {
85
+ "content": "<jupyter_start>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "11": {
93
+ "content": "<jupyter_text>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "12": {
101
+ "content": "<jupyter_code>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "13": {
109
+ "content": "<jupyter_output>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "14": {
117
+ "content": "<empty_output>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "15": {
125
+ "content": "<commit_before>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "16": {
133
+ "content": "<commit_msg>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ },
140
+ "17": {
141
+ "content": "<commit_after>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": true
147
+ },
148
+ "18": {
149
+ "content": "<reponame>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": true
155
+ }
156
+ },
157
+ "additional_special_tokens": [
158
+ "<|endoftext|>",
159
+ "<fim_prefix>",
160
+ "<fim_middle>",
161
+ "<fim_suffix>",
162
+ "<fim_pad>",
163
+ "<filename>",
164
+ "<gh_stars>",
165
+ "<issue_start>",
166
+ "<issue_comment>",
167
+ "<issue_closed>",
168
+ "<jupyter_start>",
169
+ "<jupyter_text>",
170
+ "<jupyter_code>",
171
+ "<jupyter_output>",
172
+ "<empty_output>",
173
+ "<commit_before>",
174
+ "<commit_msg>",
175
+ "<commit_after>",
176
+ "<reponame>"
177
+ ],
178
+ "bos_token": "<|endoftext|>",
179
+ "clean_up_tokenization_spaces": true,
180
+ "eos_token": "<|endoftext|>",
181
+ "extra_special_tokens": {},
182
+ "model_max_length": 8192,
183
+ "pad_token": "<|endoftext|>",
184
+ "padding_side": "right",
185
+ "tokenizer_class": "GPT2Tokenizer",
186
+ "unk_token": "<|endoftext|>",
187
+ "vocab_size": 49152
188
+ }
checkpoint-159/trainer_state.json ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 3.0,
6
+ "eval_steps": 500,
7
+ "global_step": 159,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.1902497027348395,
14
+ "grad_norm": 1.049843430519104,
15
+ "learning_rate": 1.8e-05,
16
+ "loss": 3.6094,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.380499405469679,
21
+ "grad_norm": 1.4530141353607178,
22
+ "learning_rate": 3.8e-05,
23
+ "loss": 3.3156,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.5707491082045184,
28
+ "grad_norm": 1.1667392253875732,
29
+ "learning_rate": 5.8e-05,
30
+ "loss": 2.6267,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.760998810939358,
35
+ "grad_norm": 1.4776651859283447,
36
+ "learning_rate": 7.800000000000001e-05,
37
+ "loss": 2.1165,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.9512485136741974,
42
+ "grad_norm": 1.4660998582839966,
43
+ "learning_rate": 9.8e-05,
44
+ "loss": 1.804,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 1.1331747919143877,
49
+ "grad_norm": 1.163648247718811,
50
+ "learning_rate": 0.000118,
51
+ "loss": 1.5531,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 1.323424494649227,
56
+ "grad_norm": 1.323768973350525,
57
+ "learning_rate": 0.000138,
58
+ "loss": 1.4815,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 1.5136741973840666,
63
+ "grad_norm": 1.6716187000274658,
64
+ "learning_rate": 0.00015800000000000002,
65
+ "loss": 1.3993,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 1.7039239001189062,
70
+ "grad_norm": 1.709359049797058,
71
+ "learning_rate": 0.00017800000000000002,
72
+ "loss": 1.3135,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 1.8941736028537455,
77
+ "grad_norm": 1.735700249671936,
78
+ "learning_rate": 0.00019800000000000002,
79
+ "loss": 1.2533,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 2.0760998810939357,
84
+ "grad_norm": 1.7600226402282715,
85
+ "learning_rate": 0.00018873520750565718,
86
+ "loss": 1.1572,
87
+ "step": 110
88
+ },
89
+ {
90
+ "epoch": 2.2663495838287755,
91
+ "grad_norm": 2.0459885597229004,
92
+ "learning_rate": 0.00015304209081197425,
93
+ "loss": 0.9394,
94
+ "step": 120
95
+ },
96
+ {
97
+ "epoch": 2.456599286563615,
98
+ "grad_norm": 1.884280800819397,
99
+ "learning_rate": 0.00010266205214377748,
100
+ "loss": 0.9113,
101
+ "step": 130
102
+ },
103
+ {
104
+ "epoch": 2.646848989298454,
105
+ "grad_norm": 2.249265432357788,
106
+ "learning_rate": 5.1544912966734994e-05,
107
+ "loss": 0.8832,
108
+ "step": 140
109
+ },
110
+ {
111
+ "epoch": 2.837098692033294,
112
+ "grad_norm": 2.0327601432800293,
113
+ "learning_rate": 1.3844591860619383e-05,
114
+ "loss": 0.9004,
115
+ "step": 150
116
+ }
117
+ ],
118
+ "logging_steps": 10,
119
+ "max_steps": 159,
120
+ "num_input_tokens_seen": 0,
121
+ "num_train_epochs": 3,
122
+ "save_steps": 500,
123
+ "stateful_callbacks": {
124
+ "TrainerControl": {
125
+ "args": {
126
+ "should_epoch_stop": false,
127
+ "should_evaluate": false,
128
+ "should_log": false,
129
+ "should_save": true,
130
+ "should_training_stop": true
131
+ },
132
+ "attributes": {}
133
+ }
134
+ },
135
+ "total_flos": 6.123192092794552e+17,
136
+ "train_batch_size": 1,
137
+ "trial_name": null,
138
+ "trial_params": null
139
+ }
checkpoint-159/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8235488be59bf3fd0161c7b433cebdef0a21ee6ed0d25ce0a1eed891f0042f8f
3
+ size 5905
checkpoint-159/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|endoftext|>",
4
+ "<fim_prefix>",
5
+ "<fim_middle>",
6
+ "<fim_suffix>",
7
+ "<fim_pad>",
8
+ "<filename>",
9
+ "<gh_stars>",
10
+ "<issue_start>",
11
+ "<issue_comment>",
12
+ "<issue_closed>",
13
+ "<jupyter_start>",
14
+ "<jupyter_text>",
15
+ "<jupyter_code>",
16
+ "<jupyter_output>",
17
+ "<empty_output>",
18
+ "<commit_before>",
19
+ "<commit_msg>",
20
+ "<commit_after>",
21
+ "<reponame>"
22
+ ],
23
+ "bos_token": {
24
+ "content": "<|endoftext|>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "eos_token": {
31
+ "content": "<|endoftext|>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "pad_token": "<|endoftext|>",
38
+ "unk_token": {
39
+ "content": "<|endoftext|>",
40
+ "lstrip": false,
41
+ "normalized": false,
42
+ "rstrip": false,
43
+ "single_word": false
44
+ }
45
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<fim_prefix>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "<fim_middle>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<fim_suffix>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "<fim_pad>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "5": {
45
+ "content": "<filename>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "6": {
53
+ "content": "<gh_stars>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "7": {
61
+ "content": "<issue_start>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "8": {
69
+ "content": "<issue_comment>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "9": {
77
+ "content": "<issue_closed>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "10": {
85
+ "content": "<jupyter_start>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "11": {
93
+ "content": "<jupyter_text>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "12": {
101
+ "content": "<jupyter_code>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "13": {
109
+ "content": "<jupyter_output>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "14": {
117
+ "content": "<empty_output>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "15": {
125
+ "content": "<commit_before>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "16": {
133
+ "content": "<commit_msg>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ },
140
+ "17": {
141
+ "content": "<commit_after>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": true
147
+ },
148
+ "18": {
149
+ "content": "<reponame>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": true
155
+ }
156
+ },
157
+ "additional_special_tokens": [
158
+ "<|endoftext|>",
159
+ "<fim_prefix>",
160
+ "<fim_middle>",
161
+ "<fim_suffix>",
162
+ "<fim_pad>",
163
+ "<filename>",
164
+ "<gh_stars>",
165
+ "<issue_start>",
166
+ "<issue_comment>",
167
+ "<issue_closed>",
168
+ "<jupyter_start>",
169
+ "<jupyter_text>",
170
+ "<jupyter_code>",
171
+ "<jupyter_output>",
172
+ "<empty_output>",
173
+ "<commit_before>",
174
+ "<commit_msg>",
175
+ "<commit_after>",
176
+ "<reponame>"
177
+ ],
178
+ "bos_token": "<|endoftext|>",
179
+ "clean_up_tokenization_spaces": true,
180
+ "eos_token": "<|endoftext|>",
181
+ "extra_special_tokens": {},
182
+ "model_max_length": 8192,
183
+ "pad_token": "<|endoftext|>",
184
+ "padding_side": "right",
185
+ "tokenizer_class": "GPT2Tokenizer",
186
+ "unk_token": "<|endoftext|>",
187
+ "vocab_size": 49152
188
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8235488be59bf3fd0161c7b433cebdef0a21ee6ed0d25ce0a1eed891f0042f8f
3
+ size 5905
vocab.json ADDED
The diff for this file is too large to render. See raw diff