neural-thinker commited on
Commit
1fbb4fe
·
1 Parent(s): 3c8ed07

feat(security): establish ML security and governance framework

Browse files

- Add MIT License for open source compatibility and broader adoption
- Implement comprehensive SECURITY.md with ML-specific security guidelines
- Create .github/SECURITY.md template for GitHub security issue reporting
- Add extensive .gitignore for ML/AI development security best practices

ML Security features:
- Model integrity verification with SHA-256 checksums
- Adversarial robustness testing and bias detection protocols
- Data privacy and anonymization procedures for training datasets
- LGPD compliance for sensitive government data handling
- Secure model serving and deployment guidelines

Development security enhancements:
- Protection against accidental commit of model artifacts and datasets
- Security patterns for ML pipelines and training infrastructure
- Comprehensive coverage of ML/AI specific files and directories
- Support for MLOps tools (MLflow, Weights & Biases, DVC)

Files changed (4) hide show
  1. .github/SECURITY.md +39 -0
  2. .gitignore +271 -0
  3. LICENSE +21 -0
  4. SECURITY.md +212 -0
.github/SECURITY.md ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🔒 Security Policy
2
+
3
+ ## 🚨 Reporting Security Vulnerabilities
4
+
5
+ **Do not report security vulnerabilities through public GitHub issues.**
6
+
7
+ Instead, please report them by email to: **security@cidadao.ai**
8
+
9
+ Please include the following information:
10
+ - Description of the vulnerability
11
+ - Affected models or components
12
+ - Steps to reproduce
13
+ - Potential impact on model security
14
+ - Data samples (if safe to share)
15
+ - Suggested fix (if any)
16
+
17
+ ## 📋 Supported Versions
18
+
19
+ | Version | Supported |
20
+ | ------- | ------------------ |
21
+ | 1.0.x | :white_check_mark: |
22
+
23
+ ## 🛡️ ML Security Features
24
+
25
+ - Model integrity verification (SHA-256)
26
+ - Adversarial robustness testing
27
+ - Data privacy and anonymization
28
+ - Secure model serving
29
+ - Bias detection and mitigation
30
+ - LGPD compliance for training data
31
+
32
+ ## 📞 Contact
33
+
34
+ - **Security Team**: security@cidadao.ai
35
+ - **ML Security**: ml-security@cidadao.ai
36
+ - **Response Time**: Within 48 hours
37
+ - **Coordinated Disclosure**: We practice responsible disclosure
38
+
39
+ For more details, see our full [SECURITY.md](../SECURITY.md) file.
.gitignore ADDED
@@ -0,0 +1,271 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Cidadão.AI Models - .gitignore
2
+ # Machine Learning and MLOps specific gitignore
3
+
4
+ # Byte-compiled / optimized / DLL files
5
+ __pycache__/
6
+ *.py[cod]
7
+ *$py.class
8
+
9
+ # C extensions
10
+ *.so
11
+
12
+ # Distribution / packaging
13
+ .Python
14
+ build/
15
+ develop-eggs/
16
+ dist/
17
+ downloads/
18
+ eggs/
19
+ .eggs/
20
+ lib/
21
+ lib64/
22
+ parts/
23
+ sdist/
24
+ var/
25
+ wheels/
26
+ share/python-wheels/
27
+ *.egg-info/
28
+ .installed.cfg
29
+ *.egg
30
+ MANIFEST
31
+
32
+ # PyInstaller
33
+ # Usually these files are written by a python script from a template
34
+ # before PyInstaller builds the exe, so as to inject date/other infos into it.
35
+ *.manifest
36
+ *.spec
37
+
38
+ # Installer logs
39
+ pip-log.txt
40
+ pip-delete-this-directory.txt
41
+
42
+ # Unit test / coverage reports
43
+ htmlcov/
44
+ .tox/
45
+ .nox/
46
+ .coverage
47
+ .coverage.*
48
+ .cache
49
+ nosetests.xml
50
+ coverage.xml
51
+ *.cover
52
+ *.py,cover
53
+ .hypothesis/
54
+ .pytest_cache/
55
+ cover/
56
+
57
+ # Translations
58
+ *.mo
59
+ *.pot
60
+
61
+ # Django stuff:
62
+ *.log
63
+ local_settings.py
64
+ db.sqlite3
65
+ db.sqlite3-journal
66
+
67
+ # Flask stuff:
68
+ instance/
69
+ .webassets-cache
70
+
71
+ # Scrapy stuff:
72
+ .scrapy
73
+
74
+ # Sphinx documentation
75
+ docs/_build/
76
+
77
+ # PyBuilder
78
+ .pybuilder/
79
+ target/
80
+
81
+ # Jupyter Notebook
82
+ .ipynb_checkpoints
83
+
84
+ # IPython
85
+ profile_default/
86
+ ipython_config.py
87
+
88
+ # pyenv
89
+ # For a library or package, you might want to ignore these files since the code is
90
+ # intended to run in multiple environments; otherwise, check them in:
91
+ # .python-version
92
+
93
+ # pipenv
94
+ # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
95
+ # However, in case of collaboration, if having platform-specific dependencies or dependencies
96
+ # having no cross-platform support, pipenv may install dependencies that don't work, or not
97
+ # install all needed dependencies.
98
+ #Pipfile.lock
99
+
100
+ # poetry
101
+ # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
102
+ # This is especially recommended for binary packages to ensure reproducibility, and is more
103
+ # commonly ignored for libraries.
104
+ # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
105
+ #poetry.lock
106
+
107
+ # pdm
108
+ # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
109
+ #pdm.lock
110
+ # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
111
+ # in version control.
112
+ # https://pdm.fming.dev/#use-with-ide
113
+ .pdm.toml
114
+
115
+ # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
116
+ __pypackages__/
117
+
118
+ # Celery stuff
119
+ celerybeat-schedule
120
+ celerybeat.pid
121
+
122
+ # SageMath parsed files
123
+ *.sage.py
124
+
125
+ # Environments
126
+ .env
127
+ .venv
128
+ env/
129
+ venv/
130
+ ENV/
131
+ env.bak/
132
+ venv.bak/
133
+
134
+ # Spyder project settings
135
+ .spyderproject
136
+ .spyproject
137
+
138
+ # Rope project settings
139
+ .ropeproject
140
+
141
+ # mkdocs documentation
142
+ /site
143
+
144
+ # mypy
145
+ .mypy_cache/
146
+ .dmypy.json
147
+ dmypy.json
148
+
149
+ # Pyre type checker
150
+ .pyre/
151
+
152
+ # pytype static type analyzer
153
+ .pytype/
154
+
155
+ # Cython debug symbols
156
+ cython_debug/
157
+
158
+ # ML/AI Specific Files
159
+ # ===================
160
+
161
+ # Model artifacts
162
+ models/
163
+ *.pkl
164
+ *.joblib
165
+ *.h5
166
+ *.hdf5
167
+ *.pb
168
+ *.pth
169
+ *.pt
170
+ *.onnx
171
+ *.tflite
172
+ *.mlmodel
173
+ *.coreml
174
+
175
+ # Large datasets
176
+ datasets/
177
+ data/
178
+ *.csv
179
+ *.json
180
+ *.parquet
181
+ *.feather
182
+ *.hdf
183
+ *.h5
184
+
185
+ # Training artifacts
186
+ logs/
187
+ runs/
188
+ experiments/
189
+ checkpoints/
190
+ artifacts/
191
+ outputs/
192
+
193
+ # MLflow
194
+ mlruns/
195
+ mlflow.db
196
+ .mlflow/
197
+
198
+ # Weights & Biases
199
+ wandb/
200
+
201
+ # TensorBoard
202
+ tensorboard/
203
+ tb_logs/
204
+
205
+ # DVC (Data Version Control)
206
+ .dvc/
207
+ .dvcignore
208
+
209
+ # Jupyter notebook outputs
210
+ *checkpoint.ipynb
211
+
212
+ # Large files that shouldn't be in git
213
+ *.zip
214
+ *.tar.gz
215
+ *.rar
216
+ *.7z
217
+
218
+ # IDE
219
+ .vscode/
220
+ .idea/
221
+ *.swp
222
+ *.swo
223
+ *~
224
+
225
+ # OS generated files
226
+ .DS_Store
227
+ .DS_Store?
228
+ ._*
229
+ .Spotlight-V100
230
+ .Trashes
231
+ ehthumbs.db
232
+ Thumbs.db
233
+
234
+ # Security
235
+ .secrets/
236
+ secrets.yaml
237
+ secrets.json
238
+ *.key
239
+ *.pem
240
+ *.crt
241
+ *.p12
242
+ *.pfx
243
+
244
+ # Docker
245
+ .dockerignore
246
+ docker-compose.override.yml
247
+
248
+ # Temporary files
249
+ tmp/
250
+ temp/
251
+ *.tmp
252
+ *.temp
253
+
254
+ # HuggingFace cache
255
+ .cache/
256
+ transformers_cache/
257
+
258
+ # Custom model configs that may contain secrets
259
+ *config.secret.yaml
260
+ *config.secret.json
261
+ *config.local.yaml
262
+ *config.local.json
263
+
264
+ # Training data that may be sensitive
265
+ training_data/
266
+ raw_data/
267
+ sensitive_data/
268
+
269
+ # Model evaluation reports (may contain sensitive info)
270
+ evaluation_reports/
271
+ performance_reports/
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Anderson Henrique da Silva
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
SECURITY.md ADDED
@@ -0,0 +1,212 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🔒 Security Policy - Cidadão.AI Models
2
+
3
+ ## 📋 Overview
4
+
5
+ This document outlines the security practices and vulnerability disclosure process for the Cidadão.AI Models repository, which contains machine learning models and MLOps infrastructure for government transparency analysis.
6
+
7
+ ## ⚠️ Supported Versions
8
+
9
+ | Version | Supported |
10
+ | ------- | ------------------ |
11
+ | 1.0.x | :white_check_mark: |
12
+
13
+ ## 🛡️ Security Features
14
+
15
+ ### ML Model Security
16
+ - **Model Integrity**: SHA-256 checksums for all model artifacts
17
+ - **Supply Chain Security**: Verified model provenance and lineage
18
+ - **Input Validation**: Robust validation of all model inputs
19
+ - **Output Sanitization**: Safe handling of model predictions
20
+ - **Adversarial Robustness**: Testing against adversarial attacks
21
+
22
+ ### Data Security
23
+ - **Data Privacy**: Personal data anonymization in training datasets
24
+ - **LGPD Compliance**: Brazilian data protection law compliance
25
+ - **Secure Storage**: Encrypted storage of sensitive training data
26
+ - **Access Controls**: Role-based access to model artifacts
27
+ - **Audit Trails**: Complete logging of model training and deployment
28
+
29
+ ### Infrastructure Security
30
+ - **Container Security**: Secure Docker images with minimal attack surface
31
+ - **Dependency Scanning**: Regular vulnerability scanning of Python packages
32
+ - **Secret Management**: Secure handling of API keys and model credentials
33
+ - **Network Security**: Encrypted communications for all model serving
34
+ - **Environment Isolation**: Separate environments for training and production
35
+
36
+ ## 🚨 Reporting Security Vulnerabilities
37
+
38
+ ### How to Report
39
+ 1. **DO NOT** create a public GitHub issue for security vulnerabilities
40
+ 2. Send an email to: **security@cidadao.ai** (or andersonhs27@gmail.com)
41
+ 3. Include detailed information about the vulnerability
42
+ 4. We will acknowledge receipt within 48 hours
43
+
44
+ ### What to Include
45
+ - Description of the vulnerability
46
+ - Affected models or components
47
+ - Steps to reproduce the issue
48
+ - Potential impact on model performance or security
49
+ - Data samples (if safe to share)
50
+ - Suggested remediation (if available)
51
+ - Your contact information
52
+
53
+ ### Response Timeline
54
+ - **Initial Response**: Within 48 hours
55
+ - **Investigation**: 1-7 days depending on severity
56
+ - **Model Retraining**: 1-14 days if required
57
+ - **Deployment**: 1-3 days after fix verification
58
+ - **Public Disclosure**: After fix is deployed (coordinated disclosure)
59
+
60
+ ## 🛠️ Security Best Practices
61
+
62
+ ### Model Development Security
63
+ ```python
64
+ # Example secure model loading
65
+ import hashlib
66
+ import pickle
67
+
68
+ def secure_model_load(model_path, expected_hash):
69
+ """Safely load model with integrity verification"""
70
+ with open(model_path, 'rb') as f:
71
+ model_data = f.read()
72
+
73
+ # Verify model integrity
74
+ model_hash = hashlib.sha256(model_data).hexdigest()
75
+ if model_hash != expected_hash:
76
+ raise SecurityError("Model integrity check failed")
77
+
78
+ return pickle.loads(model_data)
79
+ ```
80
+
81
+ ### Data Handling Security
82
+ ```python
83
+ # Example data anonymization
84
+ def anonymize_government_data(data):
85
+ """Remove or hash personally identifiable information"""
86
+ # Remove CPF, names, addresses
87
+ # Hash vendor IDs
88
+ # Preserve analytical utility while protecting privacy
89
+ return anonymized_data
90
+ ```
91
+
92
+ ### Deployment Security
93
+ ```bash
94
+ # Security checks before model deployment
95
+ pip audit # Check for vulnerable dependencies
96
+ bandit -r src/ # Security linting
97
+ safety check # Known security vulnerabilities
98
+ docker scan cidadao-ai-models:latest # Container vulnerability scan
99
+ ```
100
+
101
+ ## 🔍 Security Testing
102
+
103
+ ### Model Security Testing
104
+ - **Adversarial Testing**: Robustness against adversarial examples
105
+ - **Data Poisoning**: Detection of malicious training data
106
+ - **Model Extraction**: Protection against model stealing attacks
107
+ - **Membership Inference**: Privacy testing for training data
108
+ - **Fairness Testing**: Bias detection across demographic groups
109
+
110
+ ### Infrastructure Testing
111
+ - **Penetration Testing**: Regular security assessments
112
+ - **Dependency Scanning**: Automated vulnerability detection
113
+ - **Container Security**: Image scanning and hardening
114
+ - **API Security**: Authentication and authorization testing
115
+ - **Network Security**: Encryption and secure communications
116
+
117
+ ## 🎯 Model-Specific Security Considerations
118
+
119
+ ### Corruption Detection Models
120
+ - **False Positive Impact**: Careful calibration to minimize false accusations
121
+ - **Bias Prevention**: Regular testing for demographic and regional bias
122
+ - **Transparency**: Explainable AI for all corruption predictions
123
+ - **Audit Trail**: Complete logging of all corruption detections
124
+
125
+ ### Anomaly Detection Models
126
+ - **Threshold Management**: Secure configuration of anomaly thresholds
127
+ - **Feature Security**: Protection of sensitive features from exposure
128
+ - **Model Drift**: Monitoring for performance degradation over time
129
+ - **Validation**: Human expert validation of anomaly predictions
130
+
131
+ ### Natural Language Models
132
+ - **Text Sanitization**: Safe handling of government document text
133
+ - **Information Extraction**: Secure extraction without data leakage
134
+ - **Language Security**: Protection against prompt injection attacks
135
+ - **Content Filtering**: Removal of personally identifiable information
136
+
137
+ ## 📊 Privacy and Ethics
138
+
139
+ ### Data Privacy
140
+ - **Anonymization**: Personal data removed or hashed in all models
141
+ - **Minimal Collection**: Only necessary data used for model training
142
+ - **Retention Limits**: Training data deleted after model deployment
143
+ - **Access Logs**: Complete audit trail of data access
144
+ - **Consent Management**: Respect for data subject rights under LGPD
145
+
146
+ ### Ethical AI
147
+ - **Fairness**: Regular bias testing and mitigation
148
+ - **Transparency**: Explainable predictions for all model outputs
149
+ - **Accountability**: Clear responsibility for model decisions
150
+ - **Human Oversight**: Human review required for high-impact predictions
151
+ - **Social Impact**: Assessment of model impact on society
152
+
153
+ ## 📞 Contact Information
154
+
155
+ ### Security Team
156
+ - **Primary Contact**: security@cidadao.ai
157
+ - **ML Security**: ml-security@cidadao.ai (or andersonhs27@gmail.com)
158
+ - **Data Privacy**: privacy@cidadao.ai (or andersonhs27@gmail.com)
159
+ - **Response SLA**: 48 hours for critical model security issues
160
+
161
+ ### Emergency Contact
162
+ For critical security incidents affecting production models:
163
+ - **Email**: security@cidadao.ai (Priority: CRITICAL)
164
+ - **Subject**: [URGENT ML SECURITY] Brief description
165
+
166
+ ## 🔬 Model Governance
167
+
168
+ ### Model Registry Security
169
+ - **Version Control**: Secure versioning of all model artifacts
170
+ - **Access Control**: Role-based access to model registry
171
+ - **Audit Logging**: Complete history of model updates
172
+ - **Approval Process**: Required approval for production deployments
173
+
174
+ ### Monitoring and Alerting
175
+ - **Performance Monitoring**: Real-time model performance tracking
176
+ - **Security Monitoring**: Detection of anomalous model behavior
177
+ - **Data Drift Detection**: Monitoring for changes in input distributions
178
+ - **Alert System**: Immediate notification of security incidents
179
+
180
+ ## 📚 Security Resources
181
+
182
+ ### ML Security Documentation
183
+ - [OWASP Machine Learning Security Top 10](https://owasp.org/www-project-machine-learning-security-top-10/)
184
+ - [NIST AI Risk Management Framework](https://www.nist.gov/itl/ai-risk-management-framework)
185
+ - [Google ML Security Best Practices](https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning)
186
+
187
+ ### Security Tools
188
+ - **Model Scanning**: TensorFlow Privacy, PyTorch Security
189
+ - **Data Validation**: TensorFlow Data Validation (TFDV)
190
+ - **Bias Detection**: Fairness Indicators, AI Fairness 360
191
+ - **Adversarial Testing**: Foolbox, CleverHans
192
+
193
+ ## 🔄 Incident Response
194
+
195
+ ### Model Security Incidents
196
+ 1. **Immediate Response**: Isolate affected models from production
197
+ 2. **Assessment**: Evaluate impact and scope of security breach
198
+ 3. **Containment**: Prevent further damage or data exposure
199
+ 4. **Investigation**: Determine root cause and affected systems
200
+ 5. **Recovery**: Retrain or redeploy secure models
201
+ 6. **Post-Incident**: Review and improve security measures
202
+
203
+ ### Communication Plan
204
+ - **Internal**: Immediate notification to security team and stakeholders
205
+ - **External**: Coordinated disclosure to affected users and regulators
206
+ - **Public**: Transparent communication about resolved issues
207
+
208
+ ---
209
+
210
+ **Note**: This security policy is reviewed quarterly and updated as needed. Last updated: January 2025.
211
+
212
+ For questions about this security policy, contact: security@cidadao.ai