Create README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,47 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# SpadaLab
|
| 2 |
+
|
| 3 |
+
**French legal datasets, curated for AI builders.**
|
| 4 |
+
|
| 5 |
+
SpadaLab produces **vector-ready French legal datasets** for builders of legal AI assistants, RAG pipelines, and decision-support tools — with a focus on the realities of **micro-entrepreneurs** and **regulated professions**.
|
| 6 |
+
|
| 7 |
+
🇫🇷 *SpadaLab produit des datasets juridiques francais prets a vectoriser pour les developpeurs d'assistants IA, pipelines RAG et outils d'aide a la decision.*
|
| 8 |
+
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
## Why SpadaLab
|
| 12 |
+
|
| 13 |
+
French micro-entrepreneurs and small business owners face a wall of legal complexity (URSSAF, RGPD, accessibility, food safety, trade regulations…) — most of it scattered across Legifrance and EU CELLAR in formats that don't fit modern AI pipelines.
|
| 14 |
+
|
| 15 |
+
We curate, clean, and chunk this content into **production-ready datasets** that any team can drop into their vector store of choice (Qdrant, Weaviate, pgvector, etc.) — **no embedding lock-in**.
|
| 16 |
+
|
| 17 |
+
## What we ship
|
| 18 |
+
|
| 19 |
+
Four commercial packs (launching May 2026) :
|
| 20 |
+
|
| 21 |
+
| Pack | Coverage | Format |
|
| 22 |
+
|---|---|---|
|
| 23 |
+
| **Micro-Entrepreneur Complet** | 7 collections : CGI, LPF, Code commerce, Conso, Securite sociale, CNIL, RGPD | JSON / Parquet |
|
| 24 |
+
| **Artisanat Reglemente** | Code de l'artisanat + LODA related | JSON / Parquet |
|
| 25 |
+
| **HACCP & Hygiene Alimentaire** | EU Reg. 178/2002, 852/2004, 853/2004 + LODA | JSON / Parquet |
|
| 26 |
+
| **Accessibilite PMR** | LODA accessibility (ERP, batiments) | JSON / Parquet |
|
| 27 |
+
|
| 28 |
+
**Format** : pre-chunked text + metadata + Gebru-style datasheets. **No embeddings included by default** — you bring your own model (avoids OpenAI lock-in).
|
| 29 |
+
|
| 30 |
+
**Sample datasets** (CC BY-NC 4.0) and **gated full datasets** (custom commercial license) will be published here.
|
| 31 |
+
|
| 32 |
+
## How we work
|
| 33 |
+
|
| 34 |
+
- **Source-of-truth-only** : Legifrance PISTE Production API + EU CELLAR (not scraped websites)
|
| 35 |
+
- **Reproducible pipelines** : every dataset shipped with manifest, SHA-256 hashes, ingestion scripts
|
| 36 |
+
- **Versioned** : semantic versioning per dataset, transparent changelog
|
| 37 |
+
- **Local-first AI** : built using on-premise models (Ollama) where appropriate
|
| 38 |
+
|
| 39 |
+
## Stay in touch
|
| 40 |
+
|
| 41 |
+
- Email : `contact@spadalab.fr`
|
| 42 |
+
- Website : *coming soon*
|
| 43 |
+
- Datasets marketplace : also available on Datarade *(coming soon)*
|
| 44 |
+
|
| 45 |
+
---
|
| 46 |
+
|
| 47 |
+
*SpadaLab is a French micro-enterprise (SIREN 103 696 993) based in France.*
|