bclement-spadalab commited on
Commit
ac23463
·
verified ·
1 Parent(s): c0aaff6

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -3
README.md CHANGED
@@ -1,3 +1,47 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SpadaLab
2
+
3
+ **French legal datasets, curated for AI builders.**
4
+
5
+ SpadaLab produces **vector-ready French legal datasets** for builders of legal AI assistants, RAG pipelines, and decision-support tools — with a focus on the realities of **micro-entrepreneurs** and **regulated professions**.
6
+
7
+ 🇫🇷 *SpadaLab produit des datasets juridiques francais prets a vectoriser pour les developpeurs d'assistants IA, pipelines RAG et outils d'aide a la decision.*
8
+
9
+ ---
10
+
11
+ ## Why SpadaLab
12
+
13
+ French micro-entrepreneurs and small business owners face a wall of legal complexity (URSSAF, RGPD, accessibility, food safety, trade regulations…) — most of it scattered across Legifrance and EU CELLAR in formats that don't fit modern AI pipelines.
14
+
15
+ We curate, clean, and chunk this content into **production-ready datasets** that any team can drop into their vector store of choice (Qdrant, Weaviate, pgvector, etc.) — **no embedding lock-in**.
16
+
17
+ ## What we ship
18
+
19
+ Four commercial packs (launching May 2026) :
20
+
21
+ | Pack | Coverage | Format |
22
+ |---|---|---|
23
+ | **Micro-Entrepreneur Complet** | 7 collections : CGI, LPF, Code commerce, Conso, Securite sociale, CNIL, RGPD | JSON / Parquet |
24
+ | **Artisanat Reglemente** | Code de l'artisanat + LODA related | JSON / Parquet |
25
+ | **HACCP & Hygiene Alimentaire** | EU Reg. 178/2002, 852/2004, 853/2004 + LODA | JSON / Parquet |
26
+ | **Accessibilite PMR** | LODA accessibility (ERP, batiments) | JSON / Parquet |
27
+
28
+ **Format** : pre-chunked text + metadata + Gebru-style datasheets. **No embeddings included by default** — you bring your own model (avoids OpenAI lock-in).
29
+
30
+ **Sample datasets** (CC BY-NC 4.0) and **gated full datasets** (custom commercial license) will be published here.
31
+
32
+ ## How we work
33
+
34
+ - **Source-of-truth-only** : Legifrance PISTE Production API + EU CELLAR (not scraped websites)
35
+ - **Reproducible pipelines** : every dataset shipped with manifest, SHA-256 hashes, ingestion scripts
36
+ - **Versioned** : semantic versioning per dataset, transparent changelog
37
+ - **Local-first AI** : built using on-premise models (Ollama) where appropriate
38
+
39
+ ## Stay in touch
40
+
41
+ - Email : `contact@spadalab.fr`
42
+ - Website : *coming soon*
43
+ - Datasets marketplace : also available on Datarade *(coming soon)*
44
+
45
+ ---
46
+
47
+ *SpadaLab is a French micro-enterprise (SIREN 103 696 993) based in France.*