Optitransfer commited on
Commit
5ec5e37
·
verified ·
1 Parent(s): f650406

Add organisation profile card

Browse files
Files changed (1) hide show
  1. README.md +69 -4
README.md CHANGED
@@ -1,10 +1,75 @@
1
  ---
2
  title: README
3
- emoji: 🌖
4
- colorFrom: pink
5
- colorTo: red
6
  sdk: static
7
  pinned: false
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: README
3
+ emoji: 🏔️
4
+ colorFrom: blue
5
+ colorTo: green
6
  sdk: static
7
  pinned: false
8
  ---
9
 
10
+ <div style="max-width: 800px; margin: 0 auto;">
11
+
12
+ <h2>🏔️ OptiTransferData — Sovereign AI Data for Europe</h2>
13
+
14
+ <p style="font-size: 1.1em; color: #555;">
15
+ Production-grade, EU AI Act compliant web corpora for LLM training, RAG pipelines, and NLP research. Curated in Switzerland 🇨🇭 with independent quality assurance.
16
+ </p>
17
+
18
+ ---
19
+
20
+ ### 🎯 What We Do
21
+
22
+ We build **gold-standard national web corpora** — comprehensive, deduplicated, and quality-scored datasets covering entire country-level web domains. Each dataset is independently audited and delivered with full provenance tracking, SHA-256 integrity verification, and commercial licensing.
23
+
24
+ **Our focus areas:**
25
+ - 🤖 **LLM Pre-training & Fine-tuning** — Sovereign language data at scale
26
+ - 🔍 **RAG Pipelines** — Pre-chunked, embedding-ready corpora with quality scores
27
+ - 🏛️ **Government & Regulatory NLP** — Domain-classified, jurisdiction-specific data
28
+ - 📊 **Academic Research** — Reproducible, well-documented datasets with full metadata
29
+
30
+ ---
31
+
32
+ ### 📦 Available Datasets
33
+
34
+ | Dataset | Records | Coverage | Format |
35
+ |---|---|---|---|
36
+ | 🇱🇮 [Liechtenstein Ultra Premium](https://huggingface.co/datasets/OptiTransferData/liechtenstein-ultra-premium-li) | 35,748 | Full `.li` domain | JSONL · 37 fields |
37
+ | 🇫🇷 [France Sovereign RAG Chunks](https://huggingface.co/datasets/OptiTransferData/france-sovereign-rag-chunks) | 348,829 | French government & institutional web | JSONL · 8 fields |
38
+
39
+ > **Free gated samples** available on each dataset — request access to evaluate before purchasing.
40
+
41
+ **Coming soon:** 🇩🇪 Germany · 🇦🇹 Austria · 🇨🇭 Switzerland · 🇮🇹 Italy · 🇪🇸 Spain
42
+
43
+ ---
44
+
45
+ ### ✅ Quality Standards
46
+
47
+ - 📋 **Independent QA audits** with documented accuracy metrics
48
+ - 🔐 **SHA-256 integrity verification** on all production files
49
+ - 📊 **Quality scoring** per record (0–100 scale)
50
+ - 🏷️ **Domain classification** and language detection
51
+ - 📜 **EU AI Act compliance** — full data provenance and licensing transparency
52
+ - 🧹 **Deduplication** — content-level and URL-level
53
+
54
+ ---
55
+
56
+ ### 💼 Licensing & Access
57
+
58
+ | Tier | Access |
59
+ |---|---|
60
+ | **Sample** | Free with gated access — evaluate data quality |
61
+ | **Full Dataset** | Commercial licence — complete production data |
62
+ | **Enterprise** | Custom pricing — dedicated support, SLA, bespoke corpora |
63
+
64
+ 📧 **Contact us for a quote:** [data@optitransfer.ch](mailto:data@optitransfer.ch)
65
+
66
+ **Payment methods:**
67
+ 🏦 Bank Transfer (SEPA/SWIFT) · 📱 TWINT (Swiss) · ₿ Crypto (BTC/ETH/SOL — addresses on request)
68
+
69
+ ---
70
+
71
+ <p style="text-align: center; color: #888; font-size: 0.9em;">
72
+ 🏔️ Curated in Switzerland · <a href="https://optitransfer.ch">optitransfer.ch</a> · <a href="mailto:data@optitransfer.ch">data@optitransfer.ch</a>
73
+ </p>
74
+
75
+ </div>