Full-text search
26 results

wizenheimer / invoice-yc
README.md
dataset
11 matches

wizenheimer / doclaynet_bench
README.md
dataset
11 matches

wizenheimer / funsd_layoutlmv3
README.md
dataset
11 matches

wizenheimer / layoutlm_resume_data
README.md
dataset
11 matches

wizenheimer / invoices_receipts_ocr_v1
README.md
dataset
11 matches
cyrustt / translateddatallama
dataset
1 matches
cyrusknopf / uom-reports
dataset
1 matches

FrancophonIA / Cyprus_Europe
README.md
dataset
4 matches
tags:
task_categories:translation, language:eng, language:deu, language:ell, language:fra, region:us
18
19
20
21
22
ion "Cyprus has always been Europe 2017" of the Press and Information Office of Cyprus.
## Citation
```
PIO Publication "Cyprus has always been Europe 2017" (2018, December 18). Version 1.0. [Dataset (Text corpus)]. Source: European Language Grid. https://live.european-language-grid.eu/catalogue/corpus/19081

FrancophonIA / COVID-19_PIO-CY
README.md
dataset
1 matches
tags:
task_categories:translation, language:eng, language:el, language:fra, region:us
17
18
19
20
21
e of Cyprus (30th April 2020). It contains 1692 TUs in total.
## Citation
```
COVID-19 PIO-CY dataset. Multilingual (EN, FR, EL) (2020, May 04). Version 1.0. [Dataset (Text corpus)]. Source: European Language Grid. https://live.european-language-grid.eu/catalogue/corpus/21077

FrancophonIA / Herein_System_Thesaurus
README.md
dataset
1 matches
tags:
task_categories:translation, language:bul, language:deu, language:ell, language:eng, language:spa, language:fin, language:fra, language:hrv, language:hun, language:nld, language:nor, language:pol, language:por, language:ron, language:slv, region:us
30
31
32
33
34
and Cyprus) and French (Belgium, Switzerland and France) thesauri.
Synonyms have therefore been introduced for these three languages to express the different national concepts.
Between 500 and 600 words per language are divided into 9 groups:
cmammides / BIOMON
README.md
dataset
2 matches
tags:
task_categories:feature-extraction, language:en, license:cc-by-4.0, size_categories:n<1K, format:audiofolder, modality:audio, library:datasets, library:mlcroissant, doi:10.57967/hf/2613, region:us, biology
23
24
25
26
27
d of Cyprus in the Mediterranean Basin biodiversity hotspot. Scientific Data. 12, 461. https://doi.org/10.1038/s41597-025-04807-1.
See also:
1. Mammides, C. et al. (2025). The Combined Effectiveness of Acoustic Indices in Measuring Bird Species Richness in Biodiverse Sites in Cyprus, China, and Australia. Ecological Indicators. 170, 1131105.https://doi.org/10.1016/j.ecolind.2025.113105.

paperswithbacktest / Indices-Daily-Price
README.md
dataset
3 matches
tags:
task_categories:tabular-regression, language:en, license:other, region:us
57
58
59
60
61
APA: Cyprus Stock Market (Cyprus General) - Cyprus (CY)
- DARSDSEI: Tanzania All Share Index DSEI - Tanzania (TZ)
- DAX: Germany DAX 30 Stock Market Index - Germany (DE)
- DFMGI: DFM general - United Arab Emirates
- DSEX: DSE Broad - Bangladesh (BD)

habdine / Prot2Text-Data
README.md
dataset
2 matches
tags:
license:cc-by-nc-4.0, size_categories:100K<n<1M, format:parquet, modality:text, library:datasets, library:pandas, library:mlcroissant, library:polars, arxiv:2307.14367, region:us
52
53
54
55
56
y of Cyprus, Nicosia, Cyprus.<br>
**Prot2Text** paper is published in **AAAI 2024**. Preliminary versions of the paper were accepted as a spotlight at [DGM4H@NeurIPS 2023](https://sites.google.com/ethz.ch/dgm4h-neurips2023/home?authuser=0) and [AI4Science@NeurIPS 2023](https://ai4sciencecommunity.github.io/neurips23.html).
## Dataset Description

facebook / cyberseceval3-visual-prompt-injection
README.md
dataset
1 matches
tags:
task_categories:text-generation, language:en, license:mit, size_categories:1K<n<10K, format:json, modality:image, modality:text, library:datasets, library:pandas, library:mlcroissant, library:polars, arxiv:2408.01605, arxiv:2311.17600, region:us, ai security, prompt injection
112
113
114
115
116
and Cyrus Nikolaidis and Daniel Song and David Molnar and James Crnkovich and Jayson Grace and Manish Bhatt and Sahana Chennabasappa and Spencer Whitman and Stephanie Ding and Vlad Ionescu and Yue Li and Joshua Saxe},
year={2024},
eprint={2408.01605},
archivePrefix={arXiv},
primaryClass={cs.CR},

unimelb-nlp / Multi-EuP
README.md
dataset
1 matches
tags:
task_categories:text-retrieval, language:en, language:de, language:fr, language:it, language:es, language:pl, language:ro, language:nl, language:el, language:hu, language:pt, language:cs, language:sv, language:bg, language:da, language:fi, language:sk, language:lt, language:hr, language:sl, language:et, language:lv, language:mt, language:ga, license:apache-2.0, size_categories:10K<n<100K, modality:image, arxiv:2311.01870, region:us
86
87
88
89
90
ece, Cyprus | 3% | 4% | 707 | 209/205 |
| Hungarian| HU | Hungary | 3% | 3% | 614 | 126/128 |
| Portuguese| PT | Portugal | 2% | 3% | 1176 | 179/167 |
| Czech | CS | Czech Republic | 2% | 3% | 397 | 167/149 |
| Swedish | SV | Sweden | 2% | 3% | 531 | 175/165 |

walledai / CyberSecEval
README.md
dataset
1 matches
tags:
language:en, license:mit, size_categories:1K<n<10K, format:parquet, modality:text, library:datasets, library:pandas, library:mlcroissant, library:polars, arxiv:2404.13161, region:us
163
164
165
166
167
dis, Cyrus and Song, Daniel and Wan, Shengye and Ahmad, Faizan and Aschermann, Cornelius and Chen, Yaohui and Kapil, Dhaval and others},
journal={arXiv preprint arXiv:2404.13161},
year={2024}
}
```

jtatman / one_twenty_five_faces
README.md
dataset
1 matches

coastalcph / multi_eurlex
README.md
dataset
1 matches
tags:
task_categories:text-classification, task_ids:multi-label-classification, task_ids:topic-classification, annotations_creators:found, language_creators:found, multilinguality:multilingual, source_datasets:original, language:bg, language:cs, language:da, language:de, language:el, language:en, language:es, language:et, language:fi, language:fr, language:hr, language:hu, language:it, language:lt, language:lv, language:mt, language:nl, language:pl, language:pt, language:ro, language:sk, language:sl, language:sv, license:cc-by-sa-4.0, size_categories:10K<n<100K, arxiv:2109.00904, region:us
1272
1273
1274
1275
1276
81), Cyprus (2008) </td> <td> 3/4% </td> <td> 55,000 / 5,000 / 5,000 </td> </tr>
<tr><td> Hungarian </td> <td> <b>hu</b> </td> <td> Hungary (2004) </td> <td> 3/3% </td> <td> 22,664 / 5,000 / 5,000 </td> </tr>
<tr><td> Portuguese </td> <td> <b>pt</b> </td> <td> Portugal (1986) </td> <td> 2/3% </td> <td> 23,188 / 5,000 / 5,000 </td> </tr>
<tr><td> Czech </td> <td> <b>cs</b> </td> <td> Czech Republic (2004) </td> <td> 2/3% </td> <td> 23,187 / 5,000 / 5,000 </td> </tr>
<tr><td> Swedish </td> <td> <b>sv</b> </td> <td> Sweden (1995) </td> <td> 2/3% </td> <td> 42,490 / 5,000 / 5,000 </td> </tr>

nlpaueb / multi_eurlex
README.md
dataset
1 matches
tags:
task_categories:text-classification, task_ids:multi-label-classification, task_ids:topic-classification, annotations_creators:found, language_creators:found, language_creators:machine-generated, multilinguality:multilingual, source_datasets:extended|multi_eurlex, language:en, language:de, language:fr, language:el, language:sk, license:cc-by-sa-4.0, size_categories:10K<n<100K, region:us
186
187
188
189
190
81), Cyprus (2008) </td> <td> 3/4% </td> <td> 11,000 / 1,000 / 5,000 </td> </tr>
<tr><td> Slovak </td> <td> <b>sk</b> </td> <td> Slovakia (2004) </td> <td> 1/1% </td> <td> 11,000 / 1,000 / 5,000 </td> </tr>
</table>
[1] Native and Total EU speakers percentage (%) \