Spaces:
Sleeping
Sleeping
INFO_MAIN= " Welcome to the Polish ASR Survey dashboard! <br> \ | |
You can use it to learn about the state of Polish ASR speech data and benchmarks. <br> \ | |
The dashboard is built upon the [*Polish ASR Speech Datasets Catalog*](https://github.com/goodmike31/pl-asr-speech-data-survey) and [*Polish ASR Benchmarks Catalog*](https://docs.google.com/spreadsheets/d/1fVsE98Ulmt-EIEe4wx8sUdo7RLigDdAVjQxNpAJIrH8/edit?usp=sharing). <br><br> \ | |
The dashboard is divided into the following tabs: <br> \ | |
* **About Polish ASR Survey** - general information about the survey, references, and contact points <br> \ | |
* **Polish ASR Speech Data Catalog** - detailed information about the speech data available for Polish ASR <br> \ | |
* **Polish ASR Speech Data Survey** - analysis of the state of Polish ASR speech data <br> \ | |
* **ASR Speech Data Taxonomy** - explanation of the columns in the *Polish ASR Speech Datasets Catalog* <br> \ | |
* **Polish ASR Benchmarks Catalog** - detailed information about the benchmarks available for Polish ASR <br> \ | |
* **Polish ASR Benchmarks Survey** - analysis of the state of Polish ASR benchmarks <br> \ | |
* **ASR Benchmarks Taxonomy** - explanation of the columns in the *Polish ASR Benchmarks Catalog* <br> \ | |
Please visit respective tab to learn how to use it and provide feedback. <br><br> \ | |
If you want to share your feedback regarding the Speech Data catalog, please use this [FORM](https://forms.gle/EWJ6YfbJJTyEzQs66). <br><br> \ | |
If you are looking for the latest ASR benchmarks for Polish, please visit the [AMU ASR leaderboard](https://huggingface.co/spaces/amu-cai/pl-asr-leaderboard). <br><br> \ | |
You can also contact the author via [email](mailto:michal.junczyk@amu.edu.pl) or [LinkedIn](https://www.linkedin.com/in/michaljunczyk/).<br>" | |
CITATION_MAIN = "@misc{junczyk-2024-pl-asr-survey <br> \ | |
title = {Polish ASR Survey}, <br> \ | |
author = {Michał Junczyk}, <br> \ | |
year = {2024}, <br> \ | |
publisher = {Hugging Face}, <br> \ | |
url = {https://huggingface.co/spaces/amu-cai/pl-asr-survey} }" | |
# TODO | |
# * Analysis of datasets utility for the purpose of ASR evaluation (see the **Dataset Utility Index** tab) <br>\ | |
############################################################################################################ | |
INFO_CATALOG= "This dashboard complements *Polish ASR Speech Datasets Catalog* available on [GitHub](https://github.com/goodmike31/pl-asr-speech-data-survey) and [Google Sheets](https://docs.google.com/spreadsheets/d/181EDfwZNtHgHFOMaKNtgKssrYDX4tXTJ9POMzBsCRlI/edit#gid=0) by providing:<br> \ | |
* More convenient browsing of the catalog content (*see the **How to use?** section below*) <br>\ | |
* Up-to-data analysis of the state of Polish ASR speech data (*see the **Polish ASR Speech Data Survey** tab*) <br><br> \ | |
IMPORTANT - If you want to share your feedback regarding the catalog, please use this [FORM](https://forms.gle/EWJ6YfbJJTyEzQs66). Each response is granted 50 PLN for the charity of your choice. <br>\ | |
Your feedback will be helpful to assess the state of Polish ASR speech data from the community perspective.<br><br> \ | |
If you want report missing dataset or request correction of descriptons, please follow the steps described on [GitHub](https://github.com/goodmike31/pl-asr-speech-data-survey?tab=readme-ov-file#how-to-contribute-to-the-polish-asr-speech-datasets-catalog) <br> \ | |
You can also contact the author via [email](mailto:michal.junczyk@amu.edu.pl) or [LinkedIn](https://www.linkedin.com/in/michaljunczyk/).<br>" | |
CITATION_CATALOG="@article{Junczyk+2024+27+52, <br>\ | |
url = {https://doi.org/10.1515/psicl-2023-0019},<br>\ | |
title = {A survey of Polish ASR speech datasets},<br>\ | |
author = {Michał Junczyk},<br>\ | |
pages = {27--52},<br>\ | |
volume = {60},<br>\ | |
number = {1},<br>\ | |
journal = {Poznan Studies in Contemporary Linguistics},<br>\ | |
doi = {doi:10.1515/psicl-2023-0019},<br>\ | |
year = {2024},<br>\ | |
lastchecked = {2024-03-10}<br>\ | |
}" | |
HOWTO_CATALOG = "To browse the catalog content using filters you must enable them first. <br> \ | |
You can also sort the columns by clicking on the column header. <br> \ | |
Depending on the column type, you can use the search box to filter the content. <br> \ | |
Please refer to the **ASR Speech Data Taxonomy** tab for the explanation of the columns. <br> \ | |
If you looking for insights derived from the collected in the catalog, please go to **Polish ASR Speech Data Survey** tab. <br>" | |
HOWTO_TAXONOMY_CAT = "This table presents descriptors (columns) used in the *Polish ASR Speech Datasets Catalog* <br> \ | |
Taxonomy is also provided on [GitHub as TSV file](https://github.com/goodmike31/pl-asr-speech-data-survey/blob/main/snapshots/pl-asr-speech-datasets-catalog-latest.tsv) and [Google Sheets](https://docs.google.com/spreadsheets/d/181EDfwZNtHgHFOMaKNtgKssrYDX4tXTJ9POMzBsCRlI/edit#gid=2015613057)" | |
############################################################################################################ | |
INFO_BENCHMARK = "TODO" | |
CITATION_BENCHMARK="@misc{junczyk-2023-pl-asr-speech-data-catalog, <br> \ | |
title = {Polish ASR Speech Datasets Catalog}, <br> \ | |
author = {Michał Junczyk}, <br> \ | |
year = {2023}, <br> \ | |
publisher = {Github}, <br> \ | |
url = {https://github.com/goodmike31/pl-asr-speech-data-survey} }" | |
HOWTO_BENCHMARK = "You can use the filters to browse the catalog content. <br> \ | |
You can also sort the columns by clicking on the column header. <br> \ | |
Depending on the column type, you can use the search box to filter the content. <br> \ | |
Please refer to the **ASR Benchmarks Catalog Taxonomy** tab for the explanation of the columns. <br>" | |
############################################################################################################ |