Christopher Schrรถder

cschroeder

AI & ML interests

NLP, Active Learning, Text Representations, PyTorch

Recent Activity

Organizations

Webis Group's profile picture Webis Hugging Face Workshop's profile picture small-text's profile picture German LLM Tokenizers's profile picture Social Post Explorers's profile picture GERTuraX's profile picture Hugging Face Discord Community's profile picture ScaDS.AI German LLM's profile picture

cschroeder's activity

posted an update 3 months ago
view post
Post
536
๐Ÿ”ฅ ๐…๐ข๐ง๐š๐ฅ ๐‚๐š๐ฅ๐ฅ ๐š๐ง๐ ๐ƒ๐ž๐š๐๐ฅ๐ข๐ง๐ž ๐„๐ฑ๐ญ๐ž๐ง๐ฌ๐ข๐จ๐ง: Survey on Data Annotation and Active Learning

Short summary: We need your support for a web survey in which we investigate how recent advancements in natural language processing, particularly LLMs, have influenced the need for labeled data in supervised machine learning โ€” with a focus on, but not limited to, active learning. See the original post for details.

โžก๏ธ Extended Deadline: January 26th, 2025.
Please consider participating or sharing our survey! (If you have any experience with supervised learning in natural language processing, you are eligible to participate in our survey.)

Survey: https://bildungsportal.sachsen.de/umfragen/limesurvey/index.php/538271
replied to their post 3 months ago
view reply

Just a quick note: I will not again enter any ideological debates here.

First off, I think this is a non-issue regardless of which license we use. This is first and foremost a scientific study, and the dataset weโ€™re producing is more of a byproductโ€”its main purpose is to help other researchers verify our findings. It seems like there might be some misconceptions about this dataset: Think of it as a table of answer codes. It is not a text dataset and therefore not interesting or useful for LLM training (or similar).

Second, we made this decision because the survey doesnโ€™t have any funding and relies on people generously sharing their opinions (without compensation). Given the growing skepticism around data collection, we wanted to be especially careful not to discourage users from participating. Our primary goal is to conduct a study with a population as diverse as possible, and we did not want to lose potential participants who might be less inclined to give away their data without compensation.

posted an update 3 months ago
view post
Post
424
Hereโ€™s just one of the many exciting questions from our survey. If these topics resonate with you and you have experience working on supervised learning with text (i.e., supervised learning in Natural Language Processing), we warmly invite you to participate!

Survey: https://bildungsportal.sachsen.de/umfragen/limesurvey/index.php/538271
Estimated time required: 5โ€“15 minutes
Deadline for participation: January 12, 2025

โ€”

โค๏ธ Weโ€™re seeking responses from across the globe! If you know 1โ€“3 people who might qualify for this surveyโ€”particularly those in different regionsโ€”please share it with them. Weโ€™d really appreciate it!

#NLProc #ActiveLearning #ML
  • 2 replies
ยท
posted an update 3 months ago
view post
Post
365
๐Ÿ’ก๐—Ÿ๐—ผ๐—ผ๐—ธ๐—ถ๐—ป๐—ด ๐—ณ๐—ผ๐—ฟ ๐˜€๐˜‚๐—ฝ๐—ฝ๐—ผ๐—ฟ๐˜: ๐—›๐—ฎ๐˜ƒ๐—ฒ ๐˜†๐—ผ๐˜‚ ๐—ฒ๐˜ƒ๐—ฒ๐—ฟ ๐—ต๐—ฎ๐—ฑ ๐˜๐—ผ ๐—ผ๐˜ƒ๐—ฒ๐—ฟ๐—ฐ๐—ผ๐—บ๐—ฒ ๐—ฎ ๐—น๐—ฎ๐—ฐ๐—ธ ๐—ผ๐—ณ ๐—น๐—ฎ๐—ฏ๐—ฒ๐—น๐—ฒ๐—ฑ ๐—ฑ๐—ฎ๐˜๐—ฎ ๐˜๐—ผ ๐—ฑ๐—ฒ๐—ฎ๐—น ๐˜„๐—ถ๐˜๐—ต ๐—ฎ๐—ป ๐—ก๐—Ÿ๐—ฃ ๐˜๐—ฎ๐˜€๐—ธ?

Are you working on Natural Language Processing tasks and have faced the challenge of a lack of labeled data before? ๐—ช๐—ฒ ๐—ฎ๐—ฟ๐—ฒ ๐—ฐ๐˜‚๐—ฟ๐—ฟ๐—ฒ๐—ป๐˜๐—น๐˜† ๐—ฐ๐—ผ๐—ป๐—ฑ๐˜‚๐—ฐ๐˜๐—ถ๐—ป๐—ด ๐—ฎ ๐˜€๐˜‚๐—ฟ๐˜ƒ๐—ฒ๐˜† to explore the strategies used to address this bottleneck, especially in the context of recent advancements, including but not limited to large language models.

The survey is non-commercial and conducted solely for academic research purposes. The results will contribute to an open-access publication that also benefits the community.

๐Ÿ‘‰ With only 5โ€“15 minutes of your time, you would greatly help to investigate which strategies are used by the #NLP community to overcome a lack of labeled data.

โค๏ธHow you can help even more: If you know others working on supervised learning and NLP, please share this survey with themโ€”weโ€™d really appreciate it!

Survey: https://bildungsportal.sachsen.de/umfragen/limesurvey/index.php/538271
Estimated time required: 5โ€“15 minutes
Deadline for participation: January 12, 2025

#NLP #ML