view post Post 1585 The folks at Foursquare released a dataset of 104.5 million places of interest ( foursquare/fsq-os-places) and here's all of them on a plot See translation 3 replies Β· π₯ 4 4 π 1 1 π 1 1 + Reply
view post Post 2334 The Lichess database of games, puzzles, and engine evaluations is now on the Hub: https://huggingface.co/LichessBillions of chess data points to download, query, and stream and we're excited to see what you'll build with it! βοΈ π€- Lichess/positions-datasets-66f50837db5cd3287d60d489- Lichess/games-datasets-66f508df78f4b43e1bb2d353 See translation π 8 8 β€οΈ 2 2 π₯ 2 2 + Reply
StarCoder 2 and The Stack v2: The Next Generation Paper β’ 2402.19173 β’ Published Feb 29, 2024 β’ 136
view post Post TIL: EleutherAI/pile is on Wikipedia: https://en.wikipedia.org/wiki/The_Pile_(dataset) π€― 5 5 π€ 4 4 β€οΈ 1 1 + Reply
GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration Paper β’ 2306.01481 β’ Published Jun 2, 2023 β’ 1
Stable Bias: Analyzing Societal Representations in Diffusion Models Paper β’ 2303.11408 β’ Published Mar 20, 2023
BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model Paper β’ 2212.04960 β’ Published Dec 9, 2022 β’ 1
Towards Openness Beyond Open Access: User Journeys through 3 Open AI Collaboratives Paper β’ 2301.08488 β’ Published Jan 20, 2023
Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face Paper β’ 2302.14534 β’ Published Feb 28, 2023