Spacerini

non-profit

https://github.com/castorini/hf-spacerini

AI & ML interests

Information Retrieval

Organization Card

We present Spacerini, a modular framework for seamless building and deployment of interactive search applications, designed to facilitate the qualitative analysis of large scale research datasets.

In the current AI research landscape, billion-token textual corpora are widley used to pre-train large language models and conversational agents, which are then applied in a variety of downstream tasks. However, as is clear from the instant community feedback and more principled research, the factuality and fairness of such models’ generations remain elusive as models tend to hallucinate facts and memorize rather then abstract knowledge. In order to understand their failure modes, researchers often turn to the training data in search for the source of questionable model predictions.

Spacerini enables such qualitative analysis by leveraging and integrating features from both the Pyserini toolkit and the Hugging Face ecosystem. Users can easily index their collections and deploy them as ad-hoc search engines, making the retrieval of relevant data points quick and efficient. The user-friendly interface allows to search through massive datasets in no-code fashion, making Spacerini broadly accessible to anyone looking to qualitatively audit their text collections. Spacerini can also be leveraged by IR researchers aiming to demonstrate the capabilities of their indices in a simple and interactive way.

The framework is open-sourced and available on GitHub: https://github.com/castorini/hf-spacerini.

spaces 14

Chat Noir Streamlit Wrapper

🐈

GÆA / gaia / gæa

🌏

Search large text corpora for information

Spacerini

AI & ML interests

spaces 14

Chat Noir Streamlit Wrapper

GÆA / gaia / gæa

Code search

Miracl Search - Arabic

Miracl Search - Chinese

Miracl Search - Bengali

models 1

spacerini/bpe-imdb-25k

datasets 1

spacerini/gpt2-outputs

AI & ML interests

Team members 7

spaces 14 Sort: Recently updated

Chat Noir Streamlit Wrapper

GÆA / gaia / gæa

Code search

Miracl Search - Arabic

Miracl Search - Chinese

Miracl Search - Bengali

models 1

datasets 1

spaces 14