File size: 1,103 Bytes
59e77ed
 
69a60a1
 
 
bd8cdb3
 
 
59e77ed
bd8cdb3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
---
license: mit
inference:
  parameters:
    max_length: 60
tags:
- fine-tuned
- information-retrieval
---

# DSI Search on Toy Dataset

[![DOI](https://zenodo.org/badge/DOI/10.1145/3642970.3655837.svg)](https://doi.org/10.1145/3642970.3655837)

This is a simplified demonstration of the search engine presented in _De-DSI: Decentralised Differentiable Search Index_.

For this example, we fine-tuned the T5-small model on a [dataset](https://huggingface.co/tribler/dsi-search-on-toy-dataset/blob/main/dataset.csv) comprised of 526 distinct documents, including: 
- URLs to YouTube videos featuring movie trailers
- Magnet links for accessing CC-licensed music
- Bitcoin wallet addresses belonging to various artists

The train data consisted solely of the respective titles of the documents (i.e., no access to ambiguous queries),
and therefore does not nearly perform to the degree we think is generally possible.

For demonstration purposes, however, this model can be tested with queries like _"spider man", "oceans 13", "sister staarlightt",_ or _"xileno bitcoin address"_ (to give some examples).