de-dsi / app.py
synctext's picture
remembering Aaron !
c15600e verified
raw
history blame
6.31 kB
import gradio as gr
from transformers import pipeline
model_pipeline = pipeline("text2text-generation", model="tribler/dsi-search-on-toy-dataset")
def process_query(query):
results = model_pipeline(query, max_length=60)
result_text = results[0]['generated_text'].strip()
if result_text.startswith("http"):
youtube_id = result_text.split('watch?v=')[-1]
iframe = f'<iframe width="560" height="315" src="https://www.youtube.com/embed/{youtube_id}" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>'
return gr.HTML(iframe)
elif result_text.startswith("magnet"):
return gr.HTML(f'<a href="{result_text}" target="_blank">{result_text}</a>')
else:
bitcoin_logo_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/4/46/Bitcoin.svg/800px-Bitcoin.svg.png"
return gr.Textbox(f'<div style="display:flex;align-items:center;"><img src="{bitcoin_logo_url}" alt="Bitcoin Logo" style="width:20px;height:20px;margin-right:5px;"><span>{result_text}</span></div>')
interface = gr.Interface(fn=process_query,
inputs=gr.Textbox(label="Query"),
outputs="html",
title="Search Interface",
submit_btn="Find",
description="""
### Search for movie trailers, music torrents, and bitcoin wallet addresses!
This toy example knows about 500 URLs after merely a few hours of training on a single GPU.
([View dataset](https://huggingface.co/tribler/dsi-search-on-toy-dataset/blob/main/dataset.csv), read [scientific article](https://arxiv.org/pdf/2404.12237.pdf) from EuroMLSys, [model](https://huggingface.co/tribler/dsi-search-on-toy-dataset), and [all code](https://github.com/Tribler/De-DSI)).
""",
article="""
## De-DSI
De-DSI is a proof-of-principle of fully decentralised search engines.
We show that, in principle, it is possible to connect millions of even billions of devices to form a decentralised search engine. This represents a step towards a "[global brain](https://dl.acm.org/doi/pdf/10.1145/2160718.2160731)" for humanity.
Generative AI is increasingly influencing fields such as content discovery, relevance ranking, and financial transactions, showcasing its potential to disrupt various industries.
The novel end-to-end generative architectures could pave the way for fully decentralized alternatives in social media, the movie industry, search engines, and financial sectors—mirroring the decentralization levels of Bitcoin and BitTorrent.
This shift could significantly empower ordinary Internet users.
Explore the scientific foundation of this transformation in our paper presented at EuroMLSys 2024.
The paper is available [here](https://huggingface.co/papers/2404.12237).
We invite you to contribute to and engage with our community at the International Workshop on [Distributed Infrastructure for Common Good](https://dicg-workshop.github.io/) (DICG).
### Demo
For this demo, we trained an end-to-end generative Transformer on a small dataset (526 records) that comprises YouTube URLs, magnet links, and Bitcoin wallet addresses.
Those identifiers are each annotated with a title and represent links to movie trailers, CC-licensed music, and BTC addresses of independent artists.
Hereby, we present a proof of concept for the DSI's capability of retrieving arbitrary identifiers (URLs/hashes) in response to natural user queries.
The model is available under a permissive license and can be accessed [here](https://huggingface.co/tribler/dsi-search-on-toy-dataset).
### Please Note
Disclaimer. This project represents both a groundbreaking advance and a preliminary exploration into decentralized systems.
As a preliminary model, the project showcases a toy example rather than the full potential of its ultimate capabilities.
It serves as a proof of concept that invites further development and imagination.
### Decentralisation background
Why is decentralisation of AI a milestone? The Internet itself is born with the report which investigates ["is decentralized communication possible?"](https://doi.org/10.7249/RM2632). A fully decentralised form of money called Bitcoin disrupted the highly regulated financial industry. Bittorrent disrupted the monopolies around broadcasting by making it fully decentralised.
The elements that have enabled humanity to shape the world are not strength, not speed, but intelligence, money, and collaboration.
Our Tribler lab is focussed on advancing these topic and ensure they benefit ordinary citizens.
Our [entire research portfolio](https://scholar.google.com/citations?hl=en&user=pprQKjUAAAAJ&view_op=list_works&sortby=pubdate) is driven by idealism. We aim to remove power from companies, governments, and IA in order to shift all this power to self-sovereign citizens.
For instance, our "[unstoppable DAO](https://dl.acm.org/doi/pdf/10.1145/3565383.3566112)" technology creates a limit form of collective money with democratic control. We pioneered [decentralised trust](https://arxiv.org/pdf/2207.09950) with [deployment](https://research.tudelft.nl/files/89353583/1_s2.0_S1389128621001705_main.pdf). Our educational master program teaches student to engineer [collective decision](https://github.com/Tribler/tribler/issues/7691) mechanisms. The [goal of the Tribler lab](https://github.com/Tribler/tribler/issues/7064) is to prototype the first global brain by 2040.
Before 2000 we worked with [visionary collaborators](http://web.archive.org/web/20020618081554/http://www.freeamp.org/pipermail/mm/2000-December/000003.html) on our first [deployments](http://www.usenix.org/publications/library/proceedings/usenix2000/freenix/full_papers/pouwelse/pouwelse.pdf) and communities with democratic control of information (pre-wikiepdia era).
""",
examples=[["spider man"], ["oceans 13"], ["sister starlight"], ["bitcoin address of xileno"]],
concurrency_limit=50)
if __name__ == "__main__":
interface.launch()