Spaces:

AdityaKhalkar
/

Dataset-finder

Sleeping

App Files Files Community

AdityaKhalkar commited on Apr 10

Commit

fc4e82c

•

1 Parent(s): d1d080a

added files

Browse files

Files changed (4) hide show

README.md +5 -12
datasets.csv +0 -0
requirements.txt +5 -0
streamlit_app.py +61 -0

README.md CHANGED Viewed

@@ -1,13 +1,6 @@
----
-title: Dataset Finder
-emoji: 💻
-colorFrom: pink
-colorTo: indigo
-sdk: streamlit
-sdk_version: 1.33.0
-app_file: app.py
-pinned: false
-license: mit
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Welcome to Streamlit!
+Edit `/streamlit_app.py` to customize this app to your heart's desire. :heart:
+If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
+forums](https://discuss.streamlit.io).

datasets.csv ADDED Viewed

The diff for this file is too large to render. See raw diff

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+altair
+pandas
+streamlit
+tensorflow
+transformers

streamlit_app.py ADDED Viewed

	@@ -0,0 +1,61 @@

+import streamlit as st
+import pandas as pd
+from transformers import pipeline
+# Load the zero-shot classification model
+classifier = pipeline("zero-shot-classification",
+                      model="facebook/bart-large-mnli")
+# Sample dataset (replace this with your actual dataset)
+df = pd.read_csv('/content/Dataset-finder/datasets.csv')
+def tag_finder(user_input):
+    keywords = df['Keyword'].unique()
+    result = classifier(user_input, keywords)
+    threshold = result['scores'][0]
+    for score in result['scores']:
+        if score == threshold:
+            continue
+        if (threshold - score) >= threshold / 10:
+            threshold = score
+        else:
+            break
+    useful_tags = [result['labels'][idx] for idx, score in enumerate(result['scores']) if score >= threshold]
+    relevant_datasets = []
+    for tag in useful_tags:
+        relevant_datasets.extend(df[df['Keyword'] == tag]['Datasets'].tolist())
+    return useful_tags, relevant_datasets
+# Define the Streamlit app
+def main():
+    # Set title and description
+    st.title("Dataset Tagging System")
+    st.write("Enter your text below and get relevant tags for your dataset.")
+    # Get user input
+    user_input = st.text_input("Enter your text:")
+    if st.button("Submit"):
+        # Find relevant tags and datasets
+        relevant_tags, relevant_datasets = tag_finder(user_input)
+        # Display relevant tags
+        if relevant_tags:
+            st.subheader("Datasets:")
+            for dataset in relevant_datasets:
+                tag = df[df['Datasets'] == dataset]['Keyword'].iloc[0]
+                st.markdown(f'''
+                    <div style="border: 2px solid #555; border-radius: 10px; padding: 10px; margin-bottom: 10px; background-color: #333; color: white; display: flex; justify-content: space-between; align-items: center;">
+                       <div>{dataset}</div>
+                       <div style="padding: 5px 10px; border: #fff 2px solid; border-radius: 5px;transition: background-color 0.3s;"><a href="https://datasetsearch.research.google.com/search?search&src=0&query={dataset}" style = "text-decoration: none; color: white;">link</a></div>
+                      <div style="border: 1px solid #666; padding: 5px; background-color: #444; border-radius: 12px;">
+                          <img width="20" height="20" style="margin: 5px;" src="https://img.icons8.com/ios/50/ffffff/price-tag--v2.png" alt="price-tag--v2"/>{tag}
+                      </div>
+                    </div>
+                ''', unsafe_allow_html=True)
+        else:
+            st.warning("No relevant tags found.")
+# Run the app
+if __name__ == "__main__":
+    main()