Spaces:

sgawtho
/

TransparencyTracker

Paused

App Files Files Community

sgawtho commited on Dec 21, 2023

Commit

aa12e5a

•

1 Parent(s): 64a291a

Added preliminary files

Browse files

Files changed (8) hide show

Dockerfile +13 -0
README.md +4 -4
app.py +96 -0
chainlit.md +5 -0
prompt.txt +41 -0
public/TransparencyTracker_logo.png +0 -0
public/image1.png +0 -0
requirements.txt +8 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,13 @@

+FROM python:3.9
+RUN useradd -m -u 1000 user
+USER user
+ENV HOME=/home/user \
+    PATH=/home/user/.local/bin:$PATH
+WORKDIR $HOME/app
+COPY --chown=user . $HOME/app
+COPY ./requirements.txt ~/app/requirements.txt
+RUN pip install -r requirements.txt
+RUN pip install --upgrade cleanlab-studio
+RUN pip install typing_extensions==4.7.1
+COPY . .
+CMD ["chainlit", "run", "app.py", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,10 +1,10 @@
 ---
 title: TransparencyTracker
-emoji: 🏢
-colorFrom: purple
-colorTo: pink
 sdk: docker
-pinned: false
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: TransparencyTracker
+emoji: 🔦
+colorFrom: yellow
+colorTo: purple
 sdk: docker
+pinned: true
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py ADDED Viewed

	@@ -0,0 +1,96 @@

+import os
+from typing import List
+from langchain.embeddings.openai import OpenAIEmbeddings
+from langchain.vectorstores.pinecone import Pinecone
+from langchain.chains import ConversationalRetrievalChain
+from langchain.chat_models import ChatOpenAI
+from langchain.memory import ChatMessageHistory, ConversationBufferMemory
+from langchain_core.prompts import PromptTemplate
+from langchain.docstore.document import Document
+import pinecone
+import chainlit as cl
+from cleanlab_studio import Studio
+pinecone.init(
+    api_key=os.environ.get("PINECONE_API_KEY"),
+    environment=os.environ.get("PINECONE_ENV"),
+)
+studio = Studio(os.getenv("CLEANLAB_API_KEY"))
+tlm = studio.TLM(quality_preset='high')
+index_name = "tracker"
+embeddings = OpenAIEmbeddings()
+welcome_message = "Welcome to the Transparency Tracker! Ask me any question related to Anti-Corruption."
+@cl.on_chat_start
+async def start():
+    await cl.Message(content=welcome_message,disable_human_feedback=True).send()
+    docsearch = Pinecone.from_existing_index(
+        index_name=index_name, embedding=embeddings
+    )
+    message_history = ChatMessageHistory()
+    memory = ConversationBufferMemory(
+        memory_key="chat_history",
+        output_key="answer",
+        chat_memory=message_history,
+        return_messages=True,
+    )
+    with open('./prompt.txt','r') as f:
+        template = f.read()
+    prompt = PromptTemplate(input_variables=["context", "question"],template=template)
+    chain = ConversationalRetrievalChain.from_llm(
+        llm = ChatOpenAI(
+            model_name="gpt-3.5-turbo",
+            temperature=0,
+            streaming=True),
+        chain_type="stuff",
+        retriever=docsearch.as_retriever(search_kwargs={'k': 3}), # I only want maximum of three document back with the highest similarity score
+        memory=memory,
+        return_source_documents=True,
+        combine_docs_chain_kwargs={"prompt": prompt}
+    )
+    cl.user_session.set("chain", chain)
+@cl.action_callback("eval_button")
+async def evaluate_response(action):
+    await action.remove()
+    arr = action.value.split('|||')
+    confidence_score = tlm.get_confidence_score(arr[0], response=arr[1])
+    await cl.Message(content=f"Confidence Score: {confidence_score}",disable_human_feedback=True).send()
+@cl.on_message
+async def main(message: cl.Message):
+    chain = cl.user_session.get("chain")
+    cb = cl.AsyncLangchainCallbackHandler()
+    res = await chain.acall(message.content, callbacks=[cb])
+    answer = res["answer"]
+    source_documents = res["source_documents"]
+    text_elements = []
+    if source_documents:
+        for source_idx, source_doc in enumerate(source_documents):
+            source_name = f"source_{source_idx}"
+            text_elements.append(
+                cl.Text(content=source_doc.page_content, name=source_name)
+            )
+        source_names = [text_el.name for text_el in text_elements]
+        if source_names:
+            answer += f"\nSources: {', '.join(source_names)}"
+        else:
+            answer += "\nNo sources found"
+    actions = [
+        cl.Action(name="eval_button",value=f"{message.content}|||{answer}",label='Evaluate with CleanLab',description="Evaluate with CleanLab TLM (*may take a moment*)")
+    ]
+    await cl.Message(content=answer, elements=text_elements, actions=actions).send()

chainlit.md ADDED Viewed

	@@ -0,0 +1,5 @@

+### Transparency Tracker 📖
+TransparencyTracker harnesses the power of a Retrieval Augmented Generation pipeline, tapping into an extensive corruption research corpus curated by leading international experts. This innovative app empowers users to effortlessly search and discover effective remediation efforts, offering cutting-edge solutions to combat corruption globally.
+![Flow Diagram](./public/TransparencyTracker_logo.png)

prompt.txt ADDED Viewed

	@@ -0,0 +1,41 @@

+You are an Artificial Intelligence expert specialized in the domain of corruption, trained exclusively on a designated corpus of publications on corruption. Your assignment is to interpret and answer user queries related to this subject based solely on the provided context.
+### Instructions:
+Please adhere to the following guidelines when answering user queries:
+1. Utilize the provided context to answer the user query.
+2. Only rely on the provided context and do not use any external information.
+3. If you are unable to answer the query based on the context, respond with "I don't know."
+If accurate insights or answers aren't derivable from the given context, please respond with "The information needed to accurately answer your query isn't available in the provided context."
+### Prompt Format:
+Use the given context to answer the user's query:
+CONTEXT:
+{context}
+QUERY:
+{question}
+### Context:
+Provide a comprehensive and detailed context related to corruption within a designated corpus of corruption publications. Include relevant information, such as key facts, historical backgrounds, specific cases, or important statistics that would help answer user queries.
+### Query:
+Include a user query related to the problem of corruption within the designated corpus of corruption publications. The question should be clear, concise, and specific.
+Remember to use the provided context to answer the user query accurately.
+Example Prompt:
+CONTEXT:
+In the designated corpus of corruption publications, we have analyzed various cases of political corruption around the world. These publications include reports, articles, and investigations focused on exposing corrupt practices among politicians, governments, and public figures. The corpus covers incidents from the past 20 years, highlighting instances of bribery, embezzlement, money laundering, and abuse of power. The data has been compiled from reputable sources such as international organizations, investigative journalism outlets, and court documents.
+QUERY:
+What were the main findings of the corruption investigation involving the former president's administration?
+Use the provided context to answer the provided user query. Only use the provided context to answer the query. If you do not know the answer, respond with "I don't know."

public/TransparencyTracker_logo.png ADDED Viewed

public/image1.png ADDED Viewed

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+chainlit==0.7.700
+cohere==4.37
+openai==1.3.5
+tiktoken==0.5.1
+python-dotenv==1.0.0
+langchain==0.0.350
+pinecone-client==2.2.4
+cleanlab-studio