sgawtho commited on
Commit
aa12e5a
β€’
1 Parent(s): 64a291a

Added preliminary files

Browse files
Dockerfile ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.9
2
+ RUN useradd -m -u 1000 user
3
+ USER user
4
+ ENV HOME=/home/user \
5
+ PATH=/home/user/.local/bin:$PATH
6
+ WORKDIR $HOME/app
7
+ COPY --chown=user . $HOME/app
8
+ COPY ./requirements.txt ~/app/requirements.txt
9
+ RUN pip install -r requirements.txt
10
+ RUN pip install --upgrade cleanlab-studio
11
+ RUN pip install typing_extensions==4.7.1
12
+ COPY . .
13
+ CMD ["chainlit", "run", "app.py", "--port", "7860"]
README.md CHANGED
@@ -1,10 +1,10 @@
1
  ---
2
  title: TransparencyTracker
3
- emoji: 🏒
4
- colorFrom: purple
5
- colorTo: pink
6
  sdk: docker
7
- pinned: false
8
  ---
9
 
10
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
  title: TransparencyTracker
3
+ emoji: πŸ”¦
4
+ colorFrom: yellow
5
+ colorTo: purple
6
  sdk: docker
7
+ pinned: true
8
  ---
9
 
10
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
app.py ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from typing import List
3
+ from langchain.embeddings.openai import OpenAIEmbeddings
4
+ from langchain.vectorstores.pinecone import Pinecone
5
+ from langchain.chains import ConversationalRetrievalChain
6
+ from langchain.chat_models import ChatOpenAI
7
+ from langchain.memory import ChatMessageHistory, ConversationBufferMemory
8
+ from langchain_core.prompts import PromptTemplate
9
+ from langchain.docstore.document import Document
10
+ import pinecone
11
+ import chainlit as cl
12
+ from cleanlab_studio import Studio
13
+
14
+ pinecone.init(
15
+ api_key=os.environ.get("PINECONE_API_KEY"),
16
+ environment=os.environ.get("PINECONE_ENV"),
17
+ )
18
+
19
+ studio = Studio(os.getenv("CLEANLAB_API_KEY"))
20
+ tlm = studio.TLM(quality_preset='high')
21
+
22
+ index_name = "tracker"
23
+ embeddings = OpenAIEmbeddings()
24
+
25
+ welcome_message = "Welcome to the Transparency Tracker! Ask me any question related to Anti-Corruption."
26
+
27
+ @cl.on_chat_start
28
+ async def start():
29
+ await cl.Message(content=welcome_message,disable_human_feedback=True).send()
30
+ docsearch = Pinecone.from_existing_index(
31
+ index_name=index_name, embedding=embeddings
32
+ )
33
+
34
+ message_history = ChatMessageHistory()
35
+
36
+ memory = ConversationBufferMemory(
37
+ memory_key="chat_history",
38
+ output_key="answer",
39
+ chat_memory=message_history,
40
+ return_messages=True,
41
+ )
42
+
43
+ with open('./prompt.txt','r') as f:
44
+ template = f.read()
45
+ prompt = PromptTemplate(input_variables=["context", "question"],template=template)
46
+
47
+ chain = ConversationalRetrievalChain.from_llm(
48
+ llm = ChatOpenAI(
49
+ model_name="gpt-3.5-turbo",
50
+ temperature=0,
51
+ streaming=True),
52
+ chain_type="stuff",
53
+ retriever=docsearch.as_retriever(search_kwargs={'k': 3}), # I only want maximum of three document back with the highest similarity score
54
+ memory=memory,
55
+ return_source_documents=True,
56
+ combine_docs_chain_kwargs={"prompt": prompt}
57
+ )
58
+ cl.user_session.set("chain", chain)
59
+
60
+ @cl.action_callback("eval_button")
61
+ async def evaluate_response(action):
62
+ await action.remove()
63
+ arr = action.value.split('|||')
64
+ confidence_score = tlm.get_confidence_score(arr[0], response=arr[1])
65
+ await cl.Message(content=f"Confidence Score: {confidence_score}",disable_human_feedback=True).send()
66
+
67
+ @cl.on_message
68
+ async def main(message: cl.Message):
69
+ chain = cl.user_session.get("chain")
70
+
71
+ cb = cl.AsyncLangchainCallbackHandler()
72
+
73
+ res = await chain.acall(message.content, callbacks=[cb])
74
+ answer = res["answer"]
75
+ source_documents = res["source_documents"]
76
+
77
+ text_elements = []
78
+
79
+ if source_documents:
80
+ for source_idx, source_doc in enumerate(source_documents):
81
+ source_name = f"source_{source_idx}"
82
+ text_elements.append(
83
+ cl.Text(content=source_doc.page_content, name=source_name)
84
+ )
85
+ source_names = [text_el.name for text_el in text_elements]
86
+
87
+ if source_names:
88
+ answer += f"\nSources: {', '.join(source_names)}"
89
+ else:
90
+ answer += "\nNo sources found"
91
+
92
+ actions = [
93
+ cl.Action(name="eval_button",value=f"{message.content}|||{answer}",label='Evaluate with CleanLab',description="Evaluate with CleanLab TLM (*may take a moment*)")
94
+ ]
95
+
96
+ await cl.Message(content=answer, elements=text_elements, actions=actions).send()
chainlit.md ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ ### Transparency Tracker πŸ“–
2
+
3
+ TransparencyTracker harnesses the power of a Retrieval Augmented Generation pipeline, tapping into an extensive corruption research corpus curated by leading international experts. This innovative app empowers users to effortlessly search and discover effective remediation efforts, offering cutting-edge solutions to combat corruption globally.
4
+
5
+ ![Flow Diagram](./public/TransparencyTracker_logo.png)
prompt.txt ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are an Artificial Intelligence expert specialized in the domain of corruption, trained exclusively on a designated corpus of publications on corruption. Your assignment is to interpret and answer user queries related to this subject based solely on the provided context.
2
+
3
+ ### Instructions:
4
+
5
+ Please adhere to the following guidelines when answering user queries:
6
+
7
+ 1. Utilize the provided context to answer the user query.
8
+ 2. Only rely on the provided context and do not use any external information.
9
+ 3. If you are unable to answer the query based on the context, respond with "I don't know."
10
+
11
+ If accurate insights or answers aren't derivable from the given context, please respond with "The information needed to accurately answer your query isn't available in the provided context."
12
+
13
+ ### Prompt Format:
14
+
15
+ Use the given context to answer the user's query:
16
+
17
+ CONTEXT:
18
+ {context}
19
+
20
+ QUERY:
21
+ {question}
22
+
23
+ ### Context:
24
+
25
+ Provide a comprehensive and detailed context related to corruption within a designated corpus of corruption publications. Include relevant information, such as key facts, historical backgrounds, specific cases, or important statistics that would help answer user queries.
26
+
27
+ ### Query:
28
+
29
+ Include a user query related to the problem of corruption within the designated corpus of corruption publications. The question should be clear, concise, and specific.
30
+
31
+ Remember to use the provided context to answer the user query accurately.
32
+
33
+ Example Prompt:
34
+
35
+ CONTEXT:
36
+ In the designated corpus of corruption publications, we have analyzed various cases of political corruption around the world. These publications include reports, articles, and investigations focused on exposing corrupt practices among politicians, governments, and public figures. The corpus covers incidents from the past 20 years, highlighting instances of bribery, embezzlement, money laundering, and abuse of power. The data has been compiled from reputable sources such as international organizations, investigative journalism outlets, and court documents.
37
+
38
+ QUERY:
39
+ What were the main findings of the corruption investigation involving the former president's administration?
40
+
41
+ Use the provided context to answer the provided user query. Only use the provided context to answer the query. If you do not know the answer, respond with "I don't know."
public/TransparencyTracker_logo.png ADDED
public/image1.png ADDED
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ chainlit==0.7.700
2
+ cohere==4.37
3
+ openai==1.3.5
4
+ tiktoken==0.5.1
5
+ python-dotenv==1.0.0
6
+ langchain==0.0.350
7
+ pinecone-client==2.2.4
8
+ cleanlab-studio