Spaces:

LLaMaWhisperer
/

LegalLLaMa

Runtime error

App Files Files Community

LLaMaWhisperer commited on Jul 28, 2023

Commit

102dc72

1 Parent(s): 8ffce3e

MvP of the Project, added the base functionality for it become a simple ChatBot

Browse files

Files changed (9) hide show

.gitignore +1 -0
README.md +50 -2
app.py +5 -0
legal_llama/__init__.py +0 -0
legal_llama/bill_retrieval.py +145 -0
legal_llama/chat_bot_interface.py +83 -0
legal_llama/dialog_management.py +83 -0
legal_llama/summarizer.py +54 -0
requirements.txt +4 -0

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ /.streamlit/

README.md CHANGED Viewed

@@ -1,4 +1,52 @@
-# LegalLLaMa (*WORK IN PROGRESS*)
 LegalLLaMa: Your friendly neighborhood lawyer llama, turning legal jargon into a piece of cake!
-LegalLLaMA is a chatbot powered by a fine-tuned LLaMa model, providing summaries and insights from U.S. Congressional bills. Bridging the gap between law and AI, one conversation at a time.

+# LegalLLaMa 🦙 (*WORK IN PROGRESS*)
 LegalLLaMa: Your friendly neighborhood lawyer llama, turning legal jargon into a piece of cake!
+Legal LLaMa is a chatbot developed to provide summaries of U.S. legislative bills based on user queries. It's built using the Hugging Face's Transformers library, and is hosted using Streamlit on Hugging Face Spaces.
+You can interact with the live demo of Legal LLaMa on Hugging Face Spaces [here](https://huggingface.co/spaces/LLaMaWhisperer/legalLLaMa).
+The chatbot uses a frame-based dialog management system to handle conversations, and leverages the ProPublica and Congress APIs to fetch information about legislative bills. The summaries of bills are generated using a state-of-the-art text summarization model.
+## Features 🎁
+- Frame-based dialog management
+- Intent recognition and slot filling
+- Real-time interaction with users
+- Bill retrieval using ProPublica and Congress APIs
+- Bill summarization using Transformer models
+## Future Work 💡
+Legal LLaMa is still a work in progress, and there are plans to make it even more useful and user-friendly. Here are some of the planned improvements:
+- Enhance intent recognition and slot filling using Natural Language Understanding (NLU) models
+- Expand the chatbot's capabilities to handle more tasks, such as providing summaries of recent bills by a particular congressman
+- Train a custom summarization model specifically for legislative texts
+## Getting Started 🚀
+To get the project running on your local machine, follow these steps:
+1. Clone the repository:
+```commandline
+git clone https://github.com/YuvrajSharma9981/LegalLLaMa.git
+```
+2. Install the required packages:
+```commandline
+pip install -r requirements.txt
+```
+3. Run the Streamlit app:
+```commandline
+streamlit run app.py
+```
+Please note that you will need to obtain API keys from ProPublica and Congress to access their APIs.
+## Contributing 🤝
+Contributions to improve Legal LLaMa are welcomed. Feel free to submit a pull request or create an issue for any bugs, feature requests, or questions about the project.
+## License 📄
+This project is licensed under the GPL-3.0 License - see the [LICENSE](LICENSE) file for details.

app.py ADDED Viewed

	@@ -0,0 +1,5 @@

+from legal_llama.chat_bot_interface import ChatBotInterface
+if __name__ == '__main__':
+    chat_bot = ChatBotInterface()
+    chat_bot.continue_conversation()

legal_llama/__init__.py ADDED Viewed

File without changes

legal_llama/bill_retrieval.py ADDED Viewed

	@@ -0,0 +1,145 @@

+import requests
+import streamlit as st
+import xml.etree.ElementTree as ET
+class BillRetriever:
+    """
+    A class used to retrieve bills using the ProPublica Congress API & United States Congress API.
+    """
+    PROPUBLICA_URL = "https://api.propublica.org/congress/v1/bills/search.json"
+    CONGRESS_URL_BASE = "https://api.congress.gov/v3/bill/{congress}/{billType}/{billNumber}/text"
+    def __init__(self, api_key=None):
+        """
+        Initialize the BillRetriever with API keys.
+        Parameters:
+            api_key (str, optional): The API key to be used for authentication. Default is None.
+        """
+        self.pro_publica_api_key = st.secrets["PRO_PUBLICA_API_KEY"]
+        self.congress_api_key = st.secrets["CONGRESS_API_KEY"]
+    def make_api_call(self, api_url, api_key, params=None):
+        """
+        Make an API call to the specified URL with optional parameters and API key.
+        Parameters:
+            api_url (str): The URL of the API endpoint.
+            api_key (str): The API Key for the API
+            params (dict, optional): Optional parameters to pass with the API call. Default is None.
+        Returns:
+            dict: JSON response data if the request is successful, None otherwise.
+        """
+        headers = {"X-API-Key": api_key} if api_key else {}
+        try:
+            response = requests.get(api_url, params=params, headers=headers)
+            response.raise_for_status()  # Raise an exception for non-2xx status codes
+            return response.json()
+        except requests.exceptions.RequestException as e:
+            print(f"Error occurred: {e}")
+            return None
+        except ValueError as e:
+            print(f"Invalid response received: {e}")
+            return None
+    def search_bill_propublica(self, query):
+        """
+        Search for a bill using the ProPublica Congress API.
+        Parameters:
+            query (str): The query string to search for.
+        Returns:
+            dict: JSON response data if the request is successful, None otherwise.
+        """
+        params = {"query": query, "sort": "date", "dir": "desc"}
+        return self.make_api_call(self.PROPUBLICA_URL, params=params, api_key=self.pro_publica_api_key)
+    def get_bill_text_congress(self, congress, bill_type, bill_number):
+        """
+        Retrieve the text of a bill using the Congress API.
+        Parameters:
+            congress (str): The number of the congress.
+            bill_type (str): The type of the bill.
+            bill_number (str): The number of the bill.
+        Returns:
+            dict: JSON response data if the request is successful, None otherwise.
+        """
+        url = self.CONGRESS_URL_BASE.format(congress=congress, billType=bill_type, billNumber=bill_number)
+        return self.make_api_call(url, api_key=self.congress_api_key)
+    def get_bill_by_query(self, query):
+        """
+        Search for a bill by query and retrieve its text.
+        Parameters:
+            query (str): The query string to search for.
+        Returns:
+            str: The text of the bill if the request is successful, None otherwise.
+        """
+        # First search for the bill using the ProPublica API
+        propublica_data = self.search_bill_propublica(query)
+        if propublica_data and 'results' in propublica_data:
+            # Iterate over the list of bills, till we find the bill which has text available on Congress Website
+            for bill_data in propublica_data['results'][0]['bills']:
+                congress = bill_data['bill_id'].split('-')[1]
+                bill_type = bill_data['bill_type']
+                bill_number = bill_data['number'].split('.')[-1]
+                # Then get the text of the bill using the Congress API
+                congress_data = self.get_bill_text_congress(congress, bill_type, bill_number)
+                if congress_data and 'textVersions' in congress_data and congress_data['textVersions']:
+                    # Check if textVersions list is not empty
+                    xml_url = congress_data['textVersions'][0]['formats'][2]['url']
+                    return self.extract_bill_text(xml_url)
+        return None
+    def extract_bill_text(self, url):
+        """
+        Extract the text content from a bill's XML data.
+        Parameters:
+            url (str): The URL of the bill's XML data.
+        Returns:
+            str: The text content of the bill.
+        """
+        # Get the XML data from the URL
+        try:
+            xml_data = requests.get(url).content
+        except requests.exceptions.RequestException as e:
+            print(f"Error occurred: {e}")
+            return None
+        # Decode bytes to string and parse XML
+        try:
+            root = ET.fromstring(xml_data.decode('utf-8'))
+        except ET.ParseError as e:
+            print(f"Error parsing XML: {e}")
+            return None
+        return self.get_all_text(root)
+    @staticmethod
+    def get_all_text(element):
+        """
+        Recursively extract text from an XML element and its children.
+        Parameters:
+            element (xml.etree.ElementTree.Element): An XML element.
+        Returns:
+            str: The concatenated text from the element and its children.
+        """
+        text = element.text or ''  # Get the text of the current element, if it exists
+        for child in element:
+            text += BillRetriever.get_all_text(child)  # Recursively get the text of all child elements
+            if child.tail:
+                text += child.tail  # Add any trailing text of the child element
+        return text

legal_llama/chat_bot_interface.py ADDED Viewed

	@@ -0,0 +1,83 @@

+from legal_llama.dialog_management import DialogManager
+import streamlit as st
+class ChatBotInterface:
+    def __init__(self):
+        """Initializes the chatbot interface, sets the page title, and initializes the DialogManager."""
+        # Set up Streamlit page configuration
+        st.set_page_config(page_title="Legal LLaMa 🦙")
+        st.title("Legal LLaMa 🦙")
+        # Define roles
+        self.user = "user"
+        self.llama = "Assistant"
+        # Initialize the DialogManager for managing conversations
+        self.dialog_manager = DialogManager()
+        # Initialize chat history in the session state if it doesn't exist
+        if "messages" not in st.session_state:
+            st.session_state.messages = []
+        # Start the conversation with a greeting message
+        first_message = ("Hello there! I'm Legal LLaMa, your friendly guide to the complex world of U.S. legislation."
+                         "\n\nThink of me as a law student who is always eager to learn and share knowledge. Right now,"
+                         "my skills are a bit limited, but I can certainly help you understand the gist of the latest "
+                         "bills proposed in the U.S. Congress. You just have to provide me with a topic - could be "
+                         "climate change, prison reform, healthcare, you name it! I'll then fetch the latest related "
+                         "bill and serve you up a digestible summary.\n\nRemember, being a law student (and a LLaMa, no"
+                         "less!) is tough, so if I miss a step, bear with me. I promise to get better with every "
+                         "interaction. So, what topic intrigues you today?")
+        self.display_message(self.llama, first_message)
+    @staticmethod
+    def display_chat_history():
+        """Displays the chat history stored in the session state."""
+        for message in st.session_state.messages:
+            with st.chat_message(message["role"]):
+                st.markdown(message["content"])
+    @staticmethod
+    def add_message_to_history(role, chat):
+        """Adds a message to the chat history in the session state."""
+        st.session_state.messages.append({"role": role, "content": chat})
+    @staticmethod
+    def display_message(role, text):
+        """Displays a chat message in the chat interface."""
+        st.chat_message(role).markdown(text)
+    def handle_user_input(self, user_input):
+        """Handles user input by recognizing the intent and updating the dialog frame."""
+        # In future, use the IntentRecognizer to check for intent
+        intent = "bill_summarization"
+        # Update the dialog frame based on the recognized intent
+        self.dialog_manager.set_frame(intent, user_input)
+    def continue_conversation(self):
+        """Continues the conversation by displaying chat history, handling user input, and generating responses."""
+        # Display chat history
+        self.display_chat_history()
+        # Handle user input
+        if prompt := st.chat_input("Ask your questions here!"):
+            # Display user message
+            self.display_message(self.user, prompt)
+            # Add user message to chat history
+            self.add_message_to_history(self.user, prompt)
+            # Handle user input (recognize intent and update frame)
+            self.handle_user_input(prompt)
+            with st.spinner('Processing your request...'):
+                # Generate response based on the current dialog frame
+                response = self.dialog_manager.generate_response()
+            # Display assistant response
+            self.display_message(self.llama, response)
+            # Add assistant response to chat history
+            self.add_message_to_history(self.llama, response)

legal_llama/dialog_management.py ADDED Viewed

	@@ -0,0 +1,83 @@

+from legal_llama.bill_retrieval import BillRetriever
+from legal_llama.summarizer import BillSummarizer
+class DialogManager:
+    """
+    A class for managing conversation frames.
+    """
+    def __init__(self):
+        """
+        Initialize the DialogManager with predefined frames.
+        """
+        self.frames = {
+            "bill_summarization": {
+                "intent": "bill_summarization",
+                "bill_query": None,
+            },
+            # Add more frames here as needed
+        }
+        self.current_frame = None
+    def set_frame(self, intent, slot):
+        """
+        Set the current frame based on the recognized intent and provided slot value.
+        Parameters:
+            intent (str): The recognized intent.
+            slot (str): The value of the slot provided by the user.
+        """
+        # Update this function in the future to check for intent.
+        self.current_frame = self.frames.get(intent, {}).copy()
+        if self.current_frame is not None:
+            self.update_slot('bill_query', slot)
+        else:
+            print(f"Unrecognized intent: {intent}")
+    def update_slot(self, slot_name, slot_value):
+        """
+        Update the value of a slot in the current frame.
+        Parameters:
+            slot_name (str): The name of the slot.
+            slot_value (str): The new value of the slot.
+        """
+        if self.current_frame is not None and slot_name in self.current_frame:
+            # If the current frame is set and the slot name exists in the frame, update the slot value
+            self.current_frame[slot_name] = slot_value
+        else:
+            print(f"Cannot update slot '{slot_name}' - no current frame or slot does not exist")
+    def generate_response(self):
+        """
+        Generate a response based on the current frame.
+        Returns:
+            str: The generated response.
+        """
+        # Check if a frame has been set
+        if self.current_frame is None:
+            print("No frame has been set")
+            return None
+        frame = self.current_frame
+        if frame['intent'] == 'bill_summarization':
+            # Extract the bill's text
+            bill_retriever = BillRetriever()
+            bill_text = bill_retriever.get_bill_by_query(frame['bill_query'])
+            if bill_text is None:
+                print("Unable to retrieve bill text")
+                return None
+            # Summarize the bill's text
+            summarizer = BillSummarizer()
+            summary = summarizer.summarize(bill_text)
+            if summary is None:
+                print("Unable to summarize bill text")
+                return None
+            return summary
+        else:
+            print(f"Unrecognized frame intent: {frame['intent']}")
+            return None

legal_llama/summarizer.py ADDED Viewed

	@@ -0,0 +1,54 @@

+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+import streamlit as st
+@st.cache_resource
+def load_model():
+    tokenizers = AutoTokenizer.from_pretrained("nsi319/legal-led-base-16384")
+    model = AutoModelForSeq2SeqLM.from_pretrained("nsi319/legal-led-base-16384")
+    return tokenizers, model
+class BillSummarizer:
+    def __init__(self):
+        """
+        Initialize a BillSummarizer, which uses the Hugging Face transformers library to summarize bills.
+        """
+        try:
+            self.tokenizer, self.model = load_model()
+        except Exception as e:
+            print(f"Error initializing summarizer pipeline: {e}")
+    def summarize(self, bill_text):
+        """
+        Summarize a bill's text using the summarization pipeline.
+        Parameters:
+            bill_text (str): The text of the bill to be summarized.
+        Returns:
+            str: The summarized text.
+        """
+        try:
+            input_tokenized = self.tokenizer.encode(bill_text, return_tensors='pt',
+                                                    padding="max_length",
+                                                    pad_to_max_length=True,
+                                                    max_length=6144,
+                                                    truncation=True)
+            summary_ids = self.model.generate(input_tokenized,
+                                              num_beams=4,
+                                              no_repeat_ngram_size=3,
+                                              length_penalty=2,
+                                              min_length=350,
+                                              max_length=500)
+            summary = [self.tokenizer.decode(g,
+                                             skip_special_tokens=True,
+                                             clean_up_tokenization_spaces=False)
+                       for g in summary_ids][0]
+            return summary
+        except Exception as e:
+            print(f"Error summarizing text: {e}")
+            return "Sorry, I couldn't summarize this bill. Please try again."

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+transformers~=4.31.0
+torch~=2.0.1
+streamlit~=1.24.1
+requests~=2.31.0