title: Lithuanian Law RAG QA ChatBot Streamlit
emoji: 📊
colorFrom: green
colorTo: yellow
sdk: streamlit
sdk_version: 1.36.0
app_file: app.py
pinned: false
license: apache-2.0
Chat with Lithuanian Law Documents
This is a README file for a Streamlit application that allows users to chat with a virtual assistant based on Lithuanian law documents, leveraging local processing power and a compact language model.
Important Disclaimer
This application utilizes a lightweight large language model (LLM) called Qwen2-0.5B-Chat_SFT_DPO.Q8_gguf to ensure smooth local processing on your device. While this model offers efficiency benefits, it comes with some limitations:
Potential for Hallucination: Due to its size and training data, the model might occasionally generate responses that are not entirely consistent with the provided documents or factual accuracy.
Character Misinterpretations: In rare instances, the model may introduce nonsensical characters, including those from the Chinese alphabet, during response generation.
We recommend keeping these limitations in mind when using the application and interpreting the provided responses.
Features
Users can choose the information retrieval type (similarity or maximum marginal relevance search). Users can specify the number of documents to retrieve. Users can ask questions about the provided documents. The virtual assistant provides answers based on the retrieved documents and a powerful, yet environmentally friendly, large language model (LLM). Technical Details
Sentence Similarity:
The application utilizes the Alibaba-NLP/gte-base-en-v1.5 model for efficient sentence embedding, allowing for semantic similarity comparisons between user queries and the legal documents.
Local Vector Store:
chroma acts as a local vector store, efficiently storing and managing the document embeddings for fast retrieval.
RAG Chain with Quantized LLM:
A Retrieval-Augmented Generation (RAG) chain is implemented to process user queries. This chain integrates two key components:
Lightweight LLM:
To ensure local operation, the application employs a compact LLM, specifically JCHAVEROT_Qwen2-0.5B-Chat_SFT_DPO.Q8_gguf, with only 0.5 billion parameters. This LLM is specifically designed for question answering tasks.
Quantization:
This Qwen2 model leverages a technique called quantization, which reduces the model size without sacrificing significant accuracy. This quantization process makes the model more efficient to run on local hardware, contributing to a more environmentally friendly solution.
CPU-based Processing:
The entire application is currently implemented to function entirely on your CPU. While utilizing a GPU could significantly improve processing speed, this CPU-based approach allows the application to run effectively on a wider range of devices. Benefits of Compact Design
Local Processing:
The compact size of the LLM and the application itself enable local processing on your device, reducing reliance on cloud-based resources and associated environmental impact. Mobile Potential: Due to its small footprint, this application has the potential to be adapted for mobile devices, bringing legal information access to a wider audience. Adaptability of Qwen2 0.5B
Fine-tuning:
While the Qwen2 0.5B model is powerful for its size, it can be further enhanced through fine-tuning on specific legal datasets or domains, potentially improving its understanding of Lithuanian legal terminology and nuances. Conversation Style: Depending on user needs and desired conversation style, alternative pre-trained models could be explored, potentially offering a trade-off between model size and specific capabilities.
Requirements
Streamlit langchain langchain-community chromadb transformers Running the application
Install the required libraries. Set the environment variable lang_api_key with your Langchain API key (if applicable). Run streamlit run main.py.
Code Structure
create_retriever_from_chroma: Creates a document retriever using Chroma and the Alibaba-NLP/gte-base-en-v1.5 model for sentence similarity. main: Defines the Streamlit application layout and functionalities. handle_userinput: Processes user input, retrieves relevant documents, and generates a response using the compressed LLM retriever within the RAG chain. create_conversational_rag_chain: Creates a RAG chain for processing user questions with the compressed LLM retriever. Additional Notes
The Lithuanian law documents might not be the latest versions.
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference