Spaces:

alfraser
/

llm-arch

Runtime error

File size: 5,736 Bytes

2adce68
 
 
 
 
 
 
d327bd9
2adce68
 
 
 
ce98eaa
 
 
 
a377dbd
 
1e895ae
a377dbd
 
1e895ae
a377dbd
1e895ae
 
a377dbd
1e895ae
 
 
 
a377dbd
 
 
1e895ae
 
 
 
 
 
 
 
 
 
 
 
a377dbd
 
ce98eaa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f87d5b8
ce98eaa

---
title: Llm Arch
emoji: 🚀
colorFrom: green
colorTo: indigo
sdk: streamlit
sdk_version: 1.28.2
app_file: Home.py
pinned: false
license: cc-by-sa-4.0
---

# LLM Arch

This project is a demonstration playground for the LLM-enabled architectures built as a submission for the Online MSc in Artificial Intelligence through the University of Bath.  The purpose of the project is to explore "LLM-enabled architectures" where an LLM is used in conjunction with some store of private data.  The goal is to provide decision support information to technical managers on the _how_ of using LLMs with their organisational data.  Specifically by comparing technical architectures and assessing the organisational implications of the technical choices.

# File Structure

<pre>
llm-arch
├── config
│   ├── architectures.json <i>(configuration for the architectures under test and displayed in the UI)</i>
├── data
│   ├── fine_tuning <i>(data and scripts related to fine-tuning LLMs)</i>
│   ├── json <i>(raw json files containing the synthetic private data built for the project)</i>
│   ├── sqlite
│   │   ├── 01_all_products_dataset.db <i>(sqlite db containing all products generated)</i>
│   │   ├── 02_baseline_dataset.db <i>(sqlite db containing the subset of data selected to be the baseline)</i>
│   │   ├── test_records.db <i>(sqlite database containing the peristed test results)</i>
│   ├── vector_stores <i>(chromadb files containing the document embeddings for the RAG architectures)</i>
├── img
├── pages
├── src
│   ├── data_synthesis <i>(python code related to generating, selecting and loading the private dataset used for the project)</i>
│   ├── training <i>(python code related to training the architectures - not used at runtime)</i>
│   ├── architectures.py <i>(the core architecture pipeline code including components, and trace)</i>
│   ├── common.py <i>(utilities for common functions, e.g. security token access, data type manipulations)</i>
│   ├── datatypes.py <i>(object oriented representation of the test data and single point for runtime access of the product DB)</i>
│   ├── st_helpers.py <i>(helpers specific to streamlit)</i>
│   ├── testing.py <i>(functionality relating to running, recording and reporting on batches of tests)</i>
├── Home.py <i>(main entry point for streamlit - first page in the streamlit app)</i>
├── local_env.yml <i>(conda environment for running project locally)</i>
├── README.md <i>(readme - this file)</i>
├── requirements.txt <i>(requirements file for additional requirements in the HF spaces environment - do not use for local running of the project)</i>
</pre>


# Demonstration environment

The project is available as a demonstration running [here on Hugging Face Spaces](https://huggingface.co/spaces/alfraser/llm-arch).  This should be the preferred method to interact with the project.

# Building and running the project

The project is split into two separate elements - training and inference.  Both elements are within this repo.  The training elements run offline and generate private data, and train models or vector stores for each of the architectures.  These elements are hosted under `data_synthesis` and `training` respectively.

For training, it is recommended that you review the training scripts.  Each architecture will be trained by querying a product set database and appropriately converting that into a trained architecture.

To build and run the demonstration project for inference you can follow the following steps:
## 1 - build the environment
* Build an environment to run the project.  A conda environment export is provided (`local_env.yml`), including the required pip dependencies also.
* Checkout the code repository locally.

## 2 - set up hugging face endpoints
* As described in the config file `architectures.json` different architectures can be configured.  Each of these will have a llama 2 based LLM at its core.  These LLMs are designed to be accessed as hugging face endpoints.  You should set up hugging face inference endpoints for each of the models you wish to use.  The default project uses both a vanilla llama2-7b-chat-hf model and the same model fin-tuned with product related queries. 

## 3 - configure streamlit secrets
* Within streamlit secrets (either a `.streamlit/secrets.toml` file in the root directory of the local repository or see the documentation for your chosen hosting solution), you need to configure 3 of 4 possible settings:
    1. `hf_token` - this is the hugging face token you will need to generate from your hugging face account to access inference.
    2. `hf_user` - this is the user account under which the inference endpoints are running.
    3. `endpoints` - this is a comma separated list of endpoints to try and control from the System Status page of the application.  This should mirror the names of the models within your user which you are planning to use.  It can be left blank and the System Status page will have no functionality.
    4. `running_local` - no value is required to be set, this is just a flag.  Setting this will disable user logins and allow all users access to all pages.
    5. `app_password` - if security is being used, login is required to access all pages except the home page.  These passwords are stored in this secrets entry in simple comma separated format such as '`password(username),password2(username2)`'.  The user name is only for display purposes and not required for log in.

## 4 - run the demo project
* From the root of the local code repository run `streamlit run Home.py`