metadata

title: Llm Arch
emoji: 🚀
colorFrom: green
colorTo: indigo
sdk: streamlit
sdk_version: 1.28.2
app_file: Home.py
pinned: false
license: cc-by-sa-4.0

LLM Arch

This project is a demonstration playground for the LLM-enabled architectures built as a submission for the Online MSc in Artificial Intelligence through the University of Bath. The purpose of the project is to explore "LLM-enabled architectures" where an LLM is used in conjunction with some store of private data. The goal is to provide decision support information to technical managers on the how of using LLMs with their organisational data. Specifically by comparing technical architectures and assessing the organisational implications of the technical choices.

Demonstration environment

The project is available as a demonstration running here on Hugging Face Spaces. This should be the preferred method to interact with the project.

Building and running the project

The project is split into two separate elements - training and inference. Both elements are within this repo. The training elements run offline and generate private data, and train models or vector stores for each of the architectures. These elements are hosted under data_synthesis and training respectively.

For training, it is recommended that you review the training scripts. Each architecture will be trained by querying a product set database and appropriately converting that into a trained architecture.

To build and run the demonstration project for inference you can follow the following steps:

1 - build the environment

Build an environment to run the project. A conda environment export is provided (local_env.yml), including the required pip dependencies also.
Checkout the code repository locally.

2 - set up hugging face endpoints

As described in the config file architectures.json different architectures can be configured. Each of these will have a llama 2 based LLM at its core. These LLMs are designed to be accessed as hugging face endpoints. You should set up hugging face inference endpoints for each of the models you wish to use. The default project uses both a vanilla llama2-7b-chat-hf model and the same model fin-tuned with product related queries.

3 - configure streamlit secrets

Within streamlit secrets (either a .streamlit/secrets.toml file in the root directory of the local repository or see the documentation for your chosen hosting solution), you need to configure 3 of 4 possible settings:
1. hf_token - this is the hugging face token you will need to generate from your hugging face account to access inference.
2. hf_user - this is the user account under which the inference endpoints are running.
3. endpoints - this is a comma separated list of endpoints to try and control from the System Status page of the application. This should mirror the names of the models within your user which you are planning to use. It can be left blank and the System Status page will have no functionality.
4. running_local - no value is required to be set, this is just a flag. Setting this will disable user logins and allow all users access to all pages.
5. app_password - if security is being used, login is required to access all pages except the home page. These passwords are stored in this secrets entry in simple comma separated format such as 'password(username),password2(username2)'. The user name is only for display purposes and not required for log in.

4 - run the demo project

From the root of the local code repository run streamlit run Home.py