system_map.md · lukestanley/ChillTranslator at 64c61f3f85c2866ef33bbfad1e822f9584ed6506

The system's architecture is designed to mitigate online toxicity by transforming text inputs into less provocative forms using Large Language Models (LLMs), which are pivotal in analysing and refining text.
Different workers, or LLM interfaces are defined, each suited for specific operational environments.
The HTTP server worker is optimised for development purposes, facilitating dynamic updates without necessitating server restarts, it can work offline, with or without a GPU using the llama-cpp-python library, provided a downloaded model.
An in-memory worker is used by the serverless worker.
For on-demand, scalable processing, the system includes a RunPod API worker that leverages serverless GPU functions.
Additionally, the Mistral API worker offers a paid service alternative for text processing tasks.
A set of environment variables are predefined to configure the LLM workers' functionality.
The LLM_WORKER environment variable sets the active LLM worker.
The N_GPU_LAYERS environment variable allows for the specification of GPU layers utilised, defaulting to the maximum available, used when the LLM worker is ran with a GPU.
CONTEXT_SIZE is an adjustable parameter that defines the extent of text the LLM can process concurrently.
The LLM_MODEL_PATH environment variable indicates the LLM model's storage location, which can be either local or sourced from the HuggingFace Hub.
The system enforces some rate limiting to maintain service integrity and equitable resource distribution.
The LAST_REQUEST_TIME and REQUEST_INTERVAL global variables are used for Mistral rate limiting.
The system's worker architecture is somewhat modular, enabling easy integration or replacement of components such as LLM workers.
The system is capable of streaming responses in some modes, allowing for real-time interaction with the LLM.
The llm_streaming function handles communication with the LLM via HTTP streaming when the server worker is active.
The llm_stream_sans_network function provides an alternative for local LLM inference without network dependency.
For serverless deployment, the llm_stream_serverless function interfaces with the RunPod API.
The llm_stream_mistral_api function facilitates interaction with the Mistral API for text processing.
The system includes a utility function, replace_text, for template-based text replacement operations.
A scoring function, calculate_overall_score, amalgamates different metrics to evaluate the text transformation's effectiveness.
The query_ai_prompt function serves as a dispatcher, directing text processing requests to the chosen LLM worker.
The inference_binary_check function within app.py ensures compatibility with the available hardware, particularly GPU presence.
The system provides a user interface through Gradio, enabling end-users to interact with the text transformation service.
The chill_out function in app.py is the entry point for processing user inputs through the Gradio interface.
The improvement_loop function in chill.py controls the iterative process of text refinement using the LLM.