How to install this locally and use offline? Any references or videos would be great
Then you want llama-cpp-python, which are llama.cpp bindings for Python. Allows you to load GGML files exactly the same as llama.cpp does, but easily accessible from code.
It can then be used either direct from your own Python code, or via an OpenAI-compatible API which you can put LangChain at, or any other client.
Thanks everyone, goal accomplished