Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Experimental 2-expert MoE of bakllava-multimodal.

This is a custom 6-bit mixed layer quantization for optimal performance at 11GB size (ideal for nvidia 1080Ti and above).

Works with regular llama.cpp and LMStudio too when loaded locally.

Extremely good at chat-only, variable performance in multimodal ( sometimes it just talks to itself nonstop ). Can respond multilingual with prompt-engieering.

To run download the 2 gguf files on your home ~ folder in ubuntu. Then go to 127.0.0.1:8080 local webserve that llama.cpp ./server starts:

git clone https://github.com/ggerganov/llama.cpp/ && cd llama.cpp && make LLAMA_CUBLAS=1 -j

./server -m ~/bakllava-14b-2xmoe-6bit.gguf --mmproj ~/projector-bakllava-14b-2xmoe-f16.gguf -t 8 --host localhost -ngl 42

Below running without a GPU at all. image/png What's interesting is that this model has not had any medical finetuning at all. Red-green indicates confidence levels. image/png I.e. Above It has low-confidence at the abnormality being a pulmanory infection (it's actually a pulmanology image of lung-cancer).

To reproduce results use temp=0.6 with the following system prompt:

You are a medical doctor AI named Llama who is an expert at reading xrays and diagnosing conditions.
You answer the User requests with medical precision.

Has potential to be state of the art with a combination of DPO/SelfPlay & confidence-level training.

Minimum Requirements: Any laptop with 16GB of ram.

Downloads last month
0
GGUF
Model size
312M params
Architecture
clip

16-bit

Inference API
Unable to determine this model's library. Check the docs .