--- base_model: - meta-llama/Meta-Llama-3-8B - nvidia/Llama3-ChatQA-1.5-8B - winglian/llama-3-8b-256k-PoSE library_name: transformers tags: - mergekit - peft - nvidia - chatqa-1.5 - chatqa - llama-3 - pytorch license: llama3 language: - en pipeline_tag: text-generation --- # Llama3-ChatQA-1.5-8B-256K I tried to achive long context RAG pipeline with this model but I have very limited resources to test this workflow. Keep in mind that this is an experimentation. This model is an 'amalgamation' of `winglian/llama-3-8b-256k-PoSE` and `nvidia/Llama3-ChatQA-1.5-8B`. ## Recipe First I extracted the Lora adapter from `nvidia/Llama3-ChatQA-1.5-8B` using `mergekkit`. You can find the adapter [here](https://huggingface.co/beratcmn/Llama3-ChatQA-1.5-8B-lora). After the extraction I merged the adapter with the `winglian/llama-3-8b-256k-PoSE` model. ## Prompt Format Since base model wasn't finetuned for any specific format we can use the ChatQA's chat format. ```text System: {System} {Context} User: {Question} Assistant: {Response} User: {Question} Assistant: ``` Big thanks to Meta Team, Nvidia Team and of course Wing Lian. ## Notes This model has not been tested on any benchmarks due to compute limitations. Base model wasn't evaluated using `Needle in Haystack` as well. There is a big possibility that this model might perform worse than both of the original models.