metadata

base_model:
  - meta-llama/Meta-Llama-3-8B
  - nvidia/Llama3-ChatQA-1.5-8B
  - winglian/llama-3-8b-256k-PoSE
library_name: transformers
tags:
  - mergekit
  - peft
  - nvidia
  - chatqa-1.5
  - chatqa
  - llama-3
  - pytorch
license: llama3
language:
  - en
pipeline_tag: text-generation

Llama3-ChatQA-1.5-8B-256K

I tried to achive long context RAG pipeline with this model but I have very limited resources to test this workflow. Keep in mind that this is an experimentation.

This model is an 'amalgamation' of winglian/llama-3-8b-256k-PoSE and nvidia/Llama3-ChatQA-1.5-8B.

Recipe

First I extracted the Lora adapter from nvidia/Llama3-ChatQA-1.5-8B using mergekkit. You can find the adapter here.

After the extraction I merged the adapter with the winglian/llama-3-8b-256k-PoSE model.

Prompt Format

Since base model wasn't finetuned for any specific format we can use the ChatQA's chat format.

System: {System}

{Context}

User: {Question}

Assistant: {Response}

User: {Question}

Assistant:

Big thanks to Meta Team, Nvidia Team and of course Wing Lian.

Notes

This model has not been tested on any benchmarks due to compute limitations. Base model wasn't evaluated using Needle in Haystack as well. There is a big possibility that this model might perform worse than both of the original models.