File size: 1,393 Bytes

---
base_model:
- meta-llama/Meta-Llama-3-8B
- nvidia/Llama3-ChatQA-1.5-8B
- winglian/llama-3-8b-256k-PoSE
library_name: transformers
tags:
- mergekit
- peft
- nvidia
- chatqa-1.5
- chatqa
- llama-3
- pytorch
license: llama3
language:
- en
pipeline_tag: text-generation
---

# Llama3-ChatQA-1.5-8B-256K

I tried to achive long context RAG pipeline with this model but I have very limited resources to test this workflow. Keep in mind that this is an experimentation.

This model is an 'amalgamation' of `winglian/llama-3-8b-256k-PoSE` and `nvidia/Llama3-ChatQA-1.5-8B`. 

## Recipe

First I extracted the Lora adapter from `nvidia/Llama3-ChatQA-1.5-8B` using `mergekkit`. You can find the adapter [here](https://huggingface.co/beratcmn/Llama3-ChatQA-1.5-8B-lora).

After the extraction I merged the adapter with the `winglian/llama-3-8b-256k-PoSE` model. 

## Prompt Format

Since base model wasn't finetuned for any specific format we can use the ChatQA's chat format.

```text
System: {System}

{Context}

User: {Question}

Assistant: {Response}

User: {Question}

Assistant:
```

Big thanks to Meta Team, Nvidia Team and of course Wing Lian.

## Notes

This model has not been tested on any benchmarks due to compute limitations. Base model wasn't evaluated using `Needle in Haystack` as well. There is a big possibility that this model might perform worse than both of the original models.