beratcmn's picture
Update README.md
b263757 verified
---
base_model:
- meta-llama/Meta-Llama-3-8B
- nvidia/Llama3-ChatQA-1.5-8B
- winglian/llama-3-8b-256k-PoSE
library_name: transformers
tags:
- mergekit
- peft
- nvidia
- chatqa-1.5
- chatqa
- llama-3
- pytorch
license: llama3
language:
- en
pipeline_tag: text-generation
---
# Llama3-ChatQA-1.5-8B-256K
I tried to achive long context RAG pipeline with this model but I have very limited resources to test this workflow. Keep in mind that this is an experimentation.
This model is an 'amalgamation' of `winglian/llama-3-8b-256k-PoSE` and `nvidia/Llama3-ChatQA-1.5-8B`.
## Recipe
First I extracted the Lora adapter from `nvidia/Llama3-ChatQA-1.5-8B` using `mergekkit`. You can find the adapter [here](https://huggingface.co/beratcmn/Llama3-ChatQA-1.5-8B-lora).
After the extraction I merged the adapter with the `winglian/llama-3-8b-256k-PoSE` model.
## Prompt Format
Since base model wasn't finetuned for any specific format we can use the ChatQA's chat format.
```text
System: {System}
{Context}
User: {Question}
Assistant: {Response}
User: {Question}
Assistant:
```
Big thanks to Meta Team, Nvidia Team and of course Wing Lian.
## Notes
This model has not been tested on any benchmarks due to compute limitations. Base model wasn't evaluated using `Needle in Haystack` as well. There is a big possibility that this model might perform worse than both of the original models.