|
--- |
|
base_model: |
|
- meta-llama/Meta-Llama-3-8B |
|
- nvidia/Llama3-ChatQA-1.5-8B |
|
- winglian/llama-3-8b-256k-PoSE |
|
library_name: transformers |
|
tags: |
|
- mergekit |
|
- peft |
|
- nvidia |
|
- chatqa-1.5 |
|
- chatqa |
|
- llama-3 |
|
- pytorch |
|
license: llama3 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Llama3-ChatQA-1.5-8B-256K |
|
|
|
I tried to achive long context RAG pipeline with this model but I have very limited resources to test this workflow. Keep in mind that this is an experimentation. |
|
|
|
This model is an 'amalgamation' of `winglian/llama-3-8b-256k-PoSE` and `nvidia/Llama3-ChatQA-1.5-8B`. |
|
|
|
## Recipe |
|
|
|
First I extracted the Lora adapter from `nvidia/Llama3-ChatQA-1.5-8B` using `mergekkit`. You can find the adapter [here](https://huggingface.co/beratcmn/Llama3-ChatQA-1.5-8B-lora). |
|
|
|
After the extraction I merged the adapter with the `winglian/llama-3-8b-256k-PoSE` model. |
|
|
|
## Prompt Format |
|
|
|
Since base model wasn't finetuned for any specific format we can use the ChatQA's chat format. |
|
|
|
```text |
|
System: {System} |
|
|
|
{Context} |
|
|
|
User: {Question} |
|
|
|
Assistant: {Response} |
|
|
|
User: {Question} |
|
|
|
Assistant: |
|
``` |
|
|
|
Big thanks to Meta Team, Nvidia Team and of course Wing Lian. |
|
|
|
## Notes |
|
|
|
This model has not been tested on any benchmarks due to compute limitations. Base model wasn't evaluated using `Needle in Haystack` as well. There is a big possibility that this model might perform worse than both of the original models. |