File size: 1,393 Bytes
93d6d40
b263757
 
 
 
93d6d40
b263757
 
 
 
 
 
 
 
 
 
 
 
93d6d40
 
b263757
93d6d40
b263757
93d6d40
b263757
93d6d40
b263757
93d6d40
b263757
93d6d40
b263757
93d6d40
b263757
93d6d40
b263757
93d6d40
b263757
 
93d6d40
b263757
93d6d40
b263757
93d6d40
b263757
93d6d40
b263757
93d6d40
b263757
 
93d6d40
b263757
93d6d40
b263757
93d6d40
b263757
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
base_model:
- meta-llama/Meta-Llama-3-8B
- nvidia/Llama3-ChatQA-1.5-8B
- winglian/llama-3-8b-256k-PoSE
library_name: transformers
tags:
- mergekit
- peft
- nvidia
- chatqa-1.5
- chatqa
- llama-3
- pytorch
license: llama3
language:
- en
pipeline_tag: text-generation
---

# Llama3-ChatQA-1.5-8B-256K

I tried to achive long context RAG pipeline with this model but I have very limited resources to test this workflow. Keep in mind that this is an experimentation.

This model is an 'amalgamation' of `winglian/llama-3-8b-256k-PoSE` and `nvidia/Llama3-ChatQA-1.5-8B`. 

## Recipe

First I extracted the Lora adapter from `nvidia/Llama3-ChatQA-1.5-8B` using `mergekkit`. You can find the adapter [here](https://huggingface.co/beratcmn/Llama3-ChatQA-1.5-8B-lora).

After the extraction I merged the adapter with the `winglian/llama-3-8b-256k-PoSE` model. 

## Prompt Format

Since base model wasn't finetuned for any specific format we can use the ChatQA's chat format.

```text
System: {System}

{Context}

User: {Question}

Assistant: {Response}

User: {Question}

Assistant:
```

Big thanks to Meta Team, Nvidia Team and of course Wing Lian.

## Notes

This model has not been tested on any benchmarks due to compute limitations. Base model wasn't evaluated using `Needle in Haystack` as well. There is a big possibility that this model might perform worse than both of the original models.