Llama3-ChatQA-1.5-8B-256K

I tried to achive long context RAG pipeline with this model but I have very limited resources to test this workflow. Keep in mind that this is an experimentation.

This model is an 'amalgamation' of winglian/llama-3-8b-256k-PoSE and nvidia/Llama3-ChatQA-1.5-8B.

Recipe

First I extracted the Lora adapter from nvidia/Llama3-ChatQA-1.5-8B using mergekkit. You can find the adapter here.

After the extraction I merged the adapter with the winglian/llama-3-8b-256k-PoSE model.

Prompt Format

Since base model wasn't finetuned for any specific format we can use the ChatQA's chat format.

System: {System}

{Context}

User: {Question}

Assistant: {Response}

User: {Question}

Assistant:

Big thanks to Meta Team, Nvidia Team and of course Wing Lian.

Notes

This model has not been tested on any benchmarks due to compute limitations. Base model wasn't evaluated using Needle in Haystack as well. There is a big possibility that this model might perform worse than both of the original models.

Downloads last month
29
Safetensors
Model size
8.03B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for beratcmn/Llama3-ChatQA-1.5-8B-256K

Finetuned
(372)
this model
Finetunes
6 models
Quantizations
1 model