beratcmn
/

Llama3-ChatQA-1.5-8B-256K

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama3-ChatQA-1.5-8B-256K / README.md

beratcmn's picture

Update README.md

b263757 verified 5 months ago

|

history blame contribute delete

No virus

1.39 kB

	---
	base_model:
	- meta-llama/Meta-Llama-3-8B
	- nvidia/Llama3-ChatQA-1.5-8B
	- winglian/llama-3-8b-256k-PoSE
	library_name: transformers
	tags:
	- mergekit
	- peft
	- nvidia
	- chatqa-1.5
	- chatqa
	- llama-3
	- pytorch
	license: llama3
	language:
	- en
	pipeline_tag: text-generation
	---

	# Llama3-ChatQA-1.5-8B-256K

	I tried to achive long context RAG pipeline with this model but I have very limited resources to test this workflow. Keep in mind that this is an experimentation.

	This model is an 'amalgamation' of `winglian/llama-3-8b-256k-PoSE` and `nvidia/Llama3-ChatQA-1.5-8B`.

	## Recipe

	First I extracted the Lora adapter from `nvidia/Llama3-ChatQA-1.5-8B` using `mergekkit`. You can find the adapter [here](https://huggingface.co/beratcmn/Llama3-ChatQA-1.5-8B-lora).

	After the extraction I merged the adapter with the `winglian/llama-3-8b-256k-PoSE` model.

	## Prompt Format

	Since base model wasn't finetuned for any specific format we can use the ChatQA's chat format.

	```text
	System: {System}

	{Context}

	User: {Question}

	Assistant: {Response}

	User: {Question}

	Assistant:
	```

	Big thanks to Meta Team, Nvidia Team and of course Wing Lian.

	## Notes

	This model has not been tested on any benchmarks due to compute limitations. Base model wasn't evaluated using `Needle in Haystack` as well. There is a big possibility that this model might perform worse than both of the original models.