File size: 1,962 Bytes
f6e1aa0 5ce63d1 94d7beb d3bbef6 6dd0163 69bc469 4cb0fbe 127ee91 fb5ae76 127ee91 fb5ae76 127ee91 fb5ae76 127ee91 1edd565 127ee91 94d7beb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
---
datasets:
- togethercomputer/RedPajama-Data-1T-Sample
library_name: transformers
pipeline_tag: text-generation
tags:
- text-generation-inference
---
This is [Llama2-22b](https://huggingface.co/chargoddard/llama2-22b) by [chargoddard](https://huggingface.co/chargoddard) in a couple of GGML formats. I have no idea what I'm doing so if something doesn't work as it should or not at all that's likely on me, not the models themselves.<br>
A second model merge has been [released](https://huggingface.co/chargoddard/llama2-22b-blocktriangular) and the GGML conversions for that can be found [here](https://huggingface.co/IHaveNoClueAndIMustPost/llama2-22b-blocktriangular-GGML).
While I haven't had any issues so far do note that the original repo states <i>"Not intended for use as-is - this model is meant to serve as a base for further tuning"</b>.
Approximate VRAM requirements at 4K context:
<table style='border: 2px #000000 solid; width: 50%' align='left' border='2'>
<tbody>
<tr>
<td style='text-align: center'>MODEL</td>
<td style='text-align: center'>SIZE</td>
<td style='text-align: center'>VRAM</td>
</tr>
<tr>
<td style='text-align: center'>q5_1</td>
<td style='text-align: center'>16.4GB</td>
<td style='text-align: center'>21.5GB</td>
</tr>
<tr>
<td style='text-align: center'>q4_K_M</td>
<td style='text-align: center'>13.2GB</td>
<td style='text-align: center'>18.3GB</td>
</tr>
<tr>
<td style='text-align: center'>q3_K_M</td>
<td style='text-align: center'>10.6GB</td>
<td style='text-align: center'>16.1GB</td>
</tr>
<tr>
<td style='text-align: center'>q2_K</td>
<td style='text-align: center'>9.2GB</td>
<td style='text-align: center'>14.5GB</td>
</tr>
</tbody>
</table>
|