|
--- |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
tags: |
|
- facebook |
|
- meta |
|
- llama |
|
- llama-3 |
|
- coreml |
|
- text-generation |
|
license: llama3 |
|
--- |
|
|
|
|
|
# Meta Llama 3 8B – Core ML |
|
|
|
This repository contains a Core ML conversion of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) |
|
|
|
|
|
This does not have KV Cache. only: inputs int32 / outputs float16. |
|
|
|
I haven't been able to test this, so leave something in 'Community' to let me know how ya tested it and how it worked. |
|
|
|
I did model.half() before scripting / coverting thinking it would reduce my memory usage (I found online that it doesn't). |
|
|
|
I am unsure if it affected the conversion process or not. |