|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- HuggingFaceTB/cosmopedia |
|
- databricks/databricks-dolly-15k |
|
- Open-Orca/OpenOrca |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# WikiChat-v0.2 |
|
Training in progress model to have conversations. |
|
|
|
The GGUFs uploaded are full FP32 precision. |
|
|
|
Using OpenOrca GPT-4 data + cosmopedia for some extra data + dolly15k for instruct |
|
|
|
## Model Details: |
|
- 83.59M parameters (83591800) |
|
- 8 attention heads |
|
- 40 layers |
|
- 384 embeddings size |
|
- 4096/8192/16384 context (please use 2/4x RoPE scaling, may train a 16k finetuned version later) |
|
- Batch size 16 |
|
- llama.cpp (train-text-from-scratch) |
|
|
|
## Prompt Format (Alpaca): |
|
``` |
|
Instruction: {system} |
|
Input: {prompt} |
|
Response: {response} |
|
``` |
|
|
|
Please structure your prompts in an instruct format for maximum performance. |
|
|
|
## Training Details: |
|
- 1x RTX 3070 8GB (Infrencing speed: 80tok/s, full GPU offload) |
|
- 1x Ryzen 3 3700x |
|
- 96gb RAM |
|
- 10 iterations |
|
- Loss Target = 2.5 to 3.0 |
|
- Approx 480 samples/1M train tokens (>0.0001 epoches) |
|
- Training data = Refer to OpenOrca page |
|
|
|
## Notes: |
|
|
|
The model isn't ready yet; this is to test tokenization of OpenOrca and a balance between training speed and model size |
|
|
|
## Example output: |
|
``` |
|
User: What is the square root of 4? |
|
``` |
|
``` |
|
Assistant: The square root of 4 is 2. |
|
``` |