metadata

license: apache-2.0
datasets:
  - HuggingFaceTB/cosmopedia
  - databricks/databricks-dolly-15k
  - Open-Orca/OpenOrca
language:
  - en
metrics:
  - accuracy
library_name: transformers
pipeline_tag: text-generation

WikiChat-v0.2

Training in progress model to have conversations.

The GGUFs uploaded are full FP32 precision.

Using OpenOrca GPT-4 data + cosmopedia for some extra data + dolly15k for instruct

Model Details:

83.59M parameters (83591800)
8 attention heads
40 layers
384 embeddings size
4096/8192/16384 context (please use 2/4x RoPE scaling, may train a 16k finetuned version later)
Batch size 16
llama.cpp (train-text-from-scratch)

Prompt Format (Alpaca):

Instruction: {system}
Input: {prompt}
Response: {response}

Please structure your prompts in an instruct format for maximum performance.

Training Details:

1x RTX 3070 8GB (Infrencing speed: 80tok/s, full GPU offload)
1x Ryzen 3 3700x
96gb RAM
10 iterations
Loss Target = 2.5 to 3.0
Approx 480 samples/1M train tokens (>0.0001 epoches)
Training data = Refer to OpenOrca page

Notes:

The model isn't ready yet; this is to test tokenization of OpenOrca and a balance between training speed and model size

Example output:

User: What is the square root of 4?

Assistant: The square root of 4 is 2.