metadata

language:
  - en
library_name: transformers
pipeline_tag: text-generation
datasets:
  - jondurbin/airoboros-2.2
  - Open-Orca/OpenOrca
  - garage-bAInd/Open-Platypus
  - WizardLM/WizardLM_evol_instruct_V2_196k
  - TokenBender/python_eval_instruct_51k
tags:
  - llama-2
  - code
license: llama2
model-index:
  - name: SpeechlessCoder
    results:
      - task:
          type: text-generation
        dataset:
          type: openai_humaneval
          name: HumanEval
        metrics:
          - name: pass@1
            type: pass@1
            value: 54.27
            verified: false

speechless-coding-7b-16k-tora

Use the following dataset to fine-tune llm_agents/tora-code-7b-v0.1 in order to improve the model's reasoning and planning abilities.

prompt_type = "alpaca" max_tokens > 128 && < 16384

Total 177,333 samples 316 MB

jondurbin/airoboros-2.2: Filter categories related to coding, reasoning and planning. 21,923 samples.
Open-Orca/OpenOrca: Filter the 'cot' category in 1M GPT4 dataset. 62,973 samples.
garage-bAInd/Open-Platypus: 100%, 22,760 samples.
WizardLM/WizardLM_evol_instruct_V2_196k: Coding coversation part. 30,081 samples
TokenBender/python_eval_instruct_51k: “python” in output .39,596 samples

HumanEval

Metric	Value
humaneval-python	54.27

Big Code Models Leaderboard

CodeLlama-34B-Python: 53.29

CodeLlama-34B-Instruct: 50.79

CodeLlama-13B-Instruct: 50.6

CodeLlama-34B: 45.11

CodeLlama-13B-Python: 42.89

CodeLlama-13B: 35.07

MultiPL-E

Metric	Value
python	59.63
java	32.28
javascript	46.58
cpp	37.83
rust	28.21
go	27.27
sh	13.29
julia	34.59
typescript	47.80

LMEval

Open LLM Leaderboard

Metric	Value
ARC
HellaSwag
MMLU
TruthfulQA
Average

Parameters


lr	2e-4
lr_scheduler_type	cosine
weight_decay	0.0
optim	paged_adamw_8bit
flash_attention	True
rerope	False
max_new_tokens	16384
num_train_epochs	2
bits	4
lora_r	64
lora_alpha	256
lora_dropout	0.05
double_quant	True
quant_type	nf4
dataset_format	sharegpt
mini_batch_size	2
grandient_accumulation_steps	32
bf16	True

A100-40G x 4