Text Generation
Transformers
PyTorch
English
llama
llama-2
code
Eval Results
Inference Endpoints
text-generation-inference
uukuguy's picture
Init
59aa8ed
metadata
language:
  - en
library_name: transformers
pipeline_tag: text-generation
datasets:
  - jondurbin/airoboros-2.2
  - Open-Orca/OpenOrca
  - garage-bAInd/Open-Platypus
  - WizardLM/WizardLM_evol_instruct_V2_196k
  - TokenBender/python_eval_instruct_51k
tags:
  - llama-2
  - code
license: llama2
model-index:
  - name: SpeechlessCoder
    results:
      - task:
          type: text-generation
        dataset:
          type: openai_humaneval
          name: HumanEval
        metrics:
          - name: pass@1
            type: pass@1
            value: 54.27
            verified: false

speechless-coding-7b-16k-tora

Use the following dataset to fine-tune llm_agents/tora-code-7b-v0.1 in order to improve the model's reasoning and planning abilities.

prompt_type = "alpaca" max_tokens > 128 && < 16384

Total 177,333 samples 316 MB

  • jondurbin/airoboros-2.2: Filter categories related to coding, reasoning and planning. 21,923 samples.
  • Open-Orca/OpenOrca: Filter the 'cot' category in 1M GPT4 dataset. 62,973 samples.
  • garage-bAInd/Open-Platypus: 100%, 22,760 samples.
  • WizardLM/WizardLM_evol_instruct_V2_196k: Coding coversation part. 30,081 samples
  • TokenBender/python_eval_instruct_51k: “python” in output .39,596 samples

HumanEval

Metric Value
humaneval-python 54.27

Big Code Models Leaderboard

CodeLlama-34B-Python: 53.29

CodeLlama-34B-Instruct: 50.79

CodeLlama-13B-Instruct: 50.6

CodeLlama-34B: 45.11

CodeLlama-13B-Python: 42.89

CodeLlama-13B: 35.07

MultiPL-E

Metric Value
python 59.63
java 32.28
javascript 46.58
cpp 37.83
rust 28.21
go 27.27
sh 13.29
julia 34.59
typescript 47.80

LMEval

Open LLM Leaderboard

Metric Value
ARC
HellaSwag
MMLU
TruthfulQA
Average

Parameters

lr 2e-4
lr_scheduler_type cosine
weight_decay 0.0
optim paged_adamw_8bit
flash_attention True
rerope False
max_new_tokens 16384
num_train_epochs 2
bits 4
lora_r 64
lora_alpha 256
lora_dropout 0.05
double_quant True
quant_type nf4
dataset_format sharegpt
mini_batch_size 2
grandient_accumulation_steps 32
bf16 True

A100-40G x 4