Performance issues when compared to other codellama_34b finetunes

#11
by rombodawg - opened

Although this model claims the highest python scores on humaneval, ive had better generation results (specifically in python) from the phinds codellama_python_34b finetune than this one. My test specifically involved writing code over 1k tokens long. One example was to code an entire game of snake in python, to which the wizardcoder-34b model not only failed to do, but also couldnt fix errors with. While the phind model achieved correct results first time, and from others testing the airoboros-34b model they had even better results.

All im asking is a simple explanation for how your model could have such high benchmark scores but still lack in code generation quality compared specifically to its competition.

And does this issue only apply to longer pieces of code generation, which the previous wizardcoder-15b model also struggles with.
@WizardLM

WizardLM Team org

We disagree with you.

  • You don't provide any examples, like prompts or captures. Without these, we cannot help you fix your problems.
  • We have tested WizardCoder-34B with this prompt: write a snake game with python. lt can write an entire snake game, which also can be played.
    image.png
    image.png
  • Someone on twitter also gets a snake game from our model.
    image.png
  • This is the generated code.
import pygame
import random

# initialize pygame
pygame.init()

# set the window size
WINDOW_SIZE = [800, 600]

# create the window
screen = pygame.display.set_mode(WINDOW_SIZE)

# set the title of the window
pygame.display.set_caption("Snake Game")

# set the game clock
clock = pygame.time.Clock()

# set the game variables
snake_pos = [100, 50]
snake_body = [[100, 50], [100-10, 50], [100-(2*10), 50]]
food_pos = [random.randint(0, WINDOW_SIZE[0]//10)*10, random.randint(0, WINDOW_SIZE[1]//10)*10]
food_spawned = True
direction = "RIGHT"
change_to = direction
score = 0

# set the game loop
running = True
while running:
    # handle events
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            running = False
        
        # handle input
        if event.type == pygame.KEYDOWN:
            if event.key == pygame.K_LEFT:
                change_to = "LEFT"
            if event.key == pygame.K_RIGHT:
                change_to = "RIGHT"
            if event.key == pygame.K_UP:
                change_to = "UP"
            if event.key == pygame.K_DOWN:
                change_to = "DOWN"

    # handle direction changes
    if change_to == "LEFT" and direction != "RIGHT":
        direction = "LEFT"
    if change_to == "RIGHT" and direction != "LEFT":
        direction = "RIGHT"
    if change_to == "UP" and direction != "DOWN":
        direction = "UP"
    if change_to == "DOWN" and direction != "UP":
        direction = "DOWN"

    # move the snake
    if direction == "LEFT":
        snake_pos[0] -= 10
    if direction == "RIGHT":
        snake_pos[0] += 10
    if direction == "UP":
        snake_pos[1] -= 10
    if direction == "DOWN":
        snake_pos[1] += 10

    # add a new segment to the snake
    snake_body.insert(0, list(snake_pos))

    # check if the snake has eaten the food
    if snake_pos[0] == food_pos[0] and snake_pos[1] == food_pos[1]:
        score += 1
        food_spawned = False
    else:
        snake_body.pop()

    # spawn a new food if it has been eaten
    if not food_spawned:
        food_pos = [random.randint(0, WINDOW_SIZE[0]//10)*10, random.randint(0, WINDOW_SIZE[1]//10)*10]
        food_spawned = True

    # draw the game objects
    screen.fill((0, 0, 0))
    for pos in snake_body:
        pygame.draw.rect(screen, (255, 255, 255), pygame.Rect(pos[0], pos[1], 10, 10))
    pygame.draw.rect(screen, (255, 0, 0), pygame.Rect(food_pos[0], food_pos[1], 10, 10))

    # draw the score
    font = pygame.font.Font(None, 36)
    text = font.render(f"Score: {score}", True, (255, 255, 255))
    screen.blit(text, (WINDOW_SIZE[0]//2 - text.get_width()//2, 10))

    # update the screen
    pygame.display.update()

    # set the game speed
    clock.tick(10)

# quit the game
pygame.quit()

@Ziyang this is great to hear! Can you share the generation settings when getting these results? I may just have had bad luck

WizardLM Team org

You can simply use this prompt write a snake game with python on our demo.
Or you can use our demo code locally. wizardcoder_demo.py
Set the temperature to 0, so the generation will be greedy decoding. You can get the same code.

Oh gotcha, i use text generation webui, so i have access to alot more setting, like top_p, top_k, temperature, typical_p, ect.

WizardLM Team org

Without greedy decoding, you will get different codes for each try. So luck matters.

Ok thanks for all the information

WizardLM changed discussion status to closed

Sign up or log in to comment