Evaluating In-Context Learning Ability

#6
by mustafaa - opened

Hello,

First of all, I would like to thank you for this amazing project. I am evaluating Chameleon's in-context learning ability. However, I think I am missing something about the inference process. When I work with a zero-shot setting, the model outputs are normal. However, with a few-shot setting, the model's responses are awkward. It sometimes avoids answering and occasionally outputs irrelevant characters. I do not encounter this problem in the zero-shot setting. Below you can find the code that I used.

def load_model(self, args) -> None:
    """
    Load the Chameleon model and processor.

    Parameters:
    - args: The arguments to load the model.

    Returns:
    None
    """
    
    from transformers import ChameleonForConditionalGeneration, ChameleonProcessor, BitsAndBytesConfig

    print('Loading Chameleon!!!')

    self.model = ChameleonForConditionalGeneration.from_pretrained(
        args.hf_path,
        device_map="cuda:0",
        torch_dtype=torch.bfloat16,
    ).to(args.device).eval()

    self.processor = ChameleonProcessor.from_pretrained(args.hf_path)
    self.generation_cfg = {
        'do_sample': True,
        'temperature': 0.7,
        'top_p': 0.9,
        'repetition_penalty': 1.2,
    }

    if args.is_zero_cot_active or args.is_few_cot_active:
      self.generation_cfg['max_new_tokens'] = 512
    else:
      self.generation_cfg['max_new_tokens'] = 50

    print('Chameleon loaded!!!')

def calculate_generated_text(self, prompt, vision_x):
    """
    Calculate generated text given a prompt and vision data.

    Parameters:
    - prompt (str): The input prompt.
    - vision_x (list[PIL Images]): List of PIL Images containing vision data.

    Returns:
    Tuple[str, str]: Tuple containing the raw and salt answer text.
    """

   """
   Example Prompt:
   In zero-shot: "<image> <Question> <Options> Answer: "
   In few-shot: "<image> <Question> <Options> Answer: <Answer> <image> <Question> <Options> Answer: "
   """ 
    if self.model is None or self.processor is None:
        raise AttributeError('Model or processor is not initialized. Call load_model first!')

    inputs = self.processor(prompt, images=vision_x, padding=True, return_tensors="pt").to(device=self.model.device, dtype=torch.bfloat16)

    out = self.model.generate(**inputs,  **self.generation_cfg)
    
    generated_text = self.processor.decode(out[0], skip_special_tokens=True)
    
    salt_prompt = prompt.replace("<image>", "")

    salt_answer = generated_text[len(salt_prompt):]

    return generated_text, salt_answer

Hi @mustafaa . I'm highly interested in trying this model but there are no clear instructions yet, so I tried your code. I'm wondering how you deal with prompt length? I got an ValueError when executing inputs=processor(...), and the error still exists after I set generation_cfg.max_length and generation_cfg.max_new_tokens=2048. My prompt is "<image> Briefly describe the image. ". Just the image would have length more than 1000 so I can't really reduce the input length.

ValueError: Input length of input_ids is 1029, but max_length is set to 20. This can lead to unexpected behavior. You should consider increasing max_length or, better yet, > setting max_new_tokens.

Hi @eve1234 . I think I made a mistake in the generate function. It should be:

out = self.model.generate(**inputs, **self.generation_cfg)

If it doesn't work, then you can try to feed each argument separately as in this post.

Sign up or log in to comment