Edit model card

Introduction

MetaAligner-UltraFeedback-1.1B is part of the MetaAligner project, the first policy-agnostic and generalizable method for multi-objective preference alignment of large language models. This model is finetuned based on the TinyLLaMA-1.1B foundation model and the dynamic multi-objective dataset built from the openbmb/UltraFeedback dataset. UltraFeedback-MetaAligner is trained to align responses of another general AI assistant considering a single-turn query, but the queries include professional questions such as programming language and history, and the aligned responses are usually more complicated. The model is expected to perform multi-objective alignment efficiently, without tuning the policy models or accessing their parameters. MetaAligner also exerts zero-shot preference alignment for unseen objectives. To our knowledge, this work marks the first attempt at generalizable multi- objective preference alignment. Experimental results show that MetaAligner can simultaneously perform effective alignment for multiple unseen objectives while maintaining performance on aligned objectives.

Dataset

This model is trained based on the following released dataset:

Usage

With the Hugging Face Transformers library, you can use the MetaAligner-UltraFeedback-1.1B model in your Python project. Here is a simple example of how to load the model:

import torch
from transformers import LlamaTokenizer, LlamaForCausalLM
tokenizer = LlamaTokenizer.from_pretrained('MetaAligner/MetaAligner-UltraFeedback-1.1B', padding_side='left')
model = LlamaForCausalLM.from_pretrained('MetaAligner/MetaAligner-UltraFeedback-1.1B', device_map='auto', torch_dtype=torch.bfloat16)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In this example, LlamaTokenizer is used to load the tokenizer, and LlamaForCausalLM is used to load the model. The device_map='auto' argument is used to automatically use the GPU if it's available.

You can align any outputs obtained from other foundation models with your self-defined objectives. For example, we have the following query from UltraFeedback, unaligned output examples from LLaMA2-7B, and target objectives:

all_aspects = {'instruction_following': 'Instruction following: the response should follow the instructions of the query',
               'honesty': 'Honesty: the response should not tell lies',
               'truthfulness': 'Truthfulness: the response should actively making known all the full truth of a matter',
               'helpfulness': 'Helpfulness: the response should provide useful resources and suggestions to the user',
               'speci': 'Specificity: the response should refer to facts and details and avoid vague arguments.',
               'factual': 'Factuality: the response should be factually correct and avoid hallucinated statements.',
               'read': 'Readability: the response should be easy to read and understand, not too technical for laymen.',
               'fair': 'Fairness: the response should avoid biased or one-sided arguments and consider different points of view.',
               'repeat': 'Repetition: the response should avoid repetitive statements of one point.',
               'len': 'Length: the response should be concise and avoid redundant content.'}
aspects = [instruction_following', 'honesty', 'truthfulness', 'helpfulness']
question = "What is the square root of banana?"
answer = "The square root of banana is not a well-defined mathematical concept, as bananas are not objects that can be squared. Bananas are a type of fruit that grow on plants and have a specific shape and size. They do not have any mathematical properties or values that can be used in mathematical operations. Therefore, the square root of banana is not a valid or meaningful mathematical concept. It is important to be aware of the limitations of mathematical language and symbols, and to use them correctly and consistently in order to avoid confusion or misinterpretation."

To ensure the best performance, use the following template to prompt MetaAligner:

query_prompt = 'You are an assistant to human. You will be provided with a query and an answer. Consider the query, ' \
               'then edit the answer to improve it considering these aspects: {aspects} | ' \
             'Query: {question} | Answer: {answer} | Edit: '
aspects = [all_aspects[i] for i in aspects]
aligner_queries = [query_prompt.format(aspects='; '.join(aspects), question=question, answer=str(answer))]

You can obtain an aligned response using the following codes:

inputs = tokenizer(aligner_queries, return_tensors="pt", padding=True)
input_ids = inputs.input_ids.to(device)
generate_ids = model.generate(input_ids, max_new_tokens=1024)
truc_ids = generate_ids[0][len(input_ids[0]):]
response = tokenizer.decode(truc_ids, skip_special_tokens=True, spaces_between_special_tokens=False)
print(response)

One inference of MetaAligner-UltraFeedback-1.1B on the above codes has the following response:

The square root of a number is the reciprocal of that number. In this case, the square root of a banana is not a valid mathematical concept. Bananas are not a mathematical quantity, and therefore, there is no square root of a banana.

License

MetaAligner-UltraFeedback-1.1B is licensed under MIT. For more details, please see the MIT file.

Downloads last month
2
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train MetaAligner/MetaAligner-UltraFeedback-1.1B