metadata

license: llama3.1
language:
  - zh
pipeline_tag: text-generation
tags:
  - facebook
  - meta
  - pytorch
  - llama
  - llama-3
  - ContaLLM
  - ContaAI
base_model:
  - meta-llama/Llama-3.1-8B-Instruct
library_name: transformers

ContaLLM-Beauty-8B-Instruct

ContaLLM-Beauty-8B-Instruct is a large Chinese vertical marketing model that can be customized to generate marketing text based on users' specific marketing needs, as well as keywords, topics, hashtags, marketing seasons, character settings, relevant materials, content length, etc, which uses LLM's capability and trained on existing high-quality marketing materials to help enterprises generate diverse and high-quality marketing content and increase marketing conversion rate.

Model description

Model type: A model trained on a mix of publicly available, synthetic and human-annotated datasets.
Language(s) (NLP): Primarily Chinese
Industry: Beauty Makeup Industry Marketing
License: Llama 3.1 Community License Agreement
Finetuned from model: meta-llama/Llama-3.1-8B-Instruct

Model Stage

Industry	Version	Llama 3.1 8B
Beauty	bf16	ContaAI/ContaLLM-Beauty-8B-Instruct
Beauty	8bit	ContaAI/ContaLLM-Beauty-8B-Instruct-8bit
Beauty	4bit	ContaAI/ContaLLM-Beauty-8B-Instruct-4bit

Using the model

Loading with HuggingFace

To load the model with HuggingFace, use the following snippet:

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("ContaAI/ContaLLM-Beauty-8B-Instruct-8bit")

System Prompt

The model is a Chinese beauty marketing model, so we use this system prompt by default:

system_prompt = '请根据用户提供的营销需求和其他信息写一篇美妆护肤行业的营销推文。'

User Prompt

Users can enter the required marketing needs according to their own needs, non-required including keywords, topics, label marketing nodes, people, related materials, content length, which content length has three specifications, respectively, shorter, medium, longer. The details are as follows:

Parameter name	Required	Meaning and optional range
营销需求	required	Fill in your marketing requirements, cannot be blank
关键词	optional	Fill in your marketing keywords, or remove this row from the prompt
话题	optional	Fill in your marketing topic, or remove this row from the prompt
标签	optional	Fill in the hashtag, or remove this row from the prompt
营销节点	optional	Fill in the marketing season, such as Valentine's Day, Christmas, or remove this row from the prompt
人设	optional	Fill in your character settings, or remove this row from the prompt
相关素材	optional	Fill in the relevant materials for your marketing needs, or remove this row from the prompt
内容长度	optional	choices=['较长', '中等', '较短'], choose what you need, or remove this row from the prompt

Example:

user_prompt = '营销需求：美白水乳推荐，推广HBN原白水乳。\n关键词：HBN原白水乳\n话题： 分享护肤 提亮肤色\n标签：爱情、浪漫\n话题： 分享护肤 提亮肤色\n人设：美白水乳推荐，推广HBN原白水乳。\n相关素材：美白水乳推荐，推广HBN原白水乳。\n内容长度：较长\n'

Use example (with template)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "ContaAI/ContaLLM-Beauty-8B-Instruct-8bit"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

system_prompt = '请根据用户提供的营销需求和其他信息写一篇美妆护肤行业的营销推文。'

user_prompt = '营销需求：美白水乳推荐，推广HBN原白水乳。\n关键词：HBN原白水乳\n话题： 分享护肤 提亮肤色\n标签：爱情、浪漫\n话题： 分享护肤 提亮肤色\n人设：美白水乳推荐，推广HBN原白水乳。\n相关素材：美白水乳推荐，推广HBN原白水乳。\n内容长度：较长\n'

prompt_template = '''<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{}<|eot_id|><|start_header_id|>user<|end_header_id|>
{}<|eot_id|><|start_header_id|>assistant<|end_header_id|>'''

prompt = prompt_template.format(system_prompt, user_prompt)

tokenized_message = tokenizer(
  prompt,
  max_length=2048,
  return_tensors="pt",
  add_special_tokens=False
)

response_token_ids= model.generate(
  **tokenized_message,
  max_new_tokens=1024,
  do_sample=True,
  top_p=1.0,
  temperature=0.5,
  min_length=None,
  use_cache=True,
  top_k=50,
  repetition_penalty=1.2,
  length_penalty=1,
)

generated_tokens = response_token_ids[0, tokenized_message['input_ids'].shape[-1]:]
generated_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
print(generated_text)

Bias, Risks, and Limitations

The ContaLLM models implemented safety techniques during data generation and training, but they are not deployed automatically with in-the-loop filtering of responses like ChatGPT during inference, so the model can produce problematic outputs (especially when prompted to do so). It is also unknown what the size and composition of the corpus was used to train the base Llama 3.1 models, however it is likely to have included a mix of Web data and technical sources like books and code. The use of the models is at your own risk. You may need to monitor the outputs of the model and take appropriate actions such as content filtering if necessary.

License and use

All Llama 3.1 ContaAI models are released under Meta's Llama 3.1 Community License Agreement.