metchee/persrv · Hugging Face

Personalized Sticker Retrieval with Vision-Language Model (PerSRV)

PerSRV provides search keywords given a sticker image and prompt. For more information, please see our paper at the end.

Usage

from transformers import AutoProcessor, LlavaForConditionalGeneration
from PIL import Image

PROCESSOR_ID = "llava-hf/llava-1.5-7b-hf"
processor = AutoProcessor.from_pretrained(PROCESSOR_ID)
processor.tokenizer.padding_side = "left"

MODEL_ID = "metchee/persrv"
tuned_model = LlavaForConditionalGeneration.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float16,
    quantization_config=quantization_config,
)

image_path = ""
prompt = f"USER: <image>\n你是个表情包专家，仔细观察、理解图片中的想表达的感觉，把这个感觉转换成关键词。\nASSISTANT:"
image = Image.open(image_path).convert("RGB")
inputs = processor(text=prompt, images=[image], return_tensors="pt").to("cuda")
generated_ids = tuned_model.generate(**inputs, max_new_tokens=MAX_LENGTH)
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)

Citation

If you find PerSRV helpful to your research, please cite the following paper :)

@misc{chee2024persrvpersonalizedstickerretrieval,
  title={PerSRV: Personalized Sticker Retrieval with Vision-Language Model},
  author={Heng Er Metilda Chee and Jiayin Wang and Zhiqiang Guo and Weizhi Ma and Min Zhang},
  year={2024},
  eprint={2410.21801},
  archivePrefix={arXiv},
  primaryClass={cs.IR},
  url={https://arxiv.org/abs/2410.21801},
}

Framework versions

PEFT 0.12.0
Transformers 4.41.2
bitsandbytes 0.43.3

metchee
/

persrv

Personalized Sticker Retrieval with Vision-Language Model (PerSRV)

Usage

Citation

Framework versions

Model tree for metchee/persrv