metadata

language:
  - ja
tags:
  - instructblip
  - vision
  - image-captioning
  - japanese-stablelm
pipeline_tag: image-to-text
license:
  - other
extra_gated_heading: Access Japanese StableLM Instruct Alpha
extra_gated_description: >-
  This repository is publicly accessible, but you have to accept the conditions
  to access its files and content.
extra_gated_button_content: Access repository
extra_gated_fields:
  Name: text
  Email: text
  Organization: text
  I agree to accept the conditions and share above info with Stability AI: checkbox
extra_gated_prompt: >
  ### JAPANESE STABLELM RESEARCH LICENSE AGREEMENT

  Dated: August 7, 2023


  "Agreement" means the terms and conditions for use, reproduction, distribution
  and modification of the Software Products set forth herein.


  “Documentation” means any specifications, manuals, documentation, and other
  written information provided by Stability AI related to the Software.


  "Licensee" or "you" means you, or your employer or any other person or entity
  (if you are entering into this Agreement on such person’s or entity's behalf),
  of the age required under applicable laws, rules or regulations to provide
  legal consent and that has legal authority to bind your employer or such other
  person or entity if you are entering in this Agreement on their behalf.


  "Stability AI" or "we" means Stability AI Ltd.


  "Software" means, collectively, Stability AI’s proprietary Japanese StableLM
  made available under this Agreement.


  “Software Products” means Software and Documentation.


  By using or distributing any portion or element of the Software Products, you
  agree to be bound by this Agreement.

  - License Rights and Redistribution.
      - Subject to your compliance with this Agreement and the Documentation, Stability AI grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty free and limited license under Stability AI’s intellectual property or other rights owned by Stability AI embodied in the Software Products to reproduce, distribute, and create derivative works of the Software Products for purposes other than commercial or production use.
      - You will not, and will not permit, assist or cause any third party to use, modify, copy, reproduce, create derivative works of, or distribute the Software Products (or any derivative works thereof, works incorporating the Software Products, or any data produced by the Software), in whole or in part, for any commercial or production purposes.
      - If you distribute or make the Software Products, or any derivative works thereof, available to a third party, you shall (i) provide a copy of this Agreement to such third party, and (ii) retain the following attribution notice within a "Notice" text file distributed as a part of such copies: "Japanese StableLM is licensed under the Japanese StableLM Research License, Copyright (c) Stability AI Ltd. All Rights Reserved.”
      - The licenses granted to you under this Agreement are conditioned upon your compliance with the Documentation and this Agreement, including the Acceptable Use Policy below and as may be updated from time to time in the future on stability.ai, which is hereby incorporated by reference into this Agreement.
  - Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE SOFTWARE
  PRODUCTS  AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN "AS IS"
  BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING,
  WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT,
  MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY
  RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE
  SOFTWARE PRODUCTS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE
  SOFTWARE PRODUCTS AND ANY OUTPUT AND RESULTS.

  - Limitation of Liability. IN NO EVENT WILL STABILITY AI OR ITS AFFILIATES BE
  LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE,
  PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST
  PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR
  PUNITIVE DAMAGES, EVEN IF STABILITY AI OR ITS AFFILIATES HAVE BEEN ADVISED OF
  THE POSSIBILITY OF ANY OF THE FOREGOING.

  - Intellectual Property.
      - No trademark licenses are granted under this Agreement, and in connection with the Software Products, neither Stability AI nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Software Products.
      - Subject to Stability AI’s ownership of the Software Products and derivatives made by or for Stability AI, with respect to any derivative works and modifications of the Software Products that are made by you, as between you and Stability AI, you are and will be the owner of such derivative works and modifications.
      - If you institute litigation or other proceedings against Stability AI (including a cross-claim or counterclaim in a lawsuit) alleging that the Software Products or associated outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Stability AI from and against any claim by any third party arising out of or related to your use or distribution of the Software Products in violation of this Agreement.
  - Term and Termination. The term of this Agreement will commence upon your
  acceptance of this Agreement or access to the Software Products and will
  continue in full force and effect until terminated in accordance with the
  terms and conditions herein. Stability AI may terminate this Agreement if you
  are in breach of any term or condition of this Agreement. Upon termination of
  this Agreement, you shall delete and cease use of the Software Products.
  Sections 2-4 shall survive the termination of this Agreement.

  —----------

  ### Japanese StableLM Acceptable Use Policy

  If you access, use, or distribute any Stability AI models, software, or other
  materials (“Stability Technology”) you agree to this Acceptable Use Policy
  (“Policy”).

  We want everyone to use Stability Technology safely and responsibly. You agree
  you will not use, or allow others to use, Stability Technology to:

  - To violate the law or others’ rights (including intellectual property rights
  and the rights of data privacy and protection), nor will you promote,
  contribute to, encourage, facilitate, plan, incite, or further anyone else’s
  violation of the law or others’ rights;

  - To commit, promote, contribute to, facilitate, encourage, plan, incite, or
  further any of the following:
          - Violence or terrorism;
          - Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content;
          - Human trafficking, exploitation, and sexual violence;
          - Harassment, abuse, threatening, stalking, or bullying of individuals or groups of individuals;
          - Discrimination in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services on the basis of race, color, caste, religion, sex (including pregnancy, sexual orientation, or gender identity), national origin, age, disability, or genetic information (including family medical history) except as may be required by applicable law (such as the provision of social security benefits solely to people who meet certain age requirements under the law);
          - Creation of malicious code, malware, computer viruses or any activity that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system;
  - For purposes of or for the performance of:
          - Fully automated decision-making, including profiling, with respect to an individual or group of individuals which produces legal effects concerning such individual(s) or similarly significantly affects such individual(s);
          - Systematic or automated scraping, mining, extraction, or harvesting of personally identifiable data, or similar activity, from the output of any Stability Technology except with respect to data that you have provided as input to the Stability Technology and which you are legally entitled to process, for so long as you retain such entitlement;
          - Development, improvement, or manufacture of any weapons of mass destruction (such as nuclear, chemical, or biologic weapons), weapons of war (such as missiles or landmines), or any gain of function-related activities with respect to any pathogens;
          - Mission critical applications or systems where best industry practices require fail-safe controls or performance, including operation of nuclear facilities, aircraft navigation, electrical grids, communication systems, water treatment facilities, air traffic control, life support, weapons systems, or emergency locator or other emergency services;
  - To intentionally deceive or mislead others, including use of Japanese
  StableLM related to the following:
      - Generating, promoting, or furthering fraud or the creation or promotion of disinformation;
      - Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content;
      - Generating, promoting, or further distributing spam;
      - Impersonating another individual without consent, authorization, or legal right
      - Representing or misleading people into believing that the use of Japanese StableLM or outputs are human-generated;
      - Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement;
      - Generating or facilitating large-scale political advertisements, propaganda, or influence campaigns;
  - Fail to appropriately disclose to end users any known dangers of your AI
  system or misrepresent or mislead with respect to its abilities.

  Nothing in this AUP is intended to prevent or impede any good faith research,
  testing, or evaluation of Japanese StableLM, or publication related to any of
  the foregoing. If you discover any flaws in Japanese StableLM that may be
  harmful to people in any way, we encourage you to notify us and give us a
  chance to remedy such flaws before others can exploit them. If you have
  questions about this AUP, contact us at legal@stability.ai.

Japanese InstructBLIP Alpha

Model Details

Japanese InstructBLIP Alpha is a vision-language instruction-following model that enables to generate Japanese descriptions for input images and optionally input texts such as questions.

Usage

First install additional dependencies in requirements.txt:

pip install sentencepiece einops

import torch
from transformers import LlamaTokenizer, AutoModelForVision2Seq, BlipImageProcessor
from PIL import Image
import requests

# helper function to format input prompts
def build_prompt(prompt="", sep="\n\n### "):
    sys_msg = "以下は、タスクを説明する指示と、文脈のある入力の組み合わせです。要求を適切に満たす応答を書きなさい。"
    p = sys_msg
    roles = ["指示", "応答"]
    user_query = "与えられた画像について、詳細に述べてください。"
    msgs = [": \n" + user_query, ": "]
    if prompt:
        roles.insert(1, "入力")
        msgs.insert(1, ": \n" + prompt)
    for role, msg in zip(roles, msgs):
        p += sep + role + msg
    return p

# load model
model = AutoModelForVision2Seq.from_pretrained("stabilityai/japanese-instructblip-alpha", trust_remote_code=True)
processor = BlipImageProcessor.from_pretrained("stabilityai/japanese-instructblip-alpha")
tokenizer = LlamaTokenizer.from_pretrained("novelai/nerdstash-tokenizer-v1", additional_special_tokens=['▁▁'])
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# prepare inputs
url = "https://images.unsplash.com/photo-1582538885592-e70a5d7ab3d3?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1770&q=80"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
prompt = "" # input empty string for image captioning. You can also input questions as prompts 
prompt = build_prompt(prompt)
inputs = processor(images=image, return_tensors="pt")
text_encoding = tokenizer(prompt, add_special_tokens=False, return_tensors="pt")
text_encoding["qformer_input_ids"] = text_encoding["input_ids"].clone()
text_encoding["qformer_attention_mask"] = text_encoding["attention_mask"].clone()
inputs.update(text_encoding)

# generate
outputs = model.generate(
    **inputs.to(device, dtype=model.dtype),
    num_beams=5,
    max_new_tokens=32,
    min_length=1,
)
generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0].strip()
print(generated_text)
# 桜と東京スカイツリー

Model Details

Developed by: Stability AI
Model type: InstructBLIP
Language(s): Japanese
License: JAPANESE STABLELM RESEARCH LICENSE AGREEMENT.

Training

Japanese InstructBLIP Alpha leverages the InstructBLIP architecture. It consists of 3 components: a frozen vision image encoder, a Q-Former, and a frozen LLM. The vision encoder and the Q-Former were initialized with Salesforce/instructblip-vicuna-7b. For the frozen LLM, Japanese-StableLM-Instruct-Alpha-7B model was used. During training, only Q-Former was trained.

Training Dataset

The training dataset includes the following public datasets:

CC12M with captions translated into Japanese
MS-COCO with STAIR Captions
Japanese Visual Genome VQA dataset

Use and Limitations

Intended Use

This model is intended to be used by the open-source community in chat-like applications in adherence with the research license.

Limitations and bias

Although the aforementioned datasets help to steer the base language models into "safer" distributions of text, not all biases and toxicity can be mitigated through fine-tuning. We ask that users be mindful of such potential issues that can arise in generated responses. Do not treat model outputs as substitutes for human judgment or as sources of truth. Please use responsibly.

How to cite

@misc{JapaneseInstructBLIPAlpha, 
    url    = {[https://huggingface.co/stabilityai/japanese-instructblip-alpha](https://huggingface.co/stabilityai/japanese-instructblip-alpha)}, 
    title  = {Japanese InstructBLIP Alpha}, 
    author = {Shing, Makoto and Akiba, Takuya}
}

Citations

@misc{dai2023instructblip,
    title         = {InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning}, 
    author        = {Wenliang Dai and Junnan Li and Dongxu Li and Anthony Meng Huat Tiong and Junqi Zhao and Weisheng Wang and Boyang Li and Pascale Fung and Steven Hoi},
    year          = {2023},
    eprint        = {2305.06500},
    archivePrefix = {arXiv},
    primaryClass  = {cs.CV}
}