Text Generation
Transformers
Safetensors
llava-qwen2
Generated from Trainer
axolotl
conversational
custom_code
Inference Endpoints
Edit model card

DolphinVision 72b 🐬

Curated and trained by Quan Nguyen (qnguyen3/stablequan), Eric Hartford, and Cognitive Computations

Discord Discord: https://discord.gg/cognitivecomputations

Our appreciation for the sponsors of DolphinVision:

  • Crusoe Cloud - provided excellent on-demand 8xH100 node used for training
  • TensorWave - provided 8x mi300x node used for evaluations and inference

DolphinVision is a multimodal model. It is uncensored, and capable to reason and comment regarding images that other popular models would object to.

import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
import warnings

# disable some warnings
transformers.logging.set_verbosity_error()
transformers.logging.disable_progress_bar()
warnings.filterwarnings('ignore')

# set device
torch.set_default_device('cuda')  # or 'cpu'

model_name = 'cognitivecomputations/dolphin-vision-72b'

# create model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map='auto',
    trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True)

# text prompt
prompt = 'Describe this image in detail'

messages = [
    {"role": "user", "content": f'<image>\n{prompt}'}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

print(text)

text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('<image>')]
input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).unsqueeze(0)

# image, sample images can be found in images folder
image = Image.open('/path/to/image.png')
image_tensor = model.process_images([image], model.config).to(dtype=model.dtype)

# generate
output_ids = model.generate(
    input_ids,
    images=image_tensor,
    max_new_tokens=2048,
    use_cache=True)[0]

print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())

Examples

image/png

<|im_start|>user
: do OCR on all the text
<|im_start|>assistant
: The text in the image reads as follows:

- STOP USING DOCKER
- Applications were not meant to be trapped in virtual jars like digital fireflies
- Years of software development, yet no one questioned why we were putting programs in imaginary boats
- Let's containerize our monolith. I'll just quickly spin up 17 microservices. Statements dreamt up by the utterly deranged
- Observe the whale-sized deception they fooled us with all this time
- These are real images used to explain Docker
- "Yay Matey, I'd like to run my website in a virtual sea"
- They have played us for absolute fools

This text is a satirical commentary on the use of Docker in software development, using humor and exaggeration to question the practice.
Downloads last month
126
Safetensors
Model size
73.2B params
Tensor type
BF16
Β·
Inference API
Input a message to start chatting with cognitivecomputations/dolphin-vision-72b.
Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.

Finetuned from

Datasets used to train cognitivecomputations/dolphin-vision-72b

Space using cognitivecomputations/dolphin-vision-72b 1