Janus-1.3B-ONNX / README.md
Xenova's picture
Xenova HF staff
Update README.md
6e8e5ab verified
metadata
license: other
base_model:
  - deepseek-ai/Janus-1.3B
pipeline_tag: any-to-any
library_name: transformers.js
tags:
  - text-to-image
  - image-to-text
  - image-text-to-text

https://huggingface.co/deepseek-ai/Janus-1.3B with ONNX weights to be compatible with Transformers.js.

Usage (Transformers.js)

If you haven't already, you can install the Transformers.js JavaScript library from NPM using:

npm i @huggingface/transformers

Example: Image+text to text

import { AutoProcessor, MultiModalityCausalLM } from "@huggingface/transformers";

// Load processor and model
const model_id = "onnx-community/Janus-1.3B-ONNX";
const processor = await AutoProcessor.from_pretrained(model_id);
const model = await MultiModalityCausalLM.from_pretrained(model_id);

// Prepare inputs
const conversation = [
  {
    role: "User",
    content: "<image_placeholder>\nConvert the formula into latex code.",
    images: ["https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/quadratic_formula.png"],
  },
];
const inputs = await processor(conversation);

// Generate response
const outputs = await model.generate({
  ...inputs,
  max_new_tokens: 150,
  do_sample: false,
});

// Decode output
const new_tokens = outputs.slice(null, [inputs.input_ids.dims.at(-1), null]);
const decoded = processor.batch_decode(new_tokens, { skip_special_tokens: true });
console.log(decoded[0]);

Sample output:

Sure, here is the LaTeX code for the given formula:

```
x = \frac{-b \pm \sqrt{b^2 - 4a c}}{2a}
```

This code represents the mathematical expression for the variable \( x \).

Example: Text to image

import { AutoProcessor, MultiModalityCausalLM } from "@huggingface/transformers";

// Load processor and model
const model_id = "onnx-community/Janus-1.3B-ONNX";
const processor = await AutoProcessor.from_pretrained(model_id);
const model = await MultiModalityCausalLM.from_pretrained(model_id);

// Prepare inputs
const conversation = [
  {
    role: "User",
    content: "A cute and adorable baby fox with big brown eyes, autumn leaves in the background enchanting,immortal,fluffy, shiny mane,Petals,fairyism,unreal engine 5 and Octane Render,highly detailed, photorealistic, cinematic, natural colors.",
  },
];
const inputs = await processor(conversation, { chat_template: "text_to_image" });

// Generate response
const num_image_tokens = processor.num_image_tokens;
const outputs = await model.generate_images({
  ...inputs,
  min_new_tokens: num_image_tokens,
  max_new_tokens: num_image_tokens,
  do_sample: true,
});

// Save the generated image
await outputs[0].save("test.png");

Sample outputs:

image/png image/png image/png image/png
image/png image/png image/png image/png

Want to play around with the model? Check out the online WebGPU demo.