aria-dev commited on
Commit
e2e1cb9
1 Parent(s): 9c1d2b0

update readme

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md CHANGED
@@ -1,3 +1,71 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ pipeline_tag: image-text-to-text
7
+ tags:
8
+ - multimodal
9
+ - aria
10
  ---
11
+ <!-- <p align="center">
12
+ <br>Aria</br>
13
+ </p> -->
14
+
15
+ This is a fork of the [rhymes-ai/Aria](https://huggingface.co/rhymes-ai/Aria) model. The primary modification is the replacement of [grouped GEMM](https://github.com/tgale96/grouped_gemm) with a sequential MLP. In this setup, each expert is a `torch.nn.Linear` layer executed sequentially. This change facilitates easier quantization using current open-source libraries, which are optimized to quantize `nn.Linear` layers.
16
+
17
+
18
+ ## Quick Start
19
+ ### Installation
20
+ ```
21
+ pip install transformers==4.45.0 accelerate==0.34.1 sentencepiece==0.2.0 torchvision requests torch Pillow
22
+ pip install flash-attn --no-build-isolation
23
+ ```
24
+
25
+ ### Inference
26
+
27
+ ```python
28
+ import requests
29
+ import torch
30
+ from PIL import Image
31
+ from transformers import AutoModelForCausalLM, AutoProcessor
32
+
33
+ model_id_or_path = "rhymes-ai/Aria-sequential_mlp"
34
+
35
+ model = AutoModelForCausalLM.from_pretrained(model_id_or_path, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
36
+
37
+ processor = AutoProcessor.from_pretrained(model_id_or_path, trust_remote_code=True)
38
+
39
+ image_path = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"
40
+
41
+ image = Image.open(requests.get(image_path, stream=True).raw)
42
+
43
+ messages = [
44
+ {
45
+ "role": "user",
46
+ "content": [
47
+ {"text": None, "type": "image"},
48
+ {"text": "what is the image?", "type": "text"},
49
+ ],
50
+ }
51
+ ]
52
+
53
+ text = processor.apply_chat_template(messages, add_generation_prompt=True)
54
+ inputs = processor(text=text, images=image, return_tensors="pt")
55
+ inputs["pixel_values"] = inputs["pixel_values"].to(model.dtype)
56
+ inputs = {k: v.to(model.device) for k, v in inputs.items()}
57
+
58
+ with torch.inference_mode(), torch.cuda.amp.autocast(dtype=torch.bfloat16):
59
+ output = model.generate(
60
+ **inputs,
61
+ max_new_tokens=500,
62
+ stop_strings=["<|im_end|>"],
63
+ tokenizer=processor.tokenizer,
64
+ do_sample=True,
65
+ temperature=0.9,
66
+ )
67
+ output_ids = output[0][inputs["input_ids"].shape[1]:]
68
+ result = processor.decode(output_ids, skip_special_tokens=True)
69
+
70
+ print(result)
71
+ ```