MERaLiON
/

MERaLiON-AudioLLM-Whisper-SEA-LION

@@ -26,7 +26,7 @@ MERaLiON stands for **M**ultimodal **E**mpathetic **R**easoning **a**nd **L**ear
 - **Language(s) (NLP):** English, Chinese, Vietnamese, Indonesian, Thai, Filipino, Tamil, Malay, Khmer, Lao, Burmese, Javanese, Sundanese
 - **License:** MIT
-For more details, please refer to our [report]().
 ## Model Description
@@ -468,6 +468,70 @@ generated_ids = outputs[:, inputs['input_ids'].size(1):]
 response = processor.batch_decode(generated_ids, skip_special_tokens=True)
 ```
 ## Bias, Risks, and Limitations
 The current MERaLiON-AudioLLM has not been aligned for safety. Developers and users should perform their own safety fine-tuning and related security measures. In no event shall the authors be held liable for any claim, damages, or other liability arising from the use of the released weights and codes.

 - **Language(s) (NLP):** English, Chinese, Vietnamese, Indonesian, Thai, Filipino, Tamil, Malay, Khmer, Lao, Burmese, Javanese, Sundanese
 - **License:** MIT
+We support model inference using the [Huggingface](#inference) and [VLLM](#vllm-inference) frameworks. For more technical details, please refer to our [report]().
 ## Model Description
 response = processor.batch_decode(generated_ids, skip_special_tokens=True)
 ```
+### VLLM Inference
+MERaLiON-AudioLLM requires vllm version `0.6.4.post1`.
+```
+pip install vllm==0.6.4.post1
+```
+Here is an example of offline inference using our custom vllm class.
+```python
+import torch
+from vllm import ModelRegistry, LLM, SamplingParams
+from vllm.assets.audio import AudioAsset
+# register custom MERaLiON-AudioLLM class
+from .vllm_meralion import MERaLiONForConditionalGeneration
+ModelRegistry.register_model("MERaLiONForConditionalGeneration", MERaLiONForConditionalGeneration)
+def run_meralion(question: str):
+    model_name = "MERaLiON/MERaLiON-AudioLLM-Whisper-SEA-LION"
+    llm = LLM(model=model_name,
+              tokenizer=model_name,
+              tokenizer_mode="slow",
+              max_model_len=4096,
+              max_num_seqs=5,
+              limit_mm_per_prompt={"audio": 1},
+              trust_remote_code=True,
+              dtype=torch.bfloat16
+              )
+    audio_in_prompt = "Given the following audio context: <SpeechHere>\n\n"
+    prompt = ("<start_of_turn>user\n"
+              f"{audio_in_prompt}Text instruction: {question}<end_of_turn>\n"
+              "<start_of_turn>model\n")
+    stop_token_ids = None
+    return llm, prompt, stop_token_ids
+audio_asset = AudioAsset("mary_had_lamb")
+question= "Please trancribe this speech."
+llm, prompt, stop_token_ids = run_meralion(question)
+# We set temperature to 0.2 so that outputs can be different
+# even when all prompts are identical when running batch inference.
+sampling_params = SamplingParams(temperature=0.2,
+                                    max_tokens=64,
+                                    stop_token_ids=stop_token_ids)
+mm_data = {"audio": [audio_asset.audio_and_sample_rate]}
+inputs = {"prompt": prompt, "multi_modal_data": mm_data}
+# batch inference
+inputs = [inputs] * 2
+outputs = llm.generate(inputs, sampling_params=sampling_params)
+for o in outputs:
+    generated_text = o.outputs[0].text
+    print(generated_text)
+```
 ## Bias, Risks, and Limitations
 The current MERaLiON-AudioLLM has not been aligned for safety. Developers and users should perform their own safety fine-tuning and related security measures. In no event shall the authors be held liable for any claim, damages, or other liability arising from the use of the released weights and codes.