Yingxu He commited on
Commit
a2391de
·
verified ·
1 Parent(s): 128578f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -1
README.md CHANGED
@@ -26,7 +26,7 @@ MERaLiON stands for **M**ultimodal **E**mpathetic **R**easoning **a**nd **L**ear
26
  - **Language(s) (NLP):** English, Chinese, Vietnamese, Indonesian, Thai, Filipino, Tamil, Malay, Khmer, Lao, Burmese, Javanese, Sundanese
27
  - **License:** MIT
28
 
29
- For more details, please refer to our [report]().
30
 
31
  ## Model Description
32
 
@@ -468,6 +468,70 @@ generated_ids = outputs[:, inputs['input_ids'].size(1):]
468
  response = processor.batch_decode(generated_ids, skip_special_tokens=True)
469
  ```
470
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
471
  ## Bias, Risks, and Limitations
472
 
473
  The current MERaLiON-AudioLLM has not been aligned for safety. Developers and users should perform their own safety fine-tuning and related security measures. In no event shall the authors be held liable for any claim, damages, or other liability arising from the use of the released weights and codes.
 
26
  - **Language(s) (NLP):** English, Chinese, Vietnamese, Indonesian, Thai, Filipino, Tamil, Malay, Khmer, Lao, Burmese, Javanese, Sundanese
27
  - **License:** MIT
28
 
29
+ We support model inference using the [Huggingface](#inference) and [VLLM](#vllm-inference) frameworks. For more technical details, please refer to our [report]().
30
 
31
  ## Model Description
32
 
 
468
  response = processor.batch_decode(generated_ids, skip_special_tokens=True)
469
  ```
470
 
471
+ ### VLLM Inference
472
+
473
+ MERaLiON-AudioLLM requires vllm version `0.6.4.post1`.
474
+
475
+ ```
476
+ pip install vllm==0.6.4.post1
477
+ ```
478
+
479
+ Here is an example of offline inference using our custom vllm class.
480
+
481
+ ```python
482
+ import torch
483
+ from vllm import ModelRegistry, LLM, SamplingParams
484
+ from vllm.assets.audio import AudioAsset
485
+
486
+ # register custom MERaLiON-AudioLLM class
487
+ from .vllm_meralion import MERaLiONForConditionalGeneration
488
+ ModelRegistry.register_model("MERaLiONForConditionalGeneration", MERaLiONForConditionalGeneration)
489
+
490
+ def run_meralion(question: str):
491
+ model_name = "MERaLiON/MERaLiON-AudioLLM-Whisper-SEA-LION"
492
+
493
+ llm = LLM(model=model_name,
494
+ tokenizer=model_name,
495
+ tokenizer_mode="slow",
496
+ max_model_len=4096,
497
+ max_num_seqs=5,
498
+ limit_mm_per_prompt={"audio": 1},
499
+ trust_remote_code=True,
500
+ dtype=torch.bfloat16
501
+ )
502
+
503
+ audio_in_prompt = "Given the following audio context: <SpeechHere>\n\n"
504
+
505
+ prompt = ("<start_of_turn>user\n"
506
+ f"{audio_in_prompt}Text instruction: {question}<end_of_turn>\n"
507
+ "<start_of_turn>model\n")
508
+ stop_token_ids = None
509
+ return llm, prompt, stop_token_ids
510
+
511
+ audio_asset = AudioAsset("mary_had_lamb")
512
+ question= "Please trancribe this speech."
513
+
514
+ llm, prompt, stop_token_ids = run_meralion(question)
515
+
516
+ # We set temperature to 0.2 so that outputs can be different
517
+ # even when all prompts are identical when running batch inference.
518
+ sampling_params = SamplingParams(temperature=0.2,
519
+ max_tokens=64,
520
+ stop_token_ids=stop_token_ids)
521
+
522
+ mm_data = {"audio": [audio_asset.audio_and_sample_rate]}
523
+ inputs = {"prompt": prompt, "multi_modal_data": mm_data}
524
+
525
+ # batch inference
526
+ inputs = [inputs] * 2
527
+
528
+ outputs = llm.generate(inputs, sampling_params=sampling_params)
529
+
530
+ for o in outputs:
531
+ generated_text = o.outputs[0].text
532
+ print(generated_text)
533
+ ```
534
+
535
  ## Bias, Risks, and Limitations
536
 
537
  The current MERaLiON-AudioLLM has not been aligned for safety. Developers and users should perform their own safety fine-tuning and related security measures. In no event shall the authors be held liable for any claim, damages, or other liability arising from the use of the released weights and codes.