zhoukz commited on
Commit
c2eff08
·
1 Parent(s): 0da09ee

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +62 -0
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ # TODO 明确支持的语言
7
+ tags:
8
+ - multimodal
9
+ - audio-language-model
10
+ - audio
11
+ # - audio-captioning
12
+ # - audio-classification
13
+ # - audio-generation
14
+ # - audio-question-answering
15
+ # - audio-understanding
16
+ # - chat
17
+ # - speech-recognition
18
+ # - text-to-speech
19
+ # TODO 有什么能力
20
+ base_model:
21
+ - mispeech/dasheng-0.6B
22
+ - Qwen/Qwen2.5-Omni-3B
23
+ # TODO 检查是否正确
24
+ ---
25
+
26
+ # MiDashengLM
27
+
28
+ ## Usage
29
+
30
+ Dependencies:
31
+
32
+ * `transformers`
33
+ * `torchaudio`
34
+
35
+ TODO:以下由Qwen2.5-Omni-3B依赖,引入路径未知,需要去除
36
+
37
+ * `pillow`
38
+ * `torchvision`
39
+
40
+ ### Inference
41
+
42
+ ```python
43
+ >>> from transformers import AutoModelForCausalLM, AutoProcessor
44
+ >>> model = AutoModelForCausalLM.from_pretrained("zhoukz/MiDashengLM-HF-dev", trust_remote_code=True)
45
+ >>> processor = AutoProcessor.from_pretrained("zhoukz/MiDashengLM-HF-dev", trust_remote_code=True)
46
+
47
+ >>> import torchaudio
48
+ >>> audio, sr = torchaudio.load("path/to/audio.wav")
49
+ >>> assert sr == 16000
50
+ >>> text = ["<|im_start|>system\\nYou are a helpful language and speech assistant.<|im_end|>\\n<|im_start|>user\\nCaption the audio<|audio_bos|><|AUDIO|><|audio_eos|><|im_end|>\\n<|im_start|>assistant\\n'"]
51
+
52
+ >>> model_inputs = processor(text=text, audio=audio)
53
+ >>> output = model.generate(**model_inputs)
54
+ >>> print(output)
55
+ ["An engine is idling.'"]
56
+ ```
57
+
58
+ ## Citation
59
+
60
+ ```bibtex
61
+ TODO
62
+ ```