no attribute tokenizer?
we get this error on this model:
Traceback (most recent call last):
File "/home/silvacarl/Dropbox/Developer-Tools/STT-Cleanup-Project/callmydoc-speech-samples/audio-cleanup/ultravox-test.py", line 37, in
result = pipe({'audio': audio, 'turns': turns, 'sampling_rate': sr}, max_new_tokens=256)
File "/home/silvacarl/.local/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1302, in call
return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
File "/home/silvacarl/.local/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1308, in run_single
model_inputs = self.preprocess(inputs, **preprocess_params)
File "/home/silvacarl/.cache/huggingface/modules/transformers_modules/fixie-ai/ultravox-v0_4_1-mistral-nemo/3ddeb17298b14bce562d9541eb28ee1ac9df01cd/ultravox_pipeline.py", line 72, in preprocess
text = self.processor.tokenizer.apply_chat_template(
AttributeError: 'NoneType' object has no attribute 'tokenizer'
from this:
Specify the device explicitly using torch.device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
Initialize the pipeline with the updated device parameter
pipe = transformers.pipeline(
model='fixie-ai/ultravox-v0_4_1-llama-3_1-8b',
model='fixie-ai/ultravox-v0_4_1-mistral-nemo',
model='fixie-ai/ultravox-v0_4',
trust_remote_code=True,
device=device
)
path = "Maricica-Moraru-21636092.mp3" # TODO: pass the audio here
audio, sr = librosa.load(path, sr=16000)
turns = [
{
"role": "system",
"content": (
"Your job is to only transcribe calls. "
"The first part of the recording to transcribe contains a proper name. "
"The rest of the recording to transcribe is a question for the office. "
"Ensure and check to make sure your transcriptions are as perfect as possible. "
"Do not interpret the transcriptions only display it."
)
},
]
Get the result from the pipeline
result = pipe({'audio': audio, 'turns': turns, 'sampling_rate': sr}, max_new_tokens=256)
Print the result
print(result)
the same thing occurs on some of the other models.
code used:
Install necessary packages if not already installed
pip install transformers peft librosa torch
import torch
from transformers import pipeline
import numpy as np
import librosa
import sys
Check for GPU availability
device = 0 if torch.cuda.is_available() else -1
pipe = pipeline(
model='fixie-ai/ultravox-v0_4_1-llama-3_1-8b',
model='fixie-ai/ultravox-v0_4_1-llama-3_1-70b',
trust_remote_code=True,
device=device # Automatically use GPU if available
)
Check for command-line argument
if len(sys.argv) < 2:
print("Usage: python ultravox-test.py ")
sys.exit(1) # Exit with error code 1 to indicate abnormal termination
Get the audio file path from command line
path = sys.argv[1]
audio, sr = librosa.load(path, sr=16000)
turns = [
{
"role": "system",
"content": (
"Your job is to only transcribe calls from patients and doctors into text. "
"Do not provide explanations, just the transcription."
),
},
]
results = pipe({'audio': audio, 'turns': turns, 'sampling_rate': sr}, max_new_tokens=30)
print (results)
Thanks for reporting!
Yeah the issue is that in the new version of transformers
, the pipeline
object overrides our processor
. It has a one line fix.
I tried going around to all of our public models and applied the following fix: https://huggingface.co/fixie-ai/ultravox-v0_4_1-mistral-nemo/commit/dfa1fa4e7a5f28f649fa168580106ed31b4809f9
Let me know if it still blows up.
hang on will try it now
tested all of them so far so good, just testing now the 70b
got this error:
python ultravox-test-multi-gpu.py Maricica-Moraru-21636092.mp3
GPU 0: NVIDIA A40, Memory: 47.32 GB
GPU 1: NVIDIA A40, Memory: 47.32 GB
GPU 2: NVIDIA A40, Memory: 47.32 GB
GPU 3: NVIDIA A40, Memory: 47.32 GB
Loading model fixie-ai/ultravox-v0_4_1-llama-3_1-70b across 4 GPUs...
An error occurred: Unrecognized configuration class <class 'transformers_modules.fixie-ai.ultravox-v0_4_1-llama-3_1-70b.1a087b379fdbc4b1178357134e88c7df7622751b.ultravox_config.UltravoxConfig'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DbrxConfig, ElectraConfig, ErnieConfig, FalconConfig, FalconMambaConfig, FuyuConfig, GemmaConfig, Gemma2Config, GitConfig, GlmConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, GraniteConfig, GraniteMoeConfig, JambaConfig, JetMoeConfig, LlamaConfig, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MistralConfig, MixtralConfig, MllamaConfig, MoshiConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NemotronConfig, OlmoConfig, OlmoeConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, Phi3Config, PhimoeConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, WhisperConfig
with this test code:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import numpy as np
import librosa
import sys
import os
from accelerate import infer_auto_device_map, init_empty_weights
from transformers.utils import check_min_version
from accelerate import dispatch_model
def setup_model_parallel():
"""Setup model parallel deployment across available GPUs."""
if torch.cuda.is_available():
num_gpus = torch.cuda.device_count()
if num_gpus < 2:
print(f"Warning: Only {num_gpus} GPU detected. Multi-GPU optimization disabled.")
return 0
# Print GPU memory info
for i in range(num_gpus):
gpu_properties = torch.cuda.get_device_properties(i)
print(f"GPU {i}: {gpu_properties.name}, Memory: {gpu_properties.total_memory / 1024**3:.2f} GB")
return num_gpus
else:
print("No GPUs detected. Running on CPU.")
return 0
def load_model_on_multi_gpu(model_name):
"""Load model across multiple GPUs using device map."""
num_gpus = setup_model_parallel()
if num_gpus < 2:
# Fall back to single GPU or CPU
return pipeline(
model=model_name,
trust_remote_code=True,
device=0 if torch.cuda.is_available() else -1
)
print(f"Loading model {model_name} across {num_gpus} GPUs...")
# Calculate optimal device map
with init_empty_weights():
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True
)
# Get max memory for each GPU
max_memory = {i: f"{int(torch.cuda.get_device_properties(i).total_memory * 0.85 / 1024**3)}GiB" for i in range(num_gpus)}
max_memory["cpu"] = "96GiB" # Adjust based on available RAM
# Load the model with device map
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
trust_remote_code=True,
max_memory=max_memory,
torch_dtype=torch.float16 # Use FP16 for memory efficiency
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Create pipeline with loaded model
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
trust_remote_code=True
)
return pipe
def process_audio(audio_path, pipe):
"""Process audio file using the pipeline."""
try:
print(f"Loading audio file: {audio_path}")
audio, sr = librosa.load(audio_path, sr=None)
duration = librosa.get_duration(y=audio, sr=sr)
print(f"Audio file loaded. Duration: {duration:.2f} seconds.")
# Run inference
print("Running inference...")
with torch.cuda.amp.autocast(): # Use automatic mixed precision
result = pipe(audio)
return result
except Exception as e:
print(f"Error processing audio: {str(e)}")
return None
def main():
# Check command line arguments
if len(sys.argv) < 2:
print("Usage: python ultravox-test-multi-gpu.py ")
sys.exit(1)
audio_path = sys.argv[1]
if not os.path.exists(audio_path):
print(f"Error: Audio file '{audio_path}' not found.")
sys.exit(1)
# Model name
model_name = 'fixie-ai/ultravox-v0_4_1-llama-3_1-70b'
try:
# Load model with multi-GPU support
pipe = load_model_on_multi_gpu(model_name)
# Process audio
result = process_audio(audio_path, pipe)
# Output results
if result is not None:
print("\nInference result:")
print(result)
except Exception as e:
print(f"An error occurred: {str(e)}")
sys.exit(1)
if name == "main":
main()
I don't think we register AutoModelForCausalLM
. Does AutoModel
work?
btw, you can use three dashes ```
to encapsulate code in Markdown so that your code get formatted better.
If you use ```python
then it'll even do syntax highlighting:
if __name__ == "__main__":
main()
thx will try AutoModel
nope:
python ultravox-test-multi-gpu.py Maricica-Moraru-21636092.mp3
GPU 0: NVIDIA A40, Memory: 47.32 GB
GPU 1: NVIDIA A40, Memory: 47.32 GB
GPU 2: NVIDIA A40, Memory: 47.32 GB
GPU 3: NVIDIA A40, Memory: 47.32 GB
Loading model fixie-ai/ultravox-v0_4_1-llama-3_1-70b across 4 GPUs...
An error occurred: Unrecognized configuration class <class 'transformers_modules.fixie-ai.ultravox-v0_4_1-llama-3_1-70b.1a087b379fdbc4b1178357134e88c7df7622751b.ultravox_config.UltravoxConfig'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DbrxConfig, ElectraConfig, ErnieConfig, FalconConfig, FalconMambaConfig, FuyuConfig, GemmaConfig, Gemma2Config, GitConfig, GlmConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, GraniteConfig, GraniteMoeConfig, JambaConfig, JetMoeConfig, LlamaConfig, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MistralConfig, MixtralConfig, MllamaConfig, MoshiConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NemotronConfig, OlmoConfig, OlmoeConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, Phi3Config, PhimoeConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig, ZambaConfig.
LlamaConfig?