Nov 18, 2024

we get this error on this model:

Traceback (most recent call last):
File "/home/silvacarl/Dropbox/Developer-Tools/STT-Cleanup-Project/callmydoc-speech-samples/audio-cleanup/ultravox-test.py", line 37, in
result = pipe({'audio': audio, 'turns': turns, 'sampling_rate': sr}, max_new_tokens=256)
File "/home/silvacarl/.local/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1302, in call
return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
File "/home/silvacarl/.local/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1308, in run_single
model_inputs = self.preprocess(inputs, **preprocess_params)
File "/home/silvacarl/.cache/huggingface/modules/transformers_modules/fixie-ai/ultravox-v0_4_1-mistral-nemo/3ddeb17298b14bce562d9541eb28ee1ac9df01cd/ultravox_pipeline.py", line 72, in preprocess
text = self.processor.tokenizer.apply_chat_template(
AttributeError: 'NoneType' object has no attribute 'tokenizer'

from this:

Specify the device explicitly using torch.device

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Initialize the pipeline with the updated device parameter

pipe = transformers.pipeline(

model='fixie-ai/ultravox-v0_4_1-llama-3_1-8b',

model='fixie-ai/ultravox-v0_4_1-mistral-nemo',

model='fixie-ai/ultravox-v0_4', 
trust_remote_code=True,
device=device

)

path = "Maricica-Moraru-21636092.mp3" # TODO: pass the audio here
audio, sr = librosa.load(path, sr=16000)

turns = [
{
"role": "system",
"content": (
"Your job is to only transcribe calls. "
"The first part of the recording to transcribe contains a proper name. "
"The rest of the recording to transcribe is a question for the office. "
"Ensure and check to make sure your transcriptions are as perfect as possible. "
"Do not interpret the transcriptions only display it."
)
},
]

Get the result from the pipeline

result = pipe({'audio': audio, 'turns': turns, 'sampling_rate': sr}, max_new_tokens=256)

Print the result

print(result)

zkoch

Fixie.ai org Nov 18, 2024

cc @petersalas-fixie

silvacarl

Nov 18, 2024

the same thing occurs on some of the other models.

code used:

Install necessary packages if not already installed

pip install transformers peft librosa torch

import torch
from transformers import pipeline
import numpy as np
import librosa
import sys

Check for GPU availability

device = 0 if torch.cuda.is_available() else -1

pipe = pipeline(

model='fixie-ai/ultravox-v0_4_1-llama-3_1-8b',

        model='fixie-ai/ultravox-v0_4_1-llama-3_1-70b',
        trust_remote_code=True,
        device=device  # Automatically use GPU if available
    )

Check for command-line argument

if len(sys.argv) < 2:
print("Usage: python ultravox-test.py ")
sys.exit(1) # Exit with error code 1 to indicate abnormal termination

Get the audio file path from command line

path = sys.argv[1]

audio, sr = librosa.load(path, sr=16000)

turns = [
{
"role": "system",
"content": (
"Your job is to only transcribe calls from patients and doctors into text. "
"Do not provide explanations, just the transcription."
),
},
]
results = pipe({'audio': audio, 'turns': turns, 'sampling_rate': sr}, max_new_tokens=30)
print (results)

zkoch

Fixie.ai org Nov 18, 2024

Actually, I think this is for you @farzadab

farzadab

Fixie.ai org Nov 18, 2024

•

edited Nov 19, 2024

Thanks for reporting!

Yeah the issue is that in the new version of transformers, the pipeline object overrides our processor. It has a one line fix.
I tried going around to all of our public models and applied the following fix: https://huggingface.co/fixie-ai/ultravox-v0_4_1-mistral-nemo/commit/dfa1fa4e7a5f28f649fa168580106ed31b4809f9

Let me know if it still blows up.

silvacarl

Nov 19, 2024

hang on will try it now

silvacarl

Nov 19, 2024

tested all of them so far so good, just testing now the 70b

silvacarl

Nov 21, 2024

got this error:

python ultravox-test-multi-gpu.py Maricica-Moraru-21636092.mp3
GPU 0: NVIDIA A40, Memory: 47.32 GB
GPU 1: NVIDIA A40, Memory: 47.32 GB
GPU 2: NVIDIA A40, Memory: 47.32 GB
GPU 3: NVIDIA A40, Memory: 47.32 GB
Loading model fixie-ai/ultravox-v0_4_1-llama-3_1-70b across 4 GPUs...
An error occurred: Unrecognized configuration class <class 'transformers_modules.fixie-ai.ultravox-v0_4_1-llama-3_1-70b.1a087b379fdbc4b1178357134e88c7df7622751b.ultravox_config.UltravoxConfig'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DbrxConfig, ElectraConfig, ErnieConfig, FalconConfig, FalconMambaConfig, FuyuConfig, GemmaConfig, Gemma2Config, GitConfig, GlmConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, GraniteConfig, GraniteMoeConfig, JambaConfig, JetMoeConfig, LlamaConfig, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MistralConfig, MixtralConfig, MllamaConfig, MoshiConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NemotronConfig, OlmoConfig, OlmoeConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, Phi3Config, PhimoeConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, WhisperConfig

with this test code:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import numpy as np
import librosa
import sys
import os
from accelerate import infer_auto_device_map, init_empty_weights
from transformers.utils import check_min_version
from accelerate import dispatch_model

def setup_model_parallel():
"""Setup model parallel deployment across available GPUs."""
if torch.cuda.is_available():
num_gpus = torch.cuda.device_count()
if num_gpus < 2:
print(f"Warning: Only {num_gpus} GPU detected. Multi-GPU optimization disabled.")
return 0

    # Print GPU memory info
    for i in range(num_gpus):
        gpu_properties = torch.cuda.get_device_properties(i)
        print(f"GPU {i}: {gpu_properties.name}, Memory: {gpu_properties.total_memory / 1024**3:.2f} GB")
    
    return num_gpus
else:
    print("No GPUs detected. Running on CPU.")
    return 0

def load_model_on_multi_gpu(model_name):
"""Load model across multiple GPUs using device map."""
num_gpus = setup_model_parallel()

if num_gpus < 2:
    # Fall back to single GPU or CPU
    return pipeline(
        model=model_name,
        trust_remote_code=True,
        device=0 if torch.cuda.is_available() else -1
    )

print(f"Loading model {model_name} across {num_gpus} GPUs...")

# Calculate optimal device map
with init_empty_weights():
    model = AutoModelForCausalLM.from_pretrained(
        model_name, 
        trust_remote_code=True
    )

# Get max memory for each GPU
max_memory = {i: f"{int(torch.cuda.get_device_properties(i).total_memory * 0.85 / 1024**3)}GiB" for i in range(num_gpus)}
max_memory["cpu"] = "96GiB"  # Adjust based on available RAM

# Load the model with device map
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    trust_remote_code=True,
    max_memory=max_memory,
    torch_dtype=torch.float16  # Use FP16 for memory efficiency
)

tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create pipeline with loaded model
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    trust_remote_code=True
)

return pipe

def process_audio(audio_path, pipe):
"""Process audio file using the pipeline."""
try:
print(f"Loading audio file: {audio_path}")
audio, sr = librosa.load(audio_path, sr=None)
duration = librosa.get_duration(y=audio, sr=sr)
print(f"Audio file loaded. Duration: {duration:.2f} seconds.")

    # Run inference
    print("Running inference...")
    with torch.cuda.amp.autocast():  # Use automatic mixed precision
        result = pipe(audio)
    
    return result

except Exception as e:
    print(f"Error processing audio: {str(e)}")
    return None

def main():
# Check command line arguments
if len(sys.argv) < 2:
print("Usage: python ultravox-test-multi-gpu.py ")
sys.exit(1)

audio_path = sys.argv[1]
if not os.path.exists(audio_path):
    print(f"Error: Audio file '{audio_path}' not found.")
    sys.exit(1)

# Model name
model_name = 'fixie-ai/ultravox-v0_4_1-llama-3_1-70b'

try:
    # Load model with multi-GPU support
    pipe = load_model_on_multi_gpu(model_name)
    
    # Process audio
    result = process_audio(audio_path, pipe)
    
    # Output results
    if result is not None:
        print("\nInference result:")
        print(result)
    
except Exception as e:
    print(f"An error occurred: {str(e)}")
    sys.exit(1)

if name == "main":
main()

farzadab

Fixie.ai org Nov 21, 2024

I don't think we register AutoModelForCausalLM. Does AutoModel work?

farzadab

Fixie.ai org Nov 21, 2024

btw, you can use three dashes ``` to encapsulate code in Markdown so that your code get formatted better.

If you use ```python then it'll even do syntax highlighting:

if __name__ == "__main__":
    main()

silvacarl

Nov 21, 2024

thx will try AutoModel

silvacarl

Nov 21, 2024

nope:

python ultravox-test-multi-gpu.py Maricica-Moraru-21636092.mp3
GPU 0: NVIDIA A40, Memory: 47.32 GB
GPU 1: NVIDIA A40, Memory: 47.32 GB
GPU 2: NVIDIA A40, Memory: 47.32 GB
GPU 3: NVIDIA A40, Memory: 47.32 GB
Loading model fixie-ai/ultravox-v0_4_1-llama-3_1-70b across 4 GPUs...
An error occurred: Unrecognized configuration class <class 'transformers_modules.fixie-ai.ultravox-v0_4_1-llama-3_1-70b.1a087b379fdbc4b1178357134e88c7df7622751b.ultravox_config.UltravoxConfig'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DbrxConfig, ElectraConfig, ErnieConfig, FalconConfig, FalconMambaConfig, FuyuConfig, GemmaConfig, Gemma2Config, GitConfig, GlmConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, GraniteConfig, GraniteMoeConfig, JambaConfig, JetMoeConfig, LlamaConfig, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MistralConfig, MixtralConfig, MllamaConfig, MoshiConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NemotronConfig, OlmoConfig, OlmoeConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, Phi3Config, PhimoeConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig, ZambaConfig.

LlamaConfig?

fixie-ai
/

ultravox-v0_4_1-mistral-nemo

no attribute tokenizer?

Specify the device explicitly using torch.device

Initialize the pipeline with the updated device parameter

model='fixie-ai/ultravox-v0_4_1-llama-3_1-8b',

model='fixie-ai/ultravox-v0_4_1-mistral-nemo',

Get the result from the pipeline

Print the result

Install necessary packages if not already installed

pip install transformers peft librosa torch

Check for GPU availability

model='fixie-ai/ultravox-v0_4_1-llama-3_1-8b',

Check for command-line argument

Get the audio file path from command line