genevera commited on
Commit
71ef4af
1 Parent(s): b561bb5

send model to device

Browse files

I was running into this error:
```
Traceback (most recent call last):
File "/opt/conda/envs/audiotoken/lib/python3.10/site-packages/gradio/routes.py", line 437, in run_predict
output = await app.get_blocks().process_api(
File "/opt/conda/envs/audiotoken/lib/python3.10/site-packages/gradio/blocks.py", line 1346, in process_api
result = await self.call_function(
File "/opt/conda/envs/audiotoken/lib/python3.10/site-packages/gradio/blocks.py", line 1074, in call_function
prediction = await anyio.to_thread.run_sync(
File "/opt/conda/envs/audiotoken/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/opt/conda/envs/audiotoken/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/opt/conda/envs/audiotoken/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/genevera/src/AudioToken/./app.py", line 117, in greet
aud_features = model.aud_encoder.extract_features(audio_values)[1]
File "/home/genevera/src/AudioToken/modules/beats/BEATs.py", line 145, in extract_features
features = self.patch_embedding(fbank)
File "/opt/conda/envs/audiotoken/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/audiotoken/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/opt/conda/envs/audiotoken/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
```

until I made the change in this PR.

Files changed (1) hide show
  1. app.py +1 -1
app.py CHANGED
@@ -136,7 +136,7 @@ if __name__ == "__main__":
136
  lora = False
137
  device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
138
  model = AudioTokenWrapper(lora, device)
139
-
140
  description = """<p>
141
  This is a demo of <a href='https://pages.cs.huji.ac.il/adiyoss-lab/AudioToken' target='_blank'>AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation</a>.<br><br>
142
  A novel method utilizing latent diffusion models trained for text-to-image-generation to generate images conditioned on audio recordings. Using a pre-trained audio encoding model, the proposed method encodes audio into a new token, which can be considered as an adaptation layer between the audio and text representations.<br><br>
 
136
  lora = False
137
  device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
138
  model = AudioTokenWrapper(lora, device)
139
+ model = model.to(device)
140
  description = """<p>
141
  This is a demo of <a href='https://pages.cs.huji.ac.il/adiyoss-lab/AudioToken' target='_blank'>AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation</a>.<br><br>
142
  A novel method utilizing latent diffusion models trained for text-to-image-generation to generate images conditioned on audio recordings. Using a pre-trained audio encoding model, the proposed method encodes audio into a new token, which can be considered as an adaptation layer between the audio and text representations.<br><br>