Spaces:
Running
on
L40S
mmcv error for python 3.10, cu118 and torch2.1.0 for custom characters.
hi @hysts , I'm almost in final stage to setup this project but met a mmcv error.
I wanna ask is there any example projects who run mmcv successfully on zero-gpus?
my project is driving 2D human video by speech audio, and mmcv is critical to allow users upload their custom videos. I confirm current scripts works well on my local, python 3.10, cu122 and torch 2.1.0. video examples are below
the building of current project including mmcv is around 45 mins, making it time consuming to debug. Thanks!
logs below:
File "/home/user/app/SMPLer-X/app.py", line 133, in <module>
infer(os.path.join(video_folder, video_input), 0.5, False, False, inferer, OUT_FOLDER)
File "/home/user/app/SMPLer-X/app.py", line 103, in infer
_, _, _ = inferer.infer(original_img, in_threshold, frame, multi_person, not(render_mesh))
File "/home/user/app/SMPLer-X/main/inference.py", line 55, in infer
mmdet_results = inference_detector(self.model, original_img)
File "/usr/local/lib/python3.10/site-packages/mmdet/apis/inference.py", line 189, in inference_detector
results = model.test_step(data_)[0]
File "/usr/local/lib/python3.10/site-packages/mmengine/model/base_model/base_model.py", line 145, in test_step
return self._run_forward(data, mode='predict') # type: ignore
File "/usr/local/lib/python3.10/site-packages/mmengine/model/base_model/base_model.py", line 361, in _run_forward
results = self(**data, mode=mode)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/mmdet/models/detectors/base.py", line 94, in forward
return self.predict(inputs, data_samples)
File "/usr/local/lib/python3.10/site-packages/mmdet/models/detectors/two_stage.py", line 231, in predict
rpn_results_list = self.rpn_head.predict(
File "/usr/local/lib/python3.10/site-packages/mmdet/models/dense_heads/base_dense_head.py", line 197, in predict
predictions = self.predict_by_feat(
File "/usr/local/lib/python3.10/site-packages/mmdet/models/dense_heads/base_dense_head.py", line 279, in predict_by_feat
results = self._predict_by_feat_single(
File "/usr/local/lib/python3.10/site-packages/mmdet/models/dense_heads/rpn_head.py", line 233, in _predict_by_feat_single
return self._bbox_post_process(
File "/usr/local/lib/python3.10/site-packages/mmdet/models/dense_heads/rpn_head.py", line 284, in _bbox_post_process
det_bboxes, keep_idxs = batched_nms(bboxes, results.scores,
File "/usr/local/lib/python3.10/site-packages/mmcv/ops/nms.py", line 303, in batched_nms
dets, keep = nms_op(boxes_for_nms, scores, **nms_cfg_)
File "/usr/local/lib/python3.10/site-packages/mmengine/utils/misc.py", line 395, in new_func
output = old_func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/mmcv/ops/nms.py", line 127, in nms
inds = NMSop.apply(boxes, scores, iou_threshold, offset, score_threshold,
File "/usr/local/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.10/site-packages/mmcv/ops/nms.py", line 27, in forward
inds = ext_module.nms(
RuntimeError: nms_impl: implementation for device cuda:0 not found.
@H-Liu1997
Ah, I don't think mmcv
works on ZeroGPU. I'll assign a normal GPU grant then.
@H-Liu1997 is looks like the github repository disappeared.
I am getting 404 on https://github.com/CyberAgentAILab/TANGO
@hlevring hi, the github is under review for engineering team, I don't know when they will finish it. but the content of scripts in the github and here (hugging face) is same, you can just git clone this repo.
if you meet error for custom characters, that is due to the mmcv install error here, I recommend you to clone this repo and setup environment based on requirements.txt. with python 3.10, torch 2.1.0.
@hysts hi, please ignore the previous and I will summarize here, I want to setup mmcv correctly on gradio-based environment.
possible solution: could we setup which image to pull? like cu121, or cu122, instead of cu123. I tested my code works on google colab. T4, cu122. I suppose the cu123 is too new here.
for example, may I ask for a dockerfile like gradio cu121-cudnn xxx ?
@H-Liu1997
I'm confused. Looks like the requirements.txt
in your Space has mmcv
and your Space is up. So, I guess at least it was successfully installed, no? Or, is it still broken for some reason?
What is the error, then? Is it really about the CUDA minor version? Or is it the same error as https://huggingface.co/spaces/H-Liu1997/TANGO/discussions/2#670c09fca317a660f0e3843c ?
Anyway, if the error is something to do with CUDA, maybe that's because you installed mmcv
at build time. On Spaces infra, CUDA is not available at build time, so if mmcv
requires CUDA to build some CUDA kernels or else, it just doesn't work even if you used Docker
as the Space SDK.
A typical solution for this kind of issue is to install the package, which is mmcv
in your case, at startup time instead of build time by running something like the following in your app.py
.
import shlex
import subprocess
subprocess.run(shlex.split("pip install mmcv==2.2.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.4/index.html"))
As for the docker SDK, I think you can just search on the Hub, but the following Dockerfile
is the one I used in my old Spaces, so maybe it can be helpful. (I guess some of the packages I installed in it is old, so you might want to update the version, though).
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y --no-install-recommends \
git \
git-lfs \
wget \
curl \
# python build dependencies \
build-essential \
libssl-dev \
zlib1g-dev \
libbz2-dev \
libreadline-dev \
libsqlite3-dev \
libncursesw5-dev \
xz-utils \
tk-dev \
libxml2-dev \
libxmlsec1-dev \
libffi-dev \
liblzma-dev \
# gradio dependencies \
ffmpeg && \
rm -rf /var/lib/apt/lists/*
RUN useradd -m -u 1000 user
USER user
ENV HOME=/home/user \
PATH=/home/user/.local/bin:${PATH}
WORKDIR ${HOME}/app
RUN curl https://pyenv.run | bash
ENV PATH=${HOME}/.pyenv/shims:${HOME}/.pyenv/bin:${PATH}
ARG PYTHON_VERSION=3.10.13
RUN pyenv install ${PYTHON_VERSION} && \
pyenv global ${PYTHON_VERSION} && \
pyenv rehash && \
pip install --no-cache-dir -U pip setuptools wheel && \
pip install "huggingface-hub==0.19.3" "hf-transfer==0.1.4"
COPY --chown=1000 . ${HOME}/app
RUN pip install -r ${HOME}/app/requirements.txt
ENV PYTHONPATH=${HOME}/app \
PYTHONUNBUFFERED=1 \
HF_HUB_ENABLE_HF_TRANSFER=1 \
GRADIO_ALLOW_FLAGGING=never \
GRADIO_NUM_PORTS=1 \
GRADIO_SERVER_NAME=0.0.0.0 \
GRADIO_THEME=huggingface \
TQDM_POSITION=-1 \
TQDM_MININTERVAL=1 \
SYSTEM=spaces
CMD ["python", "app.py"]