Spaces:

kevinwang676
/

ChatGLM2-SadTalker

Runtime error

App Files Files Community

kevinwang676 commited on Jul 19, 2023

Commit

7bbebbe

•

1 Parent(s): 7062004

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +23 -0
.gitignore +174 -0
.ipynb_checkpoints/Untitled-checkpoint.ipynb +6 -0
.ipynb_checkpoints/app-checkpoint.py +608 -0
.ipynb_checkpoints/app_sadtalker-checkpoint.py +111 -0
ChatGLM2-SadTalker-VC-1/.flake8 +21 -0
ChatGLM2-SadTalker-VC-1/.gitattributes +52 -0
ChatGLM2-SadTalker-VC-1/.gitignore +159 -0
ChatGLM2-SadTalker-VC-1/.ipynb_checkpoints/requirements-checkpoint.txt +12 -0
ChatGLM2-SadTalker-VC-1/Dockerfile +59 -0
ChatGLM2-SadTalker-VC-1/LICENSE +21 -0
ChatGLM2-SadTalker-VC-1/README.md +15 -0
ChatGLM2-SadTalker-VC-1/packages.txt +2 -0
ChatGLM2-SadTalker-VC-1/requirements.txt +12 -0
LICENSE +21 -0
README.md +268 -13
Untitled.ipynb +0 -0
__pycache__/commons.cpython-310.pyc +0 -0
__pycache__/mel_processing.cpython-310.pyc +0 -0
__pycache__/models.cpython-310.pyc +0 -0
__pycache__/modules.cpython-310.pyc +0 -0
__pycache__/tts_voice.cpython-310.pyc +0 -0
__pycache__/utils.cpython-310.pyc +0 -0
app.py +608 -0
app_sadtalker.py +111 -0
checkpoint/__init__.py +0 -0
checkpoint/freevc-24.pth +3 -0
checkpoints/SadTalker_V0.0.2_256.safetensors +3 -0
checkpoints/SadTalker_V0.0.2_512.safetensors +3 -0
checkpoints/mapping_00109-model.pth.tar +3 -0
checkpoints/mapping_00229-model.pth.tar +3 -0
cog.yaml +35 -0
commons.py +171 -0
configs/.ipynb_checkpoints/freevc-24-checkpoint.json +54 -0
configs/freevc-24.json +54 -0
docs/FAQ.md +46 -0
docs/best_practice.md +94 -0
docs/changlelog.md +29 -0
docs/example_crop.gif +3 -0
docs/example_crop_still.gif +3 -0
docs/example_full.gif +3 -0
docs/example_full_crop.gif +0 -0
docs/example_full_enhanced.gif +3 -0
docs/face3d.md +48 -0
docs/free_view_result.gif +3 -0
docs/install.md +47 -0
docs/resize_good.gif +3 -0
docs/resize_no.gif +3 -0
docs/sadtalker_logo.png +0 -0
docs/using_ref_video.gif +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,26 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+docs/example_crop.gif filter=lfs diff=lfs merge=lfs -text
+docs/example_crop_still.gif filter=lfs diff=lfs merge=lfs -text
+docs/example_full.gif filter=lfs diff=lfs merge=lfs -text
+docs/example_full_enhanced.gif filter=lfs diff=lfs merge=lfs -text
+docs/free_view_result.gif filter=lfs diff=lfs merge=lfs -text
+docs/resize_good.gif filter=lfs diff=lfs merge=lfs -text
+docs/resize_no.gif filter=lfs diff=lfs merge=lfs -text
+docs/using_ref_video.gif filter=lfs diff=lfs merge=lfs -text
+examples/driven_audio/chinese_news.wav filter=lfs diff=lfs merge=lfs -text
+examples/driven_audio/deyu.wav filter=lfs diff=lfs merge=lfs -text
+examples/driven_audio/eluosi.wav filter=lfs diff=lfs merge=lfs -text
+examples/driven_audio/fayu.wav filter=lfs diff=lfs merge=lfs -text
+examples/driven_audio/imagine.wav filter=lfs diff=lfs merge=lfs -text
+examples/driven_audio/japanese.wav filter=lfs diff=lfs merge=lfs -text
+examples/ref_video/WDA_AlexandriaOcasioCortez_000.mp4 filter=lfs diff=lfs merge=lfs -text
+examples/ref_video/WDA_KatieHill_000.mp4 filter=lfs diff=lfs merge=lfs -text
+examples/source_image/art_16.png filter=lfs diff=lfs merge=lfs -text
+examples/source_image/art_17.png filter=lfs diff=lfs merge=lfs -text
+examples/source_image/art_3.png filter=lfs diff=lfs merge=lfs -text
+examples/source_image/art_4.png filter=lfs diff=lfs merge=lfs -text
+examples/source_image/art_5.png filter=lfs diff=lfs merge=lfs -text
+examples/source_image/art_8.png filter=lfs diff=lfs merge=lfs -text
+examples/source_image/art_9.png filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,174 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+.pybuilder/
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/#use-with-ide
+.pdm.toml
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Pyre type checker
+.pyre/
+# pytype static type analyzer
+.pytype/
+# Cython debug symbols
+cython_debug/
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#  and can be added to the global gitignore or merged into this file.  For a more nuclear
+#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
+.idea/
+examples/results/*
+gfpgan/*
+checkpoints/*
+assets/*
+results/*
+Dockerfile
+start_docker.sh
+start.sh
+checkpoints
+# Mac
+.DS_Store

.ipynb_checkpoints/Untitled-checkpoint.ipynb ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+ "cells": [],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

.ipynb_checkpoints/app-checkpoint.py ADDED Viewed

	@@ -0,0 +1,608 @@

+import os, sys
+import tempfile
+import gradio as gr
+from src.gradio_demo import SadTalker
+# from src.utils.text2speech import TTSTalker
+from huggingface_hub import snapshot_download
+import torch
+import librosa
+from scipy.io.wavfile import write
+from transformers import WavLMModel
+import utils
+from models import SynthesizerTrn
+from mel_processing import mel_spectrogram_torch
+from speaker_encoder.voice_encoder import SpeakerEncoder
+import time
+from textwrap import dedent
+import mdtex2html
+from loguru import logger
+from transformers import AutoModel, AutoTokenizer
+from tts_voice import tts_order_voice
+import edge_tts
+import tempfile
+import anyio
+def get_source_image(image):
+        return image
+try:
+    import webui  # in webui
+    in_webui = True
+except:
+    in_webui = False
+def toggle_audio_file(choice):
+    if choice == False:
+        return gr.update(visible=True), gr.update(visible=False)
+    else:
+        return gr.update(visible=False), gr.update(visible=True)
+def ref_video_fn(path_of_ref_video):
+    if path_of_ref_video is not None:
+        return gr.update(value=True)
+    else:
+        return gr.update(value=False)
+def download_model():
+    REPO_ID = 'vinthony/SadTalker-V002rc'
+    snapshot_download(repo_id=REPO_ID, local_dir='./checkpoints', local_dir_use_symlinks=True)
+def sadtalker_demo():
+    download_model()
+    sad_talker = SadTalker(lazy_load=True)
+    # tts_talker = TTSTalker()
+download_model()
+sad_talker = SadTalker(lazy_load=True)
+# ChatGLM2 & FreeVC
+'''
+def get_wavlm():
+    os.system('gdown https://drive.google.com/uc?id=12-cB34qCTvByWT-QtOcZaqwwO21FLSqU')
+    shutil.move('WavLM-Large.pt', 'wavlm')
+'''
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+smodel = SpeakerEncoder('speaker_encoder/ckpt/pretrained_bak_5805000.pt')
+print("Loading FreeVC(24k)...")
+hps = utils.get_hparams_from_file("configs/freevc-24.json")
+freevc_24 = SynthesizerTrn(
+    hps.data.filter_length // 2 + 1,
+    hps.train.segment_size // hps.data.hop_length,
+    **hps.model).to(device)
+_ = freevc_24.eval()
+_ = utils.load_checkpoint("checkpoint/freevc-24.pth", freevc_24, None)
+print("Loading WavLM for content...")
+cmodel = WavLMModel.from_pretrained("microsoft/wavlm-large").to(device)
+def convert(model, src, tgt):
+    with torch.no_grad():
+        # tgt
+        wav_tgt, _ = librosa.load(tgt, sr=hps.data.sampling_rate)
+        wav_tgt, _ = librosa.effects.trim(wav_tgt, top_db=20)
+        if model == "FreeVC" or model == "FreeVC (24kHz)":
+            g_tgt = smodel.embed_utterance(wav_tgt)
+            g_tgt = torch.from_numpy(g_tgt).unsqueeze(0).to(device)
+        else:
+            wav_tgt = torch.from_numpy(wav_tgt).unsqueeze(0).to(device)
+            mel_tgt = mel_spectrogram_torch(
+                wav_tgt,
+                hps.data.filter_length,
+                hps.data.n_mel_channels,
+                hps.data.sampling_rate,
+                hps.data.hop_length,
+                hps.data.win_length,
+                hps.data.mel_fmin,
+                hps.data.mel_fmax
+            )
+        # src
+        wav_src, _ = librosa.load(src, sr=hps.data.sampling_rate)
+        wav_src = torch.from_numpy(wav_src).unsqueeze(0).to(device)
+        c = cmodel(wav_src).last_hidden_state.transpose(1, 2).to(device)
+        # infer
+        if model == "FreeVC":
+            audio = freevc.infer(c, g=g_tgt)
+        elif model == "FreeVC-s":
+            audio = freevc_s.infer(c, mel=mel_tgt)
+        else:
+            audio = freevc_24.infer(c, g=g_tgt)
+        audio = audio[0][0].data.cpu().float().numpy()
+        if model == "FreeVC" or model == "FreeVC-s":
+            write("out.wav", hps.data.sampling_rate, audio)
+        else:
+            write("out.wav", 24000, audio)
+    out = "out.wav"
+    return out
+# GLM2
+language_dict = tts_order_voice
+# fix timezone in Linux
+os.environ["TZ"] = "Asia/Shanghai"
+try:
+    time.tzset()  # type: ignore # pylint: disable=no-member
+except Exception:
+    # Windows
+    logger.warning("Windows, cant run time.tzset()")
+# model_name = "THUDM/chatglm2-6b"
+model_name = "THUDM/chatglm2-6b-int4"
+RETRY_FLAG = False
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+# model = AutoModel.from_pretrained(model_name, trust_remote_code=True).cuda()
+# 4/8 bit
+# model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).quantize(4).cuda()
+has_cuda = torch.cuda.is_available()
+# has_cuda = False  # force cpu
+if has_cuda:
+    model_glm = (
+        AutoModel.from_pretrained(model_name, trust_remote_code=True).cuda().half()
+    )  # 3.92G
+else:
+    model_glm = AutoModel.from_pretrained(
+        model_name, trust_remote_code=True
+    ).float()  # .float() .half().float()
+model_glm = model_glm.eval()
+_ = """Override Chatbot.postprocess"""
+def postprocess(self, y):
+    if y is None:
+        return []
+    for i, (message, response) in enumerate(y):
+        y[i] = (
+            None if message is None else mdtex2html.convert((message)),
+            None if response is None else mdtex2html.convert(response),
+        )
+    return y
+gr.Chatbot.postprocess = postprocess
+def parse_text(text):
+    """copy from https://github.com/GaiZhenbiao/ChuanhuChatGPT/"""
+    lines = text.split("\n")
+    lines = [line for line in lines if line != ""]
+    count = 0
+    for i, line in enumerate(lines):
+        if "```" in line:
+            count += 1
+            items = line.split("`")
+            if count % 2 == 1:
+                lines[i] = f'<pre><code class="language-{items[-1]}">'
+            else:
+                lines[i] = "<br></code></pre>"
+        else:
+            if i > 0:
+                if count % 2 == 1:
+                    line = line.replace("`", r"\`")
+                    line = line.replace("<", "&lt;")
+                    line = line.replace(">", "&gt;")
+                    line = line.replace(" ", "&nbsp;")
+                    line = line.replace("*", "&ast;")
+                    line = line.replace("_", "&lowbar;")
+                    line = line.replace("-", "&#45;")
+                    line = line.replace(".", "&#46;")
+                    line = line.replace("!", "&#33;")
+                    line = line.replace("(", "&#40;")
+                    line = line.replace(")", "&#41;")
+                    line = line.replace("$", "&#36;")
+                lines[i] = "<br>" + line
+    text = "".join(lines)
+    return text
+def predict(
+    RETRY_FLAG, input, chatbot, max_length, top_p, temperature, history, past_key_values
+):
+    try:
+        chatbot.append((parse_text(input), ""))
+    except Exception as exc:
+        logger.error(exc)
+        logger.debug(f"{chatbot=}")
+        _ = """
+        if chatbot:
+            chatbot[-1] = (parse_text(input), str(exc))
+            yield chatbot, history, past_key_values
+        # """
+        yield chatbot, history, past_key_values
+    for response, history, past_key_values in model_glm.stream_chat(
+        tokenizer,
+        input,
+        history,
+        past_key_values=past_key_values,
+        return_past_key_values=True,
+        max_length=max_length,
+        top_p=top_p,
+        temperature=temperature,
+    ):
+        chatbot[-1] = (parse_text(input), parse_text(response))
+        # chatbot[-1][-1] = parse_text(response)
+        yield chatbot, history, past_key_values, parse_text(response)
+def trans_api(input, max_length=4096, top_p=0.8, temperature=0.2):
+    if max_length < 10:
+        max_length = 4096
+    if top_p < 0.1 or top_p > 1:
+        top_p = 0.85
+    if temperature <= 0 or temperature > 1:
+        temperature = 0.01
+    try:
+        res, _ = model_glm.chat(
+            tokenizer,
+            input,
+            history=[],
+            past_key_values=None,
+            max_length=max_length,
+            top_p=top_p,
+            temperature=temperature,
+        )
+        # logger.debug(f"{res=} \n{_=}")
+    except Exception as exc:
+        logger.error(f"{exc=}")
+        res = str(exc)
+    return res
+def reset_user_input():
+    return gr.update(value="")
+def reset_state():
+    return [], [], None, ""
+# Delete last turn
+def delete_last_turn(chat, history):
+    if chat and history:
+        chat.pop(-1)
+        history.pop(-1)
+    return chat, history
+# Regenerate response
+def retry_last_answer(
+    user_input, chatbot, max_length, top_p, temperature, history, past_key_values
+):
+    if chatbot and history:
+        # Removing the previous conversation from chat
+        chatbot.pop(-1)
+        # Setting up a flag to capture a retry
+        RETRY_FLAG = True
+        # Getting last message from user
+        user_input = history[-1][0]
+        # Removing bot response from the history
+        history.pop(-1)
+    yield from predict(
+        RETRY_FLAG,  # type: ignore
+        user_input,
+        chatbot,
+        max_length,
+        top_p,
+        temperature,
+        history,
+        past_key_values,
+    )
+# print
+def print(text):
+    return text
+# TTS
+async def text_to_speech_edge(text, language_code):
+    voice = language_dict[language_code]
+    communicate = edge_tts.Communicate(text, voice)
+    with tempfile.NamedTemporaryFile(delete=False, suffix=".mp3") as tmp_file:
+        tmp_path = tmp_file.name
+    await communicate.save(tmp_path)
+    return tmp_path
+with gr.Blocks(title="ChatGLM2-6B-int4", theme=gr.themes.Soft(text_size="sm"), analytics_enabled=False) as demo:
+    gr.HTML("<center>"
+            "<h1>📺💕🎶 - ChatGLM2+声音克隆+视频对话：和喜欢的角色畅所欲言吧！</h1>"
+            "</center>")
+    gr.Markdown("## <center>🥳 - ChatGLM2+FreeVC+SadTalker，为您打造沉浸式的视频对话体验，支持中英双语</center>")
+    gr.Markdown("## <center>🌊 - 更多精彩应用，尽在[滔滔AI](http://www.talktalkai.com)；滔滔AI，为爱滔滔！💕</center>")
+    gr.Markdown("### <center>⭐ - 如果您喜欢这个程序，欢迎给我的[GitHub项目](https://github.com/KevinWang676/ChatGLM2-Voice-Cloning)点赞支持！</center>")
+    with gr.Tab("🍻 - ChatGLM2聊天区"):
+        with gr.Accordion("📒 相关信息", open=False):
+            _ = f""" ChatGLM2的可选参数信息：
+                * Low temperature: responses will be more deterministic and focused; High temperature: responses more creative.
+                * Suggested temperatures -- translation: up to 0.3; chatting: > 0.4
+                * Top P controls dynamic vocabulary selection based on context.\n
+                如果您想让ChatGLM2进行角色扮演并与之对话，请先输入恰当的提示词，如“请你扮演成动漫角色蜡笔小新并和我进行对话”；您也可以为ChatGLM2提供自定义的角色设定\n
+                当您使用声音克隆功能时，请先在此程序的对应位置上传一段您喜欢的音频
+                """
+            gr.Markdown(dedent(_))
+        chatbot = gr.Chatbot(height=300)
+        with gr.Row():
+            with gr.Column(scale=4):
+                with gr.Column(scale=12):
+                    user_input = gr.Textbox(
+                        label="请在此处和GLM2聊天 (按回车键即可发送)",
+                        placeholder="聊点什么吧",
+                    )
+                    RETRY_FLAG = gr.Checkbox(value=False, visible=False)
+        with gr.Column(min_width=32, scale=1):
+            with gr.Row():
+                submitBtn = gr.Button("开始和GLM2交流吧", variant="primary")
+                deleteBtn = gr.Button("删除最新一轮对话", variant="secondary")
+                retryBtn = gr.Button("重新生成最新一轮对话", variant="secondary")
+        with gr.Accordion("🔧 更多设置", open=False):
+            with gr.Row():
+                emptyBtn = gr.Button("清空所有聊天记录")
+                max_length = gr.Slider(
+                    0,
+                    32768,
+                    value=8192,
+                    step=1.0,
+                    label="Maximum length",
+                    interactive=True,
+                )
+                top_p = gr.Slider(
+                    0, 1, value=0.85, step=0.01, label="Top P", interactive=True
+                )
+                temperature = gr.Slider(
+                    0.01, 1, value=0.95, step=0.01, label="Temperature", interactive=True
+                )
+        with gr.Row():
+            test1 = gr.Textbox(label="GLM2的最新回答 (可编辑)", lines = 3)
+            with gr.Column():
+                language = gr.Dropdown(choices=list(language_dict.keys()), value="普通话 (中国大陆)-Xiaoxiao-女", label="请选择文本对应的语言及您喜欢的说话人")
+                tts_btn = gr.Button("生成对应的音频吧", variant="primary")
+            output_audio = gr.Audio(type="filepath", label="为您生成的音频", interactive=False)
+        tts_btn.click(text_to_speech_edge, inputs=[test1, language], outputs=[output_audio])
+        with gr.Row():
+            model_choice = gr.Dropdown(choices=["FreeVC", "FreeVC-s", "FreeVC (24kHz)"], value="FreeVC (24kHz)", label="Model", visible=False)
+            audio1 = output_audio
+            audio2 = gr.Audio(label="请上传您喜欢的声音进行声音克隆", type='filepath')
+            clone_btn = gr.Button("开始AI声音克隆吧", variant="primary")
+            audio_cloned =  gr.Audio(label="为您生成的专属声音克隆音频", type='filepath')
+        clone_btn.click(convert, inputs=[model_choice, audio1, audio2], outputs=[audio_cloned])
+        history = gr.State([])
+        past_key_values = gr.State(None)
+        user_input.submit(
+            predict,
+            [
+                RETRY_FLAG,
+                user_input,
+                chatbot,
+                max_length,
+                top_p,
+                temperature,
+                history,
+                past_key_values,
+            ],
+            [chatbot, history, past_key_values, test1],
+            show_progress="full",
+        )
+        submitBtn.click(
+            predict,
+            [
+                RETRY_FLAG,
+                user_input,
+                chatbot,
+                max_length,
+                top_p,
+                temperature,
+                history,
+                past_key_values,
+            ],
+            [chatbot, history, past_key_values, test1],
+            show_progress="full",
+            api_name="predict",
+        )
+        submitBtn.click(reset_user_input, [], [user_input])
+        emptyBtn.click(
+            reset_state, outputs=[chatbot, history, past_key_values, test1], show_progress="full"
+        )
+        retryBtn.click(
+            retry_last_answer,
+            inputs=[
+                user_input,
+                chatbot,
+                max_length,
+                top_p,
+                temperature,
+                history,
+                past_key_values,
+            ],
+            # outputs = [chatbot, history, last_user_message, user_message]
+            outputs=[chatbot, history, past_key_values, test1],
+        )
+        deleteBtn.click(delete_last_turn, [chatbot, history], [chatbot, history])
+        with gr.Accordion("📔 提示词示例", open=False):
+            etext = """In America, where cars are an important part of the national psyche, a decade ago people had suddenly started to drive less, which had not happened since the oil shocks of the 1970s. """
+            examples = gr.Examples(
+                examples=[
+                    ["Explain the plot of Cinderella in a sentence."],
+                    [
+                        "How long does it take to become proficient in French, and what are the best methods for retaining information?"
+                    ],
+                    ["What are some common mistakes to avoid when writing code?"],
+                    ["Build a prompt to generate a beautiful portrait of a horse"],
+                    ["Suggest four metaphors to describe the benefits of AI"],
+                    ["Write a pop song about leaving home for the sandy beaches."],
+                    ["Write a summary demonstrating my ability to tame lions"],
+                    ["鲁迅和周树人什么关系"],
+                    ["从前有一头牛，这头牛后面有什么？"],
+                    ["正无穷大加一大于正无穷大吗？"],
+                    ["正无穷大加正无穷大大于正无穷大吗？"],
+                    ["-2的平方根等于什么"],
+                    ["树上有5只鸟，猎人开枪打死了一只。树上还有几只鸟？"],
+                    ["树上有11只鸟，猎人开枪打死了一只。树上还有几只鸟？提示：需考虑鸟可能受惊吓飞走。"],
+                    ["鲁迅和周树人什么关系 用英文回答"],
+                    ["以红楼梦的行文风格写一张委婉的请假条。不少于320字。"],
+                    [f"{etext} 翻成中文，列出3个版本"],
+                    [f"{etext} \n 翻成中文，保留原意，但使用文学性的语言。不要写解释。列出3个版本"],
+                    ["js 判断一个数是不是质数"],
+                    ["js 实现python 的 range(10)"],
+                    ["js 实现python 的 [*(range(10)]"],
+                    ["假定 1 + 2 = 4, 试求 7 + 8"],
+                    ["Erkläre die Handlung von Cinderella in einem Satz."],
+                    ["Erkläre die Handlung von Cinderella in einem Satz. Auf Deutsch"],
+                ],
+                inputs=[user_input],
+                examples_per_page=30,
+            )
+        with gr.Accordion("For Chat/Translation API", open=False, visible=False):
+            input_text = gr.Text()
+            tr_btn = gr.Button("Go", variant="primary")
+            out_text = gr.Text()
+        tr_btn.click(
+            trans_api,
+            [input_text, max_length, top_p, temperature],
+            out_text,
+            # show_progress="full",
+            api_name="tr",
+        )
+        _ = """
+        input_text.submit(
+            trans_api,
+            [input_text, max_length, top_p, temperature],
+            out_text,
+            show_progress="full",
+            api_name="tr1",
+        )
+        # """
+    with gr.Tab("📺 - 视频聊天区"):
+        with gr.Row().style(equal_height=False):
+            with gr.Column(variant='panel'):
+                with gr.Tabs(elem_id="sadtalker_source_image"):
+                    with gr.TabItem('图片上传'):
+                        with gr.Row():
+                            source_image = gr.Image(label="请上传一张您喜欢角色的图片", source="upload", type="filepath", elem_id="img2img_image").style(width=512)
+                with gr.Tabs(elem_id="sadtalker_driven_audio"):
+                    with gr.TabItem('💡您还可以将视频下载到本地'):
+                        with gr.Row():
+                            driven_audio = audio_cloned
+                            driven_audio_no = gr.Audio(label="Use IDLE mode, no audio is required", source="upload", type="filepath", visible=False)
+                            with gr.Column():
+                                use_idle_mode = gr.Checkbox(label="Use Idle Animation", visible=False)
+                                length_of_audio = gr.Number(value=5, label="The length(seconds) of the generated video.", visible=False)
+                                use_idle_mode.change(toggle_audio_file, inputs=use_idle_mode, outputs=[driven_audio, driven_audio_no]) # todo
+                        with gr.Row():
+                            ref_video = gr.Video(label="Reference Video", source="upload", type="filepath", elem_id="vidref", visible=False).style(width=512)
+                            with gr.Column():
+                                use_ref_video = gr.Checkbox(label="Use Reference Video", visible=False)
+                                ref_info = gr.Radio(['pose', 'blink','pose+blink', 'all'], value='pose', label='Reference Video',info="How to borrow from reference Video?((fully transfer, aka, video driving mode))", visible=False)
+                            ref_video.change(ref_video_fn, inputs=ref_video, outputs=[use_ref_video]) # todo
+            with gr.Column(variant='panel'):
+                with gr.Tabs(elem_id="sadtalker_checkbox"):
+                    with gr.TabItem('视频设置'):
+                        with gr.Column(variant='panel'):
+                            # width = gr.Slider(minimum=64, elem_id="img2img_width", maximum=2048, step=8, label="Manually Crop Width", value=512) # img2img_width
+                            # height = gr.Slider(minimum=64, elem_id="img2img_height", maximum=2048, step=8, label="Manually Crop Height", value=512) # img2img_width
+                            with gr.Row():
+                                pose_style = gr.Slider(minimum=0, maximum=45, step=1, label="Pose style", value=0, visible=False) #
+                                exp_weight = gr.Slider(minimum=0, maximum=3, step=0.1, label="expression scale", value=1, visible=False) #
+                                blink_every = gr.Checkbox(label="use eye blink", value=True, visible=False)
+                            with gr.Row():
+                                size_of_image = gr.Radio([256, 512], value=256, label='face model resolution', info="use 256/512 model?", visible=False) #
+                                preprocess_type = gr.Radio(['crop', 'full'], value='crop', label='是否聚焦角色面部', info="crop：视频会聚焦角色面部；full：视频会显示图片全貌")
+                            with gr.Row():
+                                is_still_mode = gr.Checkbox(label="静态模式 (开启静态模式，角色的面部动作会减少；默认开启)", value=True)
+                                facerender = gr.Radio(['facevid2vid','pirender'], value='facevid2vid', label='facerender', info="which face render?", visible=False)
+                            with gr.Row():
+                                batch_size = gr.Slider(label="Batch size (数值越大，生成速度越快；若显卡性能好，可增大数值)", step=1, maximum=32, value=2)
+                                enhancer = gr.Checkbox(label="GFPGAN as Face enhancer", value=True, visible=False)
+                            submit = gr.Button('开始视频聊天吧', elem_id="sadtalker_generate", variant='primary')
+                with gr.Tabs(elem_id="sadtalker_genearted"):
+                        gen_video = gr.Video(label="为您生成的专属视频", format="mp4").style(width=256)
+        submit.click(
+                fn=sad_talker.test,
+                inputs=[source_image,
+                        driven_audio,
+                        preprocess_type,
+                        is_still_mode,
+                        enhancer,
+                        batch_size,
+                        size_of_image,
+                        pose_style,
+                        facerender,
+                        exp_weight,
+                        use_ref_video,
+                        ref_video,
+                        ref_info,
+                        use_idle_mode,
+                        length_of_audio,
+                        blink_every
+                        ],
+                outputs=[gen_video]
+                )
+    gr.Markdown("### <center>注意❗：请不要生成会对个人以及组织造成侵害的内容，此程序仅供科研、学习及个人娱乐使用。</center>")
+    gr.Markdown("<center>💡- 如何使用此程序：输入您对ChatGLM的提问后，依次点击“开始和GLM2交流吧”、“生成对应的音频吧”、“开始AI声音克隆吧”、“开始视频聊天吧”三个按键即可；使用声音克隆功能时，请先上传一段您喜欢的音频</center>")
+    gr.HTML('''
+        <div class="footer">
+                    <p>🌊🏞️🎶 - 江水东流急，滔滔无尽声。 明·顾璘
+                    </p>
+        </div>
+    ''')
+demo.queue().launch(show_error=True, debug=True)

.ipynb_checkpoints/app_sadtalker-checkpoint.py ADDED Viewed

	@@ -0,0 +1,111 @@

+import os, sys
+import gradio as gr
+from src.gradio_demo import SadTalker
+try:
+    import webui  # in webui
+    in_webui = True
+except:
+    in_webui = False
+def toggle_audio_file(choice):
+    if choice == False:
+        return gr.update(visible=True), gr.update(visible=False)
+    else:
+        return gr.update(visible=False), gr.update(visible=True)
+def ref_video_fn(path_of_ref_video):
+    if path_of_ref_video is not None:
+        return gr.update(value=True)
+    else:
+        return gr.update(value=False)
+def sadtalker_demo(checkpoint_path='checkpoints', config_path='src/config', warpfn=None):
+    sad_talker = SadTalker(checkpoint_path, config_path, lazy_load=True)
+    with gr.Blocks(analytics_enabled=False) as sadtalker_interface:
+        gr.Markdown("<div align='center'> <h2> 😭 SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation (CVPR 2023) </span> </h2> \
+                    <a style='font-size:18px;color: #efefef' href='https://arxiv.org/abs/2211.12194'>Arxiv</a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
+                    <a style='font-size:18px;color: #efefef' href='https://sadtalker.github.io'>Homepage</a>  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
+                     <a style='font-size:18px;color: #efefef' href='https://github.com/Winfredy/SadTalker'> Github </div>")
+        with gr.Row().style(equal_height=False):
+            with gr.Column(variant='panel'):
+                with gr.Tabs(elem_id="sadtalker_source_image"):
+                    with gr.TabItem('Upload image'):
+                        with gr.Row():
+                            source_image = gr.Image(label="Source image", source="upload", type="filepath", elem_id="img2img_image").style(width=512)
+                with gr.Tabs(elem_id="sadtalker_driven_audio"):
+                    with gr.TabItem('Upload OR TTS'):
+                        with gr.Column(variant='panel'):
+                            driven_audio = gr.Audio(label="Input audio", source="upload", type="filepath")
+                        if sys.platform != 'win32' and not in_webui:
+                            from src.utils.text2speech import TTSTalker
+                            tts_talker = TTSTalker()
+                            with gr.Column(variant='panel'):
+                                input_text = gr.Textbox(label="Generating audio from text", lines=5, placeholder="please enter some text here, we genreate the audio from text using @Coqui.ai TTS.")
+                                tts = gr.Button('Generate audio',elem_id="sadtalker_audio_generate", variant='primary')
+                                tts.click(fn=tts_talker.test, inputs=[input_text], outputs=[driven_audio])
+            with gr.Column(variant='panel'):
+                with gr.Tabs(elem_id="sadtalker_checkbox"):
+                    with gr.TabItem('Settings'):
+                        gr.Markdown("need help? please visit our [best practice page](https://github.com/OpenTalker/SadTalker/blob/main/docs/best_practice.md) for more detials")
+                        with gr.Column(variant='panel'):
+                            # width = gr.Slider(minimum=64, elem_id="img2img_width", maximum=2048, step=8, label="Manually Crop Width", value=512) # img2img_width
+                            # height = gr.Slider(minimum=64, elem_id="img2img_height", maximum=2048, step=8, label="Manually Crop Height", value=512) # img2img_width
+                            pose_style = gr.Slider(minimum=0, maximum=46, step=1, label="Pose style", value=0) #
+                            size_of_image = gr.Radio([256, 512], value=256, label='face model resolution', info="use 256/512 model?") #
+                            preprocess_type = gr.Radio(['crop', 'resize','full', 'extcrop', 'extfull'], value='crop', label='preprocess', info="How to handle input image?")
+                            is_still_mode = gr.Checkbox(label="Still Mode (fewer hand motion, works with preprocess `full`)")
+                            batch_size = gr.Slider(label="batch size in generation", step=1, maximum=10, value=2)
+                            enhancer = gr.Checkbox(label="GFPGAN as Face enhancer")
+                            submit = gr.Button('Generate', elem_id="sadtalker_generate", variant='primary')
+                with gr.Tabs(elem_id="sadtalker_genearted"):
+                        gen_video = gr.Video(label="Generated video", format="mp4").style(width=256)
+        if warpfn:
+            submit.click(
+                        fn=warpfn(sad_talker.test),
+                        inputs=[source_image,
+                                driven_audio,
+                                preprocess_type,
+                                is_still_mode,
+                                enhancer,
+                                batch_size,
+                                size_of_image,
+                                pose_style
+                                ],
+                        outputs=[gen_video]
+                        )
+        else:
+            submit.click(
+                        fn=sad_talker.test,
+                        inputs=[source_image,
+                                driven_audio,
+                                preprocess_type,
+                                is_still_mode,
+                                enhancer,
+                                batch_size,
+                                size_of_image,
+                                pose_style
+                                ],
+                        outputs=[gen_video]
+                        )
+    return sadtalker_interface
+if __name__ == "__main__":
+    demo = sadtalker_demo()
+    demo.queue()
+    demo.launch(share=True)

ChatGLM2-SadTalker-VC-1/.flake8 ADDED Viewed

	@@ -0,0 +1,21 @@

+[flake8]
+ignore =
+  # E203 whitespace before ':'
+  E203
+  D203,
+  # line too long
+  E501
+per-file-ignores =
+  # imported but unused
+  # __init__.py: F401
+  test_*.py: F401
+exclude =
+  .git,
+  __pycache__,
+  docs/source/conf.py,
+  old,
+  build,
+  dist,
+  .venv
+  pad*.py
+max-complexity = 25

ChatGLM2-SadTalker-VC-1/.gitattributes ADDED Viewed

	@@ -0,0 +1,52 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+checkpoints/BFM_Fitting/01_MorphableModel.mat filter=lfs diff=lfs merge=lfs -text
+checkpoints/BFM_Fitting/BFM09_model_info.mat filter=lfs diff=lfs merge=lfs -text
+checkpoints/facevid2vid_00189-model.pth.tar filter=lfs diff=lfs merge=lfs -text
+checkpoints/mapping_00229-model.pth.tar filter=lfs diff=lfs merge=lfs -text
+checkpoints/shape_predictor_68_face_landmarks.dat filter=lfs diff=lfs merge=lfs -text
+examples/driven_audio/chinese_news.wav filter=lfs diff=lfs merge=lfs -text
+examples/driven_audio/deyu.wav filter=lfs diff=lfs merge=lfs -text
+examples/driven_audio/eluosi.wav filter=lfs diff=lfs merge=lfs -text
+examples/driven_audio/fayu.wav filter=lfs diff=lfs merge=lfs -text
+examples/driven_audio/imagine.wav filter=lfs diff=lfs merge=lfs -text
+examples/driven_audio/japanese.wav filter=lfs diff=lfs merge=lfs -text
+examples/source_image/art_16.png filter=lfs diff=lfs merge=lfs -text
+examples/source_image/art_17.png filter=lfs diff=lfs merge=lfs -text
+examples/source_image/art_3.png filter=lfs diff=lfs merge=lfs -text
+examples/source_image/art_4.png filter=lfs diff=lfs merge=lfs -text
+examples/source_image/art_5.png filter=lfs diff=lfs merge=lfs -text
+examples/source_image/art_8.png filter=lfs diff=lfs merge=lfs -text
+examples/source_image/art_9.png filter=lfs diff=lfs merge=lfs -text

ChatGLM2-SadTalker-VC-1/.gitignore ADDED Viewed

	@@ -0,0 +1,159 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+.pybuilder/
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/#use-with-ide
+.pdm.toml
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Pyre type checker
+.pyre/
+# pytype static type analyzer
+.pytype/
+# Cython debug symbols
+cython_debug/
+results/
+checkpoints/
+gradio_cached_examples/
+gfpgan/
+start.sh

ChatGLM2-SadTalker-VC-1/.ipynb_checkpoints/requirements-checkpoint.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+scipy
+transformers
+librosa==0.8.1
+webrtcvad==2.0.10
+protobuf
+cpm_kernels
+mdtex2html
+sentencepiece
+accelerate
+loguru
+edge_tts
+altair

ChatGLM2-SadTalker-VC-1/Dockerfile ADDED Viewed

	@@ -0,0 +1,59 @@

+FROM nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04
+ENV DEBIAN_FRONTEND=noninteractive
+RUN apt-get update && \
+    apt-get upgrade -y && \
+    apt-get install -y --no-install-recommends \
+    git \
+    zip \
+    unzip \
+    git-lfs \
+    wget \
+    curl \
+    # ffmpeg \
+    ffmpeg \
+    x264 \
+    # python build dependencies \
+    build-essential \
+    libssl-dev \
+    zlib1g-dev \
+    libbz2-dev \
+    libreadline-dev \
+    libsqlite3-dev \
+    libncursesw5-dev \
+    xz-utils \
+    tk-dev \
+    libxml2-dev \
+    libxmlsec1-dev \
+    libffi-dev \
+    liblzma-dev && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+RUN useradd -m -u 1000 user
+USER user
+ENV HOME=/home/user \
+    PATH=/home/user/.local/bin:${PATH}
+WORKDIR ${HOME}/app
+RUN curl https://pyenv.run | bash
+ENV PATH=${HOME}/.pyenv/shims:${HOME}/.pyenv/bin:${PATH}
+ENV PYTHON_VERSION=3.10.9
+RUN pyenv install ${PYTHON_VERSION} && \
+    pyenv global ${PYTHON_VERSION} && \
+    pyenv rehash && \
+    pip install --no-cache-dir -U pip setuptools wheel
+RUN pip install --no-cache-dir -U torch==1.12.1 torchvision==0.13.1
+COPY --chown=1000 requirements.txt /tmp/requirements.txt
+RUN pip install --no-cache-dir -U -r /tmp/requirements.txt
+COPY --chown=1000 . ${HOME}/app
+RUN ls -a
+ENV PYTHONPATH=${HOME}/app \
+    PYTHONUNBUFFERED=1 \
+    GRADIO_ALLOW_FLAGGING=never \
+    GRADIO_NUM_PORTS=1 \
+    GRADIO_SERVER_NAME=0.0.0.0 \
+    GRADIO_THEME=huggingface \
+    SYSTEM=spaces
+CMD ["python", "app.py"]

ChatGLM2-SadTalker-VC-1/LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2023 Tencent AI Lab
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

ChatGLM2-SadTalker-VC-1/README.md ADDED Viewed

	@@ -0,0 +1,15 @@

+---
+title: ChatGLM2-SadTalker
+emoji: 📺
+colorFrom: purple
+colorTo: green
+sdk: gradio
+sdk_version: 3.23.0
+app_file: app.py
+pinned: false
+license: mit
+duplicated_from: kevinwang676/ChatGLM2-SadTalker-VC
+---
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

ChatGLM2-SadTalker-VC-1/packages.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ ffmpeg
2	+ libsndfile1

ChatGLM2-SadTalker-VC-1/requirements.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+scipy
+transformers
+librosa==0.8.1
+webrtcvad==2.0.10
+protobuf
+cpm_kernels
+mdtex2html
+sentencepiece
+accelerate
+loguru
+edge_tts
+altair

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2023 Tencent AI Lab
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,13 +1,268 @@
----
-title: ChatGLM2 SadTalker
-emoji: 📚
-colorFrom: blue
-colorTo: blue
-sdk: gradio
-sdk_version: 3.37.0
-app_file: app.py
-pinned: false
-license: mit
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+<div align="center">
+<img src='https://user-images.githubusercontent.com/4397546/229094115-862c747e-7397-4b54-ba4a-bd368bfe2e0f.png' width='500px'/>
+<!--<h2> 😭 SadTalker： <span style="font-size:12px">Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation </span> </h2> -->
+  <a href='https://arxiv.org/abs/2211.12194'><img src='https://img.shields.io/badge/ArXiv-PDF-red'></a> &nbsp; <a href='https://sadtalker.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> &nbsp; [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb) &nbsp; [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/vinthony/SadTalker) &nbsp; [![sd webui-colab](https://img.shields.io/badge/Automatic1111-Colab-green)](https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/video/stable/stable_diffusion_1_5_video_webui_colab.ipynb) &nbsp; [![Replicate](https://replicate.com/cjwbw/sadtalker/badge)](https://replicate.com/cjwbw/sadtalker)
+<div>
+    <a target='_blank'>Wenxuan Zhang <sup>*,1,2</sup> </a>&emsp;
+    <a href='https://vinthony.github.io/' target='_blank'>Xiaodong Cun <sup>*,2</a>&emsp;
+    <a href='https://xuanwangvc.github.io/' target='_blank'>Xuan Wang <sup>3</sup></a>&emsp;
+    <a href='https://yzhang2016.github.io/' target='_blank'>Yong Zhang <sup>2</sup></a>&emsp;
+    <a href='https://xishen0220.github.io/' target='_blank'>Xi Shen <sup>2</sup></a>&emsp; </br>
+    <a href='https://yuguo-xjtu.github.io/' target='_blank'>Yu Guo<sup>1</sup> </a>&emsp;
+    <a href='https://scholar.google.com/citations?hl=zh-CN&user=4oXBp9UAAAAJ' target='_blank'>Ying Shan <sup>2</sup> </a>&emsp;
+    <a target='_blank'>Fei Wang <sup>1</sup> </a>&emsp;
+</div>
+<br>
+<div>
+    <sup>1</sup> Xi'an Jiaotong University &emsp; <sup>2</sup> Tencent AI Lab &emsp; <sup>3</sup> Ant Group &emsp;
+</div>
+<br>
+<i><strong><a href='https://arxiv.org/abs/2211.12194' target='_blank'>CVPR 2023</a></strong></i>
+<br>
+<br>
+![sadtalker](https://user-images.githubusercontent.com/4397546/222490039-b1f6156b-bf00-405b-9fda-0c9a9156f991.gif)
+<b>TL;DR: &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; single portrait image 🙎‍♂️  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;+  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; audio 🎤  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; talking head video 🎞.</b>
+<br>
+</div>
+## 🔥 Highlight
+- 🔥 The extension of the [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) is online. Checkout more details [here](docs/webui_extension.md).
+https://user-images.githubusercontent.com/4397546/231495639-5d4bb925-ea64-4a36-a519-6389917dac29.mp4
+- 🔥 `full image mode` is online! checkout [here](https://github.com/Winfredy/SadTalker#full-bodyimage-generation) for more details.
+| still+enhancer in v0.0.1                 | still + enhancer   in v0.0.2       |   [input image @bagbag1815](https://twitter.com/bagbag1815/status/1642754319094108161) |
+|:--------------------: |:--------------------: | :----: |
+| <video  src="https://user-images.githubusercontent.com/48216707/229484996-5d7be64f-2553-4c9e-a452-c5cf0b8ebafe.mp4" type="video/mp4"> </video> | <video  src="https://user-images.githubusercontent.com/4397546/230717873-355b7bf3-d3de-49f9-a439-9220e623fce7.mp4" type="video/mp4"> </video>  | <img src='./examples/source_image/full_body_2.png' width='380'>
+- 🔥 Several new mode, eg, `still mode`, `reference mode`, `resize mode` are online for better and custom applications.
+- 🔥 Happy to  see more community demos at [bilibili](https://search.bilibili.com/all?keyword=sadtalker&from_source=webtop_search&spm_id_from=333.1007&search_source=3
+), [Youtube](https://www.youtube.com/results?search_query=sadtalker&sp=CAM%253D) and [twitter #sadtalker](https://twitter.com/search?q=%23sadtalker&src=typed_query).
+## 📋 Changelog (Previous changelog can be founded [here](docs/changlelog.md))
+- __[2023.06.12]__: add more new features in WEBUI extension, see the discussion [here](https://github.com/OpenTalker/SadTalker/discussions/386).
+- __[2023.06.05]__: release a new 512 beta face model. Fixed some bugs and improve the performance.
+- __[2023.04.15]__: Adding automatic1111 colab by @camenduru, thanks for this awesome colab: [![sd webui-colab](https://img.shields.io/badge/Automatic1111-Colab-green)](https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/video/stable/stable_diffusion_1_5_video_webui_colab.ipynb).
+- __[2023.04.12]__: adding a more detailed sd-webui installation document, fixed reinstallation problem.
+- __[2023.04.12]__: Fixed the sd-webui safe issues becasue of the 3rd packages, optimize the output path in `sd-webui-extension`.
+- __[2023.04.08]__: ❗️❗️❗️ In v0.0.2, we add a logo watermark to the generated video to prevent abusing since it is very realistic.
+- __[2023.04.08]__: v0.0.2, full image animation, adding baidu driver for download checkpoints. Optimizing the logic about enhancer.
+## 🚧 TODO: See the Discussion https://github.com/OpenTalker/SadTalker/issues/280
+## If you have any problem, please view our [FAQ](docs/FAQ.md) before opening an issue.
+## ⚙️ 1. Installation.
+Tutorials from communities: [中文windows教程](https://www.bilibili.com/video/BV1Dc411W7V6/) | [日本語コース](https://br-d.fanbox.cc/posts/5685086?utm_campaign=manage_post_page&utm_medium=share&utm_source=twitter)
+### Linux:
+1. Installing [anaconda](https://www.anaconda.com/), python and git.
+2. Creating the env and install the requirements.
+  ```bash
+  git clone https://github.com/Winfredy/SadTalker.git
+  cd SadTalker
+  conda create -n sadtalker python=3.8
+  conda activate sadtalker
+  pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
+  conda install ffmpeg
+  pip install -r requirements.txt
+  ### tts is optional for gradio demo.
+  ### pip install TTS
+  ```
+### Windows ([中文windows教程](https://www.bilibili.com/video/BV1Dc411W7V6/)):
+1. Install [Python 3.10.6](https://www.python.org/downloads/windows/), checking "Add Python to PATH".
+2. Install [git](https://git-scm.com/download/win) manually (OR `scoop install git` via [scoop](https://scoop.sh/)).
+3. Install `ffmpeg`, following [this instruction](https://www.wikihow.com/Install-FFmpeg-on-Windows) (OR using `scoop install ffmpeg` via [scoop](https://scoop.sh/)).
+4. Download our SadTalker repository, for example by running `git clone https://github.com/Winfredy/SadTalker.git`.
+5. Download the `checkpoint` and `gfpgan` [below↓](https://github.com/Winfredy/SadTalker#-2-download-trained-models).
+5. Run `start.bat` from Windows Explorer as normal, non-administrator, user, a gradio WebUI demo will be started.
+### Macbook:
+More tips about installnation on Macbook and the Docker file can be founded [here](docs/install.md)
+## 📥 2. Download Trained Models.
+You can run the following script to put all the models in the right place.
+```bash
+bash scripts/download_models.sh
+```
+Other alternatives:
+> we also provide an offline patch (`gfpgan/`), thus, no model will be downloaded when generating.
+**Google Driver**: download our pre-trained model from [ this link (main checkpoints)](https://drive.google.com/file/d/1gwWh45pF7aelNP_P78uDJL8Sycep-K7j/view?usp=sharing) and [ gfpgan (offline patch)](https://drive.google.com/file/d/19AIBsmfcHW6BRJmeqSFlG5fL445Xmsyi?usp=sharing)
+**Github Release Page**: download all the files from the [lastest github release page](https://github.com/Winfredy/SadTalker/releases), and then, put it in ./checkpoints.
+**百度云盘**: we provided the downloaded model in [checkpoints,  提取码: sadt.](https://pan.baidu.com/s/1P4fRgk9gaSutZnn8YW034Q?pwd=sadt) And [gfpgan,  提取码: sadt.](https://pan.baidu.com/s/1kb1BCPaLOWX1JJb9Czbn6w?pwd=sadt)
+<details><summary>Model Details</summary>
+Model explains:
+##### New version
+| Model | Description
+| :--- | :----------
+|checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker.
+|checkpoints/mapping_00109-model.pth.tar | Pre-trained MappingNet in Sadtalker.
+|checkpoints/SadTalker_V0.0.2_256.safetensors | packaged sadtalker checkpoints of old version, 256 face render).
+|checkpoints/SadTalker_V0.0.2_512.safetensors | packaged sadtalker checkpoints of old version, 512 face render).
+|gfpgan/weights | Face detection and enhanced models used in `facexlib` and `gfpgan`.
+##### Old version
+| Model | Description
+| :--- | :----------
+|checkpoints/auido2exp_00300-model.pth | Pre-trained ExpNet in Sadtalker.
+|checkpoints/auido2pose_00140-model.pth | Pre-trained PoseVAE in Sadtalker.
+|checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker.
+|checkpoints/mapping_00109-model.pth.tar | Pre-trained MappingNet in Sadtalker.
+|checkpoints/facevid2vid_00189-model.pth.tar | Pre-trained face-vid2vid model from [the reappearance of face-vid2vid](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis).
+|checkpoints/epoch_20.pth | Pre-trained 3DMM extractor in [Deep3DFaceReconstruction](https://github.com/microsoft/Deep3DFaceReconstruction).
+|checkpoints/wav2lip.pth | Highly accurate lip-sync model in [Wav2lip](https://github.com/Rudrabha/Wav2Lip).
+|checkpoints/shape_predictor_68_face_landmarks.dat | Face landmark model used in [dilb](http://dlib.net/).
+|checkpoints/BFM | 3DMM library file.
+|checkpoints/hub | Face detection models used in [face alignment](https://github.com/1adrianb/face-alignment).
+|gfpgan/weights | Face detection and enhanced models used in `facexlib` and `gfpgan`.
+The final folder will be shown as:
+<img width="331" alt="image" src="https://user-images.githubusercontent.com/4397546/232511411-4ca75cbf-a434-48c5-9ae0-9009e8316484.png">
+</details>
+## 🔮 3. Quick Start ([Best Practice](docs/best_practice.md)).
+### WebUI Demos:
+**Online**: [Huggingface](https://huggingface.co/spaces/vinthony/SadTalker) | [SDWebUI-Colab](https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/video/stable/stable_diffusion_1_5_video_webui_colab.ipynb) | [Colab](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb)
+**Local Autiomatic1111 stable-diffusion webui extension**: please refer to [Autiomatic1111 stable-diffusion webui docs](docs/webui_extension.md).
+**Local gradio demo(highly recommanded!)**: Similar to our [hugging-face demo](https://huggingface.co/spaces/vinthony/SadTalker) can be run by:
+```bash
+## you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced.
+python app.py
+```
+**Local gradio demo(highly recommanded!)**:
+- windows: just double click `webui.bat`, the requirements will be installed automatically.
+- Linux/Mac OS: run `bash webui.sh` to start the webui.
+### Manually usages:
+##### Animating a portrait image from default config:
+```bash
+python inference.py --driven_audio <audio.wav> \
+                    --source_image <video.mp4 or picture.png> \
+                    --enhancer gfpgan
+```
+The results will be saved in `results/$SOME_TIMESTAMP/*.mp4`.
+##### Full body/image Generation:
+Using `--still` to generate a natural full body video. You can add `enhancer` to improve the quality of the generated video.
+```bash
+python inference.py --driven_audio <audio.wav> \
+                    --source_image <video.mp4 or picture.png> \
+                    --result_dir <a file to store results> \
+                    --still \
+                    --preprocess full \
+                    --enhancer gfpgan
+```
+More examples and configuration and tips can be founded in the [ >>> best practice documents <<<](docs/best_practice.md).
+## 🛎 Citation
+If you find our work useful in your research, please consider citing:
+```bibtex
+@article{zhang2022sadtalker,
+  title={SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation},
+  author={Zhang, Wenxuan and Cun, Xiaodong and Wang, Xuan and Zhang, Yong and Shen, Xi and Guo, Yu and Shan, Ying and Wang, Fei},
+  journal={arXiv preprint arXiv:2211.12194},
+  year={2022}
+}
+```
+## 💗 Acknowledgements
+Facerender code borrows heavily from [zhanglonghao's reproduction of face-vid2vid](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis) and [PIRender](https://github.com/RenYurui/PIRender). We thank the authors for sharing their wonderful code. In training process, We also use the model from [Deep3DFaceReconstruction](https://github.com/microsoft/Deep3DFaceReconstruction) and [Wav2lip](https://github.com/Rudrabha/Wav2Lip). We thank for their wonderful work.
+See also these wonderful 3rd libraries we use:
+- **Face Utils**: https://github.com/xinntao/facexlib
+- **Face Enhancement**: https://github.com/TencentARC/GFPGAN
+- **Image/Video Enhancement**:https://github.com/xinntao/Real-ESRGAN
+## 🥂 Extensions:
+- [SadTalker-Video-Lip-Sync](https://github.com/Zz-ww/SadTalker-Video-Lip-Sync) from [@Zz-ww](https://github.com/Zz-ww): SadTalker for Video Lip Editing
+## 🥂 Related Works
+- [StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN (ECCV 2022)](https://github.com/FeiiYin/StyleHEAT)
+- [CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior (CVPR 2023)](https://github.com/Doubiiu/CodeTalker)
+- [VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild (SIGGRAPH Asia 2022)](https://github.com/vinthony/video-retalking)
+- [DPE: Disentanglement of Pose and Expression for General Video Portrait Editing (CVPR 2023)](https://github.com/Carlyx/DPE)
+- [3D GAN Inversion with Facial Symmetry Prior (CVPR 2023)](https://github.com/FeiiYin/SPI/)
+- [T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations (CVPR 2023)](https://github.com/Mael-zys/T2M-GPT)
+## 📢 Disclaimer
+This is not an official product of Tencent. This repository can only be used for personal/research/non-commercial purposes.
+LOGO: color and font suggestion: [ChatGPT](ai.com), logo font：[Montserrat Alternates
+](https://fonts.google.com/specimen/Montserrat+Alternates?preview.text=SadTalker&preview.text_type=custom&query=mont).
+All the copyright of the demo images and audio are from communities users or the geneartion from stable diffusion. Free free to contact us if you feel uncomfortable.

Untitled.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

__pycache__/commons.cpython-310.pyc ADDED Viewed

Binary file (5.96 kB). View file

__pycache__/mel_processing.cpython-310.pyc ADDED Viewed

Binary file (3.41 kB). View file

__pycache__/models.cpython-310.pyc ADDED Viewed

Binary file (11.2 kB). View file

__pycache__/modules.cpython-310.pyc ADDED Viewed

Binary file (10 kB). View file

__pycache__/tts_voice.cpython-310.pyc ADDED Viewed

Binary file (1.8 kB). View file

__pycache__/utils.cpython-310.pyc ADDED Viewed

Binary file (9.98 kB). View file

app.py ADDED Viewed

	@@ -0,0 +1,608 @@

+import os, sys
+import tempfile
+import gradio as gr
+from src.gradio_demo import SadTalker
+# from src.utils.text2speech import TTSTalker
+from huggingface_hub import snapshot_download
+import torch
+import librosa
+from scipy.io.wavfile import write
+from transformers import WavLMModel
+import utils
+from models import SynthesizerTrn
+from mel_processing import mel_spectrogram_torch
+from speaker_encoder.voice_encoder import SpeakerEncoder
+import time
+from textwrap import dedent
+import mdtex2html
+from loguru import logger
+from transformers import AutoModel, AutoTokenizer
+from tts_voice import tts_order_voice
+import edge_tts
+import tempfile
+import anyio
+def get_source_image(image):
+        return image
+try:
+    import webui  # in webui
+    in_webui = True
+except:
+    in_webui = False
+def toggle_audio_file(choice):
+    if choice == False:
+        return gr.update(visible=True), gr.update(visible=False)
+    else:
+        return gr.update(visible=False), gr.update(visible=True)
+def ref_video_fn(path_of_ref_video):
+    if path_of_ref_video is not None:
+        return gr.update(value=True)
+    else:
+        return gr.update(value=False)
+def download_model():
+    REPO_ID = 'vinthony/SadTalker-V002rc'
+    snapshot_download(repo_id=REPO_ID, local_dir='./checkpoints', local_dir_use_symlinks=True)
+def sadtalker_demo():
+    download_model()
+    sad_talker = SadTalker(lazy_load=True)
+    # tts_talker = TTSTalker()
+download_model()
+sad_talker = SadTalker(lazy_load=True)
+# ChatGLM2 & FreeVC
+'''
+def get_wavlm():
+    os.system('gdown https://drive.google.com/uc?id=12-cB34qCTvByWT-QtOcZaqwwO21FLSqU')
+    shutil.move('WavLM-Large.pt', 'wavlm')
+'''
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+smodel = SpeakerEncoder('speaker_encoder/ckpt/pretrained_bak_5805000.pt')
+print("Loading FreeVC(24k)...")
+hps = utils.get_hparams_from_file("configs/freevc-24.json")
+freevc_24 = SynthesizerTrn(
+    hps.data.filter_length // 2 + 1,
+    hps.train.segment_size // hps.data.hop_length,
+    **hps.model).to(device)
+_ = freevc_24.eval()
+_ = utils.load_checkpoint("checkpoint/freevc-24.pth", freevc_24, None)
+print("Loading WavLM for content...")
+cmodel = WavLMModel.from_pretrained("microsoft/wavlm-large").to(device)
+def convert(model, src, tgt):
+    with torch.no_grad():
+        # tgt
+        wav_tgt, _ = librosa.load(tgt, sr=hps.data.sampling_rate)
+        wav_tgt, _ = librosa.effects.trim(wav_tgt, top_db=20)
+        if model == "FreeVC" or model == "FreeVC (24kHz)":
+            g_tgt = smodel.embed_utterance(wav_tgt)
+            g_tgt = torch.from_numpy(g_tgt).unsqueeze(0).to(device)
+        else:
+            wav_tgt = torch.from_numpy(wav_tgt).unsqueeze(0).to(device)
+            mel_tgt = mel_spectrogram_torch(
+                wav_tgt,
+                hps.data.filter_length,
+                hps.data.n_mel_channels,
+                hps.data.sampling_rate,
+                hps.data.hop_length,
+                hps.data.win_length,
+                hps.data.mel_fmin,
+                hps.data.mel_fmax
+            )
+        # src
+        wav_src, _ = librosa.load(src, sr=hps.data.sampling_rate)
+        wav_src = torch.from_numpy(wav_src).unsqueeze(0).to(device)
+        c = cmodel(wav_src).last_hidden_state.transpose(1, 2).to(device)
+        # infer
+        if model == "FreeVC":
+            audio = freevc.infer(c, g=g_tgt)
+        elif model == "FreeVC-s":
+            audio = freevc_s.infer(c, mel=mel_tgt)
+        else:
+            audio = freevc_24.infer(c, g=g_tgt)
+        audio = audio[0][0].data.cpu().float().numpy()
+        if model == "FreeVC" or model == "FreeVC-s":
+            write("out.wav", hps.data.sampling_rate, audio)
+        else:
+            write("out.wav", 24000, audio)
+    out = "out.wav"
+    return out
+# GLM2
+language_dict = tts_order_voice
+# fix timezone in Linux
+os.environ["TZ"] = "Asia/Shanghai"
+try:
+    time.tzset()  # type: ignore # pylint: disable=no-member
+except Exception:
+    # Windows
+    logger.warning("Windows, cant run time.tzset()")
+# model_name = "THUDM/chatglm2-6b"
+model_name = "THUDM/chatglm2-6b-int4"
+RETRY_FLAG = False
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+# model = AutoModel.from_pretrained(model_name, trust_remote_code=True).cuda()
+# 4/8 bit
+# model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).quantize(4).cuda()
+has_cuda = torch.cuda.is_available()
+# has_cuda = False  # force cpu
+if has_cuda:
+    model_glm = (
+        AutoModel.from_pretrained(model_name, trust_remote_code=True).cuda().half()
+    )  # 3.92G
+else:
+    model_glm = AutoModel.from_pretrained(
+        model_name, trust_remote_code=True
+    ).float()  # .float() .half().float()
+model_glm = model_glm.eval()
+_ = """Override Chatbot.postprocess"""
+def postprocess(self, y):
+    if y is None:
+        return []
+    for i, (message, response) in enumerate(y):
+        y[i] = (
+            None if message is None else mdtex2html.convert((message)),
+            None if response is None else mdtex2html.convert(response),
+        )
+    return y
+gr.Chatbot.postprocess = postprocess
+def parse_text(text):
+    """copy from https://github.com/GaiZhenbiao/ChuanhuChatGPT/"""
+    lines = text.split("\n")
+    lines = [line for line in lines if line != ""]
+    count = 0
+    for i, line in enumerate(lines):
+        if "```" in line:
+            count += 1
+            items = line.split("`")
+            if count % 2 == 1:
+                lines[i] = f'<pre><code class="language-{items[-1]}">'
+            else:
+                lines[i] = "<br></code></pre>"
+        else:
+            if i > 0:
+                if count % 2 == 1:
+                    line = line.replace("`", r"\`")
+                    line = line.replace("<", "&lt;")
+                    line = line.replace(">", "&gt;")
+                    line = line.replace(" ", "&nbsp;")
+                    line = line.replace("*", "&ast;")
+                    line = line.replace("_", "&lowbar;")
+                    line = line.replace("-", "&#45;")
+                    line = line.replace(".", "&#46;")
+                    line = line.replace("!", "&#33;")
+                    line = line.replace("(", "&#40;")
+                    line = line.replace(")", "&#41;")
+                    line = line.replace("$", "&#36;")
+                lines[i] = "<br>" + line
+    text = "".join(lines)
+    return text
+def predict(
+    RETRY_FLAG, input, chatbot, max_length, top_p, temperature, history, past_key_values
+):
+    try:
+        chatbot.append((parse_text(input), ""))
+    except Exception as exc:
+        logger.error(exc)
+        logger.debug(f"{chatbot=}")
+        _ = """
+        if chatbot:
+            chatbot[-1] = (parse_text(input), str(exc))
+            yield chatbot, history, past_key_values
+        # """
+        yield chatbot, history, past_key_values
+    for response, history, past_key_values in model_glm.stream_chat(
+        tokenizer,
+        input,
+        history,
+        past_key_values=past_key_values,
+        return_past_key_values=True,
+        max_length=max_length,
+        top_p=top_p,
+        temperature=temperature,
+    ):
+        chatbot[-1] = (parse_text(input), parse_text(response))
+        # chatbot[-1][-1] = parse_text(response)
+        yield chatbot, history, past_key_values, parse_text(response)
+def trans_api(input, max_length=4096, top_p=0.8, temperature=0.2):
+    if max_length < 10:
+        max_length = 4096
+    if top_p < 0.1 or top_p > 1:
+        top_p = 0.85
+    if temperature <= 0 or temperature > 1:
+        temperature = 0.01
+    try:
+        res, _ = model_glm.chat(
+            tokenizer,
+            input,
+            history=[],
+            past_key_values=None,
+            max_length=max_length,
+            top_p=top_p,
+            temperature=temperature,
+        )
+        # logger.debug(f"{res=} \n{_=}")
+    except Exception as exc:
+        logger.error(f"{exc=}")
+        res = str(exc)
+    return res
+def reset_user_input():
+    return gr.update(value="")
+def reset_state():
+    return [], [], None, ""
+# Delete last turn
+def delete_last_turn(chat, history):
+    if chat and history:
+        chat.pop(-1)
+        history.pop(-1)
+    return chat, history
+# Regenerate response
+def retry_last_answer(
+    user_input, chatbot, max_length, top_p, temperature, history, past_key_values
+):
+    if chatbot and history:
+        # Removing the previous conversation from chat
+        chatbot.pop(-1)
+        # Setting up a flag to capture a retry
+        RETRY_FLAG = True
+        # Getting last message from user
+        user_input = history[-1][0]
+        # Removing bot response from the history
+        history.pop(-1)
+    yield from predict(
+        RETRY_FLAG,  # type: ignore
+        user_input,
+        chatbot,
+        max_length,
+        top_p,
+        temperature,
+        history,
+        past_key_values,
+    )
+# print
+def print(text):
+    return text
+# TTS
+async def text_to_speech_edge(text, language_code):
+    voice = language_dict[language_code]
+    communicate = edge_tts.Communicate(text, voice)
+    with tempfile.NamedTemporaryFile(delete=False, suffix=".mp3") as tmp_file:
+        tmp_path = tmp_file.name
+    await communicate.save(tmp_path)
+    return tmp_path
+with gr.Blocks(title="ChatGLM2-6B-int4", theme=gr.themes.Soft(text_size="sm"), analytics_enabled=False) as demo:
+    gr.HTML("<center>"
+            "<h1>📺💕🎶 - ChatGLM2+声音克隆+视频对话：和喜欢的角色畅所欲言吧！</h1>"
+            "</center>")
+    gr.Markdown("## <center>🥳 - ChatGLM2+FreeVC+SadTalker，为您打造沉浸式的视频对话体验，支持中英双语</center>")
+    gr.Markdown("## <center>🌊 - 更多精彩应用，尽在[滔滔AI](http://www.talktalkai.com)；滔滔AI，为爱滔滔！💕</center>")
+    gr.Markdown("### <center>⭐ - 如果您喜欢这个程序，欢迎给我的[GitHub项目](https://github.com/KevinWang676/ChatGLM2-Voice-Cloning)点赞支持！</center>")
+    with gr.Tab("🍻 - ChatGLM2聊天区"):
+        with gr.Accordion("📒 相关信息", open=False):
+            _ = f""" ChatGLM2的可选参数信息：
+                * Low temperature: responses will be more deterministic and focused; High temperature: responses more creative.
+                * Suggested temperatures -- translation: up to 0.3; chatting: > 0.4
+                * Top P controls dynamic vocabulary selection based on context.\n
+                如果您想让ChatGLM2进行角色扮演并与之对话，请先输入恰当的提示词，如“请你扮演成动漫角色蜡笔小新并和我进行对话”；您也可以为ChatGLM2提供自定义的角色设定\n
+                当您使用声音克隆功能时，请先在此程序的对应位置上传一段您喜欢的音频
+                """
+            gr.Markdown(dedent(_))
+        chatbot = gr.Chatbot(height=300)
+        with gr.Row():
+            with gr.Column(scale=4):
+                with gr.Column(scale=12):
+                    user_input = gr.Textbox(
+                        label="请在此处和GLM2聊天 (按回车键即可发送)",
+                        placeholder="聊点什么吧",
+                    )
+                    RETRY_FLAG = gr.Checkbox(value=False, visible=False)
+        with gr.Column(min_width=32, scale=1):
+            with gr.Row():
+                submitBtn = gr.Button("开始和GLM2交流吧", variant="primary")
+                deleteBtn = gr.Button("删除最新一轮对话", variant="secondary")
+                retryBtn = gr.Button("重新生成最新一轮对话", variant="secondary")
+        with gr.Accordion("🔧 更多设置", open=False):
+            with gr.Row():
+                emptyBtn = gr.Button("清空所有聊天记录")
+                max_length = gr.Slider(
+                    0,
+                    32768,
+                    value=8192,
+                    step=1.0,
+                    label="Maximum length",
+                    interactive=True,
+                )
+                top_p = gr.Slider(
+                    0, 1, value=0.85, step=0.01, label="Top P", interactive=True
+                )
+                temperature = gr.Slider(
+                    0.01, 1, value=0.95, step=0.01, label="Temperature", interactive=True
+                )
+        with gr.Row():
+            test1 = gr.Textbox(label="GLM2的最新回答 (可编辑)", lines = 3)
+            with gr.Column():
+                language = gr.Dropdown(choices=list(language_dict.keys()), value="普通话 (中国大陆)-Xiaoxiao-女", label="请选择文本对应的语言及您喜欢的说话人")
+                tts_btn = gr.Button("生成对应的音频吧", variant="primary")
+            output_audio = gr.Audio(type="filepath", label="为您生成的音频", interactive=False)
+        tts_btn.click(text_to_speech_edge, inputs=[test1, language], outputs=[output_audio])
+        with gr.Row():
+            model_choice = gr.Dropdown(choices=["FreeVC", "FreeVC-s", "FreeVC (24kHz)"], value="FreeVC (24kHz)", label="Model", visible=False)
+            audio1 = output_audio
+            audio2 = gr.Audio(label="请上传您喜欢的声音进行声音克隆", type='filepath')
+            clone_btn = gr.Button("开始AI声音克隆吧", variant="primary")
+            audio_cloned =  gr.Audio(label="为您生成的专属声音克隆音频", type='filepath')
+        clone_btn.click(convert, inputs=[model_choice, audio1, audio2], outputs=[audio_cloned])
+        history = gr.State([])
+        past_key_values = gr.State(None)
+        user_input.submit(
+            predict,
+            [
+                RETRY_FLAG,
+                user_input,
+                chatbot,
+                max_length,
+                top_p,
+                temperature,
+                history,
+                past_key_values,
+            ],
+            [chatbot, history, past_key_values, test1],
+            show_progress="full",
+        )
+        submitBtn.click(
+            predict,
+            [
+                RETRY_FLAG,
+                user_input,
+                chatbot,
+                max_length,
+                top_p,
+                temperature,
+                history,
+                past_key_values,
+            ],
+            [chatbot, history, past_key_values, test1],
+            show_progress="full",
+            api_name="predict",
+        )
+        submitBtn.click(reset_user_input, [], [user_input])
+        emptyBtn.click(
+            reset_state, outputs=[chatbot, history, past_key_values, test1], show_progress="full"
+        )
+        retryBtn.click(
+            retry_last_answer,
+            inputs=[
+                user_input,
+                chatbot,
+                max_length,
+                top_p,
+                temperature,
+                history,
+                past_key_values,
+            ],
+            # outputs = [chatbot, history, last_user_message, user_message]
+            outputs=[chatbot, history, past_key_values, test1],
+        )
+        deleteBtn.click(delete_last_turn, [chatbot, history], [chatbot, history])
+        with gr.Accordion("📔 提示词示例", open=False):
+            etext = """In America, where cars are an important part of the national psyche, a decade ago people had suddenly started to drive less, which had not happened since the oil shocks of the 1970s. """
+            examples = gr.Examples(
+                examples=[
+                    ["Explain the plot of Cinderella in a sentence."],
+                    [
+                        "How long does it take to become proficient in French, and what are the best methods for retaining information?"
+                    ],
+                    ["What are some common mistakes to avoid when writing code?"],
+                    ["Build a prompt to generate a beautiful portrait of a horse"],
+                    ["Suggest four metaphors to describe the benefits of AI"],
+                    ["Write a pop song about leaving home for the sandy beaches."],
+                    ["Write a summary demonstrating my ability to tame lions"],
+                    ["鲁迅和周树人什么关系"],
+                    ["从前有一头牛，这头牛后面有什么？"],
+                    ["正无穷大加一大于正无穷大吗？"],
+                    ["正无穷大加正无穷大大于正无穷大吗？"],
+                    ["-2的平方根等于什么"],
+                    ["树上有5只鸟，猎人开枪打死了一只。树上还有几只鸟？"],
+                    ["树上有11只鸟，猎人开枪打死了一只。树上还有几只鸟？提示：需考虑鸟可能受惊吓飞走。"],
+                    ["鲁迅和周树人什么关系 用英文回答"],
+                    ["以红楼梦的行文风格写一张委婉的请假条。不少于320字。"],
+                    [f"{etext} 翻成中文，列出3个版本"],
+                    [f"{etext} \n 翻成中文，保留原意，但使用文学性的语言。不要写解释。列出3个版本"],
+                    ["js 判断一个数是不是质数"],
+                    ["js 实现python 的 range(10)"],
+                    ["js 实现python 的 [*(range(10)]"],
+                    ["假定 1 + 2 = 4, 试求 7 + 8"],
+                    ["Erkläre die Handlung von Cinderella in einem Satz."],
+                    ["Erkläre die Handlung von Cinderella in einem Satz. Auf Deutsch"],
+                ],
+                inputs=[user_input],
+                examples_per_page=30,
+            )
+        with gr.Accordion("For Chat/Translation API", open=False, visible=False):
+            input_text = gr.Text()
+            tr_btn = gr.Button("Go", variant="primary")
+            out_text = gr.Text()
+        tr_btn.click(
+            trans_api,
+            [input_text, max_length, top_p, temperature],
+            out_text,
+            # show_progress="full",
+            api_name="tr",
+        )
+        _ = """
+        input_text.submit(
+            trans_api,
+            [input_text, max_length, top_p, temperature],
+            out_text,
+            show_progress="full",
+            api_name="tr1",
+        )
+        # """
+    with gr.Tab("📺 - 视频聊天区"):
+        with gr.Row().style(equal_height=False):
+            with gr.Column(variant='panel'):
+                with gr.Tabs(elem_id="sadtalker_source_image"):
+                    with gr.TabItem('图片上传'):
+                        with gr.Row():
+                            source_image = gr.Image(label="请上传一张您喜欢角色的图片", source="upload", type="filepath", elem_id="img2img_image").style(width=512)
+                with gr.Tabs(elem_id="sadtalker_driven_audio"):
+                    with gr.TabItem('💡您还可以将视频下载到本地'):
+                        with gr.Row():
+                            driven_audio = audio_cloned
+                            driven_audio_no = gr.Audio(label="Use IDLE mode, no audio is required", source="upload", type="filepath", visible=False)
+                            with gr.Column():
+                                use_idle_mode = gr.Checkbox(label="Use Idle Animation", visible=False)
+                                length_of_audio = gr.Number(value=5, label="The length(seconds) of the generated video.", visible=False)
+                                use_idle_mode.change(toggle_audio_file, inputs=use_idle_mode, outputs=[driven_audio, driven_audio_no]) # todo
+                        with gr.Row():
+                            ref_video = gr.Video(label="Reference Video", source="upload", type="filepath", elem_id="vidref", visible=False).style(width=512)
+                            with gr.Column():
+                                use_ref_video = gr.Checkbox(label="Use Reference Video", visible=False)
+                                ref_info = gr.Radio(['pose', 'blink','pose+blink', 'all'], value='pose', label='Reference Video',info="How to borrow from reference Video?((fully transfer, aka, video driving mode))", visible=False)
+                            ref_video.change(ref_video_fn, inputs=ref_video, outputs=[use_ref_video]) # todo
+            with gr.Column(variant='panel'):
+                with gr.Tabs(elem_id="sadtalker_checkbox"):
+                    with gr.TabItem('视频设置'):
+                        with gr.Column(variant='panel'):
+                            # width = gr.Slider(minimum=64, elem_id="img2img_width", maximum=2048, step=8, label="Manually Crop Width", value=512) # img2img_width
+                            # height = gr.Slider(minimum=64, elem_id="img2img_height", maximum=2048, step=8, label="Manually Crop Height", value=512) # img2img_width
+                            with gr.Row():
+                                pose_style = gr.Slider(minimum=0, maximum=45, step=1, label="Pose style", value=0, visible=False) #
+                                exp_weight = gr.Slider(minimum=0, maximum=3, step=0.1, label="expression scale", value=1, visible=False) #
+                                blink_every = gr.Checkbox(label="use eye blink", value=True, visible=False)
+                            with gr.Row():
+                                size_of_image = gr.Radio([256, 512], value=256, label='face model resolution', info="use 256/512 model?", visible=False) #
+                                preprocess_type = gr.Radio(['crop', 'full'], value='crop', label='是否聚焦角色面部', info="crop：视频会聚焦角色面部；full：视频会显示图片全貌")
+                            with gr.Row():
+                                is_still_mode = gr.Checkbox(label="静态模式 (开启静态模式，角色的面部动作会减少；默认开启)", value=True)
+                                facerender = gr.Radio(['facevid2vid','pirender'], value='facevid2vid', label='facerender', info="which face render?", visible=False)
+                            with gr.Row():
+                                batch_size = gr.Slider(label="Batch size (数值越大，生成速度越快；若显卡性能好，可增大数值)", step=1, maximum=32, value=2)
+                                enhancer = gr.Checkbox(label="GFPGAN as Face enhancer", value=True, visible=False)
+                            submit = gr.Button('开始视频聊天吧', elem_id="sadtalker_generate", variant='primary')
+                with gr.Tabs(elem_id="sadtalker_genearted"):
+                        gen_video = gr.Video(label="为您生成的专属视频", format="mp4").style(width=256)
+        submit.click(
+                fn=sad_talker.test,
+                inputs=[source_image,
+                        driven_audio,
+                        preprocess_type,
+                        is_still_mode,
+                        enhancer,
+                        batch_size,
+                        size_of_image,
+                        pose_style,
+                        facerender,
+                        exp_weight,
+                        use_ref_video,
+                        ref_video,
+                        ref_info,
+                        use_idle_mode,
+                        length_of_audio,
+                        blink_every
+                        ],
+                outputs=[gen_video]
+                )
+    gr.Markdown("### <center>注意❗：请不要生成会对个人以及组织造成侵害的内容，此程序仅供科研、学习及个人娱乐使用。</center>")
+    gr.Markdown("<center>💡- 如何使用此程序：输入您对ChatGLM的提问后，依次点击“开始和GLM2交流吧”、“生成对应的音频吧”、“开始AI声音克隆吧”、“开始视频聊天吧”三个按键即可；使用声音克隆功能时，请先上传一段您喜欢的音频</center>")
+    gr.HTML('''
+        <div class="footer">
+                    <p>🌊🏞️🎶 - 江水东流急，滔滔无尽声。 明·顾璘
+                    </p>
+        </div>
+    ''')
+demo.queue().launch(show_error=True, debug=True)

app_sadtalker.py ADDED Viewed

	@@ -0,0 +1,111 @@

+import os, sys
+import gradio as gr
+from src.gradio_demo import SadTalker
+try:
+    import webui  # in webui
+    in_webui = True
+except:
+    in_webui = False
+def toggle_audio_file(choice):
+    if choice == False:
+        return gr.update(visible=True), gr.update(visible=False)
+    else:
+        return gr.update(visible=False), gr.update(visible=True)
+def ref_video_fn(path_of_ref_video):
+    if path_of_ref_video is not None:
+        return gr.update(value=True)
+    else:
+        return gr.update(value=False)
+def sadtalker_demo(checkpoint_path='checkpoints', config_path='src/config', warpfn=None):
+    sad_talker = SadTalker(checkpoint_path, config_path, lazy_load=True)
+    with gr.Blocks(analytics_enabled=False) as sadtalker_interface:
+        gr.Markdown("<div align='center'> <h2> 😭 SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation (CVPR 2023) </span> </h2> \
+                    <a style='font-size:18px;color: #efefef' href='https://arxiv.org/abs/2211.12194'>Arxiv</a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
+                    <a style='font-size:18px;color: #efefef' href='https://sadtalker.github.io'>Homepage</a>  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
+                     <a style='font-size:18px;color: #efefef' href='https://github.com/Winfredy/SadTalker'> Github </div>")
+        with gr.Row().style(equal_height=False):
+            with gr.Column(variant='panel'):
+                with gr.Tabs(elem_id="sadtalker_source_image"):
+                    with gr.TabItem('Upload image'):
+                        with gr.Row():
+                            source_image = gr.Image(label="Source image", source="upload", type="filepath", elem_id="img2img_image").style(width=512)
+                with gr.Tabs(elem_id="sadtalker_driven_audio"):
+                    with gr.TabItem('Upload OR TTS'):
+                        with gr.Column(variant='panel'):
+                            driven_audio = gr.Audio(label="Input audio", source="upload", type="filepath")
+                        if sys.platform != 'win32' and not in_webui:
+                            from src.utils.text2speech import TTSTalker
+                            tts_talker = TTSTalker()
+                            with gr.Column(variant='panel'):
+                                input_text = gr.Textbox(label="Generating audio from text", lines=5, placeholder="please enter some text here, we genreate the audio from text using @Coqui.ai TTS.")
+                                tts = gr.Button('Generate audio',elem_id="sadtalker_audio_generate", variant='primary')
+                                tts.click(fn=tts_talker.test, inputs=[input_text], outputs=[driven_audio])
+            with gr.Column(variant='panel'):
+                with gr.Tabs(elem_id="sadtalker_checkbox"):
+                    with gr.TabItem('Settings'):
+                        gr.Markdown("need help? please visit our [best practice page](https://github.com/OpenTalker/SadTalker/blob/main/docs/best_practice.md) for more detials")
+                        with gr.Column(variant='panel'):
+                            # width = gr.Slider(minimum=64, elem_id="img2img_width", maximum=2048, step=8, label="Manually Crop Width", value=512) # img2img_width
+                            # height = gr.Slider(minimum=64, elem_id="img2img_height", maximum=2048, step=8, label="Manually Crop Height", value=512) # img2img_width
+                            pose_style = gr.Slider(minimum=0, maximum=46, step=1, label="Pose style", value=0) #
+                            size_of_image = gr.Radio([256, 512], value=256, label='face model resolution', info="use 256/512 model?") #
+                            preprocess_type = gr.Radio(['crop', 'resize','full', 'extcrop', 'extfull'], value='crop', label='preprocess', info="How to handle input image?")
+                            is_still_mode = gr.Checkbox(label="Still Mode (fewer hand motion, works with preprocess `full`)")
+                            batch_size = gr.Slider(label="batch size in generation", step=1, maximum=10, value=2)
+                            enhancer = gr.Checkbox(label="GFPGAN as Face enhancer")
+                            submit = gr.Button('Generate', elem_id="sadtalker_generate", variant='primary')
+                with gr.Tabs(elem_id="sadtalker_genearted"):
+                        gen_video = gr.Video(label="Generated video", format="mp4").style(width=256)
+        if warpfn:
+            submit.click(
+                        fn=warpfn(sad_talker.test),
+                        inputs=[source_image,
+                                driven_audio,
+                                preprocess_type,
+                                is_still_mode,
+                                enhancer,
+                                batch_size,
+                                size_of_image,
+                                pose_style
+                                ],
+                        outputs=[gen_video]
+                        )
+        else:
+            submit.click(
+                        fn=sad_talker.test,
+                        inputs=[source_image,
+                                driven_audio,
+                                preprocess_type,
+                                is_still_mode,
+                                enhancer,
+                                batch_size,
+                                size_of_image,
+                                pose_style
+                                ],
+                        outputs=[gen_video]
+                        )
+    return sadtalker_interface
+if __name__ == "__main__":
+    demo = sadtalker_demo()
+    demo.queue()
+    demo.launch(share=True)

checkpoint/__init__.py ADDED Viewed

File without changes

checkpoint/freevc-24.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7b39a86fefbc9ec6e30be8d26ee2a6aa5ffe6d235f6ab15773d01cdf348e5b20
+size 472644351

checkpoints/SadTalker_V0.0.2_256.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c211f5d6de003516bf1bbda9f47049a4c9c99133b1ab565c6961e5af16477bff
+size 725066984

checkpoints/SadTalker_V0.0.2_512.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0e063f7ff5258240bdb0f7690783a7b1374e6a4a81ce8fa33456f4cd49694340
+size 725066984

checkpoints/mapping_00109-model.pth.tar ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:84a8642468a3fcfdd9ab6be955267043116c2bec2284686a5262f1eaf017f64c
+size 155779231

checkpoints/mapping_00229-model.pth.tar ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:62a1e06006cc963220f6477438518ed86e9788226c62ae382ddc42fbcefb83f1
+size 155521183

cog.yaml ADDED Viewed

	@@ -0,0 +1,35 @@

+build:
+  gpu: true
+  cuda: "11.3"
+  python_version: "3.8"
+  system_packages:
+    - "ffmpeg"
+    - "libgl1-mesa-glx"
+    - "libglib2.0-0"
+  python_packages:
+    - "torch==1.12.1"
+    - "torchvision==0.13.1"
+    - "torchaudio==0.12.1"
+    - "joblib==1.1.0"
+    - "scikit-image==0.19.3"
+    - "basicsr==1.4.2"
+    - "facexlib==0.3.0"
+    - "resampy==0.3.1"
+    - "pydub==0.25.1"
+    - "scipy==1.10.1"
+    - "kornia==0.6.8"
+    - "face_alignment==1.3.5"
+    - "imageio==2.19.3"
+    - "imageio-ffmpeg==0.4.7"
+    - "librosa==0.9.2" #
+    - "tqdm==4.65.0"
+    - "yacs==0.1.8"
+    - "gfpgan==1.3.8"
+    - "dlib-bin==19.24.1"
+    - "av==10.0.0"
+    - "trimesh==3.9.20"
+  run:
+    - mkdir -p /root/.cache/torch/hub/checkpoints/ && wget --output-document "/root/.cache/torch/hub/checkpoints/s3fd-619a316812.pth" "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth"
+    - mkdir -p /root/.cache/torch/hub/checkpoints/ && wget --output-document "/root/.cache/torch/hub/checkpoints/2DFAN4-cd938726ad.zip" "https://www.adrianbulat.com/downloads/python-fan/2DFAN4-cd938726ad.zip"
+predict: "predict.py:Predictor"

commons.py ADDED Viewed

	@@ -0,0 +1,171 @@

+import math
+import numpy as np
+import torch
+from torch import nn
+from torch.nn import functional as F
+def init_weights(m, mean=0.0, std=0.01):
+  classname = m.__class__.__name__
+  if classname.find("Conv") != -1:
+    m.weight.data.normal_(mean, std)
+def get_padding(kernel_size, dilation=1):
+  return int((kernel_size*dilation - dilation)/2)
+def convert_pad_shape(pad_shape):
+  l = pad_shape[::-1]
+  pad_shape = [item for sublist in l for item in sublist]
+  return pad_shape
+def intersperse(lst, item):
+  result = [item] * (len(lst) * 2 + 1)
+  result[1::2] = lst
+  return result
+def kl_divergence(m_p, logs_p, m_q, logs_q):
+  """KL(P||Q)"""
+  kl = (logs_q - logs_p) - 0.5
+  kl += 0.5 * (torch.exp(2. * logs_p) + ((m_p - m_q)**2)) * torch.exp(-2. * logs_q)
+  return kl
+def rand_gumbel(shape):
+  """Sample from the Gumbel distribution, protect from overflows."""
+  uniform_samples = torch.rand(shape) * 0.99998 + 0.00001
+  return -torch.log(-torch.log(uniform_samples))
+def rand_gumbel_like(x):
+  g = rand_gumbel(x.size()).to(dtype=x.dtype, device=x.device)
+  return g
+def slice_segments(x, ids_str, segment_size=4):
+  ret = torch.zeros_like(x[:, :, :segment_size])
+  for i in range(x.size(0)):
+    idx_str = ids_str[i]
+    idx_end = idx_str + segment_size
+    ret[i] = x[i, :, idx_str:idx_end]
+  return ret
+def rand_slice_segments(x, x_lengths=None, segment_size=4):
+  b, d, t = x.size()
+  if x_lengths is None:
+    x_lengths = t
+  ids_str_max = x_lengths - segment_size + 1
+  ids_str = (torch.rand([b]).to(device=x.device) * ids_str_max).to(dtype=torch.long)
+  ret = slice_segments(x, ids_str, segment_size)
+  return ret, ids_str
+def rand_spec_segments(x, x_lengths=None, segment_size=4):
+  b, d, t = x.size()
+  if x_lengths is None:
+    x_lengths = t
+  ids_str_max = x_lengths - segment_size
+  ids_str = (torch.rand([b]).to(device=x.device) * ids_str_max).to(dtype=torch.long)
+  ret = slice_segments(x, ids_str, segment_size)
+  return ret, ids_str
+def get_timing_signal_1d(
+    length, channels, min_timescale=1.0, max_timescale=1.0e4):
+  position = torch.arange(length, dtype=torch.float)
+  num_timescales = channels // 2
+  log_timescale_increment = (
+      math.log(float(max_timescale) / float(min_timescale)) /
+      (num_timescales - 1))
+  inv_timescales = min_timescale * torch.exp(
+      torch.arange(num_timescales, dtype=torch.float) * -log_timescale_increment)
+  scaled_time = position.unsqueeze(0) * inv_timescales.unsqueeze(1)
+  signal = torch.cat([torch.sin(scaled_time), torch.cos(scaled_time)], 0)
+  signal = F.pad(signal, [0, 0, 0, channels % 2])
+  signal = signal.view(1, channels, length)
+  return signal
+def add_timing_signal_1d(x, min_timescale=1.0, max_timescale=1.0e4):
+  b, channels, length = x.size()
+  signal = get_timing_signal_1d(length, channels, min_timescale, max_timescale)
+  return x + signal.to(dtype=x.dtype, device=x.device)
+def cat_timing_signal_1d(x, min_timescale=1.0, max_timescale=1.0e4, axis=1):
+  b, channels, length = x.size()
+  signal = get_timing_signal_1d(length, channels, min_timescale, max_timescale)
+  return torch.cat([x, signal.to(dtype=x.dtype, device=x.device)], axis)
+def subsequent_mask(length):
+  mask = torch.tril(torch.ones(length, length)).unsqueeze(0).unsqueeze(0)
+  return mask
+@torch.jit.script
+def fused_add_tanh_sigmoid_multiply(input_a, input_b, n_channels):
+  n_channels_int = n_channels[0]
+  in_act = input_a + input_b
+  t_act = torch.tanh(in_act[:, :n_channels_int, :])
+  s_act = torch.sigmoid(in_act[:, n_channels_int:, :])
+  acts = t_act * s_act
+  return acts
+def convert_pad_shape(pad_shape):
+  l = pad_shape[::-1]
+  pad_shape = [item for sublist in l for item in sublist]
+  return pad_shape
+def shift_1d(x):
+  x = F.pad(x, convert_pad_shape([[0, 0], [0, 0], [1, 0]]))[:, :, :-1]
+  return x
+def sequence_mask(length, max_length=None):
+  if max_length is None:
+    max_length = length.max()
+  x = torch.arange(max_length, dtype=length.dtype, device=length.device)
+  return x.unsqueeze(0) < length.unsqueeze(1)
+def generate_path(duration, mask):
+  """
+  duration: [b, 1, t_x]
+  mask: [b, 1, t_y, t_x]
+  """
+  device = duration.device
+  b, _, t_y, t_x = mask.shape
+  cum_duration = torch.cumsum(duration, -1)
+  cum_duration_flat = cum_duration.view(b * t_x)
+  path = sequence_mask(cum_duration_flat, t_y).to(mask.dtype)
+  path = path.view(b, t_x, t_y)
+  path = path - F.pad(path, convert_pad_shape([[0, 0], [1, 0], [0, 0]]))[:, :-1]
+  path = path.unsqueeze(1).transpose(2,3) * mask
+  return path
+def clip_grad_value_(parameters, clip_value, norm_type=2):
+  if isinstance(parameters, torch.Tensor):
+    parameters = [parameters]
+  parameters = list(filter(lambda p: p.grad is not None, parameters))
+  norm_type = float(norm_type)
+  if clip_value is not None:
+    clip_value = float(clip_value)
+  total_norm = 0
+  for p in parameters:
+    param_norm = p.grad.data.norm(norm_type)
+    total_norm += param_norm.item() ** norm_type
+    if clip_value is not None:
+      p.grad.data.clamp_(min=-clip_value, max=clip_value)
+  total_norm = total_norm ** (1. / norm_type)
+  return total_norm

configs/.ipynb_checkpoints/freevc-24-checkpoint.json ADDED Viewed

	@@ -0,0 +1,54 @@

+{
+  "train": {
+    "log_interval": 200,
+    "eval_interval": 10000,
+    "seed": 1234,
+    "epochs": 10000,
+    "learning_rate": 2e-4,
+    "betas": [0.8, 0.99],
+    "eps": 1e-9,
+    "batch_size": 64,
+    "fp16_run": false,
+    "lr_decay": 0.999875,
+    "segment_size": 8640,
+    "init_lr_ratio": 1,
+    "warmup_epochs": 0,
+    "c_mel": 45,
+    "c_kl": 1.0,
+    "use_sr": true,
+    "max_speclen": 128,
+    "port": "8008"
+  },
+  "data": {
+    "training_files":"filelists/train.txt",
+    "validation_files":"filelists/val.txt",
+    "max_wav_value": 32768.0,
+    "sampling_rate": 16000,
+    "filter_length": 1280,
+    "hop_length": 320,
+    "win_length": 1280,
+    "n_mel_channels": 80,
+    "mel_fmin": 0.0,
+    "mel_fmax": null
+  },
+  "model": {
+    "inter_channels": 192,
+    "hidden_channels": 192,
+    "filter_channels": 768,
+    "n_heads": 2,
+    "n_layers": 6,
+    "kernel_size": 3,
+    "p_dropout": 0.1,
+    "resblock": "1",
+    "resblock_kernel_sizes": [3,7,11],
+    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
+    "upsample_rates": [10,6,4,2],
+    "upsample_initial_channel": 512,
+    "upsample_kernel_sizes": [16,16,4,4],
+    "n_layers_q": 3,
+    "use_spectral_norm": false,
+    "gin_channels": 256,
+    "ssl_dim": 1024,
+    "use_spk": true
+  }
+}

configs/freevc-24.json ADDED Viewed

	@@ -0,0 +1,54 @@

+{
+  "train": {
+    "log_interval": 200,
+    "eval_interval": 10000,
+    "seed": 1234,
+    "epochs": 10000,
+    "learning_rate": 2e-4,
+    "betas": [0.8, 0.99],
+    "eps": 1e-9,
+    "batch_size": 64,
+    "fp16_run": false,
+    "lr_decay": 0.999875,
+    "segment_size": 8640,
+    "init_lr_ratio": 1,
+    "warmup_epochs": 0,
+    "c_mel": 45,
+    "c_kl": 1.0,
+    "use_sr": true,
+    "max_speclen": 128,
+    "port": "8008"
+  },
+  "data": {
+    "training_files":"filelists/train.txt",
+    "validation_files":"filelists/val.txt",
+    "max_wav_value": 32768.0,
+    "sampling_rate": 16000,
+    "filter_length": 1280,
+    "hop_length": 320,
+    "win_length": 1280,
+    "n_mel_channels": 80,
+    "mel_fmin": 0.0,
+    "mel_fmax": null
+  },
+  "model": {
+    "inter_channels": 192,
+    "hidden_channels": 192,
+    "filter_channels": 768,
+    "n_heads": 2,
+    "n_layers": 6,
+    "kernel_size": 3,
+    "p_dropout": 0.1,
+    "resblock": "1",
+    "resblock_kernel_sizes": [3,7,11],
+    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
+    "upsample_rates": [10,6,4,2],
+    "upsample_initial_channel": 512,
+    "upsample_kernel_sizes": [16,16,4,4],
+    "n_layers_q": 3,
+    "use_spectral_norm": false,
+    "gin_channels": 256,
+    "ssl_dim": 1024,
+    "use_spk": true
+  }
+}

docs/FAQ.md ADDED Viewed

	@@ -0,0 +1,46 @@

+## Frequency Asked Question
+**Q: `ffmpeg` is not recognized as an internal or external command**
+In Linux, you can install the ffmpeg via `conda install ffmpeg`. Or on Mac OS X, try to install ffmpeg via `brew install ffmpeg`. On windows, make sure you have `ffmpeg` in the `%PATH%` as suggested in [#54](https://github.com/Winfredy/SadTalker/issues/54), then, following [this](https://www.geeksforgeeks.org/how-to-install-ffmpeg-on-windows/) installation to install `ffmpeg`.
+**Q: Running Requirments.**
+Please refer to the discussion here: https://github.com/Winfredy/SadTalker/issues/124#issuecomment-1508113989
+**Q: ModuleNotFoundError: No module named 'ai'**
+please check the checkpoint's size of the `epoch_20.pth`. (https://github.com/Winfredy/SadTalker/issues/167, https://github.com/Winfredy/SadTalker/issues/113)
+**Q: Illegal Hardware Error: Mac M1**
+please reinstall the `dlib` by `pip install dlib` individually. (https://github.com/Winfredy/SadTalker/issues/129, https://github.com/Winfredy/SadTalker/issues/109)
+**Q: FileNotFoundError: [Errno 2] No such file or directory: checkpoints\BFM_Fitting\similarity_Lm3D_all.mat**
+Make sure you have downloaded the checkpoints and gfpgan as [here](https://github.com/Winfredy/SadTalker#-2-download-trained-models) and placed them in the right place.
+**Q: RuntimeError: unexpected EOF, expected 237192 more bytes. The file might be corrupted.**
+The files are not automatically downloaded. Please update the code and download the gfpgan folders as [here](https://github.com/Winfredy/SadTalker#-2-download-trained-models).
+**Q: CUDA out of memory error**
+please refer to https://stackoverflow.com/questions/73747731/runtimeerror-cuda-out-of-memory-how-setting-max-split-size-mb
+```
+# windows
+set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128
+python inference.py ...
+# linux
+export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128
+python inference.py ...
+```
+**Q: Error while decoding stream #0:0: Invalid data found when processing input [mp3float @ 0000015037628c00] Header missing**
+Our method only support wav or mp3 files as input, please make sure the feeded audios are in these formats.

docs/best_practice.md ADDED Viewed

	@@ -0,0 +1,94 @@

+# Best Practice and Tips for configuration
+> Our model only works on REAL person's photo or the portrait image similar to REAL person. The anime talking head genreation method will be released in future.
+Advanced confiurations for `inference.py`:
+| Name        | Configuration | default |   Explaination  |
+|:------------- |:------------- |:----- | :------------- |
+| Enhance Mode | `--enhancer` | None | Using `gfpgan` or `RestoreFormer` to enhance the generated face via face restoration network
+| Background Enhancer | `--background_enhancer` | None | Using `realesrgan` to enhance the full video.
+| Still Mode   | ` --still` | False |  Using the same pose parameters as the original image, fewer head motion.
+| Expressive Mode | `--expression_scale` | 1.0 | a larger value will make the expression motion stronger.
+| save path | `--result_dir` |`./results` | The file will be save in the newer location.
+| preprocess | `--preprocess` | `crop` | Run and produce the results in the croped input image. Other choices: `resize`, where the images will be resized to the specific resolution. `full` Run the full image animation, use with `--still` to get better results.
+| ref Mode (eye) | `--ref_eyeblink` | None | A video path, where we borrow the eyeblink from this reference video to provide more natural eyebrow movement.
+| ref Mode (pose) | `--ref_pose` | None | A video path, where we borrow the pose from the head reference video.
+| 3D Mode | `--face3dvis` | False | Need additional installation. More details to generate the 3d face can be founded [here](docs/face3d.md).
+| free-view Mode | `--input_yaw`,<br> `--input_pitch`,<br> `--input_roll` | None | Genearting novel view or free-view 4D talking head from a single image. More details can be founded [here](https://github.com/Winfredy/SadTalker#generating-4d-free-view-talking-examples-from-audio-and-a-single-image).
+### About `--preprocess`
+Our method automatically handle the input images via `crop`, `resize` and `full`.
+ In `crop` mode, we only generate the croped image via the facial keypoints and generated the facial anime avator. The animation of both expression and head pose are realistic.
+> still mode will stop the eyeblink and head pose movement.
+|  [input image @bagbag1815](https://twitter.com/bagbag1815/status/1642754319094108161) | crop | crop w/still |
+|:--------------------: |:--------------------: | :----: |
+| <img src='../examples/source_image/full_body_2.png' width='380'> | ![full_body_2](example_crop.gif) | ![full_body_2](example_crop_still.gif) |
+ In `resize` mode, we resize the whole images to generate the fully talking head video. Thus, an image similar to the ID photo can be produced. ⚠️ It will produce bad results for full person images.
+| <img src='../examples/source_image/full_body_2.png' width='380'> |  <img src='../examples/source_image/full4.jpeg' width='380'> |
+|:--------------------: |:--------------------: |
+| ❌ not suitable for resize mode | ✅ good for resize mode |
+| <img src='resize_no.gif'> |  <img src='resize_good.gif' width='380'> |
+In `full` mode, our model will automatically process the croped region and paste back to the original image. Remember to use `--still` to keep the original head pose.
+| input | `--still` | `--still` & `enhancer` |
+|:--------------------: |:--------------------: | :--:|
+| <img src='../examples/source_image/full_body_2.png' width='380'> |  <img src='./example_full.gif' width='380'> |  <img src='./example_full_enhanced.gif' width='380'>
+### About `--enhancer`
+For better facial quality, we intergate [gfpgan](https://github.com/TencentARC/GFPGAN) and [real-esrgan](https://github.com/xinntao/Real-ESRGAN) for different purpose. Just adding `--enhancer <gfpgan or RestoreFormer>` or `--background_enhancer <realesrgan>` for the enhancement of the face and the full image.
+```bash
+# make sure above packages are available:
+pip install gfpgan
+pip install realesrgan
+```
+### About `--face3dvis`
+This flag indicate that we can generated the 3d-rendered face and it's 3d facial landmarks. More details can be founded [here](face3d.md).
+| Input        | Animated 3d face |
+|:-------------: | :-------------: |
+|  <img src='../examples/source_image/art_0.png' width='200px'> | <video src="https://user-images.githubusercontent.com/4397546/226856847-5a6a0a4d-a5ec-49e2-9b05-3206db65e8e3.mp4"></video>  |
+> Kindly ensure to activate the audio as the default audio playing is incompatible with GitHub.
+#### reference eye-link mode.
+| Input, w/ reference video   ,  reference video    |
+|:-------------: |
+|  ![free_view](using_ref_video.gif)|
+| If the reference video is shorter than the input audio, we will loop the reference video .
+#### Generating 4D free-view talking examples from audio and a single image
+We use `input_yaw`, `input_pitch`, `input_roll` to control head pose. For example, `--input_yaw -20 30 10` means the input head yaw degree changes from -20 to 30 and then changes from 30 to 10.
+```bash
+python inference.py --driven_audio <audio.wav> \
+                    --source_image <video.mp4 or picture.png> \
+                    --result_dir <a file to store results> \
+                    --input_yaw -20 30 10
+```
+| Results, Free-view results,  Novel view results  |
+|:-------------: |
+|  ![free_view](free_view_result.gif)|

docs/changlelog.md ADDED Viewed

	@@ -0,0 +1,29 @@

+## changelogs
+- __[2023.04.06]__: stable-diffiusion webui extension is release.
+- __[2023.04.03]__: Enable TTS in huggingface and gradio local demo.
+- __[2023.03.30]__: Launch beta version of the full body mode.
+- __[2023.03.30]__: Launch new feature: through using reference videos, our algorithm can generate videos with more natural eye blinking and some eyebrow movement.
+- __[2023.03.29]__: `resize mode` is online by `python infererence.py --preprocess resize`! Where we can produce a larger crop of the image as discussed in https://github.com/Winfredy/SadTalker/issues/35.
+- __[2023.03.29]__: local gradio demo is online! `python app.py` to start the demo. New `requirments.txt` is used to avoid the bugs in `librosa`.
+- __[2023.03.28]__: Online demo is launched in [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/vinthony/SadTalker), thanks AK!
+- __[2023.03.22]__: Launch new feature: generating the 3d face animation from a single image. New applications about it will be updated.
+- __[2023.03.22]__: Launch new feature: `still mode`, where only a small head pose will be produced via `python inference.py --still`.
+- __[2023.03.18]__: Support `expression intensity`, now you can change the intensity of the generated motion: `python inference.py --expression_scale 1.3 (some value > 1)`.
+- __[2023.03.18]__: Reconfig the data folders, now you can download the checkpoint automatically using `bash scripts/download_models.sh`.
+- __[2023.03.18]__: We have offically integrate the [GFPGAN](https://github.com/TencentARC/GFPGAN) for face enhancement, using `python inference.py --enhancer gfpgan` for  better visualization performance.
+- __[2023.03.14]__: Specify the version of package `joblib` to remove the errors in using `librosa`, [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb) is online!
+- __[2023.03.06]__: Solve some bugs in code and errors in installation
+- __[2023.03.03]__: Release the test code for audio-driven single image animation!
+- __[2023.02.28]__: SadTalker has been accepted by CVPR 2023!

docs/example_crop.gif ADDED Viewed

Git LFS Details

SHA256: da08306e3e6355928887e74057ee4221f9d877d8536341d907e29fe35e078b45
Pointer size: 132 Bytes
Size of remote file: 1.55 MB

docs/example_crop_still.gif ADDED Viewed

Git LFS Details

SHA256: 667c7531ed0a4d97a3ca9b15f79eea655b93dc40eda94498aa43b9e6a48c49aa
Pointer size: 132 Bytes
Size of remote file: 1.25 MB

docs/example_full.gif ADDED Viewed

Git LFS Details

SHA256: 2d1a2b8f5ed7b942a8625a5767828c1bc47568165a187079fbbb8492ed57301b
Pointer size: 132 Bytes
Size of remote file: 1.46 MB

docs/example_full_crop.gif ADDED Viewed

docs/example_full_enhanced.gif ADDED Viewed

Git LFS Details

SHA256: 906ca893e72854021c7715f784dc3fe219bbe67b73ff461e6ba8374f0d3b4712
Pointer size: 132 Bytes
Size of remote file: 5.78 MB

docs/face3d.md ADDED Viewed

	@@ -0,0 +1,48 @@

+## 3D Face visualization
+We use pytorch3d to visualize the produced 3d face from a single image.
+Since it is not easy to install, we produce a new install guidence here:
+```bash
+git clone https://github.com/Winfredy/SadTalker.git
+cd SadTalker
+conda create -n sadtalker3d python=3.8
+source activate sadtalker3d
+conda install ffmpeg
+conda install -c fvcore -c iopath -c conda-forge fvcore iopath
+conda install libgcc gmp
+pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
+# insintall pytorch3d
+pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu113_pyt1110/download.html
+pip install -r requirements3d.txt
+### install gpfgan for enhancer
+pip install git+https://github.com/TencentARC/GFPGAN
+### when occurs gcc version problem `from pytorch import _C` from pytorch3d, add the anaconda path to LD_LIBRARY_PATH
+export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/$YOUR_ANACONDA_PATH/lib/
+```
+Then, generating the result via:
+```bash
+python inference.py --driven_audio <audio.wav> \
+                    --source_image <video.mp4 or picture.png> \
+                    --result_dir <a file to store results> \
+                    --face3dvis
+```
+Then, the results will be given in the folders with the file name of `face3d.mp4`.
+More applications about 3d face will be released.

docs/free_view_result.gif ADDED Viewed

Git LFS Details

SHA256: 035a7fba6800964254728f82fec47fe5c91458183e19a7506dd54d89940af40f
Pointer size: 132 Bytes
Size of remote file: 5.61 MB

docs/install.md ADDED Viewed

	@@ -0,0 +1,47 @@

+### Mac (Tested on M1 Mac OS 13.3)
+```
+git clone https://github.com/Winfredy/SadTalker.git
+cd SadTalker
+conda create -n sadtalker python=3.8
+conda activate sadtalker
+# install pytorch 2.0
+pip install torch torchvision torchaudio
+conda install ffmpeg
+pip install -r requirements.txt
+pip install dlib # mac need to install the original dlib.
+```
+### Windows Native
+- Make sure you have `ffmpeg` in the `%PATH%` as suggested in [#54](https://github.com/Winfredy/SadTalker/issues/54), following [this](https://www.geeksforgeeks.org/how-to-install-ffmpeg-on-windows/) installation to install `ffmpeg`.
+### Windows WSL
+- Make sure the environment: `export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH`
+### Docker installnation
+A dockerfile are also provided by [@thegenerativegeneration](https://github.com/thegenerativegeneration) in [docker hub](https://hub.docker.com/repository/docker/wawa9000/sadtalker), which can be used directly as:
+```bash
+docker run --gpus "all" --rm -v $(pwd):/host_dir wawa9000/sadtalker \
+    --driven_audio /host_dir/deyu.wav \
+    --source_image /host_dir/image.jpg \
+    --expression_scale 1.0 \
+    --still \
+    --result_dir /host_dir
+```

docs/resize_good.gif ADDED Viewed

Git LFS Details

SHA256: ada6f2ea847e71c2a963882fd83f6b54193f4fe7c402f9f20698632b15bbdc0c
Pointer size: 132 Bytes
Size of remote file: 1.73 MB

docs/resize_no.gif ADDED Viewed

Git LFS Details

SHA256: c7702f0be5c87c8977bf3c4a73ea4d27e90d0a5a3015816abb880cfd8f75c6ac
Pointer size: 132 Bytes
Size of remote file: 2.14 MB

docs/sadtalker_logo.png ADDED Viewed

docs/using_ref_video.gif ADDED Viewed

Git LFS Details

SHA256: 9bb68ae077a6c009e7d30a36d34c30bf1310a073ab3c7d9cc1b5c9abe285e888
Pointer size: 132 Bytes
Size of remote file: 8.11 MB