Dmitry Ryumin
DmitryRyumin
AI & ML interests
Machine Learning and Applications, Multi-Modal Understanding
Recent Activity
liked
a Space
4 days ago
alibabasglab/ClearVoice
updated
a Space
25 days ago
DmitryRyumin/NewEraAI-Papers
updated
a collection
about 1 month ago
๐ญ Avatars
Organizations
DmitryRyumin's activity
reacted to
nyuuzyou's
post with ๐คฏ
about 2 months ago
reacted to
TuringsSolutions's
post with ๐ฅ
2 months ago
Post
3121
Sentence Transformers received huge updates today! Do you like giving your model access to web search and document search? That's Sentence Transformers. Hugging Face makes it beyond easy to add this functionality to any model. You can be up and running with Sentence Transformers in seconds. Check out this video for a deeper explanation and sample code: https://youtu.be/2hR3D8_kqZE
reacted to
tomaarsen's
post with ๐ฅ
2 months ago
Post
5689
I just released Sentence Transformers v3.3.0 & it's huge! 4.5x speedup for CPU with OpenVINO int8 static quantization, training with prompts for a free perf. boost, PEFT integration, evaluation on NanoBEIR, and more! Details:
1. We integrate Post-Training Static Quantization using OpenVINO, a very efficient solution for CPUs that processes 4.78x as many texts per second on average, while only hurting performance by 0.36% on average. There's a new
2. We add the option to train with prompts, e.g. strings like "query: ", "search_document: " or "Represent this sentence for searching relevant passages: ". It's as simple as using the
3. Sentence Transformers now supports training PEFT adapters via 7 new methods for adding new adapters or loading pre-trained ones. You can also directly load a trained adapter with SentenceTransformer as if it's a normal model. Very useful for e.g. 1) training multiple adapters on 1 base model, 2) training bigger models than otherwise possible, or 3) cheaply hosting multiple models by switching multiple adapters on 1 base model.
4. We added easy evaluation on NanoBEIR, a subset of BEIR a.k.a. the MTEB Retrieval benchmark. It contains 13 datasets with 50 queries and up to 10k documents each. Evaluation is fast, and can easily be done during training to track your model's performance on general-purpose information retrieval tasks.
Additionally, we also deprecate Python 3.8, add better compatibility with Transformers v4.46.0, and more. Read the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.3.0
1. We integrate Post-Training Static Quantization using OpenVINO, a very efficient solution for CPUs that processes 4.78x as many texts per second on average, while only hurting performance by 0.36% on average. There's a new
export_static_quantized_openvino_model
method to quantize a model.2. We add the option to train with prompts, e.g. strings like "query: ", "search_document: " or "Represent this sentence for searching relevant passages: ". It's as simple as using the
prompts
argument in SentenceTransformerTrainingArguments
. Our experiments show that you can easily reach 0.66% to 0.90% relative performance improvement on NDCG@10 at no extra cost by adding "query: " before each training query and "document: " before each training answer.3. Sentence Transformers now supports training PEFT adapters via 7 new methods for adding new adapters or loading pre-trained ones. You can also directly load a trained adapter with SentenceTransformer as if it's a normal model. Very useful for e.g. 1) training multiple adapters on 1 base model, 2) training bigger models than otherwise possible, or 3) cheaply hosting multiple models by switching multiple adapters on 1 base model.
4. We added easy evaluation on NanoBEIR, a subset of BEIR a.k.a. the MTEB Retrieval benchmark. It contains 13 datasets with 50 queries and up to 10k documents each. Evaluation is fast, and can easily be done during training to track your model's performance on general-purpose information retrieval tasks.
Additionally, we also deprecate Python 3.8, add better compatibility with Transformers v4.46.0, and more. Read the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.3.0
reacted to
singhsidhukuldeep's
post with ๐ฅ
3 months ago
Post
1850
Good folks at
@nvidia
have released exciting new research on normalized Transformers (nGPT) for faster and more efficient language modeling!
Here is what they are proposing:
1. Remove all normalization layers, like RMSNorm or LayerNorm, from the standard Transformer architecture.
2. Normalize all matrices along their embedding dimension after each training step. This includes input and output embeddings, attention matrices (Q, K, V), output projection matrices, and MLP matrices.
3. Replace the standard residual connections with normalized update equations using learnable eigen learning rates for the attention and MLP blocks.
4. Change the softmax scaling factor in the attention mechanism from 1/sqrt of d_k to sqrt of d_k.
5. Implement rescaling and optional normalization of query (q) and key (k) vectors in the attention mechanism using learnable scaling factors.
6. Rescale the intermediate states of the MLP block using learnable scaling factors.
7. Implement rescaling of the output logits using learnable scaling factors.
8. Remove weight decay and learning rate warmup from the optimization process.
9. Initialize the eigen learning rates and scaling factors with appropriate values as specified in the paper.
10. During training, treat all vectors and matrices as residing on a unit hypersphere, interpreting matrix-vector multiplications as cosine similarities.
11. Implement the update equations for the hidden states using the normalized outputs from attention and MLP blocks, controlled by the eigen learning rates.
12. After each forward pass, normalize all parameter matrices to ensure they remain on the unit hypersphere.
13. Use the Adam optimizer without weight decay for training the model.
14. When computing loss, apply the learnable scaling factor to the logits before the softmax operation.
15. During inference, follow the same normalization and scaling procedures as in training.
Excited to see how it scales to larger models and datasets!
Here is what they are proposing:
1. Remove all normalization layers, like RMSNorm or LayerNorm, from the standard Transformer architecture.
2. Normalize all matrices along their embedding dimension after each training step. This includes input and output embeddings, attention matrices (Q, K, V), output projection matrices, and MLP matrices.
3. Replace the standard residual connections with normalized update equations using learnable eigen learning rates for the attention and MLP blocks.
4. Change the softmax scaling factor in the attention mechanism from 1/sqrt of d_k to sqrt of d_k.
5. Implement rescaling and optional normalization of query (q) and key (k) vectors in the attention mechanism using learnable scaling factors.
6. Rescale the intermediate states of the MLP block using learnable scaling factors.
7. Implement rescaling of the output logits using learnable scaling factors.
8. Remove weight decay and learning rate warmup from the optimization process.
9. Initialize the eigen learning rates and scaling factors with appropriate values as specified in the paper.
10. During training, treat all vectors and matrices as residing on a unit hypersphere, interpreting matrix-vector multiplications as cosine similarities.
11. Implement the update equations for the hidden states using the normalized outputs from attention and MLP blocks, controlled by the eigen learning rates.
12. After each forward pass, normalize all parameter matrices to ensure they remain on the unit hypersphere.
13. Use the Adam optimizer without weight decay for training the model.
14. When computing loss, apply the learnable scaling factor to the logits before the softmax operation.
15. During inference, follow the same normalization and scaling procedures as in training.
Excited to see how it scales to larger models and datasets!
reacted to
merve's
post with ๐ฅ
3 months ago
Post
1668
Tencent released a new depth model that generates temporally consistent depth maps over videos โฏ๏ธ
Model: tencent/DepthCrafter
Demo: tencent/DepthCrafter
Paper: DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos (2409.02095)
You don't need to input anything other than video itself, no need for optical flow or camera poses! ๐คฉ
Model: tencent/DepthCrafter
Demo: tencent/DepthCrafter
Paper: DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos (2409.02095)
You don't need to input anything other than video itself, no need for optical flow or camera poses! ๐คฉ
reacted to
albertvillanova's
post with ๐
3 months ago
Post
1965
๐จ Weโve just released a new tool to compare the performance of models in the ๐ค Open LLM Leaderboard: the Comparator ๐
open-llm-leaderboard/comparator
Want to see how two different versions of LLaMA stack up? Letโs walk through a step-by-step comparison of LLaMA-3.1 and LLaMA-3.2. ๐ฆ๐งต๐
1/ Load the Models' Results
- Go to the ๐ค Open LLM Leaderboard Comparator: open-llm-leaderboard/comparator
- Search for "LLaMA-3.1" and "LLaMA-3.2" in the model dropdowns.
- Press the Load button. Ready to dive into the results!
2/ Compare Metric Results in the Results Tab ๐
- Head over to the Results tab.
- Here, youโll see the performance metrics for each model, beautifully color-coded using a gradient to highlight performance differences: greener is better! ๐
- Want to focus on a specific task? Use the Task filter to hone in on comparisons for tasks like BBH or MMLU-Pro.
3/ Check Config Alignment in the Configs Tab โ๏ธ
- To ensure youโre comparing apples to apples, head to the Configs tab.
- Review both modelsโ evaluation configurations, such as metrics, datasets, prompts, few-shot configs...
- If something looks off, itโs good to know before drawing conclusions! โ
4/ Compare Predictions by Sample in the Details Tab ๐
- Curious about how each model responds to specific inputs? The Details tab is your go-to!
- Select a Task (e.g., MuSR) and then a Subtask (e.g., Murder Mystery) and then press the Load Details button.
- Check out the side-by-side predictions and dive into the nuances of each modelโs outputs.
5/ With this tool, itโs never been easier to explore how small changes between model versions affect performance on a wide range of tasks. Whether youโre a researcher or enthusiast, you can instantly visualize improvements and dive into detailed comparisons.
๐ Try the ๐ค Open LLM Leaderboard Comparator now and take your model evaluations to the next level!
open-llm-leaderboard/comparator
Want to see how two different versions of LLaMA stack up? Letโs walk through a step-by-step comparison of LLaMA-3.1 and LLaMA-3.2. ๐ฆ๐งต๐
1/ Load the Models' Results
- Go to the ๐ค Open LLM Leaderboard Comparator: open-llm-leaderboard/comparator
- Search for "LLaMA-3.1" and "LLaMA-3.2" in the model dropdowns.
- Press the Load button. Ready to dive into the results!
2/ Compare Metric Results in the Results Tab ๐
- Head over to the Results tab.
- Here, youโll see the performance metrics for each model, beautifully color-coded using a gradient to highlight performance differences: greener is better! ๐
- Want to focus on a specific task? Use the Task filter to hone in on comparisons for tasks like BBH or MMLU-Pro.
3/ Check Config Alignment in the Configs Tab โ๏ธ
- To ensure youโre comparing apples to apples, head to the Configs tab.
- Review both modelsโ evaluation configurations, such as metrics, datasets, prompts, few-shot configs...
- If something looks off, itโs good to know before drawing conclusions! โ
4/ Compare Predictions by Sample in the Details Tab ๐
- Curious about how each model responds to specific inputs? The Details tab is your go-to!
- Select a Task (e.g., MuSR) and then a Subtask (e.g., Murder Mystery) and then press the Load Details button.
- Check out the side-by-side predictions and dive into the nuances of each modelโs outputs.
5/ With this tool, itโs never been easier to explore how small changes between model versions affect performance on a wide range of tasks. Whether youโre a researcher or enthusiast, you can instantly visualize improvements and dive into detailed comparisons.
๐ Try the ๐ค Open LLM Leaderboard Comparator now and take your model evaluations to the next level!
reacted to
m-ric's
post with ๐
3 months ago
Post
1702
By far the coolest release of the day!
> The Open LLM Leaderboard, most comprehensive suite for comparing Open LLMs on many benchmarks, just released a comparator tool that lets you dig into the detail of differences between any models.
Here's me checking how the new Llama-3.1-Nemotron-70B that we've heard so much compares to the original Llama-3.1-70B. ๐ค๐
Try it out here ๐ open-llm-leaderboard/comparator
> The Open LLM Leaderboard, most comprehensive suite for comparing Open LLMs on many benchmarks, just released a comparator tool that lets you dig into the detail of differences between any models.
Here's me checking how the new Llama-3.1-Nemotron-70B that we've heard so much compares to the original Llama-3.1-70B. ๐ค๐
Try it out here ๐ open-llm-leaderboard/comparator
reacted to
TuringsSolutions's
post with ๐
4 months ago
Post
1650
Microsoft released a method that allows you to vectorize word vectors themselves! It is called VPTQ. You can check out their full paper including the method and all of the math for the algorithm, or you can watch this video where I did all of that for you, then reconstructed their entire method within Python!
https://youtu.be/YwlKzV1y62s
https://youtu.be/YwlKzV1y62s
reacted to
nyuuzyou's
post with ๐
4 months ago
Post
1973
๐ Introducing Doc4web.ru Documents Dataset -
nyuuzyou/doc4web
Dataset highlights:
- 223,739 documents from doc4web.ru, a document hosting platform for students and teachers
- Primarily in Russian, with some English and potentially other languages
- Each entry includes: URL, title, download link, file path, and content (where available)
- Contains original document files in addition to metadata
- Data reflects a wide range of educational topics and materials
- Licensed under Creative Commons Zero (CC0) for unrestricted use
The dataset can be used for analyzing educational content in Russian, text classification tasks, and information retrieval systems. It's also valuable for examining trends in educational materials and document sharing practices in the Russian-speaking academic community. The inclusion of original files allows for in-depth analysis of various document formats and structures.
Dataset highlights:
- 223,739 documents from doc4web.ru, a document hosting platform for students and teachers
- Primarily in Russian, with some English and potentially other languages
- Each entry includes: URL, title, download link, file path, and content (where available)
- Contains original document files in addition to metadata
- Data reflects a wide range of educational topics and materials
- Licensed under Creative Commons Zero (CC0) for unrestricted use
The dataset can be used for analyzing educational content in Russian, text classification tasks, and information retrieval systems. It's also valuable for examining trends in educational materials and document sharing practices in the Russian-speaking academic community. The inclusion of original files allows for in-depth analysis of various document formats and structures.
reacted to
tomaarsen's
post with ๐ฅ
4 months ago
Post
6996
๐ฃ Sentence Transformers v3.2.0 is out, marking the biggest release for inference in 2 years! 2 new backends for embedding models: ONNX (+ optimization & quantization) and OpenVINO, allowing for speedups up to 2x-3x AND Static Embeddings for 500x speedups at 10-20% accuracy cost.
1๏ธโฃ ONNX Backend: This backend uses the ONNX Runtime to accelerate model inference on both CPU and GPU, reaching up to 1.4x-3x speedup depending on the precision. We also introduce 2 helper methods for optimizing and quantizing models for (much) faster inference.
2๏ธโฃ OpenVINO Backend: This backend uses Intel their OpenVINO instead, outperforming ONNX in some situations on CPU.
Usage is as simple as
๐ Another major new feature is Static Embeddings: think word embeddings like GLoVe and word2vec, but modernized. Static Embeddings are bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks. They're initialized in one of 2 ways:
1๏ธโฃ via Model2Vec, a new technique for distilling any Sentence Transformer models into static embeddings. Either via a pre-distilled model with
2๏ธโฃ Random initialization. This requires finetuning, but finetuning is extremely quick (e.g. I trained with 3 million pairs in 7 minutes). My final model was 6.6% worse than bge-base-en-v1.5, but 500x faster on CPU.
Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.2.0
Documentation on Speeding up Inference: https://sbert.net/docs/sentence_transformer/usage/efficiency.html
1๏ธโฃ ONNX Backend: This backend uses the ONNX Runtime to accelerate model inference on both CPU and GPU, reaching up to 1.4x-3x speedup depending on the precision. We also introduce 2 helper methods for optimizing and quantizing models for (much) faster inference.
2๏ธโฃ OpenVINO Backend: This backend uses Intel their OpenVINO instead, outperforming ONNX in some situations on CPU.
Usage is as simple as
SentenceTransformer("all-MiniLM-L6-v2", backend="onnx")
. Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later ๐๐ Another major new feature is Static Embeddings: think word embeddings like GLoVe and word2vec, but modernized. Static Embeddings are bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks. They're initialized in one of 2 ways:
1๏ธโฃ via Model2Vec, a new technique for distilling any Sentence Transformer models into static embeddings. Either via a pre-distilled model with
from_model2vec
or with from_distillation
where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed.2๏ธโฃ Random initialization. This requires finetuning, but finetuning is extremely quick (e.g. I trained with 3 million pairs in 7 minutes). My final model was 6.6% worse than bge-base-en-v1.5, but 500x faster on CPU.
Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.2.0
Documentation on Speeding up Inference: https://sbert.net/docs/sentence_transformer/usage/efficiency.html
reacted to
merve's
post with ๐ฅ
4 months ago
Post
3783
Meta AI vision has been cooking
@facebook
They shipped multiple models and demos for their papers at @ECCV ๐ค
Here's a compilation of my top picks:
- Sapiens is family of foundation models for human-centric depth estimation, segmentation and more, all models have open weights and demos ๐
All models have their demos and even torchscript checkpoints!
A collection of models and demos: facebook/sapiens-66d22047daa6402d565cb2fc
- VFusion3D is state-of-the-art consistent 3D generation model from images
Model: facebook/vfusion3d
Demo: facebook/VFusion3D
- CoTracker is the state-of-the-art point (pixel) tracking model
Demo: facebook/cotracker
Model: facebook/cotracker
They shipped multiple models and demos for their papers at @ECCV ๐ค
Here's a compilation of my top picks:
- Sapiens is family of foundation models for human-centric depth estimation, segmentation and more, all models have open weights and demos ๐
All models have their demos and even torchscript checkpoints!
A collection of models and demos: facebook/sapiens-66d22047daa6402d565cb2fc
- VFusion3D is state-of-the-art consistent 3D generation model from images
Model: facebook/vfusion3d
Demo: facebook/VFusion3D
- CoTracker is the state-of-the-art point (pixel) tracking model
Demo: facebook/cotracker
Model: facebook/cotracker
reacted to
m-ric's
post with ๐
4 months ago
Post
3061
๐ ๐๐ฅ๐-๐ฌ๐๐ก๐จ๐จ๐ฅ ๐๐๐๐ฌ ๐๐๐ง ๐๐๐ญ๐ฎ๐๐ฅ๐ฅ๐ฒ ๐ซ๐ข๐ฏ๐๐ฅ ๐๐๐ง๐๐ฒ ๐ญ๐ซ๐๐ง๐ฌ๐๐จ๐ซ๐ฆ๐๐ซ๐ฌ!
Researchers from Mila and Borealis AI just have shown that simplified versions of good old Recurrent Neural Networks (RNNs) can match the performance of today's transformers.
They took a fresh look at LSTMs (from 1997!) and GRUs (from 2014). They stripped these models down to their bare essentials, creating "minLSTM" and "minGRU". The key changes:
โถ Removed dependencies on previous hidden states in the gates
โท Dropped the tanh that had been added to restrict output range in order to avoid vanishing gradients
โธ Ensured outputs are time-independent in scale (not sure I understood that well either, don't worry)
โก๏ธ As a result, you can use a โparallel scanโ algorithm to train these new, minimal RNNs, in parallel, taking 88% more memory but also making them 200x faster than their traditional counterparts for long sequences
๐ฅ The results are mind-blowing! Performance-wise, they go toe-to-toe with Transformers or Mamba.
And for Language Modeling, they need 2.5x fewer training steps than Transformers to reach the same performance! ๐
๐ค Why does this matter?
By showing there are simpler models with similar performance to transformers, this challenges the narrative that we need advanced architectures for better performance!
๐ฌย Franรงois Chollet wrote in a tweet about this paper:
โThe fact that there are many recent architectures coming from different directions that roughly match Transformers is proof that architectures aren't fundamentally important in the curve-fitting paradigm (aka deep learning)โ
โCurve-fitting is about embedding a dataset on a curve. The critical factor is the dataset, not the specific hard-coded bells and whistles that constrain the curve's shape.โ
Itโs the Bitter lesson by Rich Sutton striking again: donโt need fancy thinking architectures, just scale up your model and data!
Read the paper ๐ย Were RNNs All We Needed? (2410.01201)
Researchers from Mila and Borealis AI just have shown that simplified versions of good old Recurrent Neural Networks (RNNs) can match the performance of today's transformers.
They took a fresh look at LSTMs (from 1997!) and GRUs (from 2014). They stripped these models down to their bare essentials, creating "minLSTM" and "minGRU". The key changes:
โถ Removed dependencies on previous hidden states in the gates
โท Dropped the tanh that had been added to restrict output range in order to avoid vanishing gradients
โธ Ensured outputs are time-independent in scale (not sure I understood that well either, don't worry)
โก๏ธ As a result, you can use a โparallel scanโ algorithm to train these new, minimal RNNs, in parallel, taking 88% more memory but also making them 200x faster than their traditional counterparts for long sequences
๐ฅ The results are mind-blowing! Performance-wise, they go toe-to-toe with Transformers or Mamba.
And for Language Modeling, they need 2.5x fewer training steps than Transformers to reach the same performance! ๐
๐ค Why does this matter?
By showing there are simpler models with similar performance to transformers, this challenges the narrative that we need advanced architectures for better performance!
๐ฌย Franรงois Chollet wrote in a tweet about this paper:
โThe fact that there are many recent architectures coming from different directions that roughly match Transformers is proof that architectures aren't fundamentally important in the curve-fitting paradigm (aka deep learning)โ
โCurve-fitting is about embedding a dataset on a curve. The critical factor is the dataset, not the specific hard-coded bells and whistles that constrain the curve's shape.โ
Itโs the Bitter lesson by Rich Sutton striking again: donโt need fancy thinking architectures, just scale up your model and data!
Read the paper ๐ย Were RNNs All We Needed? (2410.01201)
reacted to
merve's
post with ๐ฅ
4 months ago
Post
2731
NVIDIA just dropped a gigantic multimodal model called NVLM 72B ๐ฆ
nvidia/NVLM-D-72B
Paper page NVLM: Open Frontier-Class Multimodal LLMs (2409.11402)
The paper contains many ablation studies on various ways to use the LLM backbone ๐๐ป
๐ฆฉ Flamingo-like cross-attention (NVLM-X)
๐ Llava-like concatenation of image and text embeddings to a decoder-only model (NVLM-D)
โจ a hybrid architecture (NVLM-H)
Checking evaluations, NVLM-D and NVLM-H are best or second best compared to other models ๐
The released model is NVLM-D based on Qwen-2 Instruct, aligned with InternViT-6B using a huge mixture of different datasets
You can easily use this model by loading it through transformers' AutoModel ๐
nvidia/NVLM-D-72B
Paper page NVLM: Open Frontier-Class Multimodal LLMs (2409.11402)
The paper contains many ablation studies on various ways to use the LLM backbone ๐๐ป
๐ฆฉ Flamingo-like cross-attention (NVLM-X)
๐ Llava-like concatenation of image and text embeddings to a decoder-only model (NVLM-D)
โจ a hybrid architecture (NVLM-H)
Checking evaluations, NVLM-D and NVLM-H are best or second best compared to other models ๐
The released model is NVLM-D based on Qwen-2 Instruct, aligned with InternViT-6B using a huge mixture of different datasets
You can easily use this model by loading it through transformers' AutoModel ๐
reacted to
merve's
post with ๐ฅ
4 months ago
Post
4032
If you feel like you missed out for ECCV 2024, there's an app to browse the papers, rank for popularity, filter for open models, datasets and demos ๐
Get started at ECCV/ECCV2024-papers โจ
Get started at ECCV/ECCV2024-papers โจ
Post
2642
๐ฅ๐ญ๐ New Research Alert - HeadGAP (Avatars Collection)! ๐๐ญ๐ฅ
๐ Title: HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors ๐
๐ Description: HeadGAP introduces a novel method for generating high-fidelity, animatable 3D head avatars from few-shot data, using Gaussian priors and dynamic part-based modelling for personalized and generalizable results.
๐ฅ Authors: @zxz267 , @walsvid , @zhaohu2 , Weiyi Zhang, @hellozhuo , Xu Chang, Yang Zhao, Zheng Lv, Xiaoyuan Zhang, @yongjie-zhang-mail , Guidong Wang, and Lan Xu
๐ Paper: HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors (2408.06019)
๐ Github Page: https://headgap.github.io
๐ CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers
๐ WACV-2024-Papers: https://github.com/DmitryRyumin/WACV-2024-Papers
๐ ICCV-2023-Papers: https://github.com/DmitryRyumin/ICCV-2023-Papers
๐ More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
๐ Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36
๐ Keywords: #HeadGAP #3DAvatar #FewShotLearning #GaussianPriors #AvatarCreation #3DModeling #MachineLearning #ComputerVision #ComputerGraphics #GenerativeAI #DeepLearning #AI
๐ Title: HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors ๐
๐ Description: HeadGAP introduces a novel method for generating high-fidelity, animatable 3D head avatars from few-shot data, using Gaussian priors and dynamic part-based modelling for personalized and generalizable results.
๐ฅ Authors: @zxz267 , @walsvid , @zhaohu2 , Weiyi Zhang, @hellozhuo , Xu Chang, Yang Zhao, Zheng Lv, Xiaoyuan Zhang, @yongjie-zhang-mail , Guidong Wang, and Lan Xu
๐ Paper: HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors (2408.06019)
๐ Github Page: https://headgap.github.io
๐ CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers
๐ WACV-2024-Papers: https://github.com/DmitryRyumin/WACV-2024-Papers
๐ ICCV-2023-Papers: https://github.com/DmitryRyumin/ICCV-2023-Papers
๐ More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
๐ Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36
๐ Keywords: #HeadGAP #3DAvatar #FewShotLearning #GaussianPriors #AvatarCreation #3DModeling #MachineLearning #ComputerVision #ComputerGraphics #GenerativeAI #DeepLearning #AI
posted
an
update
4 months ago
Post
2642
๐ฅ๐ญ๐ New Research Alert - HeadGAP (Avatars Collection)! ๐๐ญ๐ฅ
๐ Title: HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors ๐
๐ Description: HeadGAP introduces a novel method for generating high-fidelity, animatable 3D head avatars from few-shot data, using Gaussian priors and dynamic part-based modelling for personalized and generalizable results.
๐ฅ Authors: @zxz267 , @walsvid , @zhaohu2 , Weiyi Zhang, @hellozhuo , Xu Chang, Yang Zhao, Zheng Lv, Xiaoyuan Zhang, @yongjie-zhang-mail , Guidong Wang, and Lan Xu
๐ Paper: HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors (2408.06019)
๐ Github Page: https://headgap.github.io
๐ CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers
๐ WACV-2024-Papers: https://github.com/DmitryRyumin/WACV-2024-Papers
๐ ICCV-2023-Papers: https://github.com/DmitryRyumin/ICCV-2023-Papers
๐ More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
๐ Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36
๐ Keywords: #HeadGAP #3DAvatar #FewShotLearning #GaussianPriors #AvatarCreation #3DModeling #MachineLearning #ComputerVision #ComputerGraphics #GenerativeAI #DeepLearning #AI
๐ Title: HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors ๐
๐ Description: HeadGAP introduces a novel method for generating high-fidelity, animatable 3D head avatars from few-shot data, using Gaussian priors and dynamic part-based modelling for personalized and generalizable results.
๐ฅ Authors: @zxz267 , @walsvid , @zhaohu2 , Weiyi Zhang, @hellozhuo , Xu Chang, Yang Zhao, Zheng Lv, Xiaoyuan Zhang, @yongjie-zhang-mail , Guidong Wang, and Lan Xu
๐ Paper: HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors (2408.06019)
๐ Github Page: https://headgap.github.io
๐ CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers
๐ WACV-2024-Papers: https://github.com/DmitryRyumin/WACV-2024-Papers
๐ ICCV-2023-Papers: https://github.com/DmitryRyumin/ICCV-2023-Papers
๐ More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
๐ Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36
๐ Keywords: #HeadGAP #3DAvatar #FewShotLearning #GaussianPriors #AvatarCreation #3DModeling #MachineLearning #ComputerVision #ComputerGraphics #GenerativeAI #DeepLearning #AI
reacted to
abidlabs's
post with โค๏ธ
4 months ago
Post
5020
๐ Hi Gradio community,
I'm excited to share that Gradio 5 will launch in October with improvements across security, performance, SEO, design (see the screenshot for Gradio 4 vs. Gradio 5), and user experience, making Gradio a mature framework for web-based ML applications.
Gradio 5 is currently in beta, so if you'd like to try it out early, please refer to the instructions below:
---------- Installation -------------
Gradio 5 depends on Python 3.10 or higher, so if you are running Gradio locally, please ensure that you have Python 3.10 or higher, or download it here: https://www.python.org/downloads/
* Locally: If you are running gradio locally, simply install the release candidate with
* Spaces: If you would like to update an existing gradio Space to use Gradio 5, you can simply update the
In most cases, thatโs all you have to do to run Gradio 5.0. If you start your Gradio application, you should see your Gradio app running, with a fresh new UI.
-----------------------------
Fore more information, please see: https://github.com/gradio-app/gradio/issues/9463
I'm excited to share that Gradio 5 will launch in October with improvements across security, performance, SEO, design (see the screenshot for Gradio 4 vs. Gradio 5), and user experience, making Gradio a mature framework for web-based ML applications.
Gradio 5 is currently in beta, so if you'd like to try it out early, please refer to the instructions below:
---------- Installation -------------
Gradio 5 depends on Python 3.10 or higher, so if you are running Gradio locally, please ensure that you have Python 3.10 or higher, or download it here: https://www.python.org/downloads/
* Locally: If you are running gradio locally, simply install the release candidate with
pip install gradio --pre
* Spaces: If you would like to update an existing gradio Space to use Gradio 5, you can simply update the
sdk_version
to be 5.0.0b3
in the README.md
file on Spaces.In most cases, thatโs all you have to do to run Gradio 5.0. If you start your Gradio application, you should see your Gradio app running, with a fresh new UI.
-----------------------------
Fore more information, please see: https://github.com/gradio-app/gradio/issues/9463
reacted to
fdaudens's
post with ๐
4 months ago
Post
3361
๐ 1,000,000 public models milestone achieved on Hugging Face! ๐คฏ
This chart by @cfahlgren1 shows the explosive growth of open-source AI. It's not just about numbers - it's a thriving community combining cutting-edge ML with real-world applications. cfahlgren1/hub-stats
Can't wait to see what's next!
This chart by @cfahlgren1 shows the explosive growth of open-source AI. It's not just about numbers - it's a thriving community combining cutting-edge ML with real-world applications. cfahlgren1/hub-stats
Can't wait to see what's next!
Post
2080
๐๐บ๐ New Research Alert - ECCV 2024 (Avatars Collection)! ๐๐๐
๐ Title: Expressive Whole-Body 3D Gaussian Avatar ๐
๐ Description: ExAvatar is a model that generates animatable 3D human avatars with facial expressions and hand movements from short monocular videos using a hybrid mesh and 3D Gaussian representation.
๐ฅ Authors: Gyeongsik Moon, Takaaki Shiratori, and @psyth
๐ Conference: ECCV, 29 Sep โ 4 Oct, 2024 | Milano, Italy ๐ฎ๐น
๐ Paper: MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos (2407.08414)
๐ Paper: Expressive Whole-Body 3D Gaussian Avatar (2407.21686)
๐ Github Page: https://mks0601.github.io/ExAvatar
๐ Repository: https://github.com/mks0601/ExAvatar_RELEASE
๐ CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers
๐ WACV-2024-Papers: https://github.com/DmitryRyumin/WACV-2024-Papers
๐ ICCV-2023-Papers: https://github.com/DmitryRyumin/ICCV-2023-Papers
๐ More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
๐ Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36
๐ Keywords: #ExAvatar #3DAvatar #FacialExpressions #HandMotions #MonocularVideo #3DModeling #GaussianSplatting #MachineLearning #ComputerVision #ComputerGraphics #DeepLearning #AI #ECCV2024
๐ Title: Expressive Whole-Body 3D Gaussian Avatar ๐
๐ Description: ExAvatar is a model that generates animatable 3D human avatars with facial expressions and hand movements from short monocular videos using a hybrid mesh and 3D Gaussian representation.
๐ฅ Authors: Gyeongsik Moon, Takaaki Shiratori, and @psyth
๐ Conference: ECCV, 29 Sep โ 4 Oct, 2024 | Milano, Italy ๐ฎ๐น
๐ Paper: MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos (2407.08414)
๐ Paper: Expressive Whole-Body 3D Gaussian Avatar (2407.21686)
๐ Github Page: https://mks0601.github.io/ExAvatar
๐ Repository: https://github.com/mks0601/ExAvatar_RELEASE
๐ CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers
๐ WACV-2024-Papers: https://github.com/DmitryRyumin/WACV-2024-Papers
๐ ICCV-2023-Papers: https://github.com/DmitryRyumin/ICCV-2023-Papers
๐ More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
๐ Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36
๐ Keywords: #ExAvatar #3DAvatar #FacialExpressions #HandMotions #MonocularVideo #3DModeling #GaussianSplatting #MachineLearning #ComputerVision #ComputerGraphics #DeepLearning #AI #ECCV2024