@victor I think the community is eagerly awaiting the next big month-long event, where the community can come together to build something, like we used to do in the past.
Abid Ali Awan
kingabzpro
AI & ML interests
LLMs, MLOps, ASR, & RL
Organizations
kingabzpro's activity
replied to
their
post
2 months ago
posted
an
update
2 months ago
Post
1204
I never imagined that Jenkins could be as powerful and easy to implement as GitHub Actions. Loving it. π₯°
posted
an
update
2 months ago
Post
1204
I never imagined that Jenkins could be as powerful and easy to implement as GitHub Actions. Loving it. π₯°
replied to
their
post
2 months ago
I'm having some issues with the RAG pipeline. It generally takes 0.2-2 seconds for it to respond, and most of the time the embedding model takes even longer. I can implement prompt caching, but I was considering a more hardware-related solution. What do you think about using Ray for distributed serving? Also, what do you think about GraphQL?
Post
1838
How can I make my RAG application generate real-time responses? Up until now, I have been using Groq for fast LLM generation and the Gradio Live function. I am looking for a better solution that can help me build a real-time application without any delay.
@abidlabs
kingabzpro/Real-Time-RAG
kingabzpro/Real-Time-RAG
posted
an
update
2 months ago
Post
1838
How can I make my RAG application generate real-time responses? Up until now, I have been using Groq for fast LLM generation and the Gradio Live function. I am looking for a better solution that can help me build a real-time application without any delay.
@abidlabs
kingabzpro/Real-Time-RAG
kingabzpro/Real-Time-RAG
reacted to
merve's
post with π₯π€
5 months ago
Post
4208
I love Depth Anything V2 π
Itβs Depth Anything, but scaled with both larger teacher model and a gigantic dataset!
Here's a small TLDR of paper with a lot of findings, experiments and more.
I have also created a collection that has the models, the dataset, the demo and CoreML converted model π merve/depth-anything-v2-release-6671902e798cd404513ffbf5
The authors have analyzed Marigold, a diffusion based model against Depth Anything and found out whatβs up with using synthetic images vs real images for MDE:
π Real data has a lot of label noise, inaccurate depth maps (caused by depth sensors missing transparent objects etc) and there are many details overlooked
π Synthetic data have more precise and detailed depth labels and they are truly ground-truth, but thereβs a distribution shift between real and synthetic images, and they have restricted scene coverage
The authors train different image encoders only on synthetic images and find out unless the encoder is very large the model canβt generalize well (but large models generalize inherently anyway) π§
But they still fail encountering real images that have wide distribution in labels (e.g. diverse instances of objects) π₯²
Depth Anything v2 framework is to..
π¦ Train a teacher model based on DINOv2-G based on 595K synthetic images
π·οΈ Label 62M real images using teacher model
π¦ Train a student model using the real images labelled by teacher
Result: 10x faster and more accurate than Marigold!
The authors also construct a new benchmark called DA-2K that is less noisy, highly detailed and more diverse!
Itβs Depth Anything, but scaled with both larger teacher model and a gigantic dataset!
Here's a small TLDR of paper with a lot of findings, experiments and more.
I have also created a collection that has the models, the dataset, the demo and CoreML converted model π merve/depth-anything-v2-release-6671902e798cd404513ffbf5
The authors have analyzed Marigold, a diffusion based model against Depth Anything and found out whatβs up with using synthetic images vs real images for MDE:
π Real data has a lot of label noise, inaccurate depth maps (caused by depth sensors missing transparent objects etc) and there are many details overlooked
π Synthetic data have more precise and detailed depth labels and they are truly ground-truth, but thereβs a distribution shift between real and synthetic images, and they have restricted scene coverage
The authors train different image encoders only on synthetic images and find out unless the encoder is very large the model canβt generalize well (but large models generalize inherently anyway) π§
But they still fail encountering real images that have wide distribution in labels (e.g. diverse instances of objects) π₯²
Depth Anything v2 framework is to..
π¦ Train a teacher model based on DINOv2-G based on 595K synthetic images
π·οΈ Label 62M real images using teacher model
π¦ Train a student model using the real images labelled by teacher
Result: 10x faster and more accurate than Marigold!
The authors also construct a new benchmark called DA-2K that is less noisy, highly detailed and more diverse!
reacted to
DmitryRyumin's
post with π₯
8 months ago
Post
πππ»π New Research Alert - CVPR 2024! ππΊ π
π Title: Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling ππ
π Description: Animatable Gaussians - a novel method for creating lifelike human avatars from RGB videos, utilizing 2D CNNs and 3D Gaussian splatting to capture pose-dependent garment details and dynamic appearances with high fidelity.
π₯ Authors: Zhe Li, Zerong Zheng, Lizhen Wang, and Yebin Liu
π Conference: CVPR, Jun 17-21, 2024 | Seattle WA, USA πΊπΈ
π Paper: Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling (2311.16096)
π Github Page: https://animatable-gaussians.github.io
π Repository: https://github.com/lizhe00/AnimatableGaussians
πΊ Video: https://www.youtube.com/watch?v=kOmZxD0HxZI
π More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
π Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36
π Keywords: #AnimatableGaussians #HumanAvatars #3DGaussianSplatting #CVPR2024 #DeepLearning #Animation #Innovation
π Title: Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling ππ
π Description: Animatable Gaussians - a novel method for creating lifelike human avatars from RGB videos, utilizing 2D CNNs and 3D Gaussian splatting to capture pose-dependent garment details and dynamic appearances with high fidelity.
π₯ Authors: Zhe Li, Zerong Zheng, Lizhen Wang, and Yebin Liu
π Conference: CVPR, Jun 17-21, 2024 | Seattle WA, USA πΊπΈ
π Paper: Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling (2311.16096)
π Github Page: https://animatable-gaussians.github.io
π Repository: https://github.com/lizhe00/AnimatableGaussians
πΊ Video: https://www.youtube.com/watch?v=kOmZxD0HxZI
π More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin
π Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36
π Keywords: #AnimatableGaussians #HumanAvatars #3DGaussianSplatting #CVPR2024 #DeepLearning #Animation #Innovation
reacted to
merve's
post with π€
10 months ago
Post
Posting about a very underrated model that tops paperswithcode across different segmentation benchmarks: OneFormer π
OneFormer is a "truly universal" model for semantic, instance and panoptic segmentation tasks βοΈ
What makes is truly universal is that it's a single model that is trained only once and can be used across all tasks.
The enabler here is the text conditioning, i.e. the model is given a text query that states task type along with the appropriate input, and using contrastive loss, the model learns the difference between different task types π (see in the image below)
It's also super easy to use with transformers.
I have drafted a notebook for you to try right away β¨ https://colab.research.google.com/drive/1wfJhoTFqUqcTAYAOUc6TXUubBTmOYaVa?usp=sharing
You can also check out the Space without checking out the code itself π shi-labs/OneFormer
OneFormer is a "truly universal" model for semantic, instance and panoptic segmentation tasks βοΈ
What makes is truly universal is that it's a single model that is trained only once and can be used across all tasks.
The enabler here is the text conditioning, i.e. the model is given a text query that states task type along with the appropriate input, and using contrastive loss, the model learns the difference between different task types π (see in the image below)
It's also super easy to use with transformers.
from transformers import OneFormerProcessor, OneFormerForUniversalSegmentation
processor = OneFormerProcessor.from_pretrained("shi-labs/oneformer_ade20k_swin_large")
model = OneFormerForUniversalSegmentation.from_pretrained("shi-labs/oneformer_ade20k_swin_large")
# swap the postprocessing and task_inputs for different types of segmentation
semantic_inputs = processor(images=image, task_inputs=["semantic"], return_tensors="pt")
semantic_outputs = model(**semantic_inputs)
predicted_semantic_map = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
I have drafted a notebook for you to try right away β¨ https://colab.research.google.com/drive/1wfJhoTFqUqcTAYAOUc6TXUubBTmOYaVa?usp=sharing
You can also check out the Space without checking out the code itself π shi-labs/OneFormer