2 4 9

Pretam Ray

Pretam

raypretam

AI & ML interests

NLP

Recent Activity

liked a Space 17 days ago

HuggingFaceH4/blogpost-scaling-test-time-compute

liked a Space 24 days ago

hf-accelerate/model-memory-usage

published a model about 2 months ago

Pretam/t5-small-finetuned-xsum

View all activity

Organizations

Pretam's activity

liked a Space 17 days ago

535

Scaling test-time compute

📈

Enhance math problem solving by scaling test-time compute

liked a Space 24 days ago

918

Model Memory Utility

🚀

Calculate memory needed to train AI models

published a model about 2 months ago

Pretam/t5-small-finetuned-xsum

Updated Jan 30

upvoted an article 9 months ago

Article

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

May 24, 2023

• 129

reacted to vladbogo's post with 👍 about 1 year ago

Post

A recent paper titled "ShortGPT: Layers in Large Language Models are More Redundant Than You Expect" proposes a simple and effective approach to pruning Large Language Models (LLMs) by removing redundant layers.

Key points:
* Discovers significant redundancy across layers in LLMs, with some layers playing a negligible role for the final performance.
* Defines a new metric called Block Influence (BI) to quantify the importance of each layer in an LLM.
* Removes layers with low BI scores, achieving up to 25% reduction in parameters and computation while maintaining 92% of the LLM's performance.

Congrats to the authors for their work!

Paper: ShortGPT: Layers in Large Language Models are More Redundant Than You Expect (2403.03853)

upvoted a paper about 1 year ago

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

Paper • 2402.07827 • Published Feb 12, 2024 • 47

liked a Space about 1 year ago

Promptsource

👁

Explore and create prompts for datasets

liked a model about 1 year ago

CohereForAI/aya-101

Text2Text Generation • Updated Mar 31, 2024 • 2.11k • 636

reacted to SkalskiP's post with ❤️ about 1 year ago

Post

YOLO-World: Real-Time, Zero-Shot Object Detection 🔥 🔥 🔥

YOLO-World was designed to solve a limitation of existing zero-shot object detection models: speed. Whereas other state-of-the-art models use Transformers, a powerful but typically slower architecture, YOLO-World uses the faster CNN-based YOLO architecture.

YOLO-World provides three models: small with 13M (re-parametrized 77M), medium with 29M (re-parametrized 92M), and large with 48M (re-parametrized 110M) parameters.

The YOLO-World team benchmarked the model on the LVIS dataset and measured their performance on the V100 without any performance acceleration mechanisms like quantization or TensorRT.

According to the paper, YOLO-World reached 35.4 AP with 52.0 FPS for the L version and 26.2 AP with 74.1 FPS for the S version. While the V100 is a powerful GPU, achieving such high FPS on any device is impressive.

- 🔗 YOLO-World arXiv paper: https://lnkd.in/ddRBKCCX
- 🔗 my YOLO-World technical report: https://blog.roboflow.com/what-is-yolo-world
- 🤗 YOLO-World space: SkalskiP/YOLO-World