Muhammad Imran Zaman PRO

ImranzamanML

AI & ML interests

Results-driven Machine Learning Engineer with 7+ years of experience leading teams and delivering advanced AI solutions that increased revenue by up to 40%. Proven track record in enhancing business performance through consultancy and expertise in NLP, Computer Vision, LLM models and end-to-end ML pipelines. Skilled in managing critical situations and collaborating with cross-functional teams to implement scalable, impactful solutions. Kaggle Grandmaster and top performer in global competitions, dedicated to staying at the forefront of AI advancements.

Recent Activity

Articles

Organizations

MISATO-dataset's profile picture Masakhane NLP's profile picture GEM benchmark's profile picture BigScience Biomedical Datasets's profile picture LangChainDatasets's profile picture DeepGHS's profile picture Blog-explorers's profile picture MLX Community's profile picture Cognitive Computations's profile picture

ImranzamanML's activity

reacted to dyyyyyyyy's post with 🔥 about 1 month ago
view post
Post
1225
📊 We present ScaleQuest-Math-1M, a mathematical reasoning dataset of 1 million high-quality question-answer pairs.
🔥 We propose ScaleQuest, a scalable and novel data synthesis method that utilizes small-size open-source models to generate questions from scratch.

Project Page: https://scalequest.github.io/
Dataset: dyyyyyyyy/ScaleQuest-Math
Paper: Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch (2410.18693)
HF Collection: dyyyyyyyy/scalequest-670a7dc2623c91990f28913b
posted an update about 1 month ago
view post
Post
622
Easy steps for an effective RAG pipeline with LLM models!
1. Document Embedding & Indexing
We can start with the use of embedding models to vectorize documents, store them in vector databases (Elasticsearch, Pinecone, Weaviate) for efficient retrieval.

2. Smart Querying
Then we can generate query embeddings, retrieve top-K relevant chunks and can apply hybrid search if needed for better precision.

3. Context Management
We can concatenate retrieved chunks, optimize chunk order and keep within token limits to preserve response coherence.

4. Prompt Engineering
Then we can instruct the LLM to leverage retrieved context, using clear instructions to prioritize the provided information.

5. Post-Processing
Finally we can implement response verification, fact-checking and integrate feedback loops to refine the responses.

Happy to connect :)
reacted to MonsterMMORPG's post with ❤️ about 1 month ago
view post
Post
3633
Stability AI published their most power newest model Stable Diffusion 3.5 Large. This model unlike FLUX is full model not distilled and has huge potential. I have done extensive research and publishing all of it in this video regarding how to use SD 3.5 Large with the best settings. Moreover, I am sharing how to use FLUX DEV with the best possible configuration as well. Moreover, I am making a huge comparison between SD 3.5 and FLUX and you are going to learn who is the winner.

https://youtu.be/-zOKhoO9a5s

62 Prompts tested on all experiments to find best Sampler + Scheduler for Stable Diffusion 3.5 Large and SD 3.5 Large vs FLUX DEV > https://youtu.be/-zOKhoO9a5s

FLUX Dev vs SD 3.5 Large fully compared.

SD 3.5 Large FP16 vs Scaled FP8 fully compared.

T5 XXL FP8 vs Scaled FP8 vs FP16 fully compared.

FLUX FP16 vs Scaled FP8 fully compared.

Also how to install SwarmUI on Windows, Massed Compute and RunPod shown in the tutorial.

I have shown how to use FLUX and SD 3.5 Large in details as well.
reacted to AlexBodner's post with 👍 about 1 month ago
view post
Post
2388
💾🧠How much VRAM will you need for training your AI model? 💾🧠
Check out this app where you convert:
Pytorch/tensorflow summary -> required VRAM
or
Parameter count -> required VRAM

Use it in: http://howmuchvram.com

And everything is open source! Ask for new functionalities or contribute in:
https://github.com/AlexBodner/How_Much_VRAM
If it's useful to you leave a star 🌟and share it to someone that will find the tool useful!
  • 1 reply
·
reacted to lippytm's post with 🚀 about 1 month ago
view post
Post
1358
Hello Universes of Time Machine Builders. Financing Time Machines Traveling Throughout Eternal Time Rewriting Historical History Retroactively. Robotics Robots for no manual labor so the Human race can leave the planet retroactively. The Old Testament “Hitchhikers Guide Throughout the Galaxy”, and the New Testament being “Hitchhikers Guides Throughout the Universes of Time Machine Builders”. Teaching & Training everyone & the Robotics Robots to become better programmers & blockchain developers. Smart Contracts Earn while you Learn to become better programmers & Blockchain developers. And making a lot of money Financing leaving the planet retroactively.
  • 2 replies
·
posted an update about 1 month ago
view post
Post
1684
Are you a Professional Python Developer? Here is why Logging is important for debugging, tracking and monitoring the code

Logging
Logging is very important part of any project you start. It help you to track the execution of a program, debug issues, monitor system performance and keep an audit trail of events.

Basic Logging Setup
The basic way to add logging to a Python code is by using the logging.basicConfig() function. This function set up basic configuration for logging messages to either console or to a file.

Here is how we can use basic console logging
#Call built in library
import logging

# lets call library and start logging 
logging.basicConfig(level=logging.DEBUG) #you can add more format specifier 

# It will show on the console since we did not added filename to save logs
logging.debug('Here we go for debug message')
logging.info('Here we go for info message')
logging.warning('Here we go for warning message')
logging.error('Here we go for error message')
logging.critical('Here we go for critical message')

#Note:
# If you want to add anything in the log then do like this way
records=100
logging.debug('There are total %s number of records.', records)

# same like string format 
lost=20
logging.debug('There are total %s number of records from which %s are lost', records, lost)



Logging to a File
We can also save the log to a file instead of console. For this, we can add the filename parameter to logging.basicConfig().

import logging
# Saving the log to a file. The logs will be written to app.log
logging.basicConfig(filename='app.log', level=logging.DEBUG)

logging.debug('Here we go for debug message')
logging.info('Here we go for info message')
logging.warning('Here we go for warning message')
logging.error('Here we go for error message')
logging.critical('Here we go for critical message')

You can read more on my medium blog https://medium.com/@imranzaman-5202/are-you-a-professional-python-developer-8596e2b2edaa
reacted to daniel-de-leon's post with 🔥 about 1 month ago
view post
Post
2397
As the rapid adoption of chat bots and QandA models continues, so do the concerns for their reliability and safety. In response to this, many state-of-the-art models are being tuned to act as Safety Guardrails to protect against malicious usage and avoid undesired, harmful output. I published a Hugging Face blog introducing a simple, proof-of-concept, RoBERTa-based LLM that my team and I finetuned to detect toxic prompt inputs into chat-style LLMs. The article explores some of the tradeoffs of fine-tuning larger decoder vs. smaller encoder models and asks the question if "simpler is better" in the arena of toxic prompt detection.

🔗 to blog: https://huggingface.co/blog/daniel-de-leon/toxic-prompt-roberta
🔗 to model: Intel/toxic-prompt-roberta
🔗 to OPEA microservice: https://github.com/opea-project/GenAIComps/tree/main/comps/guardrails/toxicity_detection

A huge thank you to my colleagues that helped contribute: @qgao007 , @mitalipo , @ashahba and Fahim Mohammad
posted an update about 1 month ago
view post
Post
1363
LoRA with code 🚀 using PEFT (parameter efficient fine-tuning)

LoRA (Low-Rank Adaptation)
LoRA adds low-rank matrices to specific layers and reduce the number of trainable parameters for efficient fine-tuning.

Code:
Please install these libraries first:
pip install peft
pip install datasets
pip install transformers

from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments
from peft import LoraConfig, get_peft_model
from datasets import load_dataset

# Loading the pre-trained BERT model
model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

# Configuring the LoRA parameters
lora_config = LoraConfig(
    r=8,
    lora_alpha=16, 
    lora_dropout=0.1, 
    bias="none" 
)

# Applying LoRA to the model
model = get_peft_model(model, lora_config)

# Loading dataset for classification
dataset = load_dataset("glue", "sst2")
train_dataset = dataset["train"]

# Setting the training arguments
training_args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=16,
    num_train_epochs=3,
    logging_dir="./logs",
)

# Creating a Trainer instance for fine-tuning
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
)

# Finally we can fine-tune the model
trainer.train()


LoRA adds low-rank matrices to fine-tune only a small portion of the model and reduces training overhead by training fewer parameters.
We can perform efficient fine-tuning with minimal impact on accuracy and its suitable for large models where full-precision training is still feasible.
replied to their post about 1 month ago
posted an update about 1 month ago
view post
Post
1705
Today lets discuss about 32-bit (FP32) and 16-bit (FP16) floating-point!

Floating-point numbers are used to represent real numbers (like decimals) and they consist of three parts:

Sign bit: 
Indicates whether the number is positive (0) or negative (1).
Exponent:
Determines the scale of the number (i.e., how large or small it is by shifting the decimal point).
Mantissa (or fraction): 
Represents the actual digits of the number.

32-bit Floating Point (FP32)
Total bits: 32 bits
Sign bit: 1 bit
Exponent: 8 bits
Mantissa: 23 bits
For example:
A number like -15.375 would be represented as:
Sign bit: 1 (negative number)
Exponent: Stored after being adjusted by a bias (127 in FP32).
Mantissa: The significant digits after converting the number to binary.

16-bit Floating Point (FP16)
Total bits: 16 bits
Sign bit: 1 bit
Exponent: 5 bits
Mantissa: 10 bits
Example:
A number like -15.375 would be stored similarly:
Sign bit: 1 (negative number)
Exponent: Uses 5 bits, limiting the range compared to FP32.
Mantissa: Only 10 bits for precision.

Precision and Range
FP32: Higher precision and larger range, with about 7 decimal places of accuracy.
FP16: Less precision (around 3-4 decimal places), smaller range but faster computations and less memory use.
·
posted an update about 2 months ago
view post
Post
1282
Last Thursday at KaggleX organized by Google, I presented a workshop on "Unlocking the Power of Large Language Models (LLMs) for Business Applications" where I explained how we can reduce the size of LLM models to make them more suitable for business use and addressing common resource limitations.
https://drive.google.com/file/d/1p5sT4_DeyBuwCqmYt4dCJKZOgLMpESzR/view
posted an update about 2 months ago
view post
Post
1356
Here is how we can calculate the size of any LLM model:

Each parameter in LLM models is typically stored as a floating-point number. The size of each parameter in bytes depends on the precision.

32-bit precision: Each parameter takes 4 bytes.
16-bit precision: Each parameter takes 2 bytes

To calculate the total memory usage of the model:
Memory usage (in bytes) = No. of Parameters × Size of Each Parameter

For example:
32-bit Precision (FP32)
In 32-bit floating-point precision, each parameter takes 4 bytes.
Memory usage in bytes = 1 billion parameters × 4 bytes
1,000,000,000 × 4 = 4,000,000,000 bytes
In gigabytes: ≈ 3.73 GB

16-bit Precision (FP16)
In 16-bit floating-point precision, each parameter takes 2 bytes.
Memory usage in bytes = 1 billion parameters × 2 bytes
1,000,000,000 × 2 = 2,000,000,000 bytes
In gigabytes: ≈ 1.86 GB

It depends on whether you use 32-bit or 16-bit precision, a model with 1 billion parameters would use approximately 3.73 GB or 1.86 GB of memory, respectively.
published an article about 2 months ago
view article
Article

Unlocking the Power of Large Language Models (LLMs) for Business Applications

1