You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.


image

Methodology

This model represents a fine-tuned version of the Sarvam-30B baseline, developed for the Resilient AI Challenge., developed for the Resilient AI Challenge.

The compression strategy utilized "Post-Training Quantization (PTQ) formatted via compressed-tensors to achieve a W4A16 precision balance.

The primary objective was to maximize energy efficiency while ensuring the model maintains at least 80% of the baseline Sarvam-30b performance.


Model Details

  • Base Model: sarvam-30b
  • Compression Precision: W4A16
  • License: Apache 2.0

Inference Configuration

The model is optimized to run using the vLLM inference engine.

vllm_config.yaml

model: ./models/sarvam-30b-compressed-w4a16
quantization: compressed-tensors
kv-cache-dtype: auto
max-model-len: 8192
trust-remote-code: true

Evaluation Metrics

The model has been evaluated against the challenge benchmarks:

  • Technical Reasoning: Advanced Science and Mathematics problem-solving.
  • Domain-Specific Expertise: Medical knowledge synthesis.
  • Linguistic Creativity: Narrative generation in English and Indian languages.
  • Analytical Logic: Complex logical reasoning and deductive tasks.
  • Energy Monitoring: Power consumption was tracked using the NVIDIA Management Library (NVML) for GPU draw and TDP-relative estimation for CPU load.

Usage

To serve this model for evaluation, use the following command:

vllm serve --config vllm_config.yaml
Downloads last month
57
Safetensors
Model size
6B params
Tensor type
I64
I32
F16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for Girikannan/sarvam-30b-compressed-model

Finetuned
(10)
this model