Model Card for Abhijith71/Abhijith

Model Summary

This model is a fine-tuned version of deepseek-ai/deepseek-coder-6.7b-instruct using Parameter-Efficient Fine-Tuning (PEFT). It is optimized for text generation tasks, particularly in code generation and instruction-based responses. The model is designed to assist in generating high-quality code snippets, explanations, and other natural language responses related to programming.

Model Details

  • Developed by: Abhijith71
  • Funded by [optional]: Self-funded
  • Shared by: Abhijith71
  • Model type: Causal Language Model (CLM)
  • Language(s): English
  • License: Apache 2.0
  • Fine-tuned from: deepseek-ai/deepseek-coder-6.7b-instruct

Model Sources

Uses

Direct Use

  • Code generation
  • Instruction-based responses
  • General text completion

Downstream Use

  • AI-assisted programming
  • Documentation generation
  • Code debugging assistance

Out-of-Scope Use

  • Not optimized for open-ended conversation beyond code-related tasks
  • Not suitable for real-time chatbot applications without further tuning

Bias, Risks, and Limitations

  • The model might generate biased or inaccurate responses based on training data.
  • Outputs should be verified before using in production.
  • Limited knowledge outside the dataset scope (no real-time updating capability).

Recommendations

  • Users should review generated code for accuracy and security.
  • Additional fine-tuning may be required for specialized use cases.

How to Use

Loading the Model

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Abhijith71/Abhijith"
base_model = "deepseek-ai/deepseek-coder-6.7b-instruct"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model)
model = PeftModel.from_pretrained(model, model_name)

inputs = tokenizer("Generate a Python function for factorial", return_tensors="pt")
output = model.generate(**inputs)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Training Details

Training Data

  • Fine-tuned on a dataset containing programming-related instructions and code snippets.

Training Procedure

  • Preprocessing: Tokenization and formatting of instruction-based prompts.
  • Training Regime: Mixed precision training (bf16) using PEFT for efficient fine-tuning.

Evaluation

Testing Data, Factors & Metrics

  • Testing Data: Sampled from programming-related sources
  • Metrics:
    • Perplexity (PPL)
    • Code quality assessment
    • Instruction-following accuracy

Results

  • Model generates coherent and useful code snippets for various prompts.
  • Some limitations exist in edge cases and complex multi-step reasoning.

Environmental Impact

  • Hardware: NVIDIA A100 GPUs
  • Training Duration: Approx. 10-20 hours
  • Cloud Provider: AWS
  • Carbon Emissions: Estimated using ML Impact Calculator

Technical Specifications

Model Architecture

  • Based on deepseek-ai/deepseek-coder-6.7b-instruct
  • Uses PEFT for fine-tuning

Compute Infrastructure

  • Hardware: 4x A100 GPUs
  • Software: PEFT 0.14.0, Transformers 4.34+

Citation

@misc{abhijith2024,
  author = {Abhijith71},
  title = {Fine-tuned DeepSeek Coder Model},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  url = {https://huggingface.co/Abhijith71/Abhijith}
}

Contact

Framework Versions

  • PEFT: 0.14.0
  • Transformers: 4.34+
  • Torch: 2.0+

This model card provides an overview of the fine-tuned model, detailing its purpose, usage, and limitations. Users should review generated outputs and consider fine-tuning for specific applications.

Downloads last month
2,420
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for Abhijith71/Abhijith

Adapter
(356)
this model