Model Card for Abhijith71/Abhijith

Model Summary

This model is a fine-tuned version of deepseek-ai/deepseek-coder-6.7b-instruct using Parameter-Efficient Fine-Tuning (PEFT). It is optimized for text generation tasks, particularly in code generation and instruction-based responses. The model is designed to assist in generating high-quality code snippets, explanations, and other natural language responses related to programming.

Model Details

Developed by: Abhijith71
Funded by [optional]: Self-funded
Shared by: Abhijith71
Model type: Causal Language Model (CLM)
Language(s): English
License: Apache 2.0
Fine-tuned from: deepseek-ai/deepseek-coder-6.7b-instruct

Model Sources

Repository: Hugging Face Model Page
Paper [optional]: N/A
Demo: N/A

Uses

Direct Use

Code generation
Instruction-based responses
General text completion

Downstream Use

AI-assisted programming
Documentation generation
Code debugging assistance

Out-of-Scope Use

Not optimized for open-ended conversation beyond code-related tasks
Not suitable for real-time chatbot applications without further tuning

Bias, Risks, and Limitations

The model might generate biased or inaccurate responses based on training data.
Outputs should be verified before using in production.
Limited knowledge outside the dataset scope (no real-time updating capability).

Recommendations

Users should review generated code for accuracy and security.
Additional fine-tuning may be required for specialized use cases.

How to Use

Loading the Model

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Abhijith71/Abhijith"
base_model = "deepseek-ai/deepseek-coder-6.7b-instruct"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model)
model = PeftModel.from_pretrained(model, model_name)

inputs = tokenizer("Generate a Python function for factorial", return_tensors="pt")
output = model.generate(**inputs)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Training Details

Training Data

Fine-tuned on a dataset containing programming-related instructions and code snippets.

Training Procedure

Preprocessing: Tokenization and formatting of instruction-based prompts.
Training Regime: Mixed precision training (bf16) using PEFT for efficient fine-tuning.

Evaluation

Testing Data, Factors & Metrics

Testing Data: Sampled from programming-related sources
Metrics:
- Perplexity (PPL)
- Code quality assessment
- Instruction-following accuracy

Results

Model generates coherent and useful code snippets for various prompts.
Some limitations exist in edge cases and complex multi-step reasoning.

Environmental Impact

Hardware: NVIDIA A100 GPUs
Training Duration: Approx. 10-20 hours
Cloud Provider: AWS
Carbon Emissions: Estimated using ML Impact Calculator

Technical Specifications

Model Architecture

Based on deepseek-ai/deepseek-coder-6.7b-instruct
Uses PEFT for fine-tuning

Compute Infrastructure

Hardware: 4x A100 GPUs
Software: PEFT 0.14.0, Transformers 4.34+

Citation

@misc{abhijith2024,
  author = {Abhijith71},
  title = {Fine-tuned DeepSeek Coder Model},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  url = {https://huggingface.co/Abhijith71/Abhijith}
}

Contact

Author: Abhijith71
Hugging Face Profile: https://huggingface.co/Abhijith71

Framework Versions

PEFT: 0.14.0
Transformers: 4.34+
Torch: 2.0+

This model card provides an overview of the fine-tuned model, detailing its purpose, usage, and limitations. Users should review generated outputs and consider fine-tuning for specific applications.

Abhijith71
/

Abhijith