Instructions to use dz237/AwareAILabs-v0.11-3-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dz237/AwareAILabs-v0.11-3-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="dz237/AwareAILabs-v0.11-3-8B", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("dz237/AwareAILabs-v0.11-3-8B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("dz237/AwareAILabs-v0.11-3-8B", trust_remote_code=True)

PEFT
How to use dz237/AwareAILabs-v0.11-3-8B with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use dz237/AwareAILabs-v0.11-3-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dz237/AwareAILabs-v0.11-3-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dz237/AwareAILabs-v0.11-3-8B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/dz237/AwareAILabs-v0.11-3-8B

SGLang

How to use dz237/AwareAILabs-v0.11-3-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dz237/AwareAILabs-v0.11-3-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dz237/AwareAILabs-v0.11-3-8B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dz237/AwareAILabs-v0.11-3-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dz237/AwareAILabs-v0.11-3-8B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use dz237/AwareAILabs-v0.11-3-8B with Docker Model Runner:
```
docker model run hf.co/dz237/AwareAILabs-v0.11-3-8B
```

Model Card for dz237/AwareAILabs-v0.11-3-8B

This model is a fine-tuned variant of Meta-Llama-3-8B, adapted using LoRA (Low-Rank Adaptation) techniques. It is designed for text generation tasks and can serve as a backbone for conversational AI, creative writing, or other NLP applications.

Model Details

Model Description

This model is built upon Meta-Llama-3-8B and further refined using LoRA adapter weights. It leverages the efficiency and scalability of the Transformers library to provide quality text-generation outputs while reducing computational overhead. The model is particularly useful in scenarios where resource constraints demand a lighter-weight adaptation of larger language models.

Developed by: dz237
Funded by [optional]: N/A
Shared by [optional]: dz237 / AwareAILabs Community
Model type: Llama-based Causal Language Model with LoRA fine-tuning
Language(s) (NLP): Primarily English (additional languages may be supported depending on fine-tuning data)
License: [More Information Needed]
Finetuned from model [optional]: meta-llama/Meta-Llama-3-8B

Model Sources [optional]

Repository: dz237/AwareAILabs-v0.11-3-8B
Paper [optional]: [More Information Needed]
Demo [optional]: [More Information Needed]

Uses

Direct Use

This model can be directly used for generating text for chatbots, story generation, and other creative language tasks. It is particularly useful for developers who need an adaptable and efficient language model without the full resource requirements of larger base models.

Downstream Use [optional]

The model’s architecture allows it to be further fine-tuned for specific tasks such as summarization, translation, or question-answering. Developers can integrate it into larger systems or tailor it to domain-specific applications.

Out-of-Scope Use

Critical decision making: Due to potential biases and the possibility of generating inaccurate or misleading content, this model should not be used in high-stakes applications without human oversight.
Sensitive content generation: The model has not been extensively tested for generating content in sensitive domains and may produce inappropriate or biased outputs.

Bias, Risks, and Limitations

The model may inherit biases present in the training data or the base model.
Outputs are generated based on statistical patterns and may occasionally produce incorrect or nonsensical text.
Use in critical applications should be approached with caution and appropriate safeguards.

Recommendations

Users should:

Evaluate outputs carefully in sensitive or high-stakes applications.
Consider additional fine-tuning or bias-mitigation strategies before deployment in production environments.
Provide clear usage guidelines and monitor the model's outputs regularly.

How to Get Started with the Model

Install the Transformers library and load the model as follows:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("dz237/AwareAILabs-v0.11-3-8B")
tokenizer = AutoTokenizer.from_pretrained("dz237/AwareAILabs-v0.11-3-8B")

# Example usage:
prompt = "Once upon a time"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: 7