Text Generation
Transformers
Safetensors
English
llama
text-generation-inference
ruslanmv
trl
4-bit precision
bitsandbytes
Instructions to use ruslanmv/llama3-8B-medical with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ruslanmv/llama3-8B-medical with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ruslanmv/llama3-8B-medical")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ruslanmv/llama3-8B-medical") model = AutoModelForCausalLM.from_pretrained("ruslanmv/llama3-8B-medical") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use ruslanmv/llama3-8B-medical with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ruslanmv/llama3-8B-medical" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ruslanmv/llama3-8B-medical", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/ruslanmv/llama3-8B-medical
- SGLang
How to use ruslanmv/llama3-8B-medical with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ruslanmv/llama3-8B-medical" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ruslanmv/llama3-8B-medical", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ruslanmv/llama3-8B-medical" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ruslanmv/llama3-8B-medical", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use ruslanmv/llama3-8B-medical with Docker Model Runner:
docker model run hf.co/ruslanmv/llama3-8B-medical
| language: en | |
| license: apache-2.0 | |
| tags: | |
| - text-generation-inference | |
| - transformers | |
| - ruslanmv | |
| - llama | |
| - trl | |
| base_model: meta-llama/Meta-Llama-3-8B | |
| datasets: | |
| - ruslanmv/ai-medical-chatbot | |
| model-index: | |
| - name: llama3-8B-medical | |
| results: [] | |
| widget: | |
| - example_title: llama3-8B-medical | |
| messages: | |
| - role: system | |
| content: >- | |
| You are an AI Medical Chatbot Assistant, providing comprehensive and informative responses to your inquiries. | |
| If a question does not make any sense, or is not factually coherent, explain why instead of answering something incorrect. | |
| - role: user | |
| content: Im a 35-year-old male experiencing symptoms like fatigue, increased sensitivity to cold, and dry, itchy skin. Could these be indicative of hypothyroidism? | |
| output: | |
| text: >- | |
| Yes, it is possible. Hypothyroidism can present symptoms like increased sensitivity to cold, dry skin, and fatigue. These symptoms are characteristic of hypothyroidism. | |
| I recommend consulting with a healthcare provider. | |
| # Medical-Llama3-8B-4bit: Fine-Tuned Llama3 for Medical Q&A | |
| [](https://ruslanmv.com/) | |
| Medical fine tuned version of LLAMA-3-8B quantized in 4 bits using common open source datasets and showing improvements over multilingual tasks. It has been used the standard bitquantized technique for post-fine-tuning quantization reducing the computational time complexity and space complexity required to run the model. The overall architecture it's all LLAMA-3 based. | |
| This repository provides a fine-tuned version of the powerful Llama3 8B model, specifically designed to answer medical questions in an informative way. It leverages the rich knowledge contained in the AI Medical Chatbot dataset ([ruslanmv/ai-medical-chatbot](https://huggingface.co/datasets/ruslanmv/ai-medical-chatbot)). | |
| **Model & Development** | |
| - **Developed by:** ruslanmv | |
| - **License:** Apache-2.0 | |
| - **Finetuned from model:** meta-llama/Meta-Llama-3-8B | |
| **Key Features** | |
| - **Medical Focus:** Optimized to address health-related inquiries. | |
| - **Knowledge Base:** Trained on a comprehensive medical chatbot dataset. | |
| - **Text Generation:** Generates informative and potentially helpful responses. | |
| **Installation** | |
| This model is accessible through the Hugging Face Transformers library. Install it using pip: | |
| ```bash | |
| pip install git+https://github.com/huggingface/accelerate.git | |
| pip install git+https://github.com/huggingface/transformers.git | |
| pip install bitsandbytes | |
| ``` | |
| **Usage Example** | |
| Here's a Python code snippet demonstrating how to interact with the `llama3-8B-medical` model and generate answers to your medical questions: | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig | |
| import torch | |
| # Load tokenizer and model | |
| model_id = "ruslanmv/llama3-8B-medical" | |
| quantization_config = BitsAndBytesConfig( | |
| load_in_4bit=True, | |
| bnb_4bit_compute_dtype=torch.bfloat16 | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") | |
| model = AutoModelForCausalLM.from_pretrained(model_id, config=quantization_config) | |
| def create_prompt(user_query): | |
| B_INST, E_INST = "<s>[INST]", "[/INST]" | |
| B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n" | |
| DEFAULT_SYSTEM_PROMPT = """\ | |
| You are an AI Medical Chatbot Assistant, provide comprehensive and informative responses to your inquiries. | |
| If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.""" | |
| SYSTEM_PROMPT = B_SYS + DEFAULT_SYSTEM_PROMPT + E_SYS | |
| instruction = f"User asks: {user_query}\n" | |
| prompt = B_INST + SYSTEM_PROMPT + instruction + E_INST | |
| return prompt.strip() | |
| def generate_text(model, tokenizer, prompt, | |
| max_length=200, | |
| temperature=0.8, | |
| num_return_sequences=1): | |
| prompt = create_prompt(user_query) | |
| # Tokenize the prompt | |
| input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device) # Move input_ids to the same device as the model | |
| # Generate text | |
| output = model.generate( | |
| input_ids=input_ids, | |
| max_length=max_length, | |
| temperature=temperature, | |
| num_return_sequences=num_return_sequences, | |
| pad_token_id=tokenizer.eos_token_id, # Set pad token to end of sequence token | |
| do_sample=True | |
| ) | |
| # Decode the generated output | |
| generated_text = tokenizer.decode(output[0], skip_special_tokens=True) | |
| # Split the generated text based on the prompt and take the portion after it | |
| generated_text = generated_text.split(prompt)[-1].strip() | |
| return generated_text | |
| # Example usage | |
| # - Context: First describe your problem. | |
| # - Question: Then make the question. | |
| user_query = "I'm a 35-year-old male experiencing symptoms like fatigue, increased sensitivity to cold, and dry, itchy skin. Could these be indicative of hypothyroidism?" | |
| generated_text = generate_text(model, tokenizer, user_query) | |
| print(generated_text) | |
| ``` | |
| the type of answer is : | |
| ``` | |
| Yes, it is possible. Hypothyroidism can present symptoms like increased sensitivity to cold, dry skin, and fatigue. These symptoms are characteristic of hypothyroidism. I recommend consulting with a healthcare provider. 2. Hypothyroidism can present symptoms like fever, increased sensitivity to cold, dry skin, and fatigue. These symptoms are characteristic of hypothyroidism. | |
| ``` | |
| **Important Note** | |
| This model is intended for informational purposes only and should not be used as a substitute for professional medical advice. Always consult with a qualified healthcare provider for any medical concerns. | |
| **License** | |
| This model is distributed under the Apache License 2.0 (see LICENSE file for details). | |
| **Contributing** | |
| We welcome contributions to this repository! If you have improvements or suggestions, feel free to create a pull request. | |
| **Disclaimer** | |
| While we strive to provide informative responses, the accuracy of the model's outputs cannot be guaranteed. It is crucial to consult a doctor or other healthcare professional for definitive medical advice. | |
| ``` |