Instructions to use BinaryLight1011/Context with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BinaryLight1011/Context with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("BinaryLight1011/Context", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Unsloth Studio
How to use BinaryLight1011/Context with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BinaryLight1011/Context to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BinaryLight1011/Context to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for BinaryLight1011/Context to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="BinaryLight1011/Context", max_seq_length=2048, )
| from unsloth import FastLanguageModel | |
| import torch | |
| class EndpointHandler: | |
| def __init__(self, path=""): | |
| self.model, self.tokenizer = FastLanguageModel.from_pretrained( | |
| model_name=path, | |
| max_seq_length=2048, | |
| dtype=torch.float16, | |
| load_in_4bit=True, | |
| ) | |
| FastLanguageModel.for_inference(self.model) | |
| def __call__(self, data: dict): | |
| inputs_text = data.pop("inputs", "") | |
| parameters = data.pop("parameters", {}) | |
| # Formata no template do LLaMA 3 | |
| messages = [{"role": "user", "content": inputs_text}] | |
| formatted = self.tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=False, | |
| add_generation_prompt=True | |
| ) | |
| inputs = self.tokenizer( | |
| formatted, | |
| return_tensors="pt" | |
| ).to("cuda") | |
| outputs = self.model.generate( | |
| **inputs, | |
| max_new_tokens=parameters.get("max_new_tokens", 512), | |
| temperature=parameters.get("temperature", 0.7), | |
| do_sample=True, | |
| pad_token_id=self.tokenizer.eos_token_id, | |
| ) | |
| # Retorna só a resposta, sem o prompt | |
| decoded = self.tokenizer.decode(outputs[0], skip_special_tokens=True) | |
| return {"generated_text": decoded} |