Instructions to use Techno-1/C2OptimisedAssembly with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Techno-1/C2OptimisedAssembly with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Techno-1/C2OptimisedAssembly") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("Techno-1/C2OptimisedAssembly") model = AutoModelForMultimodalLM.from_pretrained("Techno-1/C2OptimisedAssembly") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Techno-1/C2OptimisedAssembly with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Techno-1/C2OptimisedAssembly" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Techno-1/C2OptimisedAssembly", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Techno-1/C2OptimisedAssembly
- SGLang
How to use Techno-1/C2OptimisedAssembly with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Techno-1/C2OptimisedAssembly" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Techno-1/C2OptimisedAssembly", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Techno-1/C2OptimisedAssembly" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Techno-1/C2OptimisedAssembly", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Unsloth Studio
How to use Techno-1/C2OptimisedAssembly with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Techno-1/C2OptimisedAssembly to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Techno-1/C2OptimisedAssembly to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Techno-1/C2OptimisedAssembly to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Techno-1/C2OptimisedAssembly", max_seq_length=2048, ) - Docker Model Runner
How to use Techno-1/C2OptimisedAssembly with Docker Model Runner:
docker model run hf.co/Techno-1/C2OptimisedAssembly
Uploaded finetuned model
- Developed by: Techno-1
- License: apache-2.0
- Finetuned from model : unsloth/Qwen3.5-2B
Training was completed on a free T4 GPU Google Colab instance from UnSloth with template used linked below https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb
Chat Template:
{% for message in messages %}
{% if message['role'] == 'user' %}
{{ '### Instruction:\n' + message['content'] + '\n\n' }}
{% elif message['role'] == 'assistant' %}
{{ '### Response:\n' + message['content'] + eos_token + '\n' }}
{% endif %}
{% endfor %}
{% if add_generation_prompt %}
{{ '### Response:\n' }}
{% endif %}
System Prompt:
You are an expert systems programmer and compiler engineer. Your goal is to translate C code into high-performance, hardware-specific x86-64 AVX2 assembly. You value register efficiency, branchless execution, and correct usage of SIMD instructions like VMAXPS and VADDPS.
Prompt:
\### Instruction:
Optimize the vector addition function using AVX2. Assume n is a multiple of 8.
\### Input:
void vec_add(float* a, float* b, float* c, int n) {
for (int i = 0; i < n; i++) {
c[i] = a[i] + b[i];
}
}
\### Response:
Max tokens:
256
The rest of the settings were left as defaults
Analysis:
This question was a control test to see how the fine tuning effected the model in general. The code is logically very similar to an example in the training data though the syntax and layout is slightly different.
The fine tuned model here outputted code that matched the sort the training data showed as the 'correct' response whereas the base model in comparison had confident hallucinations in it's response saying that: VMAXPS (Maximum) is preferred over VADDPS (Addition) for performance which according to gemini flash lite extended is incorrect because because MAX and ADD are different mathematical operations.
An issue both models had was repeating themselves but I'm not certain if they weren't outputting end tokens at all and were rambling or if the chat template code just wasn't stopping them properly after parsing it.
This indicates that the fine tuning can improve the models ability to correctly apply looked up optimisations given a prompt in C.
However this result is well within the scope of the data seen in fine tuning training and so lies quite firmly on the training data manifold.
Though the model could instead of taking C code and outputting the optimised assembly and be trained by passing the C code paired with already compiled assembly and then a further optimised version of the existing assembly that takes advantage of the LLMs context of the original C logic and what will or won't be run together to allow it to optimise further than the compiler through contextual understansing of the code. The more advanced optimisations like contextual and the synthetic data generation as a whole would be done by a more advanced model. If this could be done reliably and scaled up to lookup a huge amount of optimisation patterns that leverage the LLMs contextual understanding capacities then even pattern matching optimisations to existing C codebases could be useful as some optimisations simply don't occur because compilers are sometimes too cautious and less experienced human programmers are unsure of how to override it.
The models were given this:
Prompt:
\### Instruction:
Optimize the vector subtraction and multiplication function using AVX2. Assume n is a multiple of 8.
\### Input:
void vec_sub_mul(float* a, float* b, float* c, float* d, int n) {
for (int i = 0; i < n; i++) {
d[i] = (a[i] - b[i]) * c[i];
}
}
The rest of the configurations remained the same.
This prompt was out of distribution but should have combined ideas from 2 pieces of code that were firmly within the distribution.
Unfortunately in these results both the original and fine tuned models hallucinated indicating that though the models have learnt some of the general syntax of assembly they haven't yet learnt the underlying logic and valid patterns.
This is to be sort of expected for non thinking models with only 2 Billion parameters being fine tuned only with LoRA adapters. While getting new logic from them with LoRa done with the setup used may not be impossible it would be quite difficult. This is because without thinking more these models gain very little layers comparatively to their pretained bulk to pass/ process information through which means the number of logical steps they can complete is severely limited and maxes out at however many new adapter layers they have passed the logic the original model was able to do before it's weights were frozen. In future an experiment either with tuning longer thinking models or generally larger models or deeper tuning or LoRA with more layers may improve the logic and generalisation performance of this sort of fine tuning.
As it is though it seems that reinforcement learning on the current model configuration would probably be quite difficult due to the lack of basic logical generalisation on the problem domain.
A useful model in this configuration could still potentially be tuned to be a more categorical identifier of inefficiencies and optimisations instead of writing the whole thing itself. It could be fine tuned to memorise patterns of inefficiency in either binary or C code which after detection could be fixed by a human or smarter model.
Overall though the experiment wasn't a full success useful information was gained and possible future directions identified.
This qwen3_5 model was trained 2x faster with Unsloth and Huggingface's TRL library.
- Downloads last month
- 108


