Instructions to use Techno-1/C2OptimisedAssembly with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Techno-1/C2OptimisedAssembly with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Techno-1/C2OptimisedAssembly")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Techno-1/C2OptimisedAssembly")
model = AutoModelForMultimodalLM.from_pretrained("Techno-1/C2OptimisedAssembly")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Techno-1/C2OptimisedAssembly with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Techno-1/C2OptimisedAssembly"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Techno-1/C2OptimisedAssembly",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Techno-1/C2OptimisedAssembly

SGLang

How to use Techno-1/C2OptimisedAssembly with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Techno-1/C2OptimisedAssembly" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Techno-1/C2OptimisedAssembly",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Techno-1/C2OptimisedAssembly" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Techno-1/C2OptimisedAssembly",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio

How to use Techno-1/C2OptimisedAssembly with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Techno-1/C2OptimisedAssembly to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Techno-1/C2OptimisedAssembly to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Techno-1/C2OptimisedAssembly to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Techno-1/C2OptimisedAssembly",
    max_seq_length=2048,
)

Docker Model Runner
How to use Techno-1/C2OptimisedAssembly with Docker Model Runner:
```
docker model run hf.co/Techno-1/C2OptimisedAssembly
```

Uploaded finetuned model

Developed by: Techno-1
License: apache-2.0
Finetuned from model : unsloth/Qwen3.5-2B

Training was completed on a free T4 GPU Google Colab instance from UnSloth with template used linked below https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb

Chat Template:

{% for message in messages %}
    {% if message['role'] == 'user' %}
        {{ '### Instruction:\n' + message['content'] + '\n\n' }}
    {% elif message['role'] == 'assistant' %}
        {{ '### Response:\n' + message['content'] + eos_token + '\n' }}
    {% endif %}
{% endfor %}
{% if add_generation_prompt %}
    {{ '### Response:\n' }}
{% endif %}

System Prompt:

You are an expert systems programmer and compiler engineer. Your goal is to translate C code into high-performance, hardware-specific x86-64 AVX2 assembly. You value register efficiency, branchless execution, and correct usage of SIMD instructions like VMAXPS and VADDPS.

Prompt:

\### Instruction:
Optimize the vector addition function using AVX2. Assume n is a multiple of 8.

\### Input:
void vec_add(float* a, float* b, float* c, int n) {
    for (int i = 0; i < n; i++) {
        c[i] = a[i] + b[i];
    }
}

\### Response:

Max tokens:

The rest of the settings were left as defaults

Analysis:

This question was a control test to see how the fine tuning effected the model in general. The code is logically very similar to an example in the training data though the syntax and layout is slightly different.

The fine tuned model here outputted code that matched the sort the training data showed as the 'correct' response whereas the base model in comparison had confident hallucinations in it's response saying that: VMAXPS (Maximum) is preferred over VADDPS (Addition) for performance which according to gemini flash lite extended is incorrect because because MAX and ADD are different mathematical operations.

An issue both models had was repeating themselves but I'm not certain if they weren't outputting end tokens at all and were rambling or if the chat template code just wasn't stopping them properly after parsing it.

This indicates that the fine tuning can improve the models ability to correctly apply looked up optimisations given a prompt in C.

However this result is well within the scope of the data seen in fine tuning training and so lies quite firmly on the training data manifold.

Though the model could instead of taking C code and outputting the optimised assembly and be trained by passing the C code paired with already compiled assembly and then a further optimised version of the existing assembly that takes advantage of the LLMs context of the original C logic and what will or won't be run together to allow it to optimise further than the compiler through contextual understansing of the code. The more advanced optimisations like contextual and the synthetic data generation as a whole would be done by a more advanced model. If this could be done reliably and scaled up to lookup a huge amount of optimisation patterns that leverage the LLMs contextual understanding capacities then even pattern matching optimisations to existing C codebases could be useful as some optimisations simply don't occur because compilers are sometimes too cautious and less experienced human programmers are unsure of how to override it.

The models were given this:

Prompt:

\### Instruction:
Optimize the vector subtraction and multiplication function using AVX2. Assume n is a multiple of 8.

\### Input:
void vec_sub_mul(float* a, float* b, float* c, float* d, int n) {
    for (int i = 0; i < n; i++) {
        d[i] = (a[i] - b[i]) * c[i];
    }
}

The rest of the configurations remained the same.

This prompt was out of distribution but should have combined ideas from 2 pieces of code that were firmly within the distribution.

Unfortunately in these results both the original and fine tuned models hallucinated indicating that though the models have learnt some of the general syntax of assembly they haven't yet learnt the underlying logic and valid patterns.

This is to be sort of expected for non thinking models with only 2 Billion parameters being fine tuned only with LoRA adapters. While getting new logic from them with LoRa done with the setup used may not be impossible it would be quite difficult. This is because without thinking more these models gain very little layers comparatively to their pretained bulk to pass/ process information through which means the number of logical steps they can complete is severely limited and maxes out at however many new adapter layers they have passed the logic the original model was able to do before it's weights were frozen. In future an experiment either with tuning longer thinking models or generally larger models or deeper tuning or LoRA with more layers may improve the logic and generalisation performance of this sort of fine tuning.

As it is though it seems that reinforcement learning on the current model configuration would probably be quite difficult due to the lack of basic logical generalisation on the problem domain.

A useful model in this configuration could still potentially be tuned to be a more categorical identifier of inefficiencies and optimisations instead of writing the whole thing itself. It could be fine tuned to memorise patterns of inefficiency in either binary or C code which after detection could be fixed by a human or smarter model.

Overall though the experiment wasn't a full success useful information was gained and possible future directions identified.

This qwen3_5 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month: 108

Safetensors

Model size

2B params

Tensor type

F32

BF16

Model tree for Techno-1/C2OptimisedAssembly

Base model

Qwen/Qwen3.5-2B-Base

Finetuned

Qwen/Qwen3.5-2B

Finetuned

unsloth/Qwen3.5-2B

Finetuned

(127)

this model

Techno-1
/

C2OptimisedAssembly

Uploaded finetuned model

Chat Template:

System Prompt:

Prompt:

Max tokens:

Analysis:

Prompt:

Model tree for Techno-1/C2OptimisedAssembly

Dataset used to train Techno-1/C2OptimisedAssembly