Emin Temiz PRO

etemiz

AI & ML interests

Alignment

Recent Activity

Organizations

None yet

etemiz's activity

upvoted an article 2 days ago
view article
Article

Welcome Llama 4 Maverick & Scout on Hugging Face!

β€’ 116
posted an update 7 days ago
reacted to danielhanchen's post with ❀️ 7 days ago
published an article 9 days ago
replied to their post 10 days ago
reacted to luigi12345's post with πŸ‘ 11 days ago
view post
Post
3411
🧠 PROMPT FOR CONVERTING ANY MODEL IN REASONING "THINKING" MODELπŸ”₯πŸ€–
Convert any model to Deepseek R1 like "thinking" model. πŸ’­

You're now a thinking-first LLM. For all inputs:

1. Start with <thinking>
   - Break down problems step-by-step
   - Consider multiple approaches
   - Calculate carefully
   - Identify errors
   - Evaluate critically
   - Explore edge cases
   - Check knowledge accuracy
   - Cite sources when possible

2. End with </thinking>

3. Then respond clearly based on your thinking.

The <thinking> section is invisible to users and helps you produce better answers.

For math: show all work and verify
For coding: reason through logic and test edge cases
For facts: verify information and consider reliability
For creative tasks: explore options before deciding
For analysis: examine multiple interpretations

Example:
<thinking>
[Step-by-step analysis]
[Multiple perspectives]
[Self-critique]
[Final conclusion]
</thinking>

[Clear, concise response to user]

  • 3 replies
Β·
posted an update 13 days ago
posted an update 14 days ago
view post
Post
483
Mistral Small 3.1 numbers are in. It is interesting Mistral always lands in the middle.
https://sheet.zoho.com/sheet/open/mz41j09cc640a29ba47729fed784a263c1d08?sheetid=0&range=A1

I started to do the comparison with 2 models now. In the past Llama 3.1 70B Q4 was the one doing the comparison of answers. Now I am using Gemma 3 27B Q8 as well to have a second opinion on it. Gemma 3 produces very similar measurement to Llama 3.1. So the end result is not going to shake much.
  • 1 reply
Β·
replied to their post 18 days ago
view reply

Looks like we need more mature tools for Gemma 3, it is failing to fine tune like half of the time. Unsloth and transformers are getting ready. And I am trying lower learning rates and rank stabilized LoRa, and different r, lora_alpha.

reacted to their post with πŸš€ 18 days ago
view post
Post
1695
Started fine tuning Gemma 3 using evolutionary approach. It is not the worst model according to AHA leaderboard and it is one of the smart according to lmarena.ai. My objective is to make it based, anti woke, wise, beneficial and then some.

Several GPUs are fine tuning it at the same time, each using a different dataset and using QLoRA and the successful ones are merged later. Compared to LoRa this allows faster training and also reduced overfitting because the merge operation heals overfitting. The problem with this could be the 4 bit quantization may make models dumber. But I am not looking for sheer IQ. Too much mind is a problem anyway :)

Has anyone tried parallel QLoRa and merge before?

I also automated the dataset selection and benchmarking and converging to objectives (the fit function, the reward). It is basically trying to get higher score in AHA Leaderboard as fast as possible with a diverse set of organisms that "evolve by training".

I want to release some cool stuff when I have the time:
- how an answer to a single question changes over time, with each training round or day
- a chart to show AHA alignment over training rounds
  • 3 replies
Β·
posted an update 19 days ago
view post
Post
1695
Started fine tuning Gemma 3 using evolutionary approach. It is not the worst model according to AHA leaderboard and it is one of the smart according to lmarena.ai. My objective is to make it based, anti woke, wise, beneficial and then some.

Several GPUs are fine tuning it at the same time, each using a different dataset and using QLoRA and the successful ones are merged later. Compared to LoRa this allows faster training and also reduced overfitting because the merge operation heals overfitting. The problem with this could be the 4 bit quantization may make models dumber. But I am not looking for sheer IQ. Too much mind is a problem anyway :)

Has anyone tried parallel QLoRa and merge before?

I also automated the dataset selection and benchmarking and converging to objectives (the fit function, the reward). It is basically trying to get higher score in AHA Leaderboard as fast as possible with a diverse set of organisms that "evolve by training".

I want to release some cool stuff when I have the time:
- how an answer to a single question changes over time, with each training round or day
- a chart to show AHA alignment over training rounds
  • 3 replies
Β·
posted an update 21 days ago
upvoted an article 23 days ago
published an article 23 days ago
posted an update 26 days ago
view post
Post
1324
Benchmarked Gemma 3 today. It has better knowledge compared to 2 but still in the median area in the leaderboard.
  • 1 reply
Β·
posted an update about 1 month ago
view post
Post
1689
Benchmarked QwQ for the AHA Leaderboard. Compared to Qwen 2.5 knows nutrition and fasting better but lacks faith.

  • 1 reply
Β·
posted an update about 1 month ago
published an article about 1 month ago