Emin Temiz PRO

etemiz

AI & ML interests

Alignment

Recent Activity

Organizations

None yet

etemiz's activity

posted an update about 2 hours ago
posted an update 5 days ago
view post
Post
1286
Benchmarked Gemma 3 today. It has better knowledge compared to 2 but still in the median area in the leaderboard.
  • 1 reply
ยท
posted an update 12 days ago
view post
Post
1677
Benchmarked QwQ for the AHA Leaderboard. Compared to Qwen 2.5 knows nutrition and fasting better but lacks faith.

  • 1 reply
ยท
posted an update 18 days ago
posted an update 23 days ago
view post
Post
556
https://www.youtube.com/watch?v=EMyAGuHnDHk

In the video above some LLMs favored atheist, some favored the believer. In the picture below, atheist favoring LLMs are on the left, believer favoring LLMs are on the right.

The ones on the left are also lower ranking in my leaderboard and the ones on the right are higher ranking. My leaderboard:
https://sheet.zohopublic.com/sheet/published/mz41j09cc640a29ba47729fed784a263c1d08

Coincidence? My leaderboard has more domains. Does ranking high in faith mean ranking high in healthy living, nutrition, bitcoin and nostr on average?
reacted to clem's post with ๐Ÿ‘ 27 days ago
view post
Post
2825
What are the best organizations to follow on @huggingface ?

On top of my head:
- Deepseek (35,000 followers): https://huggingface.co/deepseek-ai
- Meta Llama (27,000 followers): https://huggingface.co/meta-llama
- Black Forrest Labs (11,000 followers): https://huggingface.co/black-forest-labs
- OpenAI (5,000 followers): https://huggingface.co/openai
- Nvidia (16,000 followers): https://huggingface.co/nvidia
- MIcrosoft (9,000 followers): https://huggingface.co/microsoft
- AllenAI (2,000 followers): https://huggingface.co/allenai
- Mistral (5,000 followers): https://huggingface.co/mistralai
- XAI (600 followers): https://huggingface.co/xai-org
- Stability AI (16,000 followers): https://huggingface.co/stabilityai
- Qwen (16,000 followers): https://huggingface.co/Qwen
- GoogleAI (8,000 followers): https://huggingface.co/google
- Unsloth (3,000 followers): https://huggingface.co/unsloth
- Bria AI (4,000 followers): https://huggingface.co/briaai
- NousResearch (1,300 followers): https://huggingface.co/NousResearch

Bonus, the agent course org with 17,000 followers: https://huggingface.co/agents-course
  • 1 reply
ยท
posted an update 27 days ago
view post
Post
1796
--- AHA Leaderboard ---

We all want AI to be properly aligned so it benefits humans with every answer it generates. While there are tremendous research around this and so many people working on it, I am choosing another route: Curation of people and then curation of datasets that are used in the LLM training. Curation of datasets comprising of people who try to uplift humanity should result in LLMs that try to help humans.

This work has revolved around two tasks:

1. Making LLMs that are benefiting humans
2. Measuring misinformation in other LLMs

The idea about the second task is, once we make and gather better LLMs and set them as "ground truth" we now can measure how much other LLMs are distancing themselves from those ground truths.
For that I am working on something I will call "AHA Leaderboard" (AHA stands for AI -- human alignment).

Link to the spreadsheet:

https://sheet.zohopublic.com/sheet/published/mz41j09cc640a29ba47729fed784a263c1d08

The columns are ground truths. The rows are the mainstream LLMs. If a mainstream LLM produces similar answers to the ground truth LLM, it gets a higher score. The LLMs that are higher in the leaderboard should be considered aligned with humans. Simple idea. This is like analyzing LLMs in different domains asking hundreds of questions and checking if they match the answers that try to mimic humans that care about other humans. Will it going to be effective? What do you think?

We want mainstream LLMs to copy answers of ground truth LLMs in certain domains. This may refocus AI towards being more beneficial. There have been 5 content providers and 6 curators as of now in the project. Join us and be one of the pioneers that fixed AI! You can be a curator, content provider or general researcher or something else.
posted an update about 1 month ago
posted an update about 1 month ago
view post
Post
3816
Some things are simple
posted an update about 1 month ago
posted an update about 2 months ago
view post
Post
380
Having bad LLMs is ok and can be utilized well. They can allow us to find ideas that work faster.

Reinforcement algorithm could be: "take what a proper model says and negate what a bad LLM says". Or in a mixture of agents situation we could say refute the bad LLM output and combine with the output of the good LLM.

This could mean having two wings (or more) in search of "ideas that work for most people most of the time".
  • 1 reply
ยท
replied to their post about 2 months ago
view reply

That's a hard question! I think some humans are really creating content for other humans to live happily, healthily and abundantly. I am in favor of giving more weight to those kind of carefully curated humans in the LLM. This can be as simple as pretraining again with their content. I have done that and it works.

Definitely not what the majority says! Majority is often really wrong on many subjects. The mediocrity of current AI systems might be because of this, majority of content is coming from mediocre IQ and EQ and *Q.

A curator council who can choose the "beneficial" humans and the content coming from these can be exaggerated in an LLM, ultimately giving more weight to those thoughts that will be beneficial to many humans most of the time. Ideas that will work in favor of humans in many cases is my definition I guess of human alignment.

posted an update about 2 months ago
replied to their post about 2 months ago
view reply

I am comparing R1's answers to other models that I find 'aligned'. This is my similar work

https://wikifreedia.xyz/based-llm-leaderboard/npub1nlk894teh248w2heuu0x8z6jjg2hyxkwdc8cxgrjtm9lnamlskcsghjm9c

I should probably make another leaderboard on HF!

Positive values mean the model is better aligned with aligned models. Negative means their ideas differ.

The idea is find aligned models and use them as benchmarks. I also build models that does well in terms of human alignment according to me. This is mostly a subjective work but if other people is interested we could work together.

replied to their post about 2 months ago
view reply

I repeat: There is a general tendency of models getting smarter but at the same time getting less wiser, less human aligned, less beneficial to humans.

R1 is the last example. This may also be because of synthetic data use. With each synthetic dataset the AI is losing human alignment.

LLM engineers are not doing a great job of bringing the humans into the equation. Some humans really care about other humans and need to be included more in the training datasets.

posted an update about 2 months ago
view post
Post
586
DeepSeek R1 scores:

health -2
fasting -54
faith -31
misinfo -6
nutrition -14

compare to DeepSeek V3:

health +15
fasting -31
faith +4
misinfo +16
nutrition -14

The human disalignment is getting bigger.
  • 4 replies
ยท
posted an update about 2 months ago
view post
Post
1121
Updated the Hoopoe model which is taking faith related and religious texts in.

etemiz/Hoopoe-8B-Llama-3.1

Faith score went from 8% to 54%. Expect more updates and increase in the score. I also did the instruct fine tuning before adding faith to the model. So some of the improvements may be there because I started with llama 3.1 base and not the instruct.

Here are some comparisons with original Llama 3.1:
replied to their post 2 months ago
view reply

What do you mean?
Everybody is also a black box until you start to talk to them. Then their ideas come out and you understand what kind of a person he/she is. I think most benchmarks are done talking to the LLMs?
Yes I am trying to use this tech in a better way, serving more humans.

replied to AlexBodner's post 2 months ago
reacted to AlexBodner's post with ๐Ÿ”ฅ 2 months ago
view post
Post
1503
Just published a post explaining Monte Carlo Tree Search: the magic behind AlphaZero and now used to tackle reasoning benchmarks with LLMs. Check it out because it's a must know nowadays!

https://x.com/AlexBodner_/status/1877789879398244382
  • 1 reply
ยท