view post Post 2479 Reply Here is my latest study on OpenAIπo1π. A Case Study of Web App Coding with OpenAI Reasoning Models (2409.13773)I wrote an easy-to-read blogpost to explain finding.https://huggingface.co/blog/onekq/daily-software-engineering-work-reasoning-modelsINSTRUCTION FOLLOWING is the key.100% instruction following + Reasoning = new SOTABut if the model misses or misunderstands one instruction, it can perform far worse than non-reasoning models.
view post Post 402 Reply Announce π WebApp1K-Duo π onekq-ai/WebApp1K-Duo-ReactThis is to keep up the challenge after OpenAI o1 models saturated the WebApp1K benchmark. The new benchmark brings SOTA to 67%. Let the hill climbing commence! onekq-ai/WebApp1K-models-leaderboardPS: I will publish more findings soon.