Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

openfree 
posted an update 2 days ago
view post
Post
4866
Korean Exam Leaderboard: LLMs vs Civil Service and Professional Qualification Exams πŸ“

openfree/Korean-Exam-Leaderboard

## πŸ“Š What is this leaderboard?
This leaderboard evaluates the performance of various AI models on 22 Korean civil service and professional qualification exams. All scores are converted to a 100-point scale to show how well different LLMs can solve actual Korean civil service and professional qualification tests!

## πŸ† Current Top Performers
- **OpenAI/GPT-o1**: Bar Exam 52.5 points πŸ₯‡
- **OpenAI/GPT-4.5**: Bar Exam 49.33 points πŸ₯ˆ
- **OpenAI/GPT-4o**: Bar Exam 49.11 points πŸ₯‰
- **deepseek-ai/DeepSeek-R1**: Bar Exam 47.33 points

## πŸ“‹ Exams Being Evaluated
The leaderboard includes various Korean civil service and professional qualification exams:
- Korean Bar Exam
- Senior Civil Service Grade 5
- Judicial Service Grade 5
- National Assembly Grade 5
- Judicial Scrivener
- Police Executive Candidate
- And more exams!

## πŸ€– Models Being Evaluated
We are testing a variety of models:
- OpenAI: GPT-o1, GPT-o3-mini, GPT-4.5, GPT-4o
- Anthropic: Claude 3.7 Sonnet
- Google: Gemini 2.0 Flash/PRO/Flash Thinking
- Meta: Llama 3.3 70B Instruct, Llama 3.2 90B Vision
- DeepSeek: DeepSeek-R1
- Qwen: QwQ-32B, Qwen2.5 Coder
- Mistral: Mistral-Small-3.1-24B
- NVIDIA models: NVIDIA Nemotron variant models
- And many more!

## πŸ” Why This Matters
Korean civil service exams are known for their high difficulty and comprehensive knowledge assessment. These exams test deep knowledge across legal, administrative, and public service domains. Success in these exams demonstrates not just language understanding but also domain expertise and reasoning ability.

## πŸ§ͺ Evaluation Methodology

πŸ”œ Future Plans
We are continuously expanding our test coverage across all 22 exam categories. We will keep updating the scores marked "TBD" so please stay tuned!
Β·
hanzla 
posted an update 1 day ago
view post
Post
3358
πŸ‘‹ Hi all!

For any AI agent, internet search πŸ”Ž is an important tool. However, with APIs like Tavily and Exa, it becomes really difficult to keep up with the cost. In some cases, these Internet APIs cost more than the LLM.

To solve, this, I am making a playwright wrapper API on top of publicly available searXNG instances. This will enable agent applications to fetch internet results for free.

Currently, I have set up a basic GitHub repo, and I will continue developing advanced search features, such as image search πŸ–ΌοΈ

Github: https://github.com/HanzlaJavaid/Free-Search/tree/main

πŸš€ Try the deployed version: https://freesearch.replit.app/docs

If you find this useful, consider starring ⭐️ the GitHub repository to support further development!
Kseniase 
posted an update 1 day ago
view post
Post
3456
8 types of RoPE

As we always use Transformers, it's helpful to understand RoPEβ€”Rotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on.

Here are 8 types of RoPE that can be implemented in different cases:

1. Original RoPE -> RoFormer: Enhanced Transformer with Rotary Position Embedding (2104.09864)
Encodes token positions by rotating token embeddings in the complex plane via a position-based rotation matrix, thereby providing the self-attention mechanism with relative positional info.

2. LongRoPE -> LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens (2402.13753)
Extends the context window of pre-trained LLMs to 2048k tokens, leveraging non-uniformities in positional interpolation with an efficient search.

3. LongRoPE2 -> LongRoPE2: Near-Lossless LLM Context Window Scaling (2502.20082)
Extends the effective context window of pre-trained LLMs to the target! length, rescaling RoPE guided by β€œneedle-driven” perplexity.

4. Multimodal RoPE (MRoPE) -> Qwen2.5-VL Technical Report (2502.13923)
Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos.

5. Directional RoPE (DRoPE) -> DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling (2503.15029)
Adds an identity scalar, improving how angles are handled without extra complexity. It helps balance accuracy, speed, and memory usage.

6. VideoRoPE -> VideoRoPE: What Makes for Good Video Rotary Position Embedding? (2502.05173)
Adapts RoPE for video, featuring 3D structure, low-frequency temporal allocation, diagonal layout, and adjustable spacing.

7. VRoPE -> VRoPE: Rotary Position Embedding for Video Large Language Models (2502.11664)
An another RoPE for video, which restructures positional indices and balances encoding for uniform spatial focus.

8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10
Introduces an exponential decay factor into the rotation matrix​, improving stability on long sequences.
  • 1 reply
Β·
merve 
posted an update 3 days ago
view post
Post
2401
So many open releases at Hugging Face past week 🀯 recapping all here ‡️ merve/march-21-releases-67dbe10e185f199e656140ae

πŸ‘€ Multimodal
> Mistral AI released a 24B vision LM, both base and instruction FT versions, sota πŸ”₯ (OS)
> with IBM we released SmolDocling, a sota 256M document parser with Apache 2.0 license (OS)
> SpatialLM is a new vision LM that outputs 3D bounding boxes, comes with 0.5B (QwenVL based) and 1B (Llama based) variants
> SkyWork released SkyWork-R1V-38B, new vision reasoning model (OS)

πŸ’¬ LLMs
> NVIDIA released new Nemotron models in 49B and 8B with their post-training dataset
> LG released EXAONE, new reasoning models in 2.4B, 7.8B and 32B
> Dataset: Glaive AI released a new reasoning dataset of 22M+ examples
> Dataset: NVIDIA released new helpfulness dataset HelpSteer3
> Dataset: OpenManusRL is a new agent dataset based on ReAct framework (OS)
> Open-R1 team released OlympicCoder, new competitive coder model in 7B and 32B
> Dataset: GeneralThought-430K is a new reasoning dataset (OS)

πŸ–ΌοΈ Image Generation/Computer Vision
> Roboflow released RF-DETR, new real-time sota object detector (OS) πŸ”₯
> YOLOE is a new real-time zero-shot object detector with text and visual prompts πŸ₯Ή
> Stability AI released Stable Virtual Camera, a new novel view synthesis model
> Tencent released Hunyuan3D-2mini, new small and fast 3D asset generation model
> ByteDance released InfiniteYou, new realistic photo generation model
> StarVector is a new 8B model that generates svg from images
> FlexWorld is a new model that expands 3D views (OS)

🎀 Audio
> Sesame released CSM-1B new speech generation model (OS)

πŸ€– Robotics
> NVIDIA released GR00T, new robotics model for generalized reasoning and skills, along with the dataset

*OS ones have Apache 2.0 or MIT license
onekq 
posted an update about 23 hours ago
OFT 
posted an update 3 days ago
view post
Post
2379
Today I decided to cancel my PRO subscription for Hugging Face. I had a lot of fun with it but with the current changes to API and allowed limits I think it isn't worth it anymore. So I just turned everything off and cancelled my subscription. It feels like one of these movies scenes where you see an old computerlab and someone putting big white sheets over it and closing the door behind him. I am not going, I am not gone, but watching through the glass window of the door that I just closed.
Β·
MikeDoes 
posted an update about 18 hours ago
csabakecskemeti 
posted an update 1 day ago
view post
Post
2088
I'm collecting llama-bench results for inference with a llama 3.1 8B q4 and q8 reference models on varoius GPUs. The results are average of 5 executions.
The system varies (different motherboard and CPU ... but that probably that has little effect on the inference performance).

https://devquasar.com/gpu-gguf-inference-comparison/
the exact models user are in the page

I'd welcome results from other GPUs is you have access do anything else you've need in the post. Hopefully this is useful information everyone.
Jaward 
posted an update 1 day ago
onekq 
posted an update 3 days ago
view post
Post
3613
Folks, let's get ready.πŸ₯³ We will be busy soon. πŸ˜…πŸ€—https://github.com/huggingface/transformers/pull/36878