"Am I going to be replaced by AI?" - Crucial question, but maybe we're asking the wrong one.
📈 There's a statistic from my reads this week that stays with me: Tomer Cohen, LinkedIn's CPO, shares to Jeremy Kahn that 70% of skills used in most jobs will change by 2030. Not jobs disappearing, but transforming. And he calls out bad leadership: "If in one year's time, you are disappointed that your workforce is not 'AI native,' it is your fault."
🔄 Apparently, the Great Recalibration has begun. We're now heading into an era where AI is fundamentally redefining the nature of work itself, by forcing a complete reassessment of human value in the workplace, according to a piece in Fast Company. But it might be driven more by "the need for humans to change the way they work" than AI.
⚡ The Washington Post draws a crucial parallel: We're facing an "AI shock" similar to manufacturing's "China shock" - but hitting knowledge workers. Especially entry-level, white-collar work could get automated. The key difference? "Winning the AI tech competition with other countries won't be enough. It's equally vital to win the battle to re-skill workers."
Did we just drop personalized AI evaluation?! This tool auto-generates custom benchmarks on your docs to test which models are the best.
Most benchmarks test general capabilities, but what matters is how models handle your data and tasks. YourBench helps answer critical questions like: - Do you really need a hundreds-of-billions-parameter model sledgehammer to crack a nut? - Could a smaller, fine-tuned model work better? - How well do different models understand your domain?
Some cool features: 📚 Generates custom benchmarks from your own documents (PDFs, Word, HTML) 🎯 Tests models on real tasks, not just general capabilities 🔄 Supports multiple models for different pipeline stages 🧠 Generate both single-hop and multi-hop questions 🔍 Evaluate top models and deploy leaderboards instantly 💰 Full cost analysis to optimize for your budget 🛠️ Fully configurable via a single YAML file
26 SOTA models tested for question generation. Interesting finding: Qwen2.5 32B leads in question diversity, while smaller Qwen models and Gemini 2.0 Flash offer great value for cost.
You can also run it locally on any models you want.
Want to vibecode with DeepSeek? Just spent 10 minutes with this space and created a full world indicators dashboard - literally just by describing what I wanted!
Anyone can now prototype and deploy projects instantly.