Taylor658 (atayloraerospace)

posted an update about 5 hours ago

Post

281

Cohere for AI, Argilla, and Hugging Face are collaborating on an Open Science Project to enhance multilingual model evaluations. The project focuses on the widely-used MMLU dataset, which spans 57 subjects like mathematics, computer science, and law. However, existing translations often miss linguistic and cultural nuances, thus embedding biases. 🤔

To address this, they have annotated a subset of the MMLU test set and are inviting global perspectives to review prompts, highlighting cultural specifics and required knowledge. They have mentioned that insights will help shape future multilingual model evaluations, ensuring they are more inclusive and accurate. 🗺️ 📝 🙌

▶️ To get started go to: CohereForAI/MMLU-evaluation

🌍 They also have an Aya Discord server for collaboration with other participants: https://discord.gg/9gVhdfnQMN

posted an update 1 day ago

Post

951

The Hugging Face Computer Vision community will have the first in a series of online hangouts/study groups this Saturday June 1st at 10:00 am EDT.🚀

Join us on the Hugging Face Discord channel for the Hangout!

https://discord.gg/hugging-face-879548962464493619?event=1243129304863215656 🤗

posted an update 5 days ago

Post

1007

Researchers from Anthropic managed to extract millions of interpretable features from their Claude 3 Sonnet model, making it easier to identify and understand specific behaviors and patterns within the model.

This advance in understanding closed source AI models could make them safer by showing how specific features relate to concepts and affect the model’s behavior.

Read the Article: https://www.anthropic.com/research/mapping-mind-language-model?utm_source=substack&utm_medium=email

Read The Paper: https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html

replied to their post 5 days ago

You are welcome Omar

posted an update 6 days ago

Post

1142

The Google Deep Mind Team just released a new technical report on Gemini 1.5 Pro and Gemini 1.5 Flash.

in addition to architecture, benchmark and evaluation details, the report also provides a few real world use cases for the models such as professional task optimization and translation of lesser-known languages.

You can check out the full report here: https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf?utm_source=substack&utm_medium=email

2 replies

·

posted an update 8 days ago

Post

1767

A new paper, "Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning," was just published. The approach improves VLMs' decision-making abilities in goal-directed tasks.

This is accomplished with Chain-of-thought (COT) reasoning, which seriously enhances performance. Removing COT reasoning, however, drops effectiveness, highlighting its crucial role.

Check out the paper here: https://arxiv.org/abs/2405.10292

replied to codelion's post 14 days ago

Yes some potential multi input or multi source coded related tasks could be - executing shell commands directly from a script, deserialization of untrusted data, or parsing xml data for example. CWE-611 and CWE-502 might already cover a couple of these coding scenarios though...

posted an update 14 days ago

Post

1678

Another excellent course has launched on Hugging Face Learn https://huggingface.co/learn

HF Developer Advocate Dylan Ebert has officially launched the ML for 3D Course! 🤗

Check it out @ https://huggingface.co/learn/ml-for-3d-course/unit0/introduction

YT Channel: https://www.youtube.com/@IndividualKex
GitHub: https://github.com/huggingface/ml-for-3d-course

replied to codelion's post 15 days ago

Thanks for posting results for gpt-4o so fast!

You will have to post the latest Gemini model results tomorrow after I/O announcements. :-)

Since we are squarely in the age of multimodal models I am curious if any of the 76 standard scripts run for vulnerability remediation in "static-analysis-eval" demonstrate multimodal vulnerabilities?

posted an update 17 days ago

Post

1096

Red Hat and IBM have announced InstructLab, an open-source project for LLM contributions. InstructLab offers a model-agnostic approach for the community to contribute "skills" and or "knowledge" to LLMs via a CLI and tuning backend.

This community-driven approach to GenAI model development is novel to say the least. It will be interesting to see how effective it is in the long run, especially on models beyond the initial Granite and Merlinite familes.

Can check out Git Hub here: https://github.com/instructlab
Read the LAB Paper: https://arxiv.org/abs/2403.01081
View Model Builds: https://huggingface.co/instructlab

replied to mattmdjaga's post 19 days ago

Thanks for posting about the course!

posted an update 19 days ago

Post

1099

🤗The first submissions from the Community Hugging Face Computer Vision Course (https://huggingface.co/learn/computer-vision-course/unit0/welcome/welcome) are being posted up on HF Spaces!🤗

OmAlve/Swin-Transformer-Foods101
Rageshhf/medi-classifier

It is amazing that the first group of students has completed the course and in record time!

Will look forward to seeing more submissions from the course soon.

A nice swag item that students get when they complete the course and make their submission is this cool Hugging Face Certificate of Completion. (Its suitable for framing) 🤗
👇

1 reply

·

posted an update 26 days ago

Post

2161

The Open Medical-LLM Leaderboard is now up on HF Spaces. 🤗

openlifescienceai/open_medical_llm_leaderboard

It will be interesting to add the results of the just announced Med-Gemini model to the Leaderboard to see how it compares and if its stated 91.1% MedQA benchmark is accurate.

Capabilities of Gemini Models in Medicine (2404.18416)

atayloraerospace PRO

AI & ML interests

Organizations

Taylor658's activity