atayloraerospace PRO

Taylor658

AI & ML interests

Computer Vision 🔭 | Multimodal Gen AI 🤖| AI in Healthcare 🩺 | AI in Aerospace 🚀

Organizations

Taylor658's activity

posted an update 8 days ago
view post
Post
522
🌍 Cohere for AI has announced that this July and August, it is inviting researchers from around the world to join Expedition Aya, a global initiative focused on launching projects using multilingual tools like Aya 23 and Aya 101. 🌐

Participants can start by joining the Aya server, where all organization will take place. They can share ideas and connect with others on Discord and the signup sheet. Various events will be hosted to help people find potential team members. 🤝

To support the projects, Cohere API credits will be issued. 💰

Over the course of six weeks, weekly check-in calls are also planned to help teams stay on track and receive support with using Aya. 🖥️

The expedition will wrap up at the end of August with a closing event to showcase everyone’s work and plan next steps. Participants who complete the expedition will also receive some Expedition Aya swag. 🎉

Links:
Join the Aya Discord: https://discord.com/invite/q9QRYkjpwk
Visit the Expedition Aya Minisite: https://sites.google.com/cohere.com/expedition-aya/home
replied to davanstrien's post 12 days ago
view reply

This is great..thanks for posting up these Gradio's in one collection.

posted an update 13 days ago
view post
Post
887
🔍 A recently published technical report introduces MINT-1T, a dataset that will considerably expand open-source multimodal data. It features one trillion text tokens and three billion images and is scheduled for release in July 2024.

Researcher Affiliation:

University of Washington
Salesforce Research
Stanford University
University of Texas at Austin
University of California, Berkeley

Paper:
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
https://arxiv.org/pdf/2406.11271v1.pdf

GitHub:
https://github.com/mlfoundations/MINT-1T

Highlights:

MINT-1T Dataset: Largest open-source multimodal interleaved dataset with 1 trillion text tokens & 3 billion images. 📊🖼️
Diverse Sources: Incorporates data from HTML, PDFs, and ArXiv documents. 📄📚
Open Source: Dataset and code will be released at https://github.com/mlfoundations/MINT-1T. 🌐🔓
Broader Domain Representation: Uses diverse data sources for balanced domain representation. 🌍📚
Performance in Multimodal Tasks: The dataset’s scale and diversity should enhance multimodal task performance. 🤖💡

Datasheet Information:

Motivation: Addresses the gap in large-scale open-source multimodal datasets. 🌐📊
Composition: 927.6 million documents, including HTML, PDF, and ArXiv sources. 📄📚
Collection Process: Gathered from CommonCrawl WARC and WAT dumps, with rigorous filtering. 🗂️🔍
Preprocessing/Cleaning: Removal of low-quality text, duplicates and anonymization of sensitive information. 🧹🔒
Ethical Considerations: Measures to ensure privacy and avoid bias. ⚖️🔏
Uses: Training multimodal models, generating interleaved image-text sequences, and building retrieval systems. 🤖📖
posted an update 15 days ago
view post
Post
794
With the CVPR conference (https://cvpr.thecvf.com) in full swing this week in Seattle 🏙️, the competition details for NeurIPS 2024 have just been released.🚀

Some of the competitions this year include:

🦾 MyoChallenge 2024: Physiological dexterity in bionic humans.
🌌 FAIR Universe: Handling uncertainties in fundamental science.
🧪 BELKA: Chemical assessment through big encoded libraries.
🏆 HAC: Hacker-Cup AI competition.
💰 Large-Scale Auction Challenge: Decision-making in competitive games.
📶 URGENT Challenge: Signal reconstruction and enhancement.
🛡️ LASC 2024: Safety in LLM and AI agents.

For more details, check out: https://blog.neurips.cc/2024/06/04/neurips-2024-competitions-announced
replied to dvilasuero's post 19 days ago
view reply

We'll keep you posted in both channels but I think it will make sense to shift our efforts towards the HF discord very soon

Ok sounds good...see you on discord

replied to dvilasuero's post 19 days ago
view reply

One more quick question😁...for those of us on the Hugging Face Hub and Discord Communities; who want to engage more with the Argilla community, do you recommend that we join over on the Argilla Slack Community or will there be new channels in the near future over on the Hugging Face Discord for Argilla?

replied to dvilasuero's post 20 days ago
view reply

That's great news on the upcoming 2.0 release and possibility of a separate monitoring and metrics package. I would love to help beta test and or brainstorm multimodal metrics!

replied to dvilasuero's post 20 days ago
view reply

Congratulations to the Argilla Team! 🥳

With the fast adoption of multimodal models, will you guys add multimodal monitoring into the model monitoring set soon?

In particular cross-modal consistency, multimodal data drift and fairness and bias monitoring...

posted an update 20 days ago
view post
Post
3205
Luma AI has just launched Dream Machine, a Sora and Kling AI-like tool that generates videos from simple text and images. 🎥
Dream Machine is out of beta and offers a free tier to test it out.

I tried this extremely simple prompt with the pic below and thought the capture of my prompt into a drone camera-like video was decent:

You are a drone operator. Create a 30-second video from a drone heading eastbound over the western suburbs of Bismarck, North Dakota, looking east towards the city on an overcast summer evening during the golden hour from an altitude of 200 ft.


Dream Machine also has a paid tier. However, like its paid tier text-to-image brethren from 2023 (who all fared EXTREMELY badly once good text-to-image capabilities became the norm in open and closed source LLMs), time will tell if the pay tier model will work for text and image to video. ⏳

This will be evident in 3 to 5 months once GPT-5, Gemini-2, Mistral-9, Llama 4, et al., all models with enhanced multimodal capabilities, are released. 🚀
posted an update 22 days ago
view post
Post
2514
Researchers at Carnegie Mellon University have introduced Sotopia, a platform designed to evaluate and enhance AI’s social capabilities. Sotopia focuses on assessing AI’s performance in goal-oriented social interactions, like collaboration, negotiation, and competition.

🔍 Key Findings:
Performance Evaluation: The platform enables testing and comparison of different AI systems, with a specific emphasis on refining Mistral-7B. 🛠️
Benchmarking: Sotopia uses GPT-4 as a benchmark to evaluate other AI systems’ capabilities. 📏

🔧 Technical Points:
Foundation: Sotopia builds upon Mistral-7B, focusing on behavior cloning and self-reinforcement. 🏗️
Multi-Dimensional Assessment: Sotopia evaluates AI performance across 7 social dimensions, including believability, adherence to social norms, and successful goal completion. 🌐
Data Collection: The platform gathers data from human-human, human-AI, and AI-AI interactions. 📂

Sotopia Project Page: https://www.sotopia.world/
Check out the HF space here: cmu-lti/sotopia-space
Additional details are in the HF Collection: cmu-lti/sotopia-65f312c1bd04a8c4a9225e5b

posted an update 27 days ago
view post
Post
2141
🔬 This paper introduces Fusion Intelligence (FI), a novel approach integrating the adaptive behaviors of natural organisms 🐝(Bees!)🐝 with AI's computational power.

Paper:
Fusion Intelligence: Confluence of Natural and Artificial Intelligence for Enhanced Problem-Solving Efficiency (2405.09763)
https://arxiv.org/pdf/2405.09763

Key Takeaways:
* Fusion Intelligence (FI): Combines natural organism efficiency with AI's power. 🌟
* Hybrid Approach: Integrates natural abilities with AI for better problem-solving. 🧠🤖
* Agricultural Applications: Shows a 50% improvement in pollination efficiency. 🐝🌼
* Energy Efficiency: Consumes only 29.5-50.2 mW per bee, much lower than traditional methods. ⚡
* Scalability: Applicable to fields like environmental monitoring and search and rescue. 🌍🔍
* Non-Invasive: Eliminates the need for invasive modifications to biological entities. 🌱

This research offers a new approach for those interested in sustainable AI solutions. By merging biology with AI, (FI) aims to create solutions for a variety of challenges.
  • 2 replies
·
posted an update about 1 month ago
replied to not-lain's post about 1 month ago
view reply

@not-lain This is great! Thanks for offering this intro session for those new to the HF community.

posted an update about 1 month ago
view post
Post
1232
Cohere for AI, Argilla, and Hugging Face are collaborating on an Open Science Project to enhance multilingual model evaluations. The project focuses on the widely-used MMLU dataset, which spans 57 subjects like mathematics, computer science, and law. However, existing translations often miss linguistic and cultural nuances, thus embedding biases. 🤔

To address this, they have annotated a subset of the MMLU test set and are inviting global perspectives to review prompts, highlighting cultural specifics and required knowledge. They have mentioned that insights will help shape future multilingual model evaluations, ensuring they are more inclusive and accurate. 🗺️ 📝 🙌

▶️ To get started go to: CohereForAI/MMLU-evaluation

🌍 They also have an Aya Discord server for collaboration with other participants: https://discord.gg/9gVhdfnQMN
posted an update about 1 month ago
posted an update about 1 month ago
view post
Post
1059
Researchers from Anthropic managed to extract millions of interpretable features from their Claude 3 Sonnet model, making it easier to identify and understand specific behaviors and patterns within the model​.

This advance in understanding closed source AI models could make them safer by showing how specific features relate to concepts and affect the model’s behavior.

Read the Article: https://www.anthropic.com/research/mapping-mind-language-model?utm_source=substack&utm_medium=email

Read The Paper: https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html
replied to their post about 1 month ago
posted an update about 1 month ago
view post
Post
1200
The Google Deep Mind Team just released a new technical report on Gemini 1.5 Pro and Gemini 1.5 Flash.

in addition to architecture, benchmark and evaluation details, the report also provides a few real world use cases for the models such as professional task optimization and translation of lesser-known languages.

You can check out the full report here: https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf?utm_source=substack&utm_medium=email

  • 2 replies
·
posted an update about 1 month ago
view post
Post
1792
A new paper, "Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning," was just published. The approach improves VLMs' decision-making abilities in goal-directed tasks.

This is accomplished with Chain-of-thought (COT) reasoning, which seriously enhances performance. Removing COT reasoning, however, drops effectiveness, highlighting its crucial role.

Check out the paper here: https://arxiv.org/abs/2405.10292
replied to codelion's post about 2 months ago
view reply

Yes some potential multi input or multi source coded related tasks could be - executing shell commands directly from a script, deserialization of untrusted data, or parsing xml data for example. CWE-611 and CWE-502 might already cover a couple of these coding scenarios though...

posted an update about 2 months ago
replied to codelion's post about 2 months ago
view reply

Thanks for posting results for gpt-4o so fast!

You will have to post the latest Gemini model results tomorrow after I/O announcements. :-)

Since we are squarely in the age of multimodal models I am curious if any of the 76 standard scripts run for vulnerability remediation in "static-analysis-eval" demonstrate multimodal vulnerabilities?

posted an update about 2 months ago
view post
Post
1121
Red Hat and IBM have announced InstructLab, an open-source project for LLM contributions. InstructLab offers a model-agnostic approach for the community to contribute "skills" and or "knowledge" to LLMs via a CLI and tuning backend.

This community-driven approach to GenAI model development is novel to say the least. It will be interesting to see how effective it is in the long run, especially on models beyond the initial Granite and Merlinite familes.

Can check out Git Hub here: https://github.com/instructlab
Read the LAB Paper: https://arxiv.org/abs/2403.01081
View Model Builds: https://huggingface.co/instructlab
replied to mattmdjaga's post about 2 months ago
posted an update about 2 months ago
view post
Post
1112
🤗The first submissions from the Community Hugging Face Computer Vision Course (https://huggingface.co/learn/computer-vision-course/unit0/welcome/welcome) are being posted up on HF Spaces!🤗

OmAlve/Swin-Transformer-Foods101
Rageshhf/medi-classifier

It is amazing that the first group of students has completed the course and in record time!

Will look forward to seeing more submissions from the course soon.

A nice swag item that students get when they complete the course and make their submission is this cool Hugging Face Certificate of Completion. (Its suitable for framing) 🤗
👇
  • 1 reply
·
posted an update 2 months ago