SafeLMM

community

AI & ML interests

Synthetic Augmented data, Fair and Extreme-scaled Large Multimodal Model (SafeLMM) * Multilingual, Multimodal, Multidomain data * Synthetic data * Safety-by-design

Recent Activity

huu-ontocord authored a paper about 1 month ago

RedPajama: an Open Dataset for Training Large Language Models

huu-ontocord authored a paper about 1 month ago

LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

huu-ontocord authored a paper about 1 month ago

Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs

View all activity

SafeLMM's activity

huu-ontocord

authored 3 papers about 1 month ago

RedPajama: an Open Dataset for Training Large Language Models

Paper • 2411.12372 • Published Nov 19, 2024 • 56

LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

Paper • 2412.15035 • Published Dec 19, 2024 • 4

Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs

Paper • 2502.19413 • Published Feb 26 • 19

huu-ontocord

authored a paper 12 months ago

ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming

Paper • 2404.08676 • Published Apr 6, 2024 • 3

tanmaylaud

authored 3 papers about 1 year ago

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Paper • 2404.00399 • Published Mar 30, 2024 • 43

ClimaBench: A Benchmark Dataset For Climate Change Text Understanding in English

Paper • 2301.04253 • Published Jan 11, 2023 • 1

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 31

huu-ontocord

posted an update about 1 year ago

Post

1704

We would like to announce our Aurora-M multilingual models which is based on Starcoderplus.
Twitter: https://twitter.com/ontocord/status/1772778544051155029
LinkedIn: https://www.linkedin.com/feed/update/urn:li:activity:7178521998845759488/
Blog post: https://huggingface.co/blog/mayank-mishra/aurora
Arxiv: Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order (2404.00399)

Current LLMs are very susceptible to generating toxic, harmful and even dangerous content. They can also generate outputs with gender or racial biases. The Biden-Harris Executive Order https://www.federalregister.gov/documents/2023/11/01/2023-24283/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence) sets forth guidelines on what is considered a safe AI system.
Following up on these guidelines, we present the world's first open source Biden-Harris Executive Order Red teamed Multilingual Language Model: Aurora-M. Inspired by BigScience, the model is trained on 5 languages: English, Hindi, Japanese, Vietnamese and Finnish.

* Red teamed model: aurora-m/aurora-m-biden-harris-redteamed tuned according to the order mentioned above)
* Base model: aurora-m/aurora-m-base (not safety tuned)
* Instruct model: aurora-m/aurora-m-instruct (not safety tuned)

@mayank-mishra @cabbage972 @sted97 @Xa9aX @Taishi-N324 @Muennighoff @vumichien @prateeky2806 @felfri @spyysalo and many many others!

huu-ontocord

authored 3 papers about 1 year ago

OpenAssistant Conversations -- Democratizing Large Language Model Alignment

Paper • 2304.07327 • Published Apr 14, 2023 • 6

Data Governance in the Age of Large-Scale Data-Driven Language Technology

Paper • 2206.03216 • Published May 4, 2022

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Paper • 2404.00399 • Published Mar 30, 2024 • 43

AI & ML interests

Recent Activity

Team members 2

SafeLMM's activity