Acknowledge license to accept the repository. Our team may take 2-3 days to process your request

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

UNBIAS Pas de Pitié Pour Le Croissant LLM LICENSE AGREEMENT
UNBIAS Pas de Pitié Pour Le Croissant Version Release Date: June 17, 2024
“Agreement” means the terms and conditions for use, reproduction, distribution and modification of the UnBias Materials set forth herein.
“Documentation” means the specifications, manuals and documentation accompanying Pas de Pitié Pour Le Croissant LLM distributed by UnBias at https://huggingface.co/unbias/PasDePitiePourLeCroissantLLMBase, https://www.unbias.fr or by any other means.
“Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
“Pas de Pitié Pour Le Croissant LLM” means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by UnBias at https://huggingface.co/unbias/PasDePitiePourLeCroissantLLMBase, https://www.unbias.fr or by any other means.
“UnBias Materials” means, collectively, UnBias’ proprietary “Pas de Pitié Pour Le Croissant LLM” and Documentation (and any portion thereof) made available under this Agreement.
“UnBias” or “we” means UnBias SAS, a French company with a capital of three thousand and on euros, registered at Grasse B greffe under SIREN 897 757 993 and its eventual successor.
Our current business address as of this release date (June 11, 2024) at Sophia-Antipolis, 1047 route des Dolines 06560 Valbonne, France.
By clicking “I Accept” below or by using or distributing any portion or element of the UnBias Materials, you agree to be bound by this Agreement.

  1. License Rights and Redistribution.

a. Intellectual Property Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under UnBias intellectual property or other rights owned by UnBias embodied in the UnBias Materials to use, and create derivative works of, and make modifications to
the UnBias Materials.

b. Redistribution and Use. You shall NOT use publicly including through a public or private website, distribute and you shall NOT make available the UnBias Materials (or any derivative works thereof), or a product or service that uses any of them, including another AI model. During your own use, you shall prominently display “Built with UnBias Pas de Pitié Pour Le Croissant LLM” on the user interface or product documentation. If you use the UnBias Materials to create, train, fine tune, or otherwise improve an AI model, you shall also include “UnBias” at the beginning of any such AI model name. If you receive UnBias Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. You must retain in all copies of the UnBias Materials that you keep the following attribution notice within a “Notice” text file distributed as a part of such copies: “UnBias Pas de Pitié Pour Le Croissant LLM is licensed under the UNBIAS Pas de Pitié Pour Le Croissant LLM LICENSE AGREEMENT, Copyright © UnBias SAS All Rights Reserved.” Your use of the UnBias Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the UnBias Materials (available at XXXX), which is hereby incorporated by reference into this Agreement. You will not use the UnBias Materials or any output or results of the UnBias Materials to improve any other large language model (excluding UnBias Pas de Pitié Pour Le Croissant LLM or derivative works thereof).

  1. Additional Commercial Terms. To commercially use UnBias Pas de Pitié Pour Le Croissant, you must request a license from UnBias, which UnBias may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until UnBias otherwise expressly grants you such rights.
  2. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE UNBIAS MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND UNBIAS DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE UNBIAS MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE UNBIAS MATERIALS AND ANY OUTPUT AND RESULTS.
  3. Limitation of Liability. IN NO EVENT WILL UNBIAS OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF UNBIAS OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
  4. Intellectual Property.
  • No trademark licenses are granted under this Agreement, and in connection with the UnBias Materials, neither UnBias nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the UnBias Materials or as set forth in this Section 5(a). UnBias hereby grants you a license to use “UnBias” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will comply with UnBias’s brand guidelines (currently accessible at XXX). All goodwill arising out of your use of the Mark will inure to the benefit of UnBias.

  • Subject to UnBias’s ownership of UnBias Materials and derivatives made by or for UnBias, with respect to any derivative works and modifications of the UnBias Materials that are made by you, as between you and UnBias, you shall use but NOT claim ownership or copyright any such derivative works and modifications.

  • If you institute litigation or other proceedings against UnBias or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the UnBias Materials or UnBias Pas de Pitié Pour Le Croissant LLM outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless UnBias from and against any claim by any third party arising out of or related to your use or distribution of the UnBias Materials.

  1. Issues, evaluations, Benchmarking, Assessment, Quality Assurance, Commenting. You shall immediately notify UnBias of an issue in particular zero-day vulnerabilites, legal issues in particular with respect to the General Data Protection Regulation (GDPR), the Data Protection Law Enforcement Directive and other rules concerning the protection of personal data and the regulation (EU) 2024 of the European Parliament and the Council of laying down harmonised rules on artificial intelligence and amending Regulations (EC) No 300/2008, (EU) No 167/2013, (EU) No 168/2013, (EU) 2018/858, (EU) 2018/1139 and (EU) 2019/2144 and Directives 2014/90/EU, (EU) 2016/797 and (EU) 2020/1828 (Artificial Intelligence Act). You also commit to send to UnBias for its consideration including granting to UnBias re-use of your work for advertisement purposes any evaluation, benchmarking, assessment, quality assurance or commnent. You shall referain from commenting, advertising or publicising any appreciation of Pas de Pitié pour les Croissants LLM without explicit UnBias approval.
  2. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the UnBias Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. UnBias may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the UnBias Materials. Sections 3, 4 and 8 shall survive the termination of this Agreement. If any provision of this Agreement is held to be invalid, illegal or unenforceable, the remaining provisions shall be unaffected thereby and remain valid as if such provision had not been set forth herein.
  3. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of France without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of Paris, France shall have exclusive jurisdiction of any dispute arising out of this Agreement.

Log in or Sign Up to review the conditions and access this model content.

PasDePitiéPourLeCroissantLLM - Base (demo v20240519-7h18)

PasDePitiéPourLeCroissantLLM was created by UnBias Harpax from the CroissantLLM initiative's 24 layers, 1.345 (a.k.a "1.5") billion parameters croissantllm/CroissantLLMBase model (5,3 Go ROM) to 282 million parameters and 3 layers (709 Mo ROM).

That's a whopping 80% reduction in size, speed, energy and carbon consumption. Your 100,000 euros compute budget could either afford you 5x the compute or you could save 80.000 euros.

This version of the model is however NOT ENABLED with UnBias Harpax CrystALS as it has been reverted to a vanilla architecture by UnBias Harpax UnCrystALS. The CrystALS version is 182 millions parameters and fits on 200Mo ROM.

A UnBias Harpax CrystALS version would result in a 87% reduction compared to baseline in size, speed, energy and carbon consumption. Your 100,000 euros compute budget could either afford you 7.7x the compute or you could save 87.000 euros.

Last but not least the larger the model gets, the larger the compression factor becomes, about 50x-ish for 70billions parameters-class models.

To harness the benefits of UnBias Harpax CrystALS please liaise with our sales department for further enquiries.

Pas de Pitié pour les Croissants ("Spare no Croissants") is a pun on CroissantLLM based on a 1980s French children television broadcast (© 1987, AB Productions, TF1)

Abstract

Who we are ?

UnBias is a French DeepTech start-up based at Sophia-Antipolis focusing on high-precision, portable (Mb not Tb) signal understanding and bias control.

At UnBias, we strive for sustainability and efficiency in furthering artificial intelligence and machine learning. Our frugal R&D regularly yields discoveries that accelerate model training while using minimal resources.

What we do?

UnBias Harpax comprises 4 components to elevate your efficiency and win the AI race:

  1. Trainer, the acceleration component
  2. CrystALS, the compression architecture
  3. Vampire the third-party non-native model conversion to CrystALS architecture
  4. UnCrystALS module to export back to non-CrystALS architectures for sharing and publications (provided a loss in performance and increased size)

UnBias Harpax was designed for large-scale pre-training from scratch, eventually we extended the solution to vampirize existing models

What is this?

PasDePitiéPourLeCroissantLLM was created by UnBias Harpax from the CroissantLLM initiative's 24 layers, 1.345 (a.k.a "1.5") billion parameters croissantllm/CroissantLLMBase model (5,3 Go ROM) to 282 million parameters and 3 layers (709 Mo ROM).

Benefits

Our layer supercompression technique shrinks model footprint several times without quantization nor pruning.

With less GPU required, our solution offers significant savings on compute for model both model pre-training from scratch or further-training as well as fine-tuing and inference.

This will save both considerable expenditures in compute, development lead-time and deployment.

Smaller models also fit on smaller, typically 30-40% cheaper GPUs, resulting in considerable potential cost savings.

Last but not least, smaller models may fit on CPUs for added edge and remote serving as well as IoT deployments.

By reducing the model footprint many times, UnBias Harpax CrystALS helps reduce carbon costs and makes it more affordable for businesses to deploy solutions.

Contact us to learn more about how CrystALS can help your research and develop in a responsible and sustainable manner.

CrystALS is a pun on Crystal to pursue the Sesame Street tradition initiated by BERT (© 1984, The Sesame Workshop)

Citation

Our work can be cited as:

@misc{dalferro2024pasdepitiépourlecroissantllm,
      title={PasDePitiéPourLeCroissantLLM: An UnBias Harpax Vampire demo compression of a Truly Bilingual French-English Language Model}, 
      author={{UnBias SAS}}, Benoit Dal Ferro, Daphné Marnat},
      year={2024},
}

The CroissantLLM's Initiative can be cited as:

@misc{faysse2024croissantllm,
      title={CroissantLLM: A Truly Bilingual French-English Language Model}, 
      author={Manuel Faysse and Patrick Fernandes and Nuno M. Guerreiro and António Loison and Duarte M. Alves and Caio Corro and Nicolas Boizard and João Alves and Ricardo Rei and Pedro H. Martins and Antoni Bigata Casademunt and François Yvon and André F. T. Martins and Gautier Viaud and Céline Hudelot and Pierre Colombo},
      year={2024},
      eprint={2402.00786},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Usage

This model is a pre-trained foundation model, that is, it is not fine-tuned for Chat function and works best with few-shot prompting strategies.


import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "unbias/PasDePitiePourLeCroissantLLMBaseTest"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.float16).eval() 

inputs = tokenizer("I am so tired I could sleep right now. -> Je suis si fatigué que je pourrais m'endormir maintenant.\nHe is heading to the market. -> Il va au marché.\nWe are running on the beach. ->", return_tensors="pt").to(model.device)
tokens = model.generate(**inputs, max_length=100, do_sample=True, top_p=0.95, top_k=60, temperature=0.3)
print(tokenizer.decode(tokens[0]))

# remove bos token
inputs = tokenizer("Capitales: France -> Paris, Italie -> Rome, Allemagne -> Berlin, Espagne ->", return_tensors="pt", add_special_tokens=True).to(model.device)
tokens = model.generate(**inputs, max_length=100, do_sample=True, top_p=0.95, top_k=60)
print(tokenizer.decode(tokens[0]))
Downloads last month
35

Datasets used to train unbias/PasDePitiePourLeCroissantLLMBaseTest