--- license: apache-2.0 language: - multilingual - en - ru - es - fr - de - it - pt - pl - nl - vi - tr - sv - id - ro - cs - zh - hu - ja - th - fi - fa - uk - da - el - 'no' - bg - sk - ko - ar - lt - ca - sl - he - et - lv - hi - sq - ms - az - sr - ta - hr - kk - is - ml - mr - te - af - gl - fil - be - mk - eu - bn - ka - mn - bs - uz - ur - sw - yue - ne - kn - kaa - gu - si - cy - eo - la - hy - ky - tg - ga - mt - my - km - tt - so - ku - ps - pa - rw - lo - ha - dv - fy - lb - ckb - mg - gd - am - ug - ht - grc - hmn - sd - jv - mi - tk - ceb - yi - ba - fo - or - xh - su - kl - ny - sm - sn - co - zu - ig - yo - pap - st - haw - as - oc - cv - lus - tet - gsw - sah - br - rm - sa - bo - om - se - ce - cnh - ilo - hil - udm - os - lg - ti - vec - ts - tyv - kbd - ee - iba - av - kha - to - tn - nso - fj - zza - ak - ada - otq - dz - bua - cfm - ln - chm - gn - krc - wa - hif - yua - srn - war - rom - bik - pam - sg - lu - ady - kbp - syr - ltg - myv - iso - kac - bho - ay - kum - qu - za - pag - ngu - ve - pck - zap - tyz - hui - bbc - tzo - tiv - ksd - gom - min - ang - nhe - bgp - nzi - nnb - nv - zxx - bci - kv - new - mps - alt - meu - bew - fon - iu - abt - mgh - mnw - tvl - dov - tlh - ho - kw - mrj - meo - crh - mbt - emp - ace - ium - mam - gym - mai - crs - pon - ubu - fip - quc - gv - kj - btx - ape - chk - rcf - shn - tzh - mdf - ppk - ss - gag - cab - kri - seh - ibb - tbz - bru - enq - ach - cuk - kmb - wo - kek - qub - tab - bts - kos - rwo - cak - tuc - bum - cjk - gil - stq - tsg - quh - mak - arn - ban - jiv - sja - yap - tcy - toj - twu - xal - amu - rmc - hus - nia - kjh - bm - guh - mas - acf - dtp - ksw - bzj - din - zne - mad - msi - mag - mkn - kg - lhu - ch - qvi - mh - djk - sus - mfe - srm - dyu - ctu - gui - pau - inb - bi - mni - guc - jam - wal - jac - bas - gor - skr - nyu - noa - sda - gub - nog - cni - teo - tdx - sxn - rki - nr - frp - alz - taj - lrc - cce - rn - jvn - hvn - nij - dwr - izz - msm - bus - ktu - chr - maz - tzj - suz - knj - bim - gvl - bqc - tca - pis - prk - laj - mel - qxr - niq - ahk - shp - hne - spp - koi - krj - quf - luz - agr - tsc - mqy - gof - gbm - miq - dje - awa - bjj - qvz - sjp - tll - raj - kjg - bgz - quy - cbk - akb - oj - ify - mey - ks - cac - brx - qup - syl - jax - ff - ber - tks - trp - mrw - adh - smt - srr - ffm - qvc - mtr - ann - kaa - aa - noe - nut - gyn - kwi - xmm - msb tags: - ctranslate2 - quantization - int8 - float16 - madlad400 --- # madlad400-7b-mt-bt model for CTranslate2 **The model is quantized version of the [jbochi/madlad400-7b-mt-bt](https://huggingface.co/jbochi/madlad400-7b-mt-bt) with int8_float16 quantization and can be used in [CTranslate2](https://github.com/OpenNMT/CTranslate2).** **madlad400 is a multilingual machine translation model based on the T5 architecture introduced by Google DeepMind, Google Research in Sep 2023. It was trained on 250 billion tokens covering over 450 languages using publicly available data. The paper is titled "MADLAD-400: A Multilingual And Document-Level Large Audited Dataset" ([arXiv:2309.04662](https://arxiv.org/abs/2309.04662)).** **madlad400-7b-mt-bt is finetuned version of the 7.2B parameter model on backtranslated data. Authors say in the [paper](https://arxiv.org/pdf/2309.04662.pdf) that:** > While this setup is very likely sub-optimal, we see that back-translation > greatly improves en2xx translation (by 3.0 chrf, in the case of Flores-200) in most cases. ## Conversion details The original model was converted on 2023-12 with the following command: ``` ct2-transformers-converter --model jbochi/madlad400-7b-mt-bt --quantization int8_float16 --output_dir madlad400-7b-mt-bt-ct2-int8_float16 \ --copy_files added_tokens.json generation_config.json model.safetensors.index.json shared_vocabulary.json special_tokens_map.json spiece.model tokenizer.json tokenizer_config.json ``` ## Example This example code is obtained from [CTranslate2_transformers](https://opennmt.net/CTranslate2/guides/transformers.html#t5). More detailed information about the `translate_batch` methon can be found at [CTranslate2_Translator.translate_batch](https://opennmt.net/CTranslate2/python/ctranslate2.Translator.html#ctranslate2.Translator.translate_batch). ```python import ctranslate2 import transformers translator = ctranslate2.Translator("avans06/madlad400-7b-mt-bt-ct2-int8_float16", compute_type="auto") tokenizer = transformers.AutoTokenizer.from_pretrained("jbochi/madlad400-7b-mt-bt") prefix = "<2zh> " input_text = "Who is Alan Turing?" input_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(prefix + input_text)) results = translator.translate_batch([input_tokens]) output_tokens = results[0].hypotheses[0] output_text = tokenizer.decode(tokenizer.convert_tokens_to_ids(output_tokens)) ```