flax-community/dalle-mini · Create new file

initial commit01d6e17e

Create README.md68b5b511

Create README.mdb0b99201

pip and conda cpu installbdaeebae

* Update requirements for TPU support.d86c32f5

Updated the README based on our current strategybcac6952

Merge pull request #3 from khalidsaifullaah/patch-1182f15a7

* Simplify requirements.txt, do not pin packages.ccd00b66

Added CC3M data downloader script75b01a0a

Merge pull request #4 from khalidsaifullaah/main11c8be9b

* Update README with `-f` instructions for pip.df7b7be1

Merge pull request #2 from pcuenca/main8b9d1f58

CC3M downloader script updateda8e4fc0d

CC12M downloader script added3df3a47d

Merge pull request #7 from khalidsaifullaah/main1055c3d4

* Initial encoding tests.eb912a16

* JIT outside the loop.4b8c3a87

* Ignore __pycache__550b4727

* dalle_mini package with models and utilities:150ed18d

* Notebook that processes CC12M and creates a version with encodings.16f038a5

* Data preprocessing pipeline proof of concept.95d2faf0

* Prepend [bos] to image encodings, rename to "labels".86ba7743

feat: add run_seq2seq_flax46cb01fa

feat: add seq2seq requirementsfad333f6

feat: adjust seq2seq script for dalle3f0364c5

Merge pull request #8 from pcuenca/main9c0e5c96

fix typos and update requirementsde74f116

use pylogging to refer to logging.d9f5a351

val_max_target_length set to OUTPUT_LENGTH6c27b0d8

accept tsv files as inputs.a104edb2

Decoder: set eos to an unreachable value, set min_length=max_length toa841a4ce

Preprocessing: return "labels", "decoder_input_ids" anddf3c7bd4

* Only perform validation if requested32dc2d8e

* Make padding mask optional.ecafe5e7

feat: log metrics more frequently498559f0

feat: add adafactor600ad79d

feat: default to 1000 warmup stepsb66b9510

fix: typo833a2d58

Merge pull request #9 from borisdayma/feat--wandb-search61f888fa

feat: padding mask not requiredd61405b5

feat: simplify loss function9db361a4

Merge pull request #10 from borisdayma/feat-losscbeacb9b

feat: gradient accumulationc9e95757

fix: typos5960e871

Merge pull request #11 from borisdayma/feat-cumulba73e00b

feat: lower default source length48c07ca8

feat: add sweep for parameter searchdad6d938

Merge pull request #12 from borisdayma/feat-sweeps06f1345c

fix: wandb logging with sync_tensorboard8ba598c1

feat: update script650ecb14

feat: update sweep parameters2f69241b

text-heneration-notebook1c2552a2

notebook example for model.generate67221fce

Updated with train_file flag to resolve the error8f058ae5

Merge pull request #13 from borisdayma/model-generate-notebook894a546e

Merge pull request #14 from khalidsaifullaah/main66bfb994

Move generate nb by @ghosh-r to demodcbf0919

fix: accumulation vs lr4d55db6a

feat: update lr rangedbbd01a7

feat: shared cache folder42ce7dd2

feat: requirements for tb logging8884d407

Merge pull request #15 from borisdayma/feat-fix-lrdc5ae57f

[WIP] Encoding YFC100M dataset.b4dfea0a

fix: missing argbc01f788

feat: output_length considers bos and eos8bb22368

feat: update default parametersdbe8c41e

doc: fix comment3073ff4f

feat: log model1c44a7db

feat: fix typoec8d66b8

fix: typo47bc7b92

feat: don't log model by default5b79afd5

Merge pull request #16 from borisdayma/feat-log_modelbf4da913

fix: correct decoder_input_ids and labels19946bea

fix: model config0be49425

Merge pull request #17 from borisdayma/fix-model357779ab

fix: typo678a62f2

fix: labels array6c1f112a

fix: should be converted to array945d86c0

Shift tokens in numpy because the built in shift function stalls.835ea55c

change bart-large-cnn to bart-large in demo folder5801f139

Notebook to encode splitted YFCC100M files.82fad8cd

Add eval_interval to evaluate and log every so often.566d5f28

Merge pull request #18 from borisdayma/change-bart-large-demo395641ff

Merge pull request #19 from pcuenca/mainf8b0895e

feat: hardcoded datasetse8709a6e

feat: change default for quick tests71c757b2

feat: use common wandb shared folder7aa2f4ba

feat: no decay option5a3211fc

Merge pull request #20 from borisdayma/eval-interval635402df

feat: eval less often for faster trainingf0a53acd

Merge pull request #21 from borisdayma/feat-no_decayb29bab7d

feat: log everything through wandb19070abb

feat: set default x-axis97a008ec

Merge branch 'main'3ddf1c50

feat: eval_steps already exists in TrainingArguments0a0080bc

fix: eval_steps belongs to training_args900136f3

feat: hardcode eval_steps4c5e5a71

fix: log correct metrics3fef9c16

fix: use correct keyb20769d3

Merge pull request #22 from borisdayma/feat-axisa1c047bd

YFCC metadata cleaning and encoding script2c2f5706

feat: use bart largebb3bfa6f

feat: bye bye tensorboard533b4948

feat: update test script3cccb013

feat: split script for small and big runs5e244d0c

feat: save model frequently754f876d

fix: correct arg283adc6e

fix: define function before it is usedd449092a

fix: log metadata99a1ff5b

feat: use bart-large-cnn19d68bb1

Merge pull request #24 from borisdayma/feat--log-model-frequently648e404c

Merge pull request #23 from khalidsaifullaah/maineb591ffc

fix: model config5aaf9df6

demo for generation, including during training from wandb artifactc48da338

Merge pull request #26 from tmabraham/generation-training-demofc8c2308

fix forced bos token, also applying BART model to 8 samples now8d4e13c1

Merge pull request #27 from tmabraham/fix-forced-bos-token-on-demo8f484d95

add tpu demo notebook4b5a542d

remove .ipynbc879290d

Merge pull request #28 from patil-suraj/tpu-demoe31a84f6

feat: update model config + save optima30dbd39

feat: update scriptst63249ac6

fix: output directory must exist6e89e9e8

fix: import json90320ea5

feat: use str mode for json3e6ab1ff

feat: bigger warnings62dad481

Merge pull request #25 from borisdayma/fix-configd8111363

feat: hardcode our full dataset499ddb28

feat: allow loading a model checkpoint3d61350e

fix: typoca83cca7

fix: custom referencee803feb4

fix: config used in preprocess6d252e95

Score results using pre-trained CLIP.5a9a1b6d

Merge pull request #29 from borisdayma/load_checkpoint862924a4

Merge pull request #30 from pcuenca/clip-score6567fd7c

add tokenizer save to wandb:aecf3a76

Add missing import for CLIP.7cc9b381

Use jax.device_count(), don't assume 8.a11eff57

feat: model config not hardcodedad6ad646

Merge branch 'add-tokenizer-save' into feat-model28f08be4

feat: restore state from checkpoint4aced93a

fix: typoa173dadc

fix: missing argf65ccb36

fix: typo and missing tokenizer files09362db6

Merge pull request #32 from borisdayma/feat-model699e1d97

Script to log predictions grid to wandb.830d7a2f

Barebones demo app for local testing.cb2ac60a

Add dalle_mini directory module.adfe05ea

Update requirements7158e2e3

Add a couple of sliders and prevent generating without a prompt0e8338dc

Script that predicts using all saved versions of a model.d7be08ce

doc: update README5542365e

doc: reference vqgan-jax repocc3e85c2

chore: file not needed231a81a2

chore: move requirements to correct section469b2c2b

Merge pull request #36 from borisdayma/chore-cleanup21a35bcd

add gradio demo6b0d541f

Merge pull request #37 from tmabraham/add-gradio-demo7e6e1fee

Merge branch 'main' of github.com:borisdayma/dalle-mini into mainb7651853

Upgrade to model 4oh3u7ca for predictions.810d65be

Gradio UI skeleton for experimentation.f68e37a1

Attempts to tweak the UI.8944cc51

Integrate current UI in demo app.85eab146

Configure README to use gradio in the hf Space.1b9d7ecb

Merge pull request #38 from borisdayma/app-ui0497ad3c

doc: update README00ed1abe

Merge branch 'main' into chore-cleanup2fcac23ae

fix: report link55068620

chore: move files around31da1e5e

chore: remove duplicate filesb88a5231

feat: add symbolic link1f05876e

feat: separate model definition7f962d64

feat: less requirements are necessary5afc82fb

Update demo to use Suraj's backend server.704ee93a

Fix the number of candidates reported.a61d80f8

Do not create share link.eb6780a5

Merge pull request #39 from borisdayma/chore-cleanup25bf185b8

Get backend url from environment variable.6be31592

Text modifications, make links unconditionally blue.3584703f

Use network defaults.50b9a440

Merge pull request #40 from borisdayma/app-uib49f529e

Get predictions from backend.6e79248c

Merge pull request #41 from borisdayma/predictions7f9514db

refactor: move `captioned_strip` to library.f62b0451

Simple skeleton for a streamlit appffed1380

Merge branch 'main' of github.com:borisdayma/dalle-mini into mainbc78bfd2

feat: remove hardcoded values93c5ac82

doc: note about model definition6c5fc6a6

feat: remove unused metrics0d94b71f

feat: add logo62e13ba3

Merge pull request #42 from borisdayma/chore-clean7851774a

feat: allow display in multi lines0dd7d807

feat(app): improve display482963e0

Slight change on READMEec26f365

docs: update README527f0c86

app sidebar + improvementsc1cfda43

Transparent logoa710a16b

Transparent logo added in READMEdfab06b4

Previous logo (non-transparent) removed4a394f83

the prev. logo name reused64003e8b

comments removed2b5c84e0

Merge pull request #44 from khalidsaifullaah/main6e12ba68

Merge pull request #45 from borisdayma/sidebar-improvement720a5df1

Reuse text field for the various messages.9f85e8c8

Use HTML to center the bottom of the sidebar.32f68a8b

Use one column in the sidebar, center logo.c286f153

Text changes.89a2c739

Commit to using DALL·E instead of DALL-E43255764

Add Again button after first generation.0b4b9a3d

Merge pull request #47 from borisdayma/demo-improvements5d39a05a

"Project Report" instead of "Report"b5619d25

doc: update README6fa11069

feat: purple is prettierd4159c9e

Merge pull request #48 from borisdayma/fix-title47f6891c

chore: reduce size of notebooks0aab987d

fix: prevent empty search when Again is tapped and the field is empty.588c97c1

Merge pull request #49 from borisdayma/demo-improvementsbbc3a60a

feat: update requirementsf23b8bf6

feat: update requirementsffe0e3d0

streamlit session_state hack for v0.793f588191

Merge branch 'main' of https://github.com/borisdayma/dalle-mini into mainc7357219

action: max file size59f5b2a7

Break before sidebar links.f95204e7

Merge branch 'main' of github.com:borisdayma/dalle-mini into main6a9e187a

fix: action max sizeebcaf781

Center sidebar description.20495457

Merge pull request #50 from borisdayma/sidebar-centerd0b5d567

feat: name action5b3faf15

fix: correct action dependency27ef1ce5

feat: test action32f50fe1

feat: push to hub5d72414c

feat: correct access71aa2912

Merge pull request #51 from borisdayma/fix-actiond8b128b9

fix(action): clone full repo6b5023f3

feat: split actions between PR and mainbc7d169f

feat: rename actionff46e1c2

feat: update action197d48db

fix: action file sizeb486ad2c

feat: simplify actionfe319354

doc: explain the logod8a21922

Update requirements.txt9cb966be

Update requirements.txt9f254057

Update app_gradio.pyadcb0639

Update app_gradio.py139d8016

Ignore .streamlit, notebook checkpoints.01fd46dd

Example JAX VQGAN notebook to show image encoding/decoding.cb008a45

Add encoding example.99ed4c60

Refactor: use VQGAN model from github, remove local copy9851f428

Refactor: use VQGAN model from github, remove local copy11ae5954

Merge pull request #59 from pcuenca/refactor/vqgan-jax127e2309

Merge pull request #60 from borisdayma/refactor/vqgan-jaxcfb03b35

Merge pull request #58 from pcuenca/main35406cd8

moved gradio files into app/gradiobb2758c5

Reorganization: move JAX VQGAN notebook to devdf5e2f06

Created using Colaboratory1d8a7996

feat(wandb-examples): use model filec7776fb7

Merge pull request #61 from pcuenca/maind9f0f391

Merge branch 'main' into chore-mv0d975e77

chore: reorganize filesab5769aa

fix: action file size910f8ebe

open external link in new tab prevening breaking of demo851cea00

doc: add reference to inference pipeline in README5da4af07

Merge pull request #65 from borisdayma/external-links-new-tab440f9666

Merge pull request #63 from borisdayma/chore-mvb5991361

feat: add tqdma8b4257a

feat: remove warnings30878396

feat: link colab to correct branchac97ed48

fix: restore deleted paragraph3d56f19a

fix: use correct brancha6b4265a

Merge pull request #66 from borisdayma/feat-tqdm2c857082

doc: simplify installation instructionse994676e

Merge pull request #67 from borisdayma/doc-install488c07c1

feat: add license1286649e

Merge pull request #68 from borisdayma/fix-licenseba8d9f52

fix(colab): don't preprocess images twice in CLIPae3f013b

feat: remove warninge627a218

fix: use correct branch2dfcce84

Merge pull request #69 from borisdayma/fix-colab2e2cd5d5

doc(README): fix typo18f5a295

Merge branch 'main' into abidlabs/maindccd804f

feat: update gradio appa0b5dc71

fix: add symlink6783773a

Merge pull request #55 from abidlabs/main686197af

docs: add link to requirementsb7b2e315

feat(action): use exact HF spaces file size limitf1d5a2e0

Merge pull request #70 from borisdayma/feat-filesize1c9c679e

Fix requirements.txt to install libtpu from google's page.65f52829

fix(seq2seq): opt_state from ckpt + limit cache0c9ff657

feat: 🥑 theme460f43af

doc(README): add link to models753c4f01

feat: make dalle_mini installable1df4fcb9

Merge pull request #73 from borisdayma/doc-modelsda0ffc83

Merge pull request #72 from borisdayma/feat-themec794bb2c

fix: issue url710c65b5

fix: actually replace state1d04ab39

Normal install of transformers & datasetsa9ea330e

Merge pull request #75 from borisdayma/fix-dev-requirements08b0ce1d

Add citation file78abdf74

citations added4844e749

Merge pull request #76 from borisdayma/citationc91f919f

Merge branch 'main' of https://github.com/khalidsaifullaah/dalle-mini into readme-referencesd7c3d1cb

DOI added87a67a73

CITATION.cff updatec1d5af30

updated project's BibTex in README2d777957

readme and CITATION doi syncedd93a40d7

doc: demo released1ba7fc24

Merge pull request #79 from borisdayma/doc-demo635b1bf8

Merge branch 'borisdayma:main' into readme-references19a3f53b

Add progress indicator and warning.bb7c400a

Merge pull request #80 from borisdayma/app-progress6c108692

fix: typo2bd938f0

Progress indicator follows current theme.a7f2bba0

Remove session handling hack.1985070c

Merge branch 'app-progress' into main8be786e4

Merge branch 'main' of github.com:borisdayma/dalle-mini into mainc9959cbd

Fix typo again.813a4002

Revert "Remove session handling hack."8e37fb8d

Merge pull request #78 from khalidsaifullaah/readme-references97c79259

feat: action to debug app47723e56

fix(action): typo7fb274e1

Specify src refefence in sync_to_hub_debug action325d2ee6

docs(README): update acknowledgementsecf5f294

feat: remove symlinks6e844036

Merge branch 'main' of https://github.com/borisdayma/dalle-mini into feat-setupb78c972a

fix: unused importe05a13d1

fix(app): install dalle-minidbf86c96

fix: typo7067c27d

Notebooks that demonstrate streaming encoding6047b498

Replace notebooks with the correct versions.3b508e3d

Merge pull request #85 from borisdayma/encoding-streaming783de86a

feat: requirements not needed8d594fd7

doc: update READMEa8c579f6

Merge pull request #74 from borisdayma/feat-setup27a34357

feat(inference_notebook): dalle-mini is installable1c83da92

feat: add text utilities1212a74e

Merge pull request #87 from borisdayma/feat-textdf2dbc7b

feat: add ftfya09ea254

Merge branch 'main' of https://github.com/borisdayma/dalle-mini into fix-opt_state39caefb2

feat: handle streaminga96f44df

fix: remove breakpointb75e0e98

fix(seq2seq): use streaming arg0c992bdb

fix(seq2seq): normalize text061c06b4

feat: update defaults9ed63789

feat: log epoch + check params074c5e12

feat: limit artifacts size7253e563

feat: no need for default valuesa37cd751

Merge pull request #71 from borisdayma/fix-opt_state77657e63

feat: use optax for gradient accumulation69cf636e

feat: update scripts5ca30e67

feat(gitignore): ignore compiled files and wandb0b21fa51

add standalone modeling file6197b2f6

Merge pull request #88 from borisdayma/feat-cumul272552a0

fix(seq2seq): memory issue708a42c5

feat: get rid of global_step + log more metrics4a4820f6

Merge branch 'main' of https://github.com/borisdayma/dalle-mini5faf0fdf

fix: state.step type47e006f6

feat: log to backend378a628b

feat: add functionsb8bbe685

feat: remove cache before creating artifacts5f6b691c

feat: add scoring353365f0

feat: create a table38705a9e

refactor: loop over runsbf3640df

feat: allow latest version onlyff051c95

feat: cleanup2d169e35

fix: pmap clip329a553a44

fix: typoc85fbb61

feat: add sample1d51d0b8

feat: cleanup91d8a296

Merge pull request #90 from borisdayma/feat-newfdbe19f5

feat: more samples2ef2966f

Merge pull request #91 from borisdayma/feat-inf335110d0

feat: update wandb inferencedc792788

feat: add more samples046ae753

feat: reorganize samples0588e94c

remove bias and minor fixes180ed1ef

add partition helpers28563564

fix layernorm77744836

add gradient checkpointing95a8ed24

handle dtype for embeddings29db3274

make checkpointing optionalf6c4cb2b

don't tie embeddingsa265819e

add property to get num paramse5a52b94

feat: add customization parameters23b88701

feat: more samplese3f152ea

Improving DALL·E mini.cd4f7e9f

Improving DALL·E mini.8c580762

Merge pull request #95 from Gertie01/patch-1ce65c797

Merge pull request #96 from Gertie01/patch-2a253eea9

feat: cleanup training script3cd6d417

fix: comment36cb7372

fix: OOM with checkpointse2400cc1

fix: OOM86c6c90f

fix: duplicate samples7b8c2cb4

Merge pull request #98 from borisdayma/feat-seq2seq0cc04f20

feat: simplify fix_htmld054d1b9

feat: reorganize samplesb1aaa0f2

fix: typo41b680bf

feat: rename variablesc7fe3801

feat: update samples8444c1b1

feat: update samples39d3a154

Merge pull request #93 from borisdayma/inferencee226ca6e

feat(text): use hf_hub for wiki word counta96c3477

feat(text): few improvements849c5f39

Merge pull request #104 from borisdayma/feat-hf_hub9e08dc55

feat: keep %229cdc00

feat(text): handle dates & prices04656057

feat(text): improvements on pre-processing7b58e88d

feat(text): more charbf25d32a

doc: update README931b52fa

fix(inference): update flax version (temporary fix)6aa30f54

feat: use model definition803c7df2

fix: correct use of dtypeb7d8724b

feat: simplify parameters87fac28e

feat: add metrics + cleanup6523a6d5

fix: log train_metric only if defined9bf93976

fix: fixes training script0df810d1

fix: commentsbab75aa5

feat: use custom TrainingArguments85748ef4

feat: use_auth_token + seed for dataset and modeleac6890e

feat: update samples321a5c2c

feat: update required librariesc55ecf8a

Merge pull request #107 from borisdayma/feat-seq2seq2816f986

feat: add cool samples3a3bee8b

refactor: move to tools4e4a30fe

feat: use pretrained weights0a77f724

feat: avoid OOM80b41d1d

feat(log_inference_samples): cleanupcb127c45

fix: correct clip params5b16588f

feat: don't ignore mismatched2be9847f

Merge pull request #109 from borisdayma/feat-model_pretrained6016fc0b

feat: simplify app74974be2

Merge pull request #108 from borisdayma/feat-inf24d30c9e

feat: add more samplesb3b48e39

feat: add samplese6c2573e

feat: handle data in separate file85c1b8ec

fix(data): minor bugs0fe3e72e

Merge pull request #111 from borisdayma/feat-datadb7d5210

feat: reorganize appd4e833e7

refactor: captioned_strip used only in gradio0ca65141

feat(model): set default config for legacy models92ccf4c8

chore: remove unused files07eca3cc

chore: move files46f7469e

feat: add dev dependencies2a3b8df4

doc(README): update links5f44d347

feat(train): merge logged dictbaa52db6

feat: add black action5750492d

feat: update black actiond54a7bed

style: reformat per black8db9ed43

feat: add isort action8b80a793

doc: setup instructionsfb1fbcab

style: use isortd2095476

feat: add Makefile6f19a1fc

feat: install black and isortd2ec1eae

feat: cleanup encode_datasetae754a31

style: reformat741bf32e

Merge pull request #112 from borisdayma/’cleanup’cbfa520d

feat: fix import error3666d7b2

Merge pull request #114 from borisdayma/fix-110caf7f44a

feat(sweep): update configde21250d

feat(requirements): add pillow26651ddd

Merge branch 'main' of https://github.com/borisdayma/dalle-mini into add-custom-modelf234ccfe

fix(model): use correct paramsa11892f9

fix: adjust training script + dataloadera96f4dc5

style6f1f2d98

refactor(model): inherit from HF Flax & simplify972bc8d2

fix: causal_mask based on image tokens8654dc99

feat: log num_params1f57ad7c

fix(train): update model nameb257ca85

fix: update model name61c93f23

fix(config): set min/max for generationeb24dbcf

feat: minor improvements53dade7f

feat(data): accept braceexpand notation5ee6e608

stylea6252c9c

feat: split shards by hosted93c8ab

feat(train): handle multi-hosts5b533b5b

fix(data): typec6ebb144

feat(setup): require braceexpand for dev98f1db7a

feat: add config3f13951a

fix: check local TPU instances only87fed1b1

feat: display local TPU's15993e35

feat: load data firstfdf7698e

feat: shard by host is optional901ff720

feat: log more metrics1b757dcb

fix: typo5c849780

feat: allow abstract_init772415c9

feat: create config filesdc5c024b

feat: update sweepe1555d42

feat: add shampoo optimizer0b874529

feat: update params604a65d5

fix: shampoo -> distributed shampooedae62dd

feat: add micro confige501f71d

fix: weight decay Adam + speed logging71435938

feat: update distributed_shampoob90198c2

doc: add reference to Distributed Shampoodb882b83

feat: add best_effort_memory_usage_reduction4d518c7e

style: apply to distributed_shampooe669c1bd

style: isort531cd787

feat: update inference pipelineaf807f72

Merge pull request #115 from borisdayma/feat-shampoo3a3d3755

fix(inference): use float32 + flatten logits71c4de38

Merge pull request #117 from borisdayma/fix-inferenceef985bed

doc: update contributionse3b1b56f

feat: support pypif5dba1e3

fix: push_to_hub deprecated23389f69

feat: refactor TrainingArgumentsadbdff97

fix(train): handle seed_dataset8b72ed8f

feat(train): refactor learning rate paramse2781bc2

fix(data): no shuffling of validation dataddcbc6a3

feat: add more config of distributed_shampoo89cf9ea5

fix: style25862e8c

Merge pull request #118 from borisdayma/feat-optim193c88c9

Override from_pretrained to support wandb artifacts.1023afa4

Use model configuration unless a specific one is supplied.5ec61ccc

Store resolved path after loading model.55a631d3

Load tokenizer associated to the model checkpoint, if possible.a77c0d42

Never consider local dirs as remote wandb references.08dd0987

Update `resume_from_checkpoint` to use `from_pretrained`.bb3f53ec

Update help string for `model_name_or_path`.290e4435

Accept changes suggested by linter.9f522b86

Fix import order to make isort happy.64d99b29

Change import order again.2b2be9be

feat(train): use MultiSteps for gradient accumulation4fa53a55

fix: styledf01fa80

feat: custom gradient accumulation2d075595

refactor(train): cleanup274ba731

feat(data): support accumulation in non-streaming88c8e062

Merge pull request #122 from borisdayma/feat-acccumc91ceb7b

feat(train): cleanup argsa2bf605c

fix(train): variable not defined4c87adf9

Tokenizer, config, model can be loaded from wandb.7e48337a

Use DalleBartTokenizer. State restoration reverted to previous method:ae983d7f

feat(train): update sweep configbbbf7c8c

Style (isort).f9d51f77

Load from wandb artifact (#121)f69b21b3

feat: use_artifact if run existinga5ed1127

feat(train): start pjit support0081723a

feat(train): progress on pjit49597a20

feat(train): no batch dimension with pjitdf1fe19c

feat(train): different rng per node2d212d88

feat(train): load model on CPU3d435916

feat(model): clean way to load on cpu12f323d7

feat(train): restore opt_state efficiently1bfc1b5c

fix stylef044cb87

style: unsused import7a176b9f

feat(train): use pjit (#125)f5239e1f

feat(train): distributed_shampoo with pjitcc34d07f

feat: update distributed_shampoo + fix None spec8a9e367d

feat(train): handle distributed_shampoo in pjit032f623d

feat(train): custom start_preconditioning_step81499245

fix(train): consider correct batch sizeb7c74586

feat(train): improve pjit speedf2540583

fix(train): grads spec00710bca

feat(pjit): follow t5x style7b5868f5

feat(train): overhead from 70% to 1% 🥳2b7f5f1d

Merge pull request #127 from borisdayma/pjit-t5xe4401dde

feat(train): another 25% faster14abe8c8

feat: use fast tokenizer767d78ae

style(tokenizer): remove unused variables605df32c

feat(train): split artifact into model/statefa5b058b

fix(train): opt_state_shape for distributed_shampoo225b6ff1

fix: style386f839c

feat(train): split artifact into model/state (#128)7c4c2870

feat(train): more custom x-axis5f28cd25

feat: handle model parallel1bb3269c

feat(train) - handle multiple nodes (#130)0952927a

feat(modeling): simplify abstract_initfa72aa72

feat: update distributed_shampoo59966808

fix: distributed shampoo class696422e4

feat: log num_parameters early7cfe5766

Merge branch 'main' of https://github.com/borisdayma/dalle-mini into main0a691de6

fix: styled4832944

fix: load from checkpoint44b7c3e3

fix: typo68cc185d

feat(train): use compilation cacheda9367c8

fix: position embedding for generate methodebac3799

feat: improve inference demo35fe5781

feat(demo): uncomment pip install094e1787

feat: wandb required for checkpoints38c2c4e2

feat: cleanup notebook5a390e83

doc: update READMEdb5a22a2

feat(train): simplify tokenizer loading4cb21dde

feat: restore weights on CPU5f954fca

feat(demo): update referencee5580003

feat: reduce artifact space + offset step34cf91cb

feat(train): save to bucket50498e68

feat: load from bucket1c4e8392

feat: handle gradient checkpointing5173ec7e

style: lintd5d442a4

feat: add bucket reference to artifactd368fb6f

feat(train): local jax cache9f5e8794

fix(train): consider schedule offsetbc4734ff

feat(dev): require datasets3d64598c

feat: update configs79557f99

feat: no gradient checkpointing for params initb798ed39

fix: no gradient checkpointing for new model2e026834

feat: support pod (#139)803ccbf4

feat(data): super conditioning (#141)79398748

feat(train): log norm and histograms (#143)b7b619a2

feat: implement transformer variants (#144)542378c3

fix(textnormalizer): consider utf8 on windows (#148)3b8d8cb0

feat: add cogview472c4cc4

feat: update mini configd9a16f22

feat: remove unecessary LN02824a72

fix: DeepNet doesn't scale weights of embedding/output layers (#150)503d6b48

feat: allow more configurations5bd4c202

feat: force final ln in encoder32f4ba55

feat: add mini_glu configa7e50506

feat: placeholders for more config69bcbeb8

feat: add sinkformer + custom final ln + pre-ln (#151)f139b0be

feat(train): rename logged config955dc20b

feat(train): google-cloud-storage is optional02b2308f

feat(model): allow bias (#152)361a994d

fix: sinkformer gradienteed4896b

feat(demo): update modelb9a1a7dd

feat: sinkhorn in lse mode (#155)00d46610

feat: allow relative position (#156)769d20ac

fix: einops is required179282e7

fix: support smelua2dcee42

fix: sinkformer2c583b3c

feat: update shampoo9ecdd3fb

feat(text): support emojis (#154)7ef7bd92

Retire demo code, replace with links to new space.5d07e741

feat: update reference4a1f007d

Lacomtessezouboff

Jun 26, 2022

No description provided.

Create new file6da2e11a