Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
Spaces:
flax-community
/
dalle-mini
like
338
Running
App
Files
Files
Community
9
Create new file
#7
by
Lacomtessezouboff
- opened
Jun 26, 2022
base:
refs/heads/main
←
from:
refs/pr/7
Discussion
Files changed
+9471
-7823
initial commit
01d6e17e
Create README.md
68b5b511
Create README.md
b0b99201
pip and conda cpu install
bdaeebae
* Update requirements for TPU support.
d86c32f5
Updated the README based on our current strategy
bcac6952
Merge pull request #3 from khalidsaifullaah/patch-1
182f15a7
* Simplify requirements.txt, do not pin packages.
ccd00b66
Added CC3M data downloader script
75b01a0a
Merge pull request #4 from khalidsaifullaah/main
11c8be9b
* Update README with `-f` instructions for pip.
df7b7be1
Merge pull request #2 from pcuenca/main
8b9d1f58
CC3M downloader script updated
a8e4fc0d
CC12M downloader script added
3df3a47d
Merge pull request #7 from khalidsaifullaah/main
1055c3d4
* Initial encoding tests.
eb912a16
* JIT outside the loop.
4b8c3a87
* Ignore __pycache__
550b4727
* dalle_mini package with models and utilities:
150ed18d
* Notebook that processes CC12M and creates a version with encodings.
16f038a5
* Data preprocessing pipeline proof of concept.
95d2faf0
* Prepend [bos] to image encodings, rename to "labels".
86ba7743
feat: add run_seq2seq_flax
46cb01fa
feat: add seq2seq requirements
fad333f6
feat: adjust seq2seq script for dalle
3f0364c5
Merge pull request #8 from pcuenca/main
9c0e5c96
fix typos and update requirements
de74f116
use pylogging to refer to logging.
d9f5a351
val_max_target_length set to OUTPUT_LENGTH
6c27b0d8
accept tsv files as inputs.
a104edb2
Decoder: set eos to an unreachable value, set min_length=max_length to
a841a4ce
Preprocessing: return "labels", "decoder_input_ids" and
df3c7bd4
* Only perform validation if requested
32dc2d8e
* Make padding mask optional.
ecafe5e7
feat: log metrics more frequently
498559f0
feat: add adafactor
600ad79d
feat: default to 1000 warmup steps
b66b9510
fix: typo
833a2d58
Merge pull request #9 from borisdayma/feat--wandb-search
61f888fa
feat: padding mask not required
d61405b5
feat: simplify loss function
9db361a4
Merge pull request #10 from borisdayma/feat-loss
cbeacb9b
feat: gradient accumulation
c9e95757
fix: typos
5960e871
Merge pull request #11 from borisdayma/feat-cumul
ba73e00b
feat: lower default source length
48c07ca8
feat: add sweep for parameter search
dad6d938
Merge pull request #12 from borisdayma/feat-sweeps
06f1345c
fix: wandb logging with sync_tensorboard
8ba598c1
feat: update script
650ecb14
feat: update sweep parameters
2f69241b
text-heneration-notebook
1c2552a2
notebook example for model.generate
67221fce
Updated with train_file flag to resolve the error
8f058ae5
Merge pull request #13 from borisdayma/model-generate-notebook
894a546e
Merge pull request #14 from khalidsaifullaah/main
66bfb994
Move generate nb by @ghosh-r to demo
dcbf0919
fix: accumulation vs lr
4d55db6a
feat: update lr range
dbbd01a7
feat: shared cache folder
42ce7dd2
feat: requirements for tb logging
8884d407
Merge pull request #15 from borisdayma/feat-fix-lr
dc5ae57f
[WIP] Encoding YFC100M dataset.
b4dfea0a
fix: missing arg
bc01f788
feat: output_length considers bos and eos
8bb22368
feat: update default parameters
dbe8c41e
doc: fix comment
3073ff4f
feat: log model
1c44a7db
feat: fix typo
ec8d66b8
fix: typo
47bc7b92
feat: don't log model by default
5b79afd5
Merge pull request #16 from borisdayma/feat-log_model
bf4da913
fix: correct decoder_input_ids and labels
19946bea
fix: model config
0be49425
Merge pull request #17 from borisdayma/fix-model
357779ab
fix: typo
678a62f2
fix: labels array
6c1f112a
fix: should be converted to array
945d86c0
Shift tokens in numpy because the built in shift function stalls.
835ea55c
change bart-large-cnn to bart-large in demo folder
5801f139
Notebook to encode splitted YFCC100M files.
82fad8cd
Add eval_interval to evaluate and log every so often.
566d5f28
Merge pull request #18 from borisdayma/change-bart-large-demo
395641ff
Merge pull request #19 from pcuenca/main
f8b0895e
feat: hardcoded datasets
e8709a6e
feat: change default for quick tests
71c757b2
feat: use common wandb shared folder
7aa2f4ba
feat: no decay option
5a3211fc
Merge pull request #20 from borisdayma/eval-interval
635402df
feat: eval less often for faster training
f0a53acd
Merge pull request #21 from borisdayma/feat-no_decay
b29bab7d
feat: log everything through wandb
19070abb
feat: set default x-axis
97a008ec
Merge branch 'main'
3ddf1c50
feat: eval_steps already exists in TrainingArguments
0a0080bc
fix: eval_steps belongs to training_args
900136f3
feat: hardcode eval_steps
4c5e5a71
fix: log correct metrics
3fef9c16
fix: use correct key
b20769d3
Merge pull request #22 from borisdayma/feat-axis
a1c047bd
YFCC metadata cleaning and encoding script
2c2f5706
feat: use bart large
bb3bfa6f
feat: bye bye tensorboard
533b4948
feat: update test script
3cccb013
feat: split script for small and big runs
5e244d0c
feat: save model frequently
754f876d
fix: correct arg
283adc6e
fix: define function before it is used
d449092a
fix: log metadata
99a1ff5b
feat: use bart-large-cnn
19d68bb1
Merge pull request #24 from borisdayma/feat--log-model-frequently
648e404c
Merge pull request #23 from khalidsaifullaah/main
eb591ffc
fix: model config
5aaf9df6
demo for generation, including during training from wandb artifact
c48da338
Merge pull request #26 from tmabraham/generation-training-demo
fc8c2308
fix forced bos token, also applying BART model to 8 samples now
8d4e13c1
Merge pull request #27 from tmabraham/fix-forced-bos-token-on-demo
8f484d95
add tpu demo notebook
4b5a542d
remove .ipynb
c879290d
Merge pull request #28 from patil-suraj/tpu-demo
e31a84f6
feat: update model config + save optim
a30dbd39
feat: update scriptst
63249ac6
fix: output directory must exist
6e89e9e8
fix: import json
90320ea5
feat: use str mode for json
3e6ab1ff
feat: bigger warnings
62dad481
Merge pull request #25 from borisdayma/fix-config
d8111363
feat: hardcode our full dataset
499ddb28
feat: allow loading a model checkpoint
3d61350e
fix: typo
ca83cca7
fix: custom reference
e803feb4
fix: config used in preprocess
6d252e95
Score results using pre-trained CLIP.
5a9a1b6d
Merge pull request #29 from borisdayma/load_checkpoint
862924a4
Merge pull request #30 from pcuenca/clip-score
6567fd7c
add tokenizer save to wandb:
aecf3a76
Add missing import for CLIP.
7cc9b381
Use jax.device_count(), don't assume 8.
a11eff57
feat: model config not hardcoded
ad6ad646
Merge branch 'add-tokenizer-save' into feat-model
28f08be4
feat: restore state from checkpoint
4aced93a
fix: typo
a173dadc
fix: missing arg
f65ccb36
fix: typo and missing tokenizer files
09362db6
Merge pull request #32 from borisdayma/feat-model
699e1d97
Script to log predictions grid to wandb.
830d7a2f
Barebones demo app for local testing.
cb2ac60a
Add dalle_mini directory module.
adfe05ea
Update requirements
7158e2e3
Add a couple of sliders and prevent generating without a prompt
0e8338dc
Script that predicts using all saved versions of a model.
d7be08ce
doc: update README
5542365e
doc: reference vqgan-jax repo
cc3e85c2
chore: file not needed
231a81a2
chore: move requirements to correct section
469b2c2b
Merge pull request #36 from borisdayma/chore-cleanup
21a35bcd
add gradio demo
6b0d541f
Merge pull request #37 from tmabraham/add-gradio-demo
7e6e1fee
Merge branch 'main' of github.com:borisdayma/dalle-mini into main
b7651853
Upgrade to model 4oh3u7ca for predictions.
810d65be
Gradio UI skeleton for experimentation.
f68e37a1
Attempts to tweak the UI.
8944cc51
Integrate current UI in demo app.
85eab146
Configure README to use gradio in the hf Space.
1b9d7ecb
Merge pull request #38 from borisdayma/app-ui
0497ad3c
doc: update README
00ed1abe
Merge branch 'main' into chore-cleanup2
fcac23ae
fix: report link
55068620
chore: move files around
31da1e5e
chore: remove duplicate files
b88a5231
feat: add symbolic link
1f05876e
feat: separate model definition
7f962d64
feat: less requirements are necessary
5afc82fb
Update demo to use Suraj's backend server.
704ee93a
Fix the number of candidates reported.
a61d80f8
Do not create share link.
eb6780a5
Merge pull request #39 from borisdayma/chore-cleanup2
5bf185b8
Get backend url from environment variable.
6be31592
Text modifications, make links unconditionally blue.
3584703f
Use network defaults.
50b9a440
Merge pull request #40 from borisdayma/app-ui
b49f529e
Get predictions from backend.
6e79248c
Merge pull request #41 from borisdayma/predictions
7f9514db
refactor: move `captioned_strip` to library.
f62b0451
Simple skeleton for a streamlit app
ffed1380
Merge branch 'main' of github.com:borisdayma/dalle-mini into main
bc78bfd2
feat: remove hardcoded values
93c5ac82
doc: note about model definition
6c5fc6a6
feat: remove unused metrics
0d94b71f
feat: add logo
62e13ba3
Merge pull request #42 from borisdayma/chore-clean
7851774a
feat: allow display in multi lines
0dd7d807
feat(app): improve display
482963e0
Slight change on README
ec26f365
docs: update README
527f0c86
app sidebar + improvements
c1cfda43
Transparent logo
a710a16b
Transparent logo added in README
dfab06b4
Previous logo (non-transparent) removed
4a394f83
the prev. logo name reused
64003e8b
comments removed
2b5c84e0
Merge pull request #44 from khalidsaifullaah/main
6e12ba68
Merge pull request #45 from borisdayma/sidebar-improvement
720a5df1
Reuse text field for the various messages.
9f85e8c8
Use HTML to center the bottom of the sidebar.
32f68a8b
Use one column in the sidebar, center logo.
c286f153
Text changes.
89a2c739
Commit to using DALL·E instead of DALL-E
43255764
Add Again button after first generation.
0b4b9a3d
Merge pull request #47 from borisdayma/demo-improvements
5d39a05a
"Project Report" instead of "Report"
b5619d25
doc: update README
6fa11069
feat: purple is prettier
d4159c9e
Merge pull request #48 from borisdayma/fix-title
47f6891c
chore: reduce size of notebooks
0aab987d
fix: prevent empty search when Again is tapped and the field is empty.
588c97c1
Merge pull request #49 from borisdayma/demo-improvements
bbc3a60a
feat: update requirements
f23b8bf6
feat: update requirements
ffe0e3d0
streamlit session_state hack for v0.79
3f588191
Merge branch 'main' of https://github.com/borisdayma/dalle-mini into main
c7357219
action: max file size
59f5b2a7
Break before sidebar links.
f95204e7
Merge branch 'main' of github.com:borisdayma/dalle-mini into main
6a9e187a
fix: action max size
ebcaf781
Center sidebar description.
20495457
Merge pull request #50 from borisdayma/sidebar-center
d0b5d567
feat: name action
5b3faf15
fix: correct action dependency
27ef1ce5
feat: test action
32f50fe1
feat: push to hub
5d72414c
feat: correct access
71aa2912
Merge pull request #51 from borisdayma/fix-action
d8b128b9
fix(action): clone full repo
6b5023f3
feat: split actions between PR and main
bc7d169f
feat: rename action
ff46e1c2
feat: update action
197d48db
fix: action file size
b486ad2c
feat: simplify action
fe319354
doc: explain the logo
d8a21922
Update requirements.txt
9cb966be
Update requirements.txt
9f254057
Update app_gradio.py
adcb0639
Update app_gradio.py
139d8016
Ignore .streamlit, notebook checkpoints.
01fd46dd
Example JAX VQGAN notebook to show image encoding/decoding.
cb008a45
Add encoding example.
99ed4c60
Refactor: use VQGAN model from github, remove local copy
9851f428
Refactor: use VQGAN model from github, remove local copy
11ae5954
Merge pull request #59 from pcuenca/refactor/vqgan-jax
127e2309
Merge pull request #60 from borisdayma/refactor/vqgan-jax
cfb03b35
Merge pull request #58 from pcuenca/main
35406cd8
moved gradio files into app/gradio
bb2758c5
Reorganization: move JAX VQGAN notebook to dev
df5e2f06
Created using Colaboratory
1d8a7996
feat(wandb-examples): use model file
c7776fb7
Merge pull request #61 from pcuenca/main
d9f0f391
Merge branch 'main' into chore-mv
0d975e77
chore: reorganize files
ab5769aa
fix: action file size
910f8ebe
open external link in new tab prevening breaking of demo
851cea00
doc: add reference to inference pipeline in README
5da4af07
Merge pull request #65 from borisdayma/external-links-new-tab
440f9666
Merge pull request #63 from borisdayma/chore-mv
b5991361
feat: add tqdm
a8b4257a
feat: remove warnings
30878396
feat: link colab to correct branch
ac97ed48
fix: restore deleted paragraph
3d56f19a
fix: use correct branch
a6b4265a
Merge pull request #66 from borisdayma/feat-tqdm
2c857082
doc: simplify installation instructions
e994676e
Merge pull request #67 from borisdayma/doc-install
488c07c1
feat: add license
1286649e
Merge pull request #68 from borisdayma/fix-license
ba8d9f52
fix(colab): don't preprocess images twice in CLIP
ae3f013b
feat: remove warning
e627a218
fix: use correct branch
2dfcce84
Merge pull request #69 from borisdayma/fix-colab
2e2cd5d5
doc(README): fix typo
18f5a295
Merge branch 'main' into abidlabs/main
dccd804f
feat: update gradio app
a0b5dc71
fix: add symlink
6783773a
Merge pull request #55 from abidlabs/main
686197af
docs: add link to requirements
b7b2e315
feat(action): use exact HF spaces file size limit
f1d5a2e0
Merge pull request #70 from borisdayma/feat-filesize
1c9c679e
Fix requirements.txt to install libtpu from google's page.
65f52829
fix(seq2seq): opt_state from ckpt + limit cache
0c9ff657
feat: 🥑 theme
460f43af
doc(README): add link to models
753c4f01
feat: make dalle_mini installable
1df4fcb9
Merge pull request #73 from borisdayma/doc-models
da0ffc83
Merge pull request #72 from borisdayma/feat-theme
c794bb2c
fix: issue url
710c65b5
fix: actually replace state
1d04ab39
Normal install of transformers & datasets
a9ea330e
Merge pull request #75 from borisdayma/fix-dev-requirements
08b0ce1d
Add citation file
78abdf74
citations added
4844e749
Merge pull request #76 from borisdayma/citation
c91f919f
Merge branch 'main' of https://github.com/khalidsaifullaah/dalle-mini into readme-references
d7c3d1cb
DOI added
87a67a73
CITATION.cff update
c1d5af30
updated project's BibTex in README
2d777957
readme and CITATION doi synced
d93a40d7
doc: demo released
1ba7fc24
Merge pull request #79 from borisdayma/doc-demo
635b1bf8
Merge branch 'borisdayma:main' into readme-references
19a3f53b
Add progress indicator and warning.
bb7c400a
Merge pull request #80 from borisdayma/app-progress
6c108692
fix: typo
2bd938f0
Progress indicator follows current theme.
a7f2bba0
Remove session handling hack.
1985070c
Merge branch 'app-progress' into main
8be786e4
Merge branch 'main' of github.com:borisdayma/dalle-mini into main
c9959cbd
Fix typo again.
813a4002
Revert "Remove session handling hack."
8e37fb8d
Merge pull request #78 from khalidsaifullaah/readme-references
97c79259
feat: action to debug app
47723e56
fix(action): typo
7fb274e1
Specify src refefence in sync_to_hub_debug action
325d2ee6
docs(README): update acknowledgements
ecf5f294
feat: remove symlinks
6e844036
Merge branch 'main' of https://github.com/borisdayma/dalle-mini into feat-setup
b78c972a
fix: unused import
e05a13d1
fix(app): install dalle-mini
dbf86c96
fix: typo
7067c27d
Notebooks that demonstrate streaming encoding
6047b498
Replace notebooks with the correct versions.
3b508e3d
Merge pull request #85 from borisdayma/encoding-streaming
783de86a
feat: requirements not needed
8d594fd7
doc: update README
a8c579f6
Merge pull request #74 from borisdayma/feat-setup
27a34357
feat(inference_notebook): dalle-mini is installable
1c83da92
feat: add text utilities
1212a74e
Merge pull request #87 from borisdayma/feat-text
df2dbc7b
feat: add ftfy
a09ea254
Merge branch 'main' of https://github.com/borisdayma/dalle-mini into fix-opt_state
39caefb2
feat: handle streaming
a96f44df
fix: remove breakpoint
b75e0e98
fix(seq2seq): use streaming arg
0c992bdb
fix(seq2seq): normalize text
061c06b4
feat: update defaults
9ed63789
feat: log epoch + check params
074c5e12
feat: limit artifacts size
7253e563
feat: no need for default values
a37cd751
Merge pull request #71 from borisdayma/fix-opt_state
77657e63
feat: use optax for gradient accumulation
69cf636e
feat: update scripts
5ca30e67
feat(gitignore): ignore compiled files and wandb
0b21fa51
add standalone modeling file
6197b2f6
Merge pull request #88 from borisdayma/feat-cumul
272552a0
fix(seq2seq): memory issue
708a42c5
feat: get rid of global_step + log more metrics
4a4820f6
Merge branch 'main' of https://github.com/borisdayma/dalle-mini
5faf0fdf
fix: state.step type
47e006f6
feat: log to backend
378a628b
feat: add functions
b8bbe685
feat: remove cache before creating artifacts
5f6b691c
feat: add scoring
353365f0
feat: create a table
38705a9e
refactor: loop over runs
bf3640df
feat: allow latest version only
ff051c95
feat: cleanup
2d169e35
fix: pmap clip32
9a553a44
fix: typo
c85fbb61
feat: add sample
1d51d0b8
feat: cleanup
91d8a296
Merge pull request #90 from borisdayma/feat-new
fdbe19f5
feat: more samples
2ef2966f
Merge pull request #91 from borisdayma/feat-inf
335110d0
feat: update wandb inference
dc792788
feat: add more samples
046ae753
feat: reorganize samples
0588e94c
remove bias and minor fixes
180ed1ef
add partition helpers
28563564
fix layernorm
77744836
add gradient checkpointing
95a8ed24
handle dtype for embeddings
29db3274
make checkpointing optional
f6c4cb2b
don't tie embeddings
a265819e
add property to get num params
e5a52b94
feat: add customization parameters
23b88701
feat: more samples
e3f152ea
Improving DALL·E mini.
cd4f7e9f
Improving DALL·E mini.
8c580762
Merge pull request #95 from Gertie01/patch-1
ce65c797
Merge pull request #96 from Gertie01/patch-2
a253eea9
feat: cleanup training script
3cd6d417
fix: comment
36cb7372
fix: OOM with checkpoints
e2400cc1
fix: OOM
86c6c90f
fix: duplicate samples
7b8c2cb4
Merge pull request #98 from borisdayma/feat-seq2seq
0cc04f20
feat: simplify fix_html
d054d1b9
feat: reorganize samples
b1aaa0f2
fix: typo
41b680bf
feat: rename variables
c7fe3801
feat: update samples
8444c1b1
feat: update samples
39d3a154
Merge pull request #93 from borisdayma/inference
e226ca6e
feat(text): use hf_hub for wiki word count
a96c3477
feat(text): few improvements
849c5f39
Merge pull request #104 from borisdayma/feat-hf_hub
9e08dc55
feat: keep %
229cdc00
feat(text): handle dates & prices
04656057
feat(text): improvements on pre-processing
7b58e88d
feat(text): more char
bf25d32a
doc: update README
931b52fa
fix(inference): update flax version (temporary fix)
6aa30f54
feat: use model definition
803c7df2
fix: correct use of dtype
b7d8724b
feat: simplify parameters
87fac28e
feat: add metrics + cleanup
6523a6d5
fix: log train_metric only if defined
9bf93976
fix: fixes training script
0df810d1
fix: comments
bab75aa5
feat: use custom TrainingArguments
85748ef4
feat: use_auth_token + seed for dataset and model
eac6890e
feat: update samples
321a5c2c
feat: update required libraries
c55ecf8a
Merge pull request #107 from borisdayma/feat-seq2seq
2816f986
feat: add cool samples
3a3bee8b
refactor: move to tools
4e4a30fe
feat: use pretrained weights
0a77f724
feat: avoid OOM
80b41d1d
feat(log_inference_samples): cleanup
cb127c45
fix: correct clip params
5b16588f
feat: don't ignore mismatched
2be9847f
Merge pull request #109 from borisdayma/feat-model_pretrained
6016fc0b
feat: simplify app
74974be2
Merge pull request #108 from borisdayma/feat-inf
24d30c9e
feat: add more samples
b3b48e39
feat: add samples
e6c2573e
feat: handle data in separate file
85c1b8ec
fix(data): minor bugs
0fe3e72e
Merge pull request #111 from borisdayma/feat-data
db7d5210
feat: reorganize app
d4e833e7
refactor: captioned_strip used only in gradio
0ca65141
feat(model): set default config for legacy models
92ccf4c8
chore: remove unused files
07eca3cc
chore: move files
46f7469e
feat: add dev dependencies
2a3b8df4
doc(README): update links
5f44d347
feat(train): merge logged dict
baa52db6
feat: add black action
5750492d
feat: update black action
d54a7bed
style: reformat per black
8db9ed43
feat: add isort action
8b80a793
doc: setup instructions
fb1fbcab
style: use isort
d2095476
feat: add Makefile
6f19a1fc
feat: install black and isort
d2ec1eae
feat: cleanup encode_dataset
ae754a31
style: reformat
741bf32e
Merge pull request #112 from borisdayma/’cleanup’
cbfa520d
feat: fix import error
3666d7b2
Merge pull request #114 from borisdayma/fix-110
caf7f44a
feat(sweep): update config
de21250d
feat(requirements): add pillow
26651ddd
Merge branch 'main' of https://github.com/borisdayma/dalle-mini into add-custom-model
f234ccfe
fix(model): use correct params
a11892f9
fix: adjust training script + dataloader
a96f4dc5
style
6f1f2d98
refactor(model): inherit from HF Flax & simplify
972bc8d2
fix: causal_mask based on image tokens
8654dc99
feat: log num_params
1f57ad7c
fix(train): update model name
b257ca85
fix: update model name
61c93f23
fix(config): set min/max for generation
eb24dbcf
feat: minor improvements
53dade7f
feat(data): accept braceexpand notation
5ee6e608
style
a6252c9c
feat: split shards by host
ed93c8ab
feat(train): handle multi-hosts
5b533b5b
fix(data): type
c6ebb144
feat(setup): require braceexpand for dev
98f1db7a
feat: add config
3f13951a
fix: check local TPU instances only
87fed1b1
feat: display local TPU's
15993e35
feat: load data first
fdf7698e
feat: shard by host is optional
901ff720
feat: log more metrics
1b757dcb
fix: typo
5c849780
feat: allow abstract_init
772415c9
feat: create config files
dc5c024b
feat: update sweep
e1555d42
feat: add shampoo optimizer
0b874529
feat: update params
604a65d5
fix: shampoo -> distributed shampoo
edae62dd
feat: add micro config
e501f71d
fix: weight decay Adam + speed logging
71435938
feat: update distributed_shampoo
b90198c2
doc: add reference to Distributed Shampoo
db882b83
feat: add best_effort_memory_usage_reduction
4d518c7e
style: apply to distributed_shampoo
e669c1bd
style: isort
531cd787
feat: update inference pipeline
af807f72
Merge pull request #115 from borisdayma/feat-shampoo
3a3d3755
fix(inference): use float32 + flatten logits
71c4de38
Merge pull request #117 from borisdayma/fix-inference
ef985bed
doc: update contributions
e3b1b56f
feat: support pypi
f5dba1e3
fix: push_to_hub deprecated
23389f69
feat: refactor TrainingArguments
adbdff97
fix(train): handle seed_dataset
8b72ed8f
feat(train): refactor learning rate params
e2781bc2
fix(data): no shuffling of validation data
ddcbc6a3
feat: add more config of distributed_shampoo
89cf9ea5
fix: style
25862e8c
Merge pull request #118 from borisdayma/feat-optim
193c88c9
Override from_pretrained to support wandb artifacts.
1023afa4
Use model configuration unless a specific one is supplied.
5ec61ccc
Store resolved path after loading model.
55a631d3
Load tokenizer associated to the model checkpoint, if possible.
a77c0d42
Never consider local dirs as remote wandb references.
08dd0987
Update `resume_from_checkpoint` to use `from_pretrained`.
bb3f53ec
Update help string for `model_name_or_path`.
290e4435
Accept changes suggested by linter.
9f522b86
Fix import order to make isort happy.
64d99b29
Change import order again.
2b2be9be
feat(train): use MultiSteps for gradient accumulation
4fa53a55
fix: style
df01fa80
feat: custom gradient accumulation
2d075595
refactor(train): cleanup
274ba731
feat(data): support accumulation in non-streaming
88c8e062
Merge pull request #122 from borisdayma/feat-acccum
c91ceb7b
feat(train): cleanup args
a2bf605c
fix(train): variable not defined
4c87adf9
Tokenizer, config, model can be loaded from wandb.
7e48337a
Use DalleBartTokenizer. State restoration reverted to previous method:
ae983d7f
feat(train): update sweep config
bbbf7c8c
Style (isort).
f9d51f77
Load from wandb artifact (#121)
f69b21b3
feat: use_artifact if run existing
a5ed1127
feat(train): start pjit support
0081723a
feat(train): progress on pjit
49597a20
feat(train): no batch dimension with pjit
df1fe19c
feat(train): different rng per node
2d212d88
feat(train): load model on CPU
3d435916
feat(model): clean way to load on cpu
12f323d7
feat(train): restore opt_state efficiently
1bfc1b5c
fix style
f044cb87
style: unsused import
7a176b9f
feat(train): use pjit (#125)
f5239e1f
feat(train): distributed_shampoo with pjit
cc34d07f
feat: update distributed_shampoo + fix None spec
8a9e367d
feat(train): handle distributed_shampoo in pjit
032f623d
feat(train): custom start_preconditioning_step
81499245
fix(train): consider correct batch size
b7c74586
feat(train): improve pjit speed
f2540583
fix(train): grads spec
00710bca
feat(pjit): follow t5x style
7b5868f5
feat(train): overhead from 70% to 1% 🥳
2b7f5f1d
Merge pull request #127 from borisdayma/pjit-t5x
e4401dde
feat(train): another 25% faster
14abe8c8
feat: use fast tokenizer
767d78ae
style(tokenizer): remove unused variables
605df32c
feat(train): split artifact into model/state
fa5b058b
fix(train): opt_state_shape for distributed_shampoo
225b6ff1
fix: style
386f839c
feat(train): split artifact into model/state (#128)
7c4c2870
feat(train): more custom x-axis
5f28cd25
feat: handle model parallel
1bb3269c
feat(train) - handle multiple nodes (#130)
0952927a
feat(modeling): simplify abstract_init
fa72aa72
feat: update distributed_shampoo
59966808
fix: distributed shampoo class
696422e4
feat: log num_parameters early
7cfe5766
Merge branch 'main' of https://github.com/borisdayma/dalle-mini into main
0a691de6
fix: style
d4832944
fix: load from checkpoint
44b7c3e3
fix: typo
68cc185d
feat(train): use compilation cache
da9367c8
fix: position embedding for generate method
ebac3799
feat: improve inference demo
35fe5781
feat(demo): uncomment pip install
094e1787
feat: wandb required for checkpoints
38c2c4e2
feat: cleanup notebook
5a390e83
doc: update README
db5a22a2
feat(train): simplify tokenizer loading
4cb21dde
feat: restore weights on CPU
5f954fca
feat(demo): update reference
e5580003
feat: reduce artifact space + offset step
34cf91cb
feat(train): save to bucket
50498e68
feat: load from bucket
1c4e8392
feat: handle gradient checkpointing
5173ec7e
style: lint
d5d442a4
feat: add bucket reference to artifact
d368fb6f
feat(train): local jax cache
9f5e8794
fix(train): consider schedule offset
bc4734ff
feat(dev): require datasets
3d64598c
feat: update configs
79557f99
feat: no gradient checkpointing for params init
b798ed39
fix: no gradient checkpointing for new model
2e026834
feat: support pod (#139)
803ccbf4
feat(data): super conditioning (#141)
79398748
feat(train): log norm and histograms (#143)
b7b619a2
feat: implement transformer variants (#144)
542378c3
fix(textnormalizer): consider utf8 on windows (#148)
3b8d8cb0
feat: add cogview
472c4cc4
feat: update mini config
d9a16f22
feat: remove unecessary LN
02824a72
fix: DeepNet doesn't scale weights of embedding/output layers (#150)
503d6b48
feat: allow more configurations
5bd4c202
feat: force final ln in encoder
32f4ba55
feat: add mini_glu config
a7e50506
feat: placeholders for more config
69bcbeb8
feat: add sinkformer + custom final ln + pre-ln (#151)
f139b0be
feat(train): rename logged config
955dc20b
feat(train): google-cloud-storage is optional
02b2308f
feat(model): allow bias (#152)
361a994d
fix: sinkformer gradient
eed4896b
feat(demo): update model
b9a1a7dd
feat: sinkhorn in lse mode (#155)
00d46610
feat: allow relative position (#156)
769d20ac
fix: einops is required
179282e7
fix: support smelu
a2dcee42
fix: sinkformer
2c583b3c
feat: update shampoo
9ecdd3fb
feat(text): support emojis (#154)
7ef7bd92
Retire demo code, replace with links to new space.
5d07e741
feat: update reference
4a1f007d
Lacomtessezouboff
Jun 26, 2022
No description provided.
Create new file
6da2e11a
Edit
Preview
Upload images, audio, and videos by dragging in the text input, pasting, or
clicking here
.
Tap or paste here to upload images
Cannot merge
This branch has merge conflicts in the following files:
README.md
Comment
·
Sign up
or
log in
to comment