Spaces:
Runtime error
Runtime error
CHANGELOG | |
--------- | |
v0.3.8: | |
- multiprocessing support (get_vocab and apply_bpe) | |
- progress bar for learn_bpe | |
- seed parameter for deterministic BPE dropout | |
- ignore some unicode line separators which would crash subword-nmt | |
v0.3.7: | |
- BPE dropout (Provilkov et al., 2019) | |
- more efficient glossaries (https://github.com/rsennrich/subword-nmt/pull/69) | |
v0.3.6: | |
- fix to subword-bpe command encoding | |
v0.3.5: | |
- fix to subword-bpe command under Python 2 | |
- wider support of --total-symbols argument | |
v0.3.4: | |
- segment_tokens method to improve library usability (https://github.com/rsennrich/subword-nmt/pull/52) | |
- support regex glossaries (https://github.com/rsennrich/subword-nmt/pull/56) | |
- allow unicode separators (https://github.com/rsennrich/subword-nmt/pull/57) | |
- new option --total-symbols in learn-bpe (commit 61ad8) | |
- fix documentation (best practices) (https://github.com/rsennrich/subword-nmt/pull/60) | |
v0.3: | |
- library is now installable via pip | |
- fix occasional problems with UTF-8 whitespace and new lines in learn_bpe and apply_bpe. | |
- do not silently convert UTF-8 newline characters into "\n" | |
- do not silently convert UTF-8 whitespace characters into " " | |
- UTF-8 whitespace and newline characters are now considered part of a word, and segmented by BPE | |
v0.2: | |
- different, more consistent handling of end-of-word token (commit a749a7) (https://github.com/rsennrich/subword-nmt/issues/19) | |
- allow passing of vocabulary and frequency threshold to apply_bpe.py, preventing the production of OOV (or rare) subword units (commit a00db) | |
- made learn_bpe.py deterministic (commit 4c54e) | |
- various changes to make handling of UTF more consistent between Python versions | |
- new command line arguments for apply_bpe.py: | |
- '--glossaries' to prevent given strings from being affected by BPE | |
- '--merges' to apply a subset of learned BPE operations | |
- new command line arguments for learn_bpe.py: | |
- '--dict-input': rather than raw text file, interpret input as a frequency dictionary (as created by get_vocab.py). | |
v0.1: | |
- consistent cross-version unicode handling | |
- all scripts are now deterministic | |