Spaces:

Harveenchadha
/

en_to_indic_translation

Runtime error

App Files Files Community

en_to_indic_translation / subword-nmt /CHANGELOG.md

harveen

Adding code

9bbf386 over 2 years ago

preview code

raw history blame

No virus

2.16 kB

	CHANGELOG
	---------

	v0.3.8:
	- multiprocessing support (get_vocab and apply_bpe)
	- progress bar for learn_bpe
	- seed parameter for deterministic BPE dropout
	- ignore some unicode line separators which would crash subword-nmt

	v0.3.7:
	- BPE dropout (Provilkov et al., 2019)
	- more efficient glossaries (https://github.com/rsennrich/subword-nmt/pull/69)

	v0.3.6:
	- fix to subword-bpe command encoding

	v0.3.5:
	- fix to subword-bpe command under Python 2
	- wider support of --total-symbols argument

	v0.3.4:
	- segment_tokens method to improve library usability (https://github.com/rsennrich/subword-nmt/pull/52)
	- support regex glossaries (https://github.com/rsennrich/subword-nmt/pull/56)
	- allow unicode separators (https://github.com/rsennrich/subword-nmt/pull/57)
	- new option --total-symbols in learn-bpe (commit 61ad8)
	- fix documentation (best practices) (https://github.com/rsennrich/subword-nmt/pull/60)

	v0.3:
	- library is now installable via pip
	- fix occasional problems with UTF-8 whitespace and new lines in learn_bpe and apply_bpe.
	- do not silently convert UTF-8 newline characters into "\n"
	- do not silently convert UTF-8 whitespace characters into " "
	- UTF-8 whitespace and newline characters are now considered part of a word, and segmented by BPE

	v0.2:
	- different, more consistent handling of end-of-word token (commit a749a7) (https://github.com/rsennrich/subword-nmt/issues/19)
	- allow passing of vocabulary and frequency threshold to apply_bpe.py, preventing the production of OOV (or rare) subword units (commit a00db)
	- made learn_bpe.py deterministic (commit 4c54e)
	- various changes to make handling of UTF more consistent between Python versions
	- new command line arguments for apply_bpe.py:
	- '--glossaries' to prevent given strings from being affected by BPE
	- '--merges' to apply a subset of learned BPE operations
	- new command line arguments for learn_bpe.py:
	- '--dict-input': rather than raw text file, interpret input as a frequency dictionary (as created by get_vocab.py).


	v0.1:
	- consistent cross-version unicode handling
	- all scripts are now deterministic