phamson02
add word segmentation before tokenization
c1d85a2