You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Update GetNumWords to use utf-8 encoding
This change fixes the following error:
```
Traceback (most recent call last):
File "/home/sourya4/kaldi/egs/tamil_telugu_proj/s5_r3/../../../tools/pocolm/scripts/prepare_int_data.py", line 168, in <module>
num_words = GetNumWords(args.vocab)
File "/home/sourya4/kaldi/egs/tamil_telugu_proj/s5_r3/../../../tools/pocolm/scripts/prepare_int_data.py", line 75, in GetNumWords
universal_newlines=True)
File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
**kwargs).stdout
File "/usr/lib/python3.6/subprocess.py", line 425, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "/usr/lib/python3.6/subprocess.py", line 850, in communicate
stdout = self.stdout.read()
File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)
# exited with return code 1 after 0.3 seconds
```
* Update GetNumWords to use utf-8 encoding
This change fixes the following error:
```
Traceback (most recent call last):
File "/home/sourya4/kaldi/egs/tamil_telugu_proj/s5_r3/../../../tools/pocolm/scripts/prune_lm_dir.py", line 613, in <module>
num_words = GetNumWords(args.lm_dir_in)
File "/home/sourya4/kaldi/egs/tamil_telugu_proj/s5_r3/../../../tools/pocolm/scripts/prune_lm_dir.py", line 220, in GetNumWords
universal_newlines=True)
File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
**kwargs).stdout
File "/usr/lib/python3.6/subprocess.py", line 425, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "/usr/lib/python3.6/subprocess.py", line 850, in communicate
stdout = self.stdout.read()
File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)
```
0 commit comments