This repository has been archived by the owner on Jan 15, 2024. It is now read-only.
(not a bug) question about bert create_pretraining_data.tokenize_lines()
#1592
Labels
bug
Something isn't working
Description
In the function
scripts.pretraining.bert.create_pretraining_data.tokenize_lines()
The code snippet:
Suggests that empty or null lines (e.g.
""
orNone
) break the for-loop returning only the lines that have been processed so far whereas stripped-empty lines (e.g." "
) are used as document delimiters.Could someone shed light as to what the (empty line + break-from-loop) is meant to accomplish? Are empty/null lines used as delimiters?
The text was updated successfully, but these errors were encountered: