Skip to content

Commit

Permalink
Make Windows-1252 detection stricter
Browse files Browse the repository at this point in the history
  • Loading branch information
Wilfred committed Jan 11, 2025
1 parent fadd0f2 commit 09355c6
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion src/files.rs
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,14 @@ pub(crate) fn guess_content(bytes: &[u8]) -> ProbableFileKind {
// ISO-8859-1 aka Latin 1), treat them as such.
let (latin1_str, _encoding, saw_malformed) = encoding_rs::WINDOWS_1252.decode(bytes);
if !saw_malformed {
return ProbableFileKind::Text(latin1_str.to_string());
let num_null = utf16_string
.chars()
.take(5000)
.filter(|c| *c == '\0')
.count();
if num_null <= 1 {
return ProbableFileKind::Text(latin1_str.to_string());
}
}

ProbableFileKind::Binary
Expand Down

0 comments on commit 09355c6

Please sign in to comment.