You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
duplicut successfully removed 378 duplicates and 385 filtered lines in 02 seconds
Any idea why this is occurring? i was expecting the same results over and over again,
On further testing it seems the cleanup stats go wild when writing to an already existing file, resulting in various inconsistent file size and words.
The text was updated successfully, but these errors were encountered:
It's also a little weird that using "duplicut -l 24" on a wordlist, then reusing the result with -l 16 afterwards, the size will always be smaller than doing -l 16 on the original wordlist. Technically it should be the same amount of duplicate/filtered lines taken out.
example2:
0...9999999.dict file (94.1mb) -> sort -u = same as original (94.1mb)
0...9999999.dict file (94.1mb) -> duplicut (successfully removed 0 duplicates and 0 filtered lines in 09 seconds) = 83.5mb (opened in notepad++ it seems the file suddenly stops after "8874998" while the original goes up to 9999999
Do any of your files contain nullbytes ?
Duplicut makes an important assumption: The input file is a standard passwords wordlist with no binary content.
The first pass 'patches' lines to me removed by overwriting a nullbyte to their first char, so seconds pass assumes tha lines starting with a nullbyte must be ignored on the second pass.
Also, duplicut makes virtual chunks of the file depending on currently available memory, and starts each chunk after next newline, so having nullbytes in your files would explain such weird behavior. Please let me know if your files do contain nullbytes, so i can investigate further if there is a possible bug.
Running with the default command on larger files (over 1GB) leads to inconsistency across multiple runs
$ duplicut '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.found' -o '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.DUPLICUT'
duplicut successfully removed 0 duplicates and 42 filtered lines in 05 seconds
$ duplicut '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.found' -o '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.DUPLICUT'
duplicut successfully removed 384 duplicates and 0 filtered lines in 02 seconds
$ duplicut '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.found' -o '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.DUPLICUT'
duplicut successfully removed 0 duplicates and 384 filtered lines in 02 seconds
$ duplicut '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.found' -o '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.DUPLICUT'
duplicut successfully removed 0 duplicates and 385 filtered lines in 02 seconds
$ duplicut '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.found' -o '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.DUPLICUT'
duplicut successfully removed 221 duplicates and 385 filtered lines in 02 seconds
$ duplicut '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.found' -o '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.DUPLICUT'
duplicut successfully removed 378 duplicates and 385 filtered lines in 02 seconds
Any idea why this is occurring? i was expecting the same results over and over again,
On further testing it seems the cleanup stats go wild when writing to an already existing file, resulting in various inconsistent file size and words.
The text was updated successfully, but these errors were encountered: