Skip to content

utils.web.getEncoding() always returning 'None' in Web plugin #1362

Open
@Rodrigo-NH

Description

@Rodrigo-NH

Hi. While trying to find why NBSP (non-breaking space) decodes incorrectly if page is charset iso8859-1 I discovered that in the Web plugin, actual line 155 "text = text.decode(utils.web.getEncoding(text) or 'utf8', 'replace')" the utils.web.getEncoding(text) is always returning 'None'.
I tried a couple of different pages with same result, getEnconding not being capable of returning actual encoding.
Example of the problem: Title returned in the page 'https://www.freebsd.org/doc/handbook/usb-device-mode-terminals.html' the title contains nbsp in the right encoding accordingly iso8859-1. If I set decoding to iso8859-1 explicity in the code web plugin returns the title correctly.

I didn't look at getEnconding() yet to try finding the issue (in the case it's really a getEnconding() issue)

The current (running) version of this Limnoria is installed on 2019-01-24T22-12-03, running on Python 3.6.8 (default, Jan 3 2019, 01:10:23) [GCC 4.2.1 Compatible FreeBSD Clang 6.0.0 (tags/RELEASE_600/final 326565)]. The newest versions available online are 2019.02.22 (in master), 2019.02.22 (in testing).

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions