Skip to content

JSONResultsReader exception when record contains invalid UTF-8 characters #540

Open
@ericatdropzone

Description

@ericatdropzone

I'm using the botsv3 dataset and running running code similar to this:

from splunklib.client import connect
from splunklib.results import JSONResultsReader
from time import sleep

spl_query = "search index=botsv3 sourcetype=stream:udp earliest=0"
connection = connect(host=host, port=port, username=user, password=password, autologin=True)
job = self.connection.jobs.create(spl_query)

# Wait for the job to complete
sleep(5)

reader = JSONResultsReader(job.results(output_mode="json", earliest_time=earliest_time, count=max_results))
for result in reader: # This throws an exception
    ...

I'm seeing this exception:

Traceback (most recent call last):
  File "/app/splunk_scanner/splunk_connection.py", line 54, in query
    for result in reader:
  File "/usr/local/lib/python3.11/site-packages/splunklib/results.py", line 352, in next
    return next(self._gen)
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/splunklib/results.py", line 361, in _parse_results
    parsed_line = json_loads(strip_line)
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/__init__.py", line 341, in loads
    s = s.decode(detect_encoding(s), 'surrogatepass')
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8e in position 89348: invalid start byte

This looks similar to this issue, but running version 1.7.4 didn't fix this instance of the problem. I also noticed that this pull request appears to fix the issue, but I'm not sure if that's the approach you'd want to take

Splunk (please complete the following information):

  • splunk version: 9.1.0.1
  • OS: MacOS 13.5
  • Deployment: single, local instance

SDK (please complete the following information):

  • Version: 1.7.4
  • Language Runtime Version: Python 3.11.4
  • OS: Linux (in a docker container) 5.15.49

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions