Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError first time doing anything with Python need some help #69

Open
Despirited opened this issue Jan 6, 2025 · 4 comments

Comments

@Despirited
Copy link

Despirited commented Jan 6, 2025

I have exported both my movies and my shows from Trakt, I managed to successfully import all the movies but when I try to import my shows I get this error

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 7868: character maps to

I have no idea what the issue is, the CSV I exported is this:
episodes_views.csv

Here is the entire message I got:

Options: Namespace(config='config.ini', input=<_io.TextIOWrapper name='episodes_views.csv' mode='r' encoding='cp1252'>, watched_at=True, rated_at=False, format='imdb', type='episodes', list='history', seen=False, clean=False, verbose=True)
Config file: config.ini
Config: <configparser.ConfigParser object at 0x00000239A28707D0>
Trakt, skipped access token refresh, token is less than 30 days, only 1:06:27.085743
Trakt: {'client_id': '02....05', 'client_secret': '64....33', 'access_token': 'fa....74', 'refresh_token': '1a....ee', 'baseurl': 'https://api.trakt.tv'}
Authorization header: Bearer fa....74
trakt-api-key header: 02....05
Traceback (most recent call last):
  File "C:\Python312\import_trakt.py", line 517, in <module>
    main()
  File "C:\Python312\import_trakt.py", line 449, in main
    read_ids = read_csv(options)
               ^^^^^^^^^^^^^^^^^
  File "C:\Python312\import_trakt.py", line 154, in read_csv
    return list(reader)
           ^^^^^^^^^^^^
  File "C:\Python312\Lib\csv.py", line 116, in __next__
    row = next(self.reader)
          ^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 7868: character maps to <undefined>
@xbgmsharp
Copy link
Owner

xbgmsharp commented Jan 6, 2025 via email

@Despirited
Copy link
Author

I just tried on Python311 and I am still getting the same error just on a different position, instead of 7868 it's now in position 7975.

@NanoBitrin
Copy link

NanoBitrin commented Jan 25, 2025

Using chatgpt and Python 3.12.8, i found a way do bypass this error

First, correct the csv file, because is badly structured with too much double quotes:
Create a .py and execute it

import csv

input_file = "export_episodes_history.csv"
output_file = "export_episodes_history_cleaned.csv"

with open(input_file, mode='r', encoding='utf-8-sig') as infile, \
     open(output_file, mode='w', encoding='utf-8', newline='') as outfile:
    
    # Read the original CSV
    reader = csv.reader(infile)
    writer = csv.writer(outfile)
    
    for row in reader:
        # Fix the header
        if reader.line_num == 1:
            fixed_header = [col.replace('""', '').strip('"') for col in row]
            writer.writerow(fixed_header)
        else:
            # Fix duplicated double quotes in data rows
            fixed_row = [col.replace('""', '"').strip('"') for col in row]
            writer.writerow(fixed_row)

print(f"Cleaned file saved as: {output_file}")

Then use the import command again

python import_trakt.py -c config.ini -f tmdb -i export_episodes_history_cleaned.csv -l history -t episodes -w

If the error still exists, change line 153 of import_trakt.py from this:

reader = csv.DictReader(options.input, delimiter=',')

to this:

reader = csv.DictReader(open(options.input.name, mode='r', encoding='utf-8-sig'), delimiter=',')

and try again, it worked for me

@xbgmsharp
Copy link
Owner

Which Operating System are you using? utf-8-sig is need if you have a BOM file.
The export output is in UTF8 format, https://github.com/xbgmsharp/trakt/blob/master/export_trakt.py#L158
The import could handle UTF8 better, it is directly open by the argparse library
https://github.com/xbgmsharp/trakt/blob/master/import_trakt.py#L355
https://github.com/xbgmsharp/trakt/blob/master/import_trakt.py#L151

The default read encoding depends on the Operating System.
https://docs.python.org/3/glossary.html#term-filesystem-encoding-and-error-handler
https://docs.python.org/3/glossary.html#term-locale-encoding

Also the quote are need.

To solve the issue, can you try to replace line 355 by the following code.
https://github.com/xbgmsharp/trakt/blob/master/import_trakt.py#L355

parser.add_argument('-i', '--input',
                    help='CSV file to import, default %(default)s',
                    nargs='?', type=lambda f: open(f, mode='r', encoding='utf-8'),
                    default=None, required=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants