ERROR: Cannot parse CAPTCHA image URL from the page. Changing Tor circuit. #172

tombenetin · 2023-08-04T15:37:00Z

Hi,

in last days i received error:

CAPTCHA protected download - CAPTCHA challenges will be displayed
[TOR] TOR started
[Link solve] TOR get new CAPTCHA (timeout 30)
[Link solve] ERROR: Cannot parse CAPTCHA image URL from the page. Changing Tor circuit.
[Link solve] TOR get new CAPTCHA (timeout 30)
[Link solve] ERROR: Cannot parse CAPTCHA image URL from the page. Changing Tor circuit.

i tried update from 3.1.0 to 3.5.1 but still same error. till now it worked like charm.
debian buster,

pip3 -V
pip 18.1 from /usr/lib/python3/dist-packages/pip (python 3.7)

tor --version
Tor version 0.4.7.13.

Thanks for great tool :)

The text was updated successfully, but these errors were encountered:

SonGokussj4 · 2023-08-05T18:29:07Z

Same problem with the newest version

Strat00s · 2023-08-09T07:43:45Z

same

Sentakuu · 2023-08-09T15:42:15Z

same problem bruh i fking hope they fix this shit first vzum is dead now this bruh i hate ulozto im not gonna pay for stupid credits 🗡️

GlaDOSik · 2023-08-13T10:07:58Z

I'm working on web app frontend for ulozto-downloader and found the same issue. At first, I thought something is wrong with my integration. But it seems Ulozto implemented a new kind of protection/challenge? The page that should contain the captcha image now contains JavaScript code with some Cloudflare DDoS protection (but that's just me guessing from the parameter names I saw in the code). The idea is that bots will have a problem with interpreting JS, but the browser of the user that opens the link does it automatically. Since the downloader is using Tor, I guess it should interpret the JS just fine, so maybe it won't be that hard to fix it.

On the other hand, it's very likely that Ulozto is protecting the website only when the user's IP is from some foreign country, and in that case, I'm not sure how it could work in the past. Maybe the protection wasn't there?

kubikaugustyn · 2023-08-13T11:02:28Z

I have the same problem. Apparently there are two issues.

Enforce tor option doesn't work properly I think
In page.py:298 and page.py:301 there's this code:

r.get(..., proxies=self.tor.proxies if not self.enforce_tor else {})

According to the enforce tor option's help, it's supposed to Perform all the connections via TOR. If not set, the initial connection to Ulozto is performed directly before TOR is launched. If not set (default False), not self.enforce_tor is True and tor IS used.
2) When the request is sent through tor, it fails with 403 status code on both normal request and cloudscrapper request. The response's title is "Just a moment..." and it contains noscript and script tags. I'll try to find out how to get around it and inform you. Anyway, the cloudscrapper request is executed when the normal one fails with 403 status code, so I recommend at least logging something to the logger to inform the user that both the requests failed.
That's all I know (for now)

@GlaDOSik I'm also working on web app frontend btw

kubikaugustyn · 2023-08-13T11:11:19Z

And apparently the cloudscraper module is supposed to fix it and prevent the 403 error.

kubikaugustyn · 2023-08-13T11:51:57Z

When I set enforce tor to True, the log looks like this:

...
Getting info (filename, filesize, …)
[TOR] TOR started
Cannot download file: expected string or bytes-like object

No, I sadly don't have anything else, since that's what's shown in my frontend (all the logs).

The cloudflare protection is very weird. I don't know why the python package doesn't work. The endpoint's code is changed upon load (the functionality is the same, but the functions and their names and order etc. are shuffled, so it's not that easy to just get something with regex) and it posts request to https://uloz.to/cdn-cgi/challenge-platform/h/b/beacon/ov1/<?>:<?>:<?>-<?>/<ray>/<cHash>/managed with body v_<ray>=<?> and your uloz.to session cookie. I think then you're able to fetch the resource.

GlaDOSik · 2023-08-13T12:30:05Z

Ah, I see. So Cloudflare was already covered by cloudscraper module. The cloudscraper is using JS interpreter to run the Cloudflare JS challenge code. But as @kubikaugustyn pointed out, the JS is doing a request to https://uloz.to/cdn-cgi/challenge-platform. Maybe the cloudscraper's JS interpreter is not able execute the POST requests like that?

Just an idea - since the downloader is already using features of Tor, wouldn't be better to use it in headless mode to do these requests instead of faking the browser using the requests/cloudscraper? All these problems related to Cloudflare anti-bot protection would go away or am I wrong?

Extended localhost log again (because of the update) Problems with Captcha - [setnicka/ulozto-downloader#172](setnicka/ulozto-downloader#172), tried to walkaround (the reason for the cloudflare-workaround directory, the 403.html files are some responses that need the "workaround") Doing the import stuff in a package is pain :-/ at 13. 8. 2023 6:59PM

vaniron · 2023-09-05T09:43:46Z

Any update, how to fix this error?

jesusdeveloper · 2023-09-07T02:16:12Z

I have a same problem...
Is there not solution?

Thank you!

kubikaugustyn · 2023-09-07T02:53:51Z

I think that the issue isn't with ulozto-downloader, but rather cloudscraper

That means the solution is to wait until cloudscraper is fixed?

zbyna · 2023-09-07T11:30:50Z

Temporary solution is docker image mentioned here: #173 (comment)

But it is WIP!

mrdevolver · 2023-10-24T18:50:31Z

Same problem here. Captchas were solved with no problems before the change which introduced solving captchas through Tor. Of course adding another layer on the top (Tor for captchas) for solving captchas will trigger anti-bot measures. Let the captchas be solved without Tor like before and let us all move on instead of trying to add yet another layer (Flaresolver and alikes) trying to solve the problem created by the first extra layer (Tor for captchas).

This downloader was popular among users mostly for the simplicity that it did not require many dependencies or too much effort setting it up on user's part. Adding layers upon layers of new dependencies and extra steps for the users to take just to fix something that potentionally didn't need to be fixed to begin with is simply taking this project to a wrong direction.

filo891 · 2023-10-24T19:13:01Z

The functionality you are describing has been in the meantime reverted (version 3.5.2, 0c905f1) and the option --enforce-tor introduced to control it instead.

This is however not the reason for the downloader being broken. It is Uloz.to and its Cloudflare bot protection, which has either been recently introduced or it has recently started to detect and block Tor exit IPs.

Even if the captchas are resolved from outside Tor, the download links need to be generated through Tor and this only works if we have the Cloudflare clearance cookie. The cookie can be obtained either automatically (which the FlareSolverr branch does) or we let the user providing it manually, which will be quite likely error-prone.

Better ideas (or even pull requests) are more than welcome :).

kubikaugustyn · 2023-10-24T19:41:26Z

The issue isn't really new Cloudflare protection being added, but rather that Cloudflare added something that broke the solver package (which WORKED by running the JavaScript code made for the bot protection, but now is broken for some reason) and thus now we have to wait for the package to be fixed, find a different package/develop our own or find a different way to walk around Cloudflare.

remux · 2023-11-18T12:03:09Z

I have the exactly same problem.... any idea when this will be fixed?

SonGokussj4 · 2023-11-18T12:58:25Z

My final workaround is running it like this:

File structure:

PS (SonGokussj4@mypc) - (C:\uloztodownloader) $ tree -CL 2
.
├── README.md
└── downloads
    └── download.txt

In my README.md I've got a simple guide:

NOTE: downloads folder in current directory (where I run the docker command) has to exist

# 1) Edit downloads/download.txt
https://uloz.to/file/DTML8nHBBdoL/arthe...0ko2D5Mt
https://uloz.to/file/ijgZqBZ9uTVA/ar...gpGWDZJLlLt

# 2) Run docker container
docker run --name ulozto-downloader -v .\downloads:/downloads pkejval/uld-docker:main

Downloaded file are in downloads folder

SonGokussj4 · 2023-11-28T22:17:10Z

Člověk si tu odloží spokojeně pěkný návod, jak to vše krásně funguje a najednou to ulož.to balí :-D To člověk nevymyslí... 😿

Sentakuu · 2023-11-30T00:48:32Z

pravda popravdě ulož.to posledni dobou muselo dělat něco s GDPR a takovíma sračkama :D ale zase budou další stránky podle mě podobné jak uloz.to ale musím říct že nic se nevyrovná staremu dobremu uloz.to :) staré dobré časy Dne úterý 28. listopadu 2023 v 23:17:24 SEČ, uživatel SonGokussj4 ***@***.***> napsal: Člověk si tu odloží spokojeně pěkný návod, jak to vše krásně funguje a najednou to ulož.to balí :-D To člověk nevymyslí... 😿 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

kubikaugustyn · 2023-12-02T18:10:31Z

Má to vůbec smysl, poté co ulož.to skončilo (vpodstatě)?

filo891 mentioned this issue Aug 13, 2023

CloudFlare challenge solver support #173

Open

Bob4716 mentioned this issue Aug 25, 2023

chyba "Unable to start TOR" #174

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR: Cannot parse CAPTCHA image URL from the page. Changing Tor circuit. #172

ERROR: Cannot parse CAPTCHA image URL from the page. Changing Tor circuit. #172

tombenetin commented Aug 4, 2023

SonGokussj4 commented Aug 5, 2023

Strat00s commented Aug 9, 2023

Sentakuu commented Aug 9, 2023

GlaDOSik commented Aug 13, 2023 •

edited

Loading

kubikaugustyn commented Aug 13, 2023

kubikaugustyn commented Aug 13, 2023

kubikaugustyn commented Aug 13, 2023

GlaDOSik commented Aug 13, 2023

vaniron commented Sep 5, 2023

jesusdeveloper commented Sep 7, 2023

kubikaugustyn commented Sep 7, 2023

zbyna commented Sep 7, 2023

mrdevolver commented Oct 24, 2023

filo891 commented Oct 24, 2023

kubikaugustyn commented Oct 24, 2023

remux commented Nov 18, 2023

SonGokussj4 commented Nov 18, 2023

SonGokussj4 commented Nov 28, 2023

Sentakuu commented Nov 30, 2023 via email

kubikaugustyn commented Dec 2, 2023

ERROR: Cannot parse CAPTCHA image URL from the page. Changing Tor circuit. #172

ERROR: Cannot parse CAPTCHA image URL from the page. Changing Tor circuit. #172

Comments

tombenetin commented Aug 4, 2023

SonGokussj4 commented Aug 5, 2023

Strat00s commented Aug 9, 2023

Sentakuu commented Aug 9, 2023

GlaDOSik commented Aug 13, 2023 • edited Loading

kubikaugustyn commented Aug 13, 2023

kubikaugustyn commented Aug 13, 2023

kubikaugustyn commented Aug 13, 2023

GlaDOSik commented Aug 13, 2023

vaniron commented Sep 5, 2023

jesusdeveloper commented Sep 7, 2023

kubikaugustyn commented Sep 7, 2023

zbyna commented Sep 7, 2023

mrdevolver commented Oct 24, 2023

filo891 commented Oct 24, 2023

kubikaugustyn commented Oct 24, 2023

remux commented Nov 18, 2023

SonGokussj4 commented Nov 18, 2023

SonGokussj4 commented Nov 28, 2023

Sentakuu commented Nov 30, 2023 via email

kubikaugustyn commented Dec 2, 2023

GlaDOSik commented Aug 13, 2023 •

edited

Loading