-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
selenium.common.exceptions.TimeoutException ImmoScout24 #272
Comments
Hi @flyingdodo11 , How much RAM do you have available for your docker containers? I think the docker daemon is by default not very generous on Mac. You should have at least 1GB of memory to run the Immoscout crawler. |
I am running Flathunter in docker on Linux with no resource limits, and I am getting the same issue. More logging output:
|
@codders Already tried that, doesnt work.. |
Okay. I've made a PR #273 - you can try and see if that fixes your issue. Unfortunately it's not something I can reproduce locally, so it's a bit of guess work. Let me know! |
@codders unfortunately didn't help. Immobilienscout crawling won't work even with increased timeout. |
Doesn't work for me either. |
Same error for me. Though it was running fine earlier today |
I had a look at this again today. What I can see is that also if I run from the command line (without docker), I get the timeout / cannot find IS24 variable message. Debugging further, I can see that in these cases the bot detection has kicked in: If I disable the '--headless' argument (or unset FLATHUNTER_HEADLESS_BROWSER), the immoscout crawl works as normal. Somehow, the version I have running in the cloud (which uses the headless argument and docker) is still succeeding. The undetected_chromedriver package is supposed to make it impossible to detect the fact that we're driving the browser from a script, and that seemed to help us for a while, but I guess it's a cat and mouse game. If anyone has any hot tips on avoiding bot detection, those would be most welcome :) |
I got this partially fixed: undetected_chromedriver provides a Docker image in which it is possible to run chromedriver without the |
That's exciting news - thanks for taking a look! Often when I've seen crashes it's been about memory usage, but I guess you've already tried that. If you make a draft PR I can also have a go at running it here and see what happens. |
Any updates on this? |
@flyingdodo11 I haven't heard anything. I don't know if this helps you, but if you're just searching in Berlin and you're okay with a pretty default setup, you can also just use the hosted version: https://flathunter.codders.io . That's running okay right now (and crawling immoscout still). |
I also ran into this issue. Any update or workaround would be great. |
I tried this method, by basing my docker image from the undetected chromedriver like this:
Also i set the flags "--no-sandbox" and "--disable-setuid-sandbox". First i get this message for a period of time:
then in the end it shows a long error message and stops |
@hruzgar Can you copy the long error message? |
I've also tried running a job on Google Cloud Run based on the
What am I missing here? |
|
@infctr you need to set "--no-sandbox" and "--disable-setuid-sandbox" flags in your config.yaml file. also don't set the "--headless" flag |
@hruzgar Did |
yeah it was working (and is still working) on my main pc. But i want to run the bot on my server to not get a high energy bill (my pc is beefy). That's the reason i am trying to get it working inside docker without any gui.. |
I've started the image with these driver flags but it didn't make a difference in the container unfortunately
|
I just tried running the bot locally on my pc again. And the weird thing is that it works with the "--headless" argument for a certain amount of time, before it fails again but as soon as i comment the "--headless" flag and run the bot again, it fires up a chrome tab and it sais that i am a robot and thus not get access to the site. |
@infctr The cloud_job script is expected to run once and then quit. It is designed to be installed as a cron job running on a timer. The flathunt script is configurable either to run in a loop, or as a one-time job. |
@hruzgar CaptchaUnsolvableError sometimes comes up if it just can't solve the captcha, but it should retry and that shouldn't be fatal. Usually a message like 'session deleted because of page crash' comes after the container runs out of memory - are you running with a memory limit on your docker container? |
Hi guys,
I'm trying to setup the flathunter for ImmoScout24. Already tried it with ebay-kleinanzeigen und immowelt with success.
I already checked all other issues regarding this problem like Issue214, none of the solutions worked for me.
Also i tried it on MacOS and Ubuntu 20.04 with the normal version and the docker version.
I always get the same errors.
The text was updated successfully, but these errors were encountered: