-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failed to scrape Indeed: 'NoneType' object has no attribute 'contents' #37
Comments
Looks like the indeed scraper needs updating - will get on this asap. |
OK, I need a bit more information, Can you show me your settings.yaml ? |
thanks,
|
Yes indeed Brad, So lets pick Dublin as the city search_terms: This generates note the +None , I believe this is due to the province being null/none '' The url works fine without +None I think logic can be added that doesn't add query strings if the settings are empty.. |
Thanks for the investigation, looks like we need to handle in internationalization for areas without provinces. |
Hello there, I just get the same issue, I work-arounded/tested (dirty) only for indeed in french. Its seems to me that the space in indeed.fr are not simple regular spaces, so using ' ' in regular expression for date is not working, replace with '\s', and the expression are in french (hour=heure, day=jour, month=mois, year=année ...)
maybe using a bigger date_regex and using an offset depending on the locale ? or internationalize the regex with more alternative like in
for now I only work-arounded with bigger date_regex table and offset, quick and dirty ... also, in indeed.py, line 133 the count of jobs is failing, I think this is the root cause of the 'NoneType' maybe a better solution should be to use re.sub instead of replace ? My 2 cents on this issue |
Hello all, just wanted to know what was the advancement of this issue? Is there a fix or something which is going to be done about this? Thank you in advance, |
Short answer: no. Long answer: no, because the problem is caused by the fact that the job listing websites such as glassdoor, monster, etc typically have slightly different websites depending on the country. This small changes breaks the functionality of JobFunnel since we scrap the job listings using tags which are language depended. The solution to this problem starts by writing an abstract formulation which allows developers to inherit from this abstract formulation to write the web scraper for a particular country. Ideally this is done in such a way such that it is accessible for many developers who do not yet have their country supported. We are working on this but remember that this is a difficult issue since it requires us to find common pattern across all countries. |
Ran
$ funnel -s /home/danny/JobFunnel/jobfunnel/config/settings.yaml
and got
The text was updated successfully, but these errors were encountered: