Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JobFunnel 3.0 with localization, ABC and improved scraping #90

Merged
merged 66 commits into from
Sep 12, 2020

Conversation

PaulMcInnis
Copy link
Owner

@PaulMcInnis PaulMcInnis commented Aug 29, 2020

Description

This is version 3.0 of JobFunnel with numerous improvements including:

  • support for localization
  • abstract base class implementation of JobFunnel and Scrapers
  • significantly improved preemption of scraping and filtering of results
  • implementation of OO features such as Job JobField and JobFilter
  • signifiantly easier path towards making new scrapers via a get() and set() style of API with configurable priority and delay
  • addition of Remote and Wage scraping
  • implementation of Cerberus for Schema and validation of YAML configuration files
  • capability of updating CSV job contents when encountering a newer duplicate

This will affect anyone currently developing off of the old branch, as the rebase will be un-tenable. I may need to squash this down a lot more.

If you are reading this, please give this branch a go, I find the easiest non-distruptive way is just to clone this repo as ABCJobFunnel and simply run

cd ABCJobFunnel
python3 -m jobfunnel

A good place to start is

Issues affected:

Context of change

  • Software (software that runs on the PC)
  • Library (library that runs on the PC)
  • Tool (tool that assists coding development) -- added a call-graph GraphViz generation script.

Type of change

I have updated all documentation.

Existing master CSV files can be ported by adding missing columns, but it is recommended just to start fresh.
Existing cache files and block lists are not compatible, block lists could however be made compatible, this one might be worth pursuing.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

General monkey testing, but I need to up test coverage to be truly confident that the code quality is there. Would appreciate anyone reading this to just try running it and to try breaking it. Respond here with any bugs you find.

Checklist:

  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.

Additional TBD:

  • Up test coverage, unfortunately existing tests are not well-adapted to the new and improved codebase.
  • Add versioning to Cache
  • maybe fix glassdoor as a part of this PR as well (GlassDoor support (fix and re-enable) #87)
  • fix Travis

@PaulMcInnis
Copy link
Owner Author

FYI I've put this up before I've re-upped the coverage / fixed the pyenv to make it accessible. Fixing the coverage will take some time, but I don't anticipate making any further large changes to the structure of the codebase.

Copy link
Collaborator

@thebigG thebigG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like the new percentage bars when scraping; it makes JobFunnel look really fancy 😎. Requested some changes. Will try to review further throughout the week as my schedule permits.

Really excited to see the new awesome changes get merged into master.

jobfunnel/config/cli.py Outdated Show resolved Hide resolved
@bunsenmurder
Copy link
Collaborator

I reviewed as much as I could but ended up stopping part way as this update seems to break debugging using PyCharm. I was able to run JF normally, but debugging would cause it to stall out indefinitely. The issue seems to stem from the use of properties within this new version; more details about the issue can be found within this thread on Jetbrain's support forum.

@PaulMcInnis
Copy link
Owner Author

PaulMcInnis commented Aug 31, 2020

Thanks for taking a look guys, I'll be fixing the CLI issues tomorrow, I might need to add some functional testing as well to make sure I've smoke tested this a bit better (in lieu of complete unit testing)

additionally, it seems that pyenv sync doesn't work with the jobfunnel dependency, not sure what's up with that yet though.

@PaulMcInnis
Copy link
Owner Author

It would also seem that USA_ENGLISH locale is broken for the default settings.yaml, need to look into this.

@PaulMcInnis PaulMcInnis requested a review from thebigG August 31, 2020 13:51
@PaulMcInnis PaulMcInnis force-pushed the ABCJobFunnel branch 2 times, most recently from ec4ac1e to 9b95f26 Compare August 31, 2020 22:16
@codecov-commenter
Copy link

codecov-commenter commented Aug 31, 2020

Codecov Report

Merging #90 into master will decrease coverage by 21.50%.
The diff coverage is 36.94%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master      #90       +/-   ##
===========================================
- Coverage   58.34%   36.83%   -21.51%     
===========================================
  Files          13       22        +9     
  Lines        1150     1341      +191     
===========================================
- Hits          671      494      -177     
- Misses        479      847      +368     
Impacted Files Coverage Δ
jobfunnel/__main__.py 0.00% <0.00%> (-35.90%) ⬇️
jobfunnel/backend/jobfunnel.py 0.00% <0.00%> (ø)
jobfunnel/backend/tools/delay.py 21.15% <21.15%> (ø)
jobfunnel/backend/tools/filters.py 21.27% <21.27%> (ø)
jobfunnel/backend/job.py 26.47% <26.47%> (ø)
jobfunnel/backend/scrapers/monster.py 28.35% <28.35%> (ø)
jobfunnel/config/manager.py 29.78% <29.78%> (ø)
jobfunnel/backend/tools/tools.py 29.87% <29.87%> (ø)
jobfunnel/backend/scrapers/glassdoor.py 30.14% <30.14%> (ø)
jobfunnel/backend/scrapers/indeed.py 30.90% <30.90%> (ø)
... and 33 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5275820...cbbd917. Read the comment docs.

@PaulMcInnis PaulMcInnis force-pushed the ABCJobFunnel branch 3 times, most recently from daf1979 to 8a7a6fb Compare August 31, 2020 23:15
@PaulMcInnis
Copy link
Owner Author

OK, I'm just working on getting a few final things in, but seperating the CLI out made things alot easier. Finally moving past that mess and added some simple tests to verify It actually works.

@PaulMcInnis PaulMcInnis dismissed thebigG’s stale review September 10, 2020 23:01

resolved all comments, force-push broke allowing me to complete review.

@PaulMcInnis
Copy link
Owner Author

OK, I've tested this enough for now.

Master is pretty broken compared to this so I'm going to merge and fix bugs as they come in from now on.

Still TODO:
[ ] Inter-scrape duplicates by TFIDF
[ ] GlassDoor scraper (webdriven)
[ ] more testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment