You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
great job Thamme! If I may this is "focused language crawling" as opposed to e.g., "focused multimedia crawling" or "web page crawling" etc. We should update the issue title to reflect that. Great job filing the issue.
thammegowda
changed the title
Support for flexible focus crawling framework
Support for flexible focus language crawling framework
Nov 15, 2017
Thanks for the suggestion. the title is now updated 👍
Focus crawling is needed for everybody, but no existing crawler seems to do it right.
we/sparkler now has the thinking cap for this task, we will propose a good solution for languages, multimedia, etc..
The first task is defining and expressing the forcus crawling specification.
The second subtask will be implementing that specification in sparkler.
Currently, we have support for URL based focus/filters.
this has to be advanced with content-based focus.
Example task can be:
Sparkler should be able to express and accept this first 'focus' requirement, which is a combination of two filters:
The text was updated successfully, but these errors were encountered: