Cached parsing
#2881
Replies: 1 comment
-
Hello, and thanks for your interest in Crawlee! This usecase is not explicitly supported by the framework, but it should be pretty easy to achieve anyways. For caching of websites, I'd recommend using a standalone caching proxy such as https://www.npmjs.com/package/@loopback/http-caching-proxy. Then I believe you can just repeat steps 2 and 3, overwrite the results each time, and adjust your parsing until you're happy with the result. Does this work for you? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, it's is possible to split the whole crawling process to 3 steps?
This approach is very handy in case where pages is very unstructured, so you can freely experiment with parsing function (on step 2) without making a real request to website
Beta Was this translation helpful? Give feedback.
All reactions