-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New Components - scrapeless #16712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
New Components - scrapeless #16712
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 3 Skipped Deployments
|
WalkthroughThis update introduces a new Scrapeless component with actions to submit a web scraping job and retrieve its results. It implements a full API client, utility functions, and actor options, and updates the package configuration. The actions correspond to submitting jobs and fetching results via the Scrapeless API. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant SubmitAction as Submit Scrape Job Action
participant ScrapelessApp as Scrapeless App
participant ScrapelessAPI as Scrapeless API
User->>SubmitAction: Provide job details (actor, URL, etc.)
SubmitAction->>ScrapelessApp: submitScrapeJob(params)
ScrapelessApp->>ScrapelessAPI: POST /scraper/request
ScrapelessAPI-->>ScrapelessApp: Return job ID
ScrapelessApp-->>SubmitAction: Return job ID
SubmitAction-->>User: Return job ID
User->>GetResultAction: Provide scrapeJobId
GetResultAction->>ScrapelessApp: getScrapeResult({ scrapeJobId })
ScrapelessApp->>ScrapelessAPI: GET /scraper/result/{scrapeJobId}
ScrapelessAPI-->>ScrapelessApp: Return scrape result
ScrapelessApp-->>GetResultAction: Return result
GetResultAction-->>User: Return result
Assessment against linked issues
Poem
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
components/scrapeless/actions/get-scrape-result/get-scrape-result.mjsOops! Something went wrong! :( ESLint: 8.57.1 Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'jsonc-eslint-parser' imported from /eslint.config.mjs components/scrapeless/actions/submit-scrape-job/submit-scrape-job.mjsOops! Something went wrong! :( ESLint: 8.57.1 Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'jsonc-eslint-parser' imported from /eslint.config.mjs components/scrapeless/common/constants.mjsOops! Something went wrong! :( ESLint: 8.57.1 Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'jsonc-eslint-parser' imported from /eslint.config.mjs Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
⏰ Context from checks skipped due to timeout of 90000ms (4)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Actions - Submit Scrape Job - Get Scrape Result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (11)
components/scrapeless/common/utils.mjs (1)
1-24
: Utility function could benefit from JSDoc documentationThis utility function seems well-structured and handles various input types gracefully. It correctly returns the parsed JSON object or falls back to the original value when parsing fails. However, adding JSDoc documentation would improve clarity for other developers.
+/** + * Attempts to parse JSON strings into JavaScript objects + * @param {any} obj - The input to parse (string, array, or any other type) + * @returns {any} - Parsed object or original input if parsing fails + */ export const parseObject = (obj) => { if (!obj) return undefined; if (Array.isArray(obj)) { return obj.map((item) => { if (typeof item === "string") { try { return JSON.parse(item); } catch (e) { return item; } } return item; }); } if (typeof obj === "string") { try { return JSON.parse(obj); } catch (e) { return obj; } } return obj; };components/scrapeless/common/constants.mjs (2)
67-73
: Fix typos in Google Flights labelsThere are capitalization errors in "FLights" which should be "Flights".
{ - label: "Google FLights", + label: "Google Flights", value: "scraper.google.flights", }, { - label: "Google FLights Chart", + label: "Google Flights Chart", value: "scraper.google.flights.chart", },
1-138
: Consider organizing ACTOR_OPTIONS by category for better maintainabilityThe current list of actor options appears to be organized somewhat randomly. Consider grouping related scrapers together (e.g., all Google services, all e-commerce platforms, etc.) to improve readability and maintainability.
You could organize the array by grouping similar scrapers together, for example:
- E-commerce (Shopee, Amazon, Temu)
- Brazilian sites
- Airlines
- Electronics distributors
- Google services (grouped by type)
This would make the list easier to navigate and maintain as new scrapers are added.
components/scrapeless/actions/get-scrape-result/get-scrape-result.mjs (2)
7-8
: Consider aligning component version with package versionThe action version is set to "0.0.1" while the package.json was updated to "0.1.0". For consistency, consider aligning these versions.
description: "Retrieve the result of a completed scraping job. [See the documentation](https://apidocs.scrapeless.com/api-11949853)", - version: "0.0.1", + version: "0.1.0", type: "action",
14-15
: Extra space at beginning of description textThere's an extra space at the beginning of the description text.
type: "string", label: "Scrape Job ID", - description: " The ID of the scrape job you want to retrieve results for. This ID is provided when you submit a scrape job.", + description: "The ID of the scrape job you want to retrieve results for. This ID is provided when you submit a scrape job.", },components/scrapeless/actions/submit-scrape-job/submit-scrape-job.mjs (2)
41-41
: Check for existence of additionalInput before parsingThe
parseObject
function is called without first checking ifthis.additionalInput
exists. While this might work ifparseObject
handles null/undefined values, it would be more robust to check for existence first.- input: parseObject(this.additionalInput), + input: this.additionalInput ? parseObject(this.additionalInput) : {},
42-42
: Add a comment explaining the 'async' parameterThe
async: true
parameter might not be self-explanatory. Consider adding a comment to explain its purpose in the API request.actor: this.actor, input: parseObject(this.additionalInput), - async: true, + async: true, // Process the request asynchronously, so we get a task ID instead of waiting for resultscomponents/scrapeless/scrapeless.app.mjs (4)
12-12
: Simplify the API token header valueTemplate literals are unnecessary when only using a single variable. You can directly reference the variable without string interpolation.
- "x-api-token": `${this.$auth.api_key}`, + "x-api-token": this.$auth.api_key,
31-35
: Add method parameter documentationThe
getScrapeResult
method takes ascrapeJobId
parameter but lacks documentation. Consider adding JSDoc comments to explain the parameter requirements.+ /** + * Get the result of a scrape job + * @param {Object} opts - The request options + * @param {string} opts.scrapeJobId - The ID of the scrape job to retrieve + * @returns {Promise<Object>} The scrape job result + */ getScrapeResult({ scrapeJobId }) { return this._makeRequest({ path: `/scraper/result/${scrapeJobId}`, }); },
24-30
: Add method parameter documentation for submitScrapeJobSimilar to
getScrapeResult
, thesubmitScrapeJob
method could benefit from JSDoc comments explaining the expected parameters and return value.+ /** + * Submit a new scrape job + * @param {Object} opts - The request options + * @param {Object} opts.data - The scrape job parameters + * @param {string} opts.data.actor - The actor to use for the scrape job + * @param {Object} opts.data.input - Input parameters for the scrape job + * @param {boolean} opts.data.async - Whether to process the request asynchronously + * @returns {Promise<Object>} The created scrape job, including taskId + */ submitScrapeJob(opts = {}) { return this._makeRequest({ method: "POST", path: "/scraper/request", ...opts, }); },
7-9
: Add environment variable support for API URLConsider making the base URL configurable to support different environments (e.g., development, testing, production).
_baseUrl() { - return "https://api.scrapeless.com/api/v1"; + return this.$auth.base_url || "https://api.scrapeless.com/api/v1"; },
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
⛔ Files ignored due to path filters (1)
pnpm-lock.yaml
is excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (6)
components/scrapeless/actions/get-scrape-result/get-scrape-result.mjs
(1 hunks)components/scrapeless/actions/submit-scrape-job/submit-scrape-job.mjs
(1 hunks)components/scrapeless/common/constants.mjs
(1 hunks)components/scrapeless/common/utils.mjs
(1 hunks)components/scrapeless/package.json
(2 hunks)components/scrapeless/scrapeless.app.mjs
(1 hunks)
🔇 Additional comments (3)
components/scrapeless/package.json (2)
3-4
: LGTM! Version update is appropriate for new componentThe version update from 0.0.1 to 0.1.0 is appropriate for introducing new functionality.
14-17
: LGTM! Dependency addition and formatting correctionsThe addition of the @pipedream/platform dependency and correction of the JSON structure are appropriate.
components/scrapeless/actions/get-scrape-result/get-scrape-result.mjs (1)
17-25
: LGTM! Well-implemented run methodThe run method is well-implemented, correctly passing the execution context and job ID to the Scrapeless API client, then returning the response with a descriptive summary.
components/scrapeless/actions/submit-scrape-job/submit-scrape-job.mjs
Outdated
Show resolved
Hide resolved
components/scrapeless/actions/submit-scrape-job/submit-scrape-job.mjs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just one possible typo in constants.mjs
. Moving to QA.
Resolves #16673.
Summary by CodeRabbit
New Features
Improvements
Other