New Components - scrapeless #16712

luancazarine · 2025-05-19T16:40:30Z

Resolves #16673.

Summary by CodeRabbit

New Features
- Added the ability to submit web scraping jobs using the Scrapeless platform, with options for target URL, proxy country, and advanced configurations.
- Introduced an action to retrieve results of completed scraping jobs.
- Provided a comprehensive list of scraper actor options for easier selection.
Improvements
- Enhanced the Scrapeless integration with a fully implemented API client, streamlining job submission and result retrieval.
- Added a utility to parse JSON strings into objects for flexible input handling.
Other
- Updated internal dependencies and versioning for improved stability.

vercel · 2025-05-19T16:40:33Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

3 Skipped Deployments

Name	Status	Preview	Updated (UTC)
docs-v2	⬜️ Ignored (Inspect)	Visit Preview	May 20, 2025 1:22pm
pipedream-docs	⬜️ Ignored (Inspect)		May 20, 2025 1:22pm
pipedream-docs-redirect-do-not-edit	⬜️ Ignored (Inspect)		May 20, 2025 1:22pm

coderabbitai · 2025-05-19T16:40:38Z

Walkthrough

This update introduces a new Scrapeless component with actions to submit a web scraping job and retrieve its results. It implements a full API client, utility functions, and actor options, and updates the package configuration. The actions correspond to submitting jobs and fetching results via the Scrapeless API.

Changes

File(s)	Change Summary
components/scrapeless/actions/get-scrape-result/get-scrape-result.mjs	Added action to retrieve scraping job results by job ID from Scrapeless.
components/scrapeless/actions/submit-scrape-job/submit-scrape-job.mjs	Added action to submit new scraping jobs with configurable parameters to Scrapeless.
components/scrapeless/common/constants.mjs	Introduced `ACTOR_OPTIONS` array for selectable scraper actors.
components/scrapeless/common/utils.mjs	Added `parseObject` utility for robust JSON/object parsing.
components/scrapeless/scrapeless.app.mjs	Implemented Scrapeless API client with methods for submitting jobs and retrieving results; refactored structure.
components/scrapeless/package.json	Bumped version to 0.1.0, added dependency on `@pipedream/platform`, fixed `publishConfig` brace.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant SubmitAction as Submit Scrape Job Action
    participant ScrapelessApp as Scrapeless App
    participant ScrapelessAPI as Scrapeless API

    User->>SubmitAction: Provide job details (actor, URL, etc.)
    SubmitAction->>ScrapelessApp: submitScrapeJob(params)
    ScrapelessApp->>ScrapelessAPI: POST /scraper/request
    ScrapelessAPI-->>ScrapelessApp: Return job ID
    ScrapelessApp-->>SubmitAction: Return job ID
    SubmitAction-->>User: Return job ID

    User->>GetResultAction: Provide scrapeJobId
    GetResultAction->>ScrapelessApp: getScrapeResult({ scrapeJobId })
    ScrapelessApp->>ScrapelessAPI: GET /scraper/result/{scrapeJobId}
    ScrapelessAPI-->>ScrapelessApp: Return scrape result
    ScrapelessApp-->>GetResultAction: Return result
    GetResultAction-->>User: Return result

Assessment against linked issues

Objective	Addressed	Explanation
Implement `submit-scrape-job` action to submit new web scraping jobs (#16673)	✅
Implement `get-scrape-result` action to retrieve completed job results (#16673)	✅

Poem

A bunny with code in its paws,
Built scrapers without any flaws.
Submit a job, then wait and see—
Results retrieved, as quick as can be!
With actors and helpers, all neatly arrayed,
The Scrapeless component is now well displayed.
🐇✨

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

components/scrapeless/actions/get-scrape-result/get-scrape-result.mjs

Oops! Something went wrong! :(

ESLint: 8.57.1

Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'jsonc-eslint-parser' imported from /eslint.config.mjs
at Object.getPackageJSONURL (node:internal/modules/package_json_reader:255:9)
at packageResolve (node:internal/modules/esm/resolve:767:81)
at moduleResolve (node:internal/modules/esm/resolve:853:18)
at defaultResolve (node:internal/modules/esm/resolve:983:11)
at ModuleLoader.defaultResolve (node:internal/modules/esm/loader:799:12)
at #cachedDefaultResolve (node:internal/modules/esm/loader:723:25)
at ModuleLoader.resolve (node:internal/modules/esm/loader:706:38)
at ModuleLoader.getModuleJobForImport (node:internal/modules/esm/loader:307:38)
at #link (node:internal/modules/esm/module_job:163:49)

components/scrapeless/actions/submit-scrape-job/submit-scrape-job.mjs

Oops! Something went wrong! :(

ESLint: 8.57.1

Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'jsonc-eslint-parser' imported from /eslint.config.mjs
at Object.getPackageJSONURL (node:internal/modules/package_json_reader:255:9)
at packageResolve (node:internal/modules/esm/resolve:767:81)
at moduleResolve (node:internal/modules/esm/resolve:853:18)
at defaultResolve (node:internal/modules/esm/resolve:983:11)
at ModuleLoader.defaultResolve (node:internal/modules/esm/loader:799:12)
at #cachedDefaultResolve (node:internal/modules/esm/loader:723:25)
at ModuleLoader.resolve (node:internal/modules/esm/loader:706:38)
at ModuleLoader.getModuleJobForImport (node:internal/modules/esm/loader:307:38)
at #link (node:internal/modules/esm/module_job:163:49)

components/scrapeless/common/constants.mjs

Oops! Something went wrong! :(

ESLint: 8.57.1

Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'jsonc-eslint-parser' imported from /eslint.config.mjs
at Object.getPackageJSONURL (node:internal/modules/package_json_reader:255:9)
at packageResolve (node:internal/modules/esm/resolve:767:81)
at moduleResolve (node:internal/modules/esm/resolve:853:18)
at defaultResolve (node:internal/modules/esm/resolve:983:11)
at ModuleLoader.defaultResolve (node:internal/modules/esm/loader:799:12)
at #cachedDefaultResolve (node:internal/modules/esm/loader:723:25)
at ModuleLoader.resolve (node:internal/modules/esm/loader:706:38)
at ModuleLoader.getModuleJobForImport (node:internal/modules/esm/loader:307:38)
at #link (node:internal/modules/esm/module_job:163:49)

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.

Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between bf9311e and b70c366.

📒 Files selected for processing (3)

components/scrapeless/actions/get-scrape-result/get-scrape-result.mjs (1 hunks)
components/scrapeless/actions/submit-scrape-job/submit-scrape-job.mjs (1 hunks)
components/scrapeless/common/constants.mjs (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

components/scrapeless/actions/get-scrape-result/get-scrape-result.mjs
components/scrapeless/common/constants.mjs
components/scrapeless/actions/submit-scrape-job/submit-scrape-job.mjs

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: pnpm publish
GitHub Check: Publish TypeScript components
GitHub Check: Lint Code Base
GitHub Check: Verify TypeScript components

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Actions - Submit Scrape Job - Get Scrape Result

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (11)

components/scrapeless/common/utils.mjs (1)
1-24: Utility function could benefit from JSDoc documentation

This utility function seems well-structured and handles various input types gracefully. It correctly returns the parsed JSON object or falls back to the original value when parsing fails. However, adding JSDoc documentation would improve clarity for other developers.
+/**
+ * Attempts to parse JSON strings into JavaScript objects
+ * @param {any} obj - The input to parse (string, array, or any other type)
+ * @returns {any} - Parsed object or original input if parsing fails
+ */
 export const parseObject = (obj) => {
   if (!obj) return undefined;

   if (Array.isArray(obj)) {
     return obj.map((item) => {
       if (typeof item === "string") {
         try {
           return JSON.parse(item);
         } catch (e) {
           return item;
         }
       }
       return item;
     });
   }
   if (typeof obj === "string") {
     try {
       return JSON.parse(obj);
     } catch (e) {
       return obj;
     }
   }
   return obj;
 };
components/scrapeless/common/constants.mjs (2)
67-73: Fix typos in Google Flights labels

There are capitalization errors in "FLights" which should be "Flights".
   {
-    label: "Google FLights",
+    label: "Google Flights",
     value: "scraper.google.flights",
   },
   {
-    label: "Google FLights Chart",
+    label: "Google Flights Chart",
     value: "scraper.google.flights.chart",
   },
1-138: Consider organizing ACTOR_OPTIONS by category for better maintainability

The current list of actor options appears to be organized somewhat randomly. Consider grouping related scrapers together (e.g., all Google services, all e-commerce platforms, etc.) to improve readability and maintainability.

You could organize the array by grouping similar scrapers together, for example:

E-commerce (Shopee, Amazon, Temu)

Brazilian sites

Airlines

Electronics distributors

Google services (grouped by type)

This would make the list easier to navigate and maintain as new scrapers are added.
components/scrapeless/actions/get-scrape-result/get-scrape-result.mjs (2)
7-8: Consider aligning component version with package version

The action version is set to "0.0.1" while the package.json was updated to "0.1.0". For consistency, consider aligning these versions.
   description: "Retrieve the result of a completed scraping job. [See the documentation](https://apidocs.scrapeless.com/api-11949853)",
-  version: "0.0.1",
+  version: "0.1.0",
   type: "action",
14-15: Extra space at beginning of description text

There's an extra space at the beginning of the description text.
     type: "string",
     label: "Scrape Job ID",
-    description: " The ID of the scrape job you want to retrieve results for. This ID is provided when you submit a scrape job.",
+    description: "The ID of the scrape job you want to retrieve results for. This ID is provided when you submit a scrape job.",
   },
components/scrapeless/actions/submit-scrape-job/submit-scrape-job.mjs (2)
41-41: Check for existence of additionalInput before parsing

The parseObject function is called without first checking if this.additionalInput exists. While this might work if parseObject handles null/undefined values, it would be more robust to check for existence first.
-      input: parseObject(this.additionalInput),
+      input: this.additionalInput ? parseObject(this.additionalInput) : {},
42-42: Add a comment explaining the 'async' parameter

The async: true parameter might not be self-explanatory. Consider adding a comment to explain its purpose in the API request.
       actor: this.actor,
       input: parseObject(this.additionalInput),
-      async: true,
+      async: true, // Process the request asynchronously, so we get a task ID instead of waiting for results
components/scrapeless/scrapeless.app.mjs (4)
12-12: Simplify the API token header value

Template literals are unnecessary when only using a single variable. You can directly reference the variable without string interpolation.
-        "x-api-token": `${this.$auth.api_key}`,
+        "x-api-token": this.$auth.api_key,
31-35: Add method parameter documentation

The getScrapeResult method takes a scrapeJobId parameter but lacks documentation. Consider adding JSDoc comments to explain the parameter requirements.
+    /**
+     * Get the result of a scrape job
+     * @param {Object} opts - The request options
+     * @param {string} opts.scrapeJobId - The ID of the scrape job to retrieve
+     * @returns {Promise<Object>} The scrape job result
+     */
     getScrapeResult({ scrapeJobId }) {
       return this._makeRequest({
         path: `/scraper/result/${scrapeJobId}`,
       });
     },
24-30: Add method parameter documentation for submitScrapeJob

Similar to getScrapeResult, the submitScrapeJob method could benefit from JSDoc comments explaining the expected parameters and return value.
+    /**
+     * Submit a new scrape job
+     * @param {Object} opts - The request options
+     * @param {Object} opts.data - The scrape job parameters
+     * @param {string} opts.data.actor - The actor to use for the scrape job
+     * @param {Object} opts.data.input - Input parameters for the scrape job
+     * @param {boolean} opts.data.async - Whether to process the request asynchronously
+     * @returns {Promise<Object>} The created scrape job, including taskId
+     */
     submitScrapeJob(opts = {}) {
       return this._makeRequest({
         method: "POST",
         path: "/scraper/request",
         ...opts,
       });
     },
7-9: Add environment variable support for API URL

Consider making the base URL configurable to support different environments (e.g., development, testing, production).
     _baseUrl() {
-      return "https://api.scrapeless.com/api/v1";
+      return this.$auth.base_url || "https://api.scrapeless.com/api/v1";
     },

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between cb64910 and bf9311e.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (6)

components/scrapeless/actions/get-scrape-result/get-scrape-result.mjs (1 hunks)
components/scrapeless/actions/submit-scrape-job/submit-scrape-job.mjs (1 hunks)
components/scrapeless/common/constants.mjs (1 hunks)
components/scrapeless/common/utils.mjs (1 hunks)
components/scrapeless/package.json (2 hunks)
components/scrapeless/scrapeless.app.mjs (1 hunks)

🔇 Additional comments (3)

components/scrapeless/package.json (2)

3-4: LGTM! Version update is appropriate for new component

The version update from 0.0.1 to 0.1.0 is appropriate for introducing new functionality.

14-17: LGTM! Dependency addition and formatting corrections

The addition of the @pipedream/platform dependency and correction of the JSON structure are appropriate.

components/scrapeless/actions/get-scrape-result/get-scrape-result.mjs (1)

17-25: LGTM! Well-implemented run method

The run method is well-implemented, correctly passing the execution context and job ID to the Scrapeless API client, then returning the response with a descriptive summary.

components/scrapeless/common/constants.mjs

components/scrapeless/actions/get-scrape-result/get-scrape-result.mjs

components/scrapeless/actions/submit-scrape-job/submit-scrape-job.mjs

components/scrapeless/scrapeless.app.mjs

michelle0927

Looks good! Just one possible typo in constants.mjs. Moving to QA.

components/scrapeless/common/constants.mjs

luancazarine · 2025-05-22T13:38:27Z

/approve

scrapeless init

d2f834a

luancazarine added the ai-assisted Content generated by AI, with human refinement and modification label May 19, 2025

luancazarine added 2 commits May 19, 2025 16:26

[Components] scrapeless #16673

3196786

Actions - Submit Scrape Job - Get Scrape Result

pnpm update

bf9311e

luancazarine marked this pull request as ready for review May 19, 2025 20:09

pipedream-component-development requested a review from michelle0927 May 19, 2025 20:10

coderabbitai bot reviewed May 19, 2025

View reviewed changes

michelle0927 reviewed May 19, 2025

View reviewed changes

components/scrapeless/common/constants.mjs Outdated Show resolved Hide resolved

luancazarine added 2 commits May 20, 2025 10:09

some adjusts

d5fe429

some adjusts

b70c366

pipedream-component-development requested a review from michelle0927 May 22, 2025 13:38

michelle0927 approved these changes May 22, 2025

View reviewed changes

luancazarine merged commit 711e202 into master May 23, 2025
11 checks passed

luancazarine deleted the issue-16673 branch May 23, 2025 14:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New Components - scrapeless #16712

New Components - scrapeless #16712

Uh oh!

luancazarine commented May 19, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented May 19, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented May 19, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michelle0927 left a comment

Uh oh!

Uh oh!

luancazarine commented May 22, 2025

Uh oh!

Uh oh!

Uh oh!

New Components - scrapeless #16712

New Components - scrapeless #16712

Uh oh!

Conversation

luancazarine commented May 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel bot commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Assessment against linked issues

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michelle0927 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

luancazarine commented May 22, 2025

Uh oh!

Uh oh!

Uh oh!

luancazarine commented May 19, 2025 •

edited by coderabbitai bot

Loading

vercel bot commented May 19, 2025 •

edited

Loading

coderabbitai bot commented May 19, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)