Scrape Web URL
Extract data from web pages by scraping their content.
Using the Scrape Web URL Node
The Scrape Web URL node's functionality includes fetching the content of a web page for the provided url, the node accepts below parameters:
-
URL (required): The URL of the page you want to scrape. Example: https://docs.buildship.com (opens in a new tab).
-
Selector (optional): Specific HTML selector you want to extract text content from (by default body will be used).
-
Steps (optional): List of steps to follow after loading the page in given url.
Usage Example: Suppose you want to scrape the below information:
So, the
steps
input value would look something like below:{ "url": "https://www.google.com/", "selector": "#result-stats", "steps": [ // type "buildship" in google search input box { "action": "type", "params": ["#APjFqb", "buildship"] }, // click on google search button { "action": "click", "params": [".gNO89b"] }, // wait for searched query to load and if page doesn't load within 3 seconds, move to next step { "action": "waitForNavigation", "params": [ { "timeout": 3000, "waitUntil": "load" } ] } ] }
The root
selector
value is the selector from which you want to extract the text-content, after all steps are executed.Each step object in
steps
list consists ofaction
andparams
.The
action
parameter is the name of any method from puppeteer-page-methods (opens in a new tab) list. And, theparams
is list of parameters required in the selectedaction
(a puppeteer method name).For one of the
action
-type
, the parameters for the puppeteer-type-method (opens in a new tab) are:Hence, the step object for
type
action would look like:{ // puppeteer method name "action": "type", // "#APjFqb" is the "selector" (the selector to find <input>) // "buildship" is the "text" (the value to be typed in <input>) // As per parameters list of "type" method, the third parameter is optional, // hence we can either include or exclude it from "params" list "params": ["#APjFqb", "buildship"] }