ScrapingBee (Independent Publisher) (Preview)
ScrapingBee is the most powerful web scraping service on the web. It will handle headless browsers, proxies, CAPTCHAs, extracting complex structured information from any website with CSS selectors, and running JavaScript scenarios (click, scroll, form filling, etc.).
This connector is available in the following products and regions:
Service | Class | Regions |
---|---|---|
Logic Apps | Standard | All Logic Apps regions except the following: - Azure Government regions - Azure China regions - US Department of Defense (DoD) |
Power Automate | Premium | All Power Automate regions except the following: - US Government (GCC) - US Government (GCC High) - China Cloud operated by 21Vianet - US Department of Defense (DoD) |
Power Apps | Premium | All Power Apps regions except the following: - US Government (GCC) - US Government (GCC High) - China Cloud operated by 21Vianet - US Department of Defense (DoD) |
Contact | |
---|---|
Name | Troy Taylor |
URL | https://www.hitachisolutions.com |
ttaylor@hitachisolutions.com |
Connector Metadata | |
---|---|
Publisher | Troy Taylor |
Website | https://www.scrapingbee.com/ |
Privacy policy | https://www.scrapingbee.com/privacy-policy/ |
Categories | Website |
Creating a connection
The connector supports the following authentication types:
Default | Parameters for creating connection. | All regions | Not shareable |
Default
Applicable: All regions
Parameters for creating connection.
This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.
Name | Type | Description | Required |
---|---|---|---|
API Key | securestring | The API Key for this api | True |
Throttling Limits
Name | Calls | Renewal Period |
---|---|---|
API calls per connection | 100 | 60 seconds |
Actions
Get usage |
Retrieve information about credit consumption and concurrency usage. |
Perform Google search |
Retrieves a scrape of Google Search results pages |
Scrap URL |
Fetches the URL requested to scrap and will render JavaScript if requested. |
Get usage
Retrieve information about credit consumption and concurrency usage.
Returns
Name | Path | Type | Description |
---|---|---|---|
Max API Credit
|
max_api_credit | integer |
The max API credit. |
Used API Credit
|
used_api_credit | integer |
The used API credit. |
Max Concurrency
|
max_concurrency | integer |
The max concurrency. |
Current Concurrency
|
current_concurrency | integer |
The current concurrency. |
Renewal Subscription Date
|
renewal_subscription_date | string |
The renewal subscription date. |
Perform Google search
Retrieves a scrape of Google Search results pages
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Search
|
search | True | string |
The text you would put in the Google search bar. |
Country Code
|
country_code | string |
The country you would like the request to come from. |
|
Results
|
nb_results | integer |
The number of results to return. |
|
Page
|
page | integer |
The page number to extract results from. |
|
Language
|
language | string |
The language to return the results in. |
|
Extra Params
|
extra_params | string |
Any additional URL parameters to submit. |
Returns
Name | Path | Type | Description |
---|---|---|---|
URL
|
meta_data.url | string |
The URL address. |
Results
|
meta_data.number_of_results | integer |
The number of results. |
Location
|
meta_data.location | string |
The location. |
Organic Results
|
meta_data.number_of_organic_results | integer |
The number of organic results. |
Ads
|
meta_data.number_of_ads | integer |
The number of ads. |
Page
|
meta_data.number_of_page | integer |
The page number. |
No Results Message
|
meta_data.no_results_message | string |
The no results message. |
Organic Results
|
organic_results | array of object | |
URL
|
organic_results.url | string |
The URL address. |
Displayed URL
|
organic_results.displayed_url | string |
The displayed URL adress. |
Description
|
organic_results.description | string |
The description. |
Extra Info
|
organic_results.extra_info | string |
The extra info. |
Position
|
organic_results.position | integer |
The position. |
Title
|
organic_results.title | string |
The title. |
Local Results
|
local_results | array of string |
The local results. |
Top Ads
|
top_ads | string |
The top ads. |
Bottom Ads
|
bottom_ads | string |
The bottom ads. |
Related Queries
|
related_queries | array of object | |
Text
|
related_queries.text | string |
The text. |
Position
|
related_queries.position | integer |
The position. |
Questions
|
questions | array of string |
The questions. |
Scrap URL
Fetches the URL requested to scrap and will render JavaScript if requested.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
URL
|
url | True | string |
The URL you want to scrape. |
Render JS
|
render_js | True | boolean |
Render the website in an headless browser. |
JS Scenario
|
js_scenario | string |
Execute JavaScript before rendering. |
|
Wait
|
wait | integer |
Time to wait before rendering. |
|
Wait For
|
wait_for | string |
Wait for a particular element to appear in the DOM. |
|
Block Ads
|
block_ads | boolean |
Whether to block ads. |
|
Block Resources
|
block_resources | boolean |
Whether to block all images and CSS. |
|
Window Width
|
window_width | integer |
The width of the window to use. |
|
Window Height
|
window_height | integer |
The height of the window to use. |
|
Premium Proxy
|
premium_proxy | boolean |
Whether to use a proxy to scrap website. |
|
Country Code
|
country_code | string |
The proxy country to use to scrap website. |
|
Stealth Proxy
|
stealth_proxy | boolean |
Whether to use a stealth proxy to scrap website. |
|
Own Proxy
|
own_proxy | string |
Your own proxy to use. |
|
Extract Rules
|
extract_rules | string |
Extraction rules to parse the HTML before responding. |
|
Screenshot
|
screenshot | boolean |
Take a screenshot of the requested website. |
|
Screenshot Selector
|
screenshot_selector | string |
Take a screenshot of a particular CSS selector. |
|
Screenshot Full Page
|
screenshot_full_page | boolean |
Take a screenshot of the entire website. |
|
Return Page Source
|
return_page_source | boolean |
Return the page source as well. |
|
Session ID
|
session_id | integer |
All API requests using the same session_id will be routed through the same IP address for a duration of 5 minutes. |
|
Timeout
|
timeout | integer |
The maximum number of ms timeout, between 1000 and 140000 (default). |
|
Cookies
|
cookies | string |
Custom cookie to pass to the website. |
|
Device
|
device | string |
The kind of device sent to the server. |
|
Custom Google
|
custom_google | boolean |
Set to true if scraping webpage on Google or a Google subdomain. |
Returns
Name | Path | Type | Description |
---|---|---|---|
Body
|
body | string |
The body. |
Cookies
|
cookies | array of object | |
Name
|
cookies.name | string |
The name. |
Value
|
cookies.value | string |
The value. |
Domain
|
cookies.domain | string |
The domain. |
Path
|
cookies.path | string |
The path. |
Expires
|
cookies.expires | float |
When expires. |
Size
|
cookies.size | integer |
The size. |
HTTP Only
|
cookies.httpOnly | boolean |
Whether only HTTP. |
Secure
|
cookies.secure | boolean |
Whether secure. |
Session
|
cookies.session | boolean |
Whether session. |
Same Party
|
cookies.sameParty | boolean |
Whether the same party. |
Source Scheme
|
cookies.sourceScheme | string |
The source scheme. |
Source Port
|
cookies.sourcePort | integer |
The source port. |
Evaluated Results
|
evaluate_results | array of string |
The evaluated results. |
Age
|
headers.age | string |
The age. |
Cache Control
|
headers.cache-control | string |
The cache control. |
Content Encoding
|
headers.content-encoding | string |
The content encoding. |
Content Security Policy
|
headers.content-security-policy | string |
The content security policy. |
Content Type
|
headers.content-type | string |
The content type. |
Date
|
headers.date | string |
The date. |
ETag
|
headers.etag | string |
The eTag. |
Referrer Policy
|
headers.referrer-policy | string |
The referrer policy. |
Server
|
headers.server | string |
The server. |
Strict Transport Security
|
headers.strict-transport-security | string |
The strict transport security. |
X Content Type Options
|
headers.x-content-type-options | string |
The x content type options. |
X Frame Options
|
headers.x-frame-options | string |
The x frame options. |
X Matched Path
|
headers.x-matched-path | string |
The x matched path. |
X Powered By
|
headers.x-powered-by | string |
The x powered by. |
X Vercel Cache
|
headers.x-vercel-cache | string |
The x Vercel cache. |
X Vercel ID
|
headers.x-vercel-id | string |
The x Vercel identifier. |
Type
|
type | string |
The type. |
IFrames
|
iframes | array of string |
The iFrames. |
XHR
|
xhr | array of object | |
URL
|
xhr.url | string |
The URL address. |
Status Code
|
xhr.status_code | integer |
The status code. |
Method
|
xhr.method | string |
The method. |
Age
|
xhr.headers.age | string |
The age. |
Cache Control
|
xhr.headers.cache-control | string |
The cache control. |
Content Length
|
xhr.headers.content-length | string |
The content length. |
Content Security Policy
|
xhr.headers.content-security-policy | string |
The content security policy. |
Content Type
|
xhr.headers.content-type | string |
The content type. |
Date
|
xhr.headers.date | string |
The date. |
ETag
|
xhr.headers.etag | string |
The eTag. |
Referrer Policy
|
xhr.headers.referrer-policy | string |
The referrer policy. |
Server
|
xhr.headers.server | string |
The server. |
Strict Transport Security
|
xhr.headers.strict-transport-security | string |
The strict transport security. |
X Content Type Options
|
xhr.headers.x-content-type-options | string |
The X content type options. |
X Frame Options
|
xhr.headers.x-frame-options | string |
The X frame options. |
X Matched Path
|
xhr.headers.x-matched-path | string |
The X matching path. |
X Vercel Cache
|
xhr.headers.x-vercel-cache | string |
The X Vercel cache. |
X Vercel ID
|
xhr.headers.x-vercel-id | string |
The X Vercel identifier. |
Access Control Allow Origin
|
xhr.headers.access-control-allow-origin | string |
The access control allow origin. |
Access Control Expose Headers
|
xhr.headers.access-control-expose-headers | string |
The access control expose headers. |
Alt SVC
|
xhr.headers.alt-svc | string |
The alt SVC. |
Vary
|
xhr.headers.vary | string |
The vary. |
Via
|
xhr.headers.via | string |
The via. |
X Envoy Upstream Service Time
|
xhr.headers.x-envoy-upstream-service-time | string |
The X envoy upstream service time. |
X Amazon Request ID
|
xhr.headers.x-amzn-requestid | string |
The X Amazon request identifier. |
X Amazon Trace ID
|
xhr.headers.x-amzn-trace-id | string |
The X Amazon trace identifier. |
Body
|
xhr.body | string |
The body. |
Cost
|
cost | integer |
The cost. |
Initial Status Code
|
initial-status-code | integer |
The initial status code. |
Resolved URL
|
resolved-url | string |
The resolved URL address. |
Microdata
|
metadata.microdata | array of string |
The microdata. |
JSON LD
|
metadata.json-ld | array of object | |
Context
|
metadata.json-ld.@context | string |
The context. |
Type
|
metadata.json-ld.@type | string |
The type. |
Name
|
metadata.json-ld.name | string |
The name. |
URL
|
metadata.json-ld.url | string |
The URL address. |
Description
|
metadata.json-ld.description | string |
The description. |
Type
|
metadata.json-ld.mainEntityOfPage.@type | string |
The type. |
URL
|
metadata.json-ld.mainEntityOfPage.url | string |
The URL address. |
Type
|
metadata.json-ld.image.@type | string |
The type. |
URL
|
metadata.json-ld.image.url | string |
The URL address. |
Type
|
metadata.json-ld.publisher.@type | string |
The type. |
Name
|
metadata.json-ld.publisher.name | string |
The name. |
URL
|
metadata.json-ld.publisher.url | string |
The URL address. |
Same As
|
metadata.json-ld.sameAs | string |
The same as. |
Open Graph
|
metadata.opengraph | array of object | |
Open Graph Title
|
metadata.opengraph.og:title | string |
The Open Graph title. |
Open Graph Description
|
metadata.opengraph.og:description | string |
The Open Graph description. |
Open Graph Site Name
|
metadata.opengraph.og:site_name | string |
The Open Graph site name. |
Open Graph URL
|
metadata.opengraph.og:url | string |
The Open Graph URL address. |
Open Graph Image
|
metadata.opengraph.og:image | string |
The Open Graph image. |
Type
|
metadata.opengraph.@type | string |
The type. |
OG
|
metadata.opengraph.@context.og | string |
The Open Graph. |
Dublincore
|
metadata.dublincore | array of object | |
Elements
|
metadata.dublincore.elements | array of object | |
Name
|
metadata.dublincore.elements.name | string |
The name. |
Content
|
metadata.dublincore.elements.content | string |
The content. |
URI
|
metadata.dublincore.elements.URI | string |
The URI. |
Terms
|
metadata.dublincore.terms | array of string |
The terms. |