Share via


GSA Site Scanning (Independent Publisher) (Preview)

Explore comprehensive insights into the health and compliance of US federal websites, offering a window into the dynamics and standards of the federal web presence. Through automated scans, this service generates detailed data on website policy compliance and best practices, enhancing the accessibility and management of government digital assets.

This connector is available in the following products and regions:

Service Class Regions
Logic Apps Standard All Logic Apps regions except the following:
     -   Azure Government regions
     -   Azure China regions
     -   US Department of Defense (DoD)
Power Automate Premium All Power Automate regions except the following:
     -   US Government (GCC)
     -   US Government (GCC High)
     -   China Cloud operated by 21Vianet
     -   US Department of Defense (DoD)
Power Apps Premium All Power Apps regions except the following:
     -   US Government (GCC)
     -   US Government (GCC High)
     -   China Cloud operated by 21Vianet
     -   US Department of Defense (DoD)
Contact
Name Richard Wilson
URL https://www.richardawilson.com/
Email richard.a.wilson@microsoft.com
Connector Metadata
Publisher Richard Wilson
Website https://open.gsa.gov/api/site-scanning-api
Privacy policy https://www.gsa.gov/technology/government-it-initiatives/digital-strategy/terms-of-service-for-developer-resources
Categories IT Operations

Creating a connection

The connector supports the following authentication types:

Default Parameters for creating connection. All regions Not shareable

Default

Applicable: All regions

Parameters for creating connection.

This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.

Name Type Description Required
GSA API Key securestring The GSA API key which can be obtained from https://open.gsa.gov/api/site-scanning-api/ True

Throttling Limits

Name Calls Renewal Period
API calls per connection 100 60 seconds

Actions

Perform Website Analysis

Performs a comprehensive analysis of websites based on various parameters such as target URL, final URL, and scan status.

Retrieve Website Information

Fetches details about websites, including the target and final URLs, ownership, scan status, and analytics detection.

Retrieve Website Information by URL

Fetches detailed information about a website based on the specified URL.

Perform Website Analysis

Performs a comprehensive analysis of websites based on various parameters such as target URL, final URL, and scan status.

Parameters

Name Key Required Type Description
Target URL Domain
target_url_domain string

The domain name plus the top-level domain (TLD) of the Target URL Domain. This parameter specifies the starting point of the scanner, contrasting with the Final URL Domain where the scan concludes after redirects.

Final URL Domain
final_url_domain string

The domain name plus the top-level domain (TLD) of the Final URL Domain. The Final URL Domain is where the scanner ends up after following redirects, in contrast to the Target URL Domain.

Final URL Live
final_url_live boolean

Indicates whether the Final URL is live by returning an HTTP status code in the 2xx family.

Target URL Redirects
target_url_redirects boolean

A boolean value indicating whether the Target URL redirects, which occurs when a 3xx HTTP status code is returned. Note that scanners have caching disabled, so 304 HTTP status codes are not present.

Target URL Agency Owner
target_url_agency_owner string

Specifies the agency that owns or operates the website associated with the Target URL.

Target URL Bureau Owner
target_url_bureau_owner string

Specifies the bureau that owns or operates the website associated with the Target URL.

Scan Status
primary_scan_status string

Captures the status of the website scan and any known reasons for failure. The value unknown_error is reserved for errors not yet encoded in the system.

DAP Detected at Final URL
dap_detected_final_url boolean

A boolean value indicating whether the Digital Analytics Program (DAP) is detected at the Final URL.

Returns

Retrieve Website Information

Fetches details about websites, including the target and final URLs, ownership, scan status, and analytics detection.

Parameters

Name Key Required Type Description
Target URL Domain
target_url_domain string

The domain name plus the top-level domain (TLD) of the Target URL Domain. The Target URL is where the scanner starts, contrasting with the Final URL, where the scanner ends after redirects.

Final URL Domain
final_url_domain string

The domain name plus the top-level domain (TLD) of the Final URL Domain. The Final URL is where the scanner ends after following redirects, in contrast to the Target URL.

Final URL Live
final_url_live boolean

Indicates whether the Final URL is live, returning an HTTP status code in the 2xx family.

Target URL Redirects
target_url_redirects boolean

Records if the Target URL redirects (true if a 3xx HTTP status code is returned). Scanners have caching disabled, thus 304 status codes are absent.

Target URL Agency Owner
target_url_agency_owner string

The agency that owns or operates the website associated with the Target URL.

Target URL Bureau Owner
target_url_bureau_owner string

The bureau that owns or operates the website associated with the Target URL.

Scan Status
primary_scan_status string

Captures the status of the scan and any known reason for failure. unknown_error is reserved for unencoded errors.

DAP Detected at Final URL
dap_detected_final_url boolean

Indicates if the Digital Analytics Program (DAP) is detected at the Final URL.

Limit
limit integer

Specifies the number of items to return in a single page of results.

Page
page integer

Specifies the page number of the results to retrieve.

Returns

Retrieve Website Information by URL

Fetches detailed information about a website based on the specified URL.

Parameters

Name Key Required Type Description
Website URL
url True string

The URL of the website to retrieve information for. This should include the domain name and any relevant path components.

Returns

Definitions

AnalysisDto

Name Path Type Description
Total Analyzed Items
total number

The total number of items analyzed.

Total Agencies Analyzed
totalAgencies number

The total number of agencies for which the websites were analyzed.

Total Final URL Base Domains
totalFinalUrlBaseDomains number

The total number of unique final URL base domains analyzed.

PaginatedWebsiteResponseDto

Name Path Type Description
Website Items
items array of WebsiteApiResultDto

An array of website results.

First Page Link
links.first string

A link to the first page of results.

Last Page Link
links.last string

A link to the last page of results.

Next Page Link
links.next string

A link to the next page of results. On the last page of results, this will be an empty string.

Previous Page Link
links.previous string

A link to the previous page of results. On the first page of results, this will be an empty string.

Current Page
meta.currentPage number

The current page number.

Item Count
meta.itemCount number

The number of items in the PaginationResponseDto.items array.

Items Per Page
meta.itemsPerPage number

The number of items per page. This should be the same as the limit query parameter.

Total Items
meta.totalItems number

The total number of items that match the query.

Total Pages
meta.totalPages number

The total number of pages, calculated as floor(totalItems / itemsPerPage).

WebsiteApiResultDto

Name Path Type Description
Canonical Link
canonical_link string

Indicates the presence of a canonical link tag.

Cloud.gov Pages Hosting
cloud_dot_gov_pages boolean

Indicates that the final URL is hosted using Cloud.gov Pages.

Content Management System (CMS)
cms string

Indicates the content management system used to host the final URL.

DAP Detected at Final URL
dap_detected_final_url boolean

A boolean representing the presence of the Digital Analytics Program on the final URL.

DAP Parameters at Final URL
dap_parameters_final_url object

An object with Digital Analytics Program parameter keys and values at the final URL.

DNS Hostname
dns_hostname string

The domain of the underlying system, often suggesting the use of a cloud or CDN provider.

Final URL
final_url string

The URL after any redirects from the target URL.

Final URL MIME Type
final_url_MIMEType string

The MIME type of the final URL extracted from the Content-Type header.

Final URL Domain
final_url_domain string

The domain name + top-level domain of the final URL.

Final URL Live
final_url_live boolean

A boolean representing whether the final URL returned a 2xx family HTTP status code.

Final URL Same Domain
final_url_same_domain boolean

A boolean field representing whether the final URL is in the same domain as the target URL. If false, this implies a redirect.

Final URL Same Website
final_url_same_website boolean

Indicates if the final URL has a different path or domain from the target URL.

Final URL Status Code
final_url_status_code number

The HTTP status code of the final URL.

Final URL Website
final_url_website string

Includes the subdomain and the top-level domain of the final URL.

Main Element Presence at Final URL
main_element_present_final_url boolean

Indicates whether the element is present at the final URL.

Open Graph Article Modified Date at Final URL
og_article_modified_final_url string

The Open Graph article modified tag if available on the final URL.

Open Graph Article Published Date at Final URL
og_article_published_final_url string

The Open Graph article published tag if available on the final URL.

Open Graph Description at Final URL
og_description_final_url string

The Open Graph description tag if found on the final URL.

Open Graph Title at Final URL
og_title_final_url string

The Open Graph title tag if found on the final URL.

Robots.txt Crawl Delay
robots_txt_crawl_delay integer

The crawl delay value in seconds, if present in the robots.txt file.

Robots.txt Detected
robots_txt_detected boolean

Indicates whether the robots.txt file is detected.

Robots.txt Final URL
robots_txt_final_url string

The final URL of the robots.txt file after any redirects.

Robots.txt Final URL MIME Type
robots_txt_final_url_MIMETYPE string

The MIME type of the robots.txt page extracted from the Content-Type header.

Robots.txt Final URL Live
robots_txt_final_url_live boolean

Indicates whether the robots.txt final URL HTTP status is in the 2xx family.

Robots.txt Final URL Size in Bytes
robots_txt_final_url_size_in_bytes number

The file size of the robots.txt file in bytes.

Robots.txt Final URL Status Code
robots_txt_final_url_status_code number

The HTTP status code of the robots.txt final URL.

Robots.txt Target URL Redirects
robots_txt_target_url_redirects boolean

Indicates whether the target robots.txt URL redirects. This targets the robots.txt file specifically.

Scan Date
scan_date string

The datetime when the scan was performed.

Scan Status
primary_scan_status string

The success status of the Core Scan.

Sitemap.xml URL Count
sitemap_xml_count integer

Indicates the number of elements found in the sitemap.xml file.

Sitemap.xml Detected
sitemap_xml_detected boolean

Indicates whether the sitemap.xml file is found.

Sitemap.xml Final URL
sitemap_xml_final_url string

The final URL of the sitemap.xml page after any redirects.

Sitemap.xml Final URL MIME Type
sitemap_xml_final_url_MIMETYPE string

The MIME type of the sitemap.xml final URL extracted from the Content-Type header.

Sitemap.xml Final URL Filesize
sitemap_xml_final_url_filesize integer

The filesize of the sitemap.xml page in bytes.

Sitemap.xml Final URL Live
sitemap_xml_final_url_live boolean

Indicates whether the sitemap.xml final URL status code is in the 2xx family.

Sitemap.xml Final URL Status Code
sitemap_xml_final_url_status_code number

The HTTP status code of the sitemap.xml page.

Sitemap.xml PDF URL Count
sitemap_xml_pdf_count integer

The number of URLs that have the PDF extension in the sitemap.xml.

Sitemap.xml Target URL Redirects
sitemap_xml_target_url_redirects boolean

Indicates whether the sitemap.xml page redirects. This targets the sitemap.xml file specifically.

Sourced from DAP List
source_list_dap boolean

Indicates whether the Digital Analytics Program provided this URL for the Target URL List.

Sourced from Federal Domains List
source_list_federal_domains boolean

Indicates whether the List of Federal Domains provided this URL for the Target URL List.

Sourced from Other Lists
source_list_other boolean

Indicates whether a manually maintained list of additional websites provided this URL for the Target URL List.

Sourced from Pulse CIO List
source_list_pulse boolean

Indicates whether the pulse.cio.gov Snapshot provided this URL for the Target URL List.

Target URL
target_url string

The URL the scanner starts the scan with.

Target URL 404 Test
target_url_404_test boolean

Tests whether the target URL properly handles 404s by calling a UUID-based pathname.

Target URL Agency Owner
target_url_agency_owner string

The agency that owns the target URL.

Target URL Government Branch
target_url_branch string

The branch of government that the URL is associated with.

Target URL Bureau Owner
target_url_bureau_owner string

The bureau that owns the target URL.

Target URL Domain
target_url_domain string

The base domain (domain name + top-level domain) of the target URL.

Target URL Redirects
target_url_redirects boolean

Indicates whether the target URL redirects.

Third-party Service Count
third_party_service_count number

The number of third-party services found.

Third-party Service Domains
third_party_service_domains array of string

A list of third-party services making outbound calls from the final URL. A third-party is defined as not matching the hostname of the URL.

USWDS Count
uswds_count number

The total of all USWDS likelihood heuristics in a sum.

USWDS Favicon
uswds_favicon number

The presence of the USWDS US Flag favicon in HTML source. Presence adds 20 points to the USWDS likelihood heuristic.

USWDS Favicon in CSS
uswds_favicon_in_css number

The presence of the USWDS US Flag favicon in CSS source. Presence adds 20 points to the USWDS likelihood heuristic.

USWDS Inline CSS
uswds_inline_css number

The number of occurrences of .usa- CSS classes in inline HTML source.

USWDS Public Sans Font
uswds_publicsans_font number

The presence of the Public Sans font in CSS source. Presence adds 20 points to the USWDS likelihood heuristic.

USWDS Semantic Version
uswds_semantic_version string

The semantic version string of USWDS.

USWDS Source Sans Font
uswds_source_sans_font number

The presence of the Source Sans font in CSS source. Presence adds 5 points to the USWDS likelihood heuristic.

USWDS String Occurrences
uswds_string number

The number of times the string uswds occurs in the HTML source.

USWDS String in CSS
uswds_string_in_css number

The number of occurrences of uswds in the CSS source.

USWDS Tables
uswds_tables number

A calculation of the (number of HTML

elements) * -10.
elements are a negative heuristic indicator of the presence of USWDS.

USWDS USA Classes
uswds_usa_classes number

The number of CSS classes found that start with .usa-.

USWDS Version
uswds_version number

The presence of the USWDS version in CSS source. Presence adds 20 points to the USWDS likelihood heuristic.