Site Scan

Site scan page processes the web page and it's sub pages and analyzes the failures from the website and provides a detailed report.

Site scan is a Smarty which processes web page and it's sub pages and also analyzes the failures from the website and provides a detailed report.

What can Site Scan do ?

  1. It extracts all the links from the web page and check whether each link is broken or not.

  2. It navigates into each and every page and does the above for every link extracted

  3. It checks for the spelling mistakes in the whole web page.

  4. It checks the W3C Standard for the HTML page and provides inputs to the developer on failures for quick fixes with HTML line numbers.

  5. It provides the console error's of each web page scanned

  6. It provides the list of excluded links which were not starting with the base URL.

Why i have to use Site Scan for my website

  1. Easy to configure and execute a site scan with live reporting feature.

  2. Easy understanding and filtering of the failures, When we are scanning a big / complex website.

  3. Easy way to configure all the features on URL basis. Specific / Custom actions can be easily configured.

  4. Download your report and analyze anywhere.

  5. Rich analytics for the scanned data.

How to initiate site scan ?

Procedure :

  1. Configure your smarty in QATTS.

  2. Use your smarty in functionality.

  3. Prepare your suite with the functionality.

  4. Execute test run with your test suite.

QATTS will take care of scan and we can view and validate the site scan report with analytics. !!!

Now lets dive in deep understanding and doing the procedure.

Configuring the smarty :

What is smarty ?

Smarty is a function which does the beyond automation process. It does all the smart work requested by the user and return backs the simple value.

Step 1

  1. Navigate to Studio -> Add Smarty.

  2. Select the smarty type as "Site Scan".

  3. Configuration Begins !!! We can configure our scan like below

    1. Which page to launch.

    2. What URL's to be considered in the scan.

    3. How to identify and configure your "Image not Available".

    4. Cleaning your content which has some dynamically generated URI's.

Better understanding of configuration

Property Name

Property Value

Description

baseURL

Provided URL will be launched in the browser and considered as the parent page.

cdnURL

List of URL's

Provided URL's will be replaced with empty or null in the HTML. Whenever we have a dynamic URL getting loaded with a pattern, we have to configure this property with URL's. This will help in calculating the size of the document

scanCheckType

FirstOccurance

This property helps in reducing the scan time and multiple hits on same URL explained as below. Ex : https://qatts.com/contact-us was the common URL in every page, I want to hit the URL only once in scan. Then we have to configure this property

props.page.exclude

contains:YOUR_VALUE,startsWith:YOUR_VALUE,endsWith:YOUR_VALUE,regex:YOUR_VALUE

There is a chance of using images / links coming from other domain / public domain. When we want to exclude them in our site scan we can configure them in the given formats.

props.page.include

contains:YOUR_VALUE,startsWith:YOUR_VALUE,endsWith:YOUR_VALUE,regex:YOUR_VALUE

By default QATTS will exclude all the URL's that were not starting with the baseURL. If we want to include any links we can configure this property with the given formats

inputSiteData

FILE_NAME.xlsx

Whenever we are running scan on regualr basis, we need to check the content and size of the source was not getting changed. To validate this we need to maintain all scan details in a given format for comparision.

props.page.textTags

Ex : h1,h2,h3

Whenever we are doing the spelling checks, the content in the HTML will be saved in tag. User has to identify thoses tags and provide here.

missingImages

Ex : imageNotAvaliable.png, noImage.jpg

Some times HTML pages were not loaded properly and we will be seeing missing images in the HTML. To identify them in the site scan, user has to identify and configure here. If the user provided values were found anywhere in the document, The HTML page misssing image section would be displayed as YES

props.page.checks

link:src,link:href,img:src,a:href,li:href

By default QATTS extracts the links with the given. If the user wants to customize the extraction of link this can be used for configuring. if the value is "custom". User can write his own locator for extracting elements and configure below property. props.page.element : UI_ELEMENT_NAME

props.page.contentCleanExp

Ex: replace:[(?<=cfemail=).*?(?=>)]:[]

Whenever HTML pages were having the dynamic content, which will change everytime. User has to identify them and configure here. It will help in calculation the size of source and comparision perfectly

Using Smarty in Functionality :

After configuring our smarty with requirements, Navigate to Functionality

// doSmarty('SMARTY_NAME','LIST_OF_MODES')

Lets understand modes of site scan and when to use

Mode Name
Mode Description
When should I use

sample

sample:100, Only the given number of URLS's will be scanned and report will be provided

Whenever we received changes to the website / we have changed the configurations of smarty

saveContent

Content will be saved for the future validations

When we initially run scan, to compare multiple scan saved files. It will be helpful

scanOnly

Only scans the website without validating with input data

When the configurations has been changed, and there was no input data. We have to use this

PageValidations

Performs W3C Checks on each and every web page and gives the results which will be helpful for DEV

Use this mode only for the first stabilized scan, This will consume time

SpellChecks

Performs spell checks on the tags provided

Use this mode only when there were content changes to website, Additional cost will be charged for this

ConsoleErrors

Extracts all the console error's and gives by URL

Use this mode when there were redirectional changes / source changes in the website

Best Practice's for the Site Scan

Site Scan Report

How to understand and analyze ?

Site scan report provides you 3 types of validations

  1. Time Taken Validation -> It gives us the Maximum time limit of the source and how much it has taken in the current scan.

Status : Slow -> When the request has taken more than the max limit

Result -> Fail

2. Content Checksum validation -> QATTS generate a checksum value for each source content, It also has to be provided in the input site scan.

Status : Changed -> When the generated checksum value is not matched with the input checksum value.

Result: Fail

3. Content Size Validation -> QATTS generate the size for each source, It also has to be provided in the input site scan.

Status : Fail -> When the generated size is not matched with the input size value

Error : When we were not able to get the response from server, ERROR message provided by the server will given in result message column.

Result :

Pass : When all the above validations were passed

Fail : When any one of above validation is failed

Error : When the server was not able to give response. In this case status code will be -1

New : When the scan is run in scanOnly mode, result would be NEW and no validations will happen

AND

When there were new URL's extracted and they were not provided in the input scan.

Missing : When the URL's were provided in the input site scan and they were not extracted during the site scan. (Complete analyzation is required in this case)

Status Code : It is a code provided by the standard HTTP as below

Status Code
Client / Server
Description

200

Server

When the user receives a successful response

40X

Client

When there was some thing missing from user. We receive this kind of errors

50X

Server

When the server was unable to give response / it is taking more time to handle request. These type of errors appear

Missing Images : Missing a image is quite common these days, When we were having 100's of web pages in website. QATTS site scan will tell you "YES" when there were some missing images in a HTML page.

How to prepare the site scan INPUT data ?

Pre-Requisite : We have to be ready with our scanOnly report. It has to be analyzed and see there were no error's

Now in the report -> Filter status code (200) Only. Because giving the failure information will give failures again.

Copy base URL, resource URL, Max Limit, Check Sum, Content Size and paste in the input site scan.

Q : My site scan was stable, New pages were added. Do i need to do scanOnly again ?

A : No need, Along with the filter result (NEW) from your report and copy only those to your input.

Q : I want to run W3Checks / Console error / Spelling checks only for specific URL's. Is it possible ?

A : Yes, we can configure that in the input site scan ACTIONS column

Q : How can i maintain my site scan ?

A : Maintaining site scan is very easy, if the new URL's were added add them to input scan. If there were complete content changes, Run the base scan once with saveContent and then prepare your input data , execute.

Q : How can i get the report after site scan

A : Navigate to your test run -> click on download site scan. A .xlsx file will be generated and given along with analytics.

Q : How understand and filter failures in the site scan report

A : User can filter report on multiple basis Filter by status code -> To get the failure count easily

Filter by Result (New) -> To know how many new URL's have been extracted.

Filter by content size -> To see how many source's have changed -> Can send to dev for quick fixes.

Last updated