Site Scan
What is site scan ?
Site scan is a Smarty which processes web page and it's sub pages and also analyzes the failures from the website and provides a detailed report.
What can Site Scan do ?
  1. 1.
    It extracts all the links from the web page and check whether each link is broken or not.
  2. 2.
    It navigates into each and every page and does the above for every link extracted
  3. 3.
    It checks for the spelling mistakes in the whole web page.
  4. 4.
    It checks the W3C Standard for the HTML page and provides inputs to the developer on failures for quick fixes with HTML line numbers.
  5. 5.
    It provides the console error's of each web page scanned
  6. 6.
    It provides the list of excluded links which were not starting with the base URL.
Why i have to use Site Scan for my website
  1. 1.
    Easy to configure and execute a site scan with live reporting feature.
  2. 2.
    Easy understanding and filtering of the failures, When we are scanning a big / complex website.
  3. 3.
    Easy way to configure all the features on URL basis. Specific / Custom actions can be easily configured.
  4. 4.
    Download your report and analyze anywhere.
  5. 5.
    Rich analytics for the scanned data.
How to initiate site scan ?
Procedure :
  1. 1.
    Configure your smarty in QATTS.
  2. 2.
    Use your smarty in functionality.
  3. 3.
    Prepare your suite with the functionality.
  4. 4.
    Execute test run with your test suite.
QATTS will take care of scan and we can view and validate the site scan report with analytics. !!!
Now lets dive in deep understanding and doing the procedure.
Configuring the smarty :
What is smarty ?
Smarty is a function which does the beyond automation process. It does all the smart work requested by the user and return backs the simple value.
Step 1
  1. 1.
    Navigate to Studio -> Add Smarty.
  2. 2.
    Select the smarty type as "Site Scan".
  3. 3.
    Configuration Begins !!! We can configure our scan like below
    1. 1.
      Which page to launch.
    2. 2.
      What URL's to be considered in the scan.
    3. 3.
      How to identify and configure your "Image not Available".
    4. 4.
      Cleaning your content which has some dynamically generated URI's.
Better understanding of configuration
Property Name
Property Value
Description
baseURL
Provided URL will be launched in the browser and considered as the parent page.
cdnURL
List of URL's
Provided URL's will be replaced with empty or null in the HTML. Whenever we have a dynamic URL getting loaded with a pattern, we have to configure this property with URL's. This will help in calculating the size of the document
scanCheckType
FirstOccurance
This property helps in reducing the scan time and multiple hits on same URL explained as below. Ex : https://qatts.com/contact-us was the common URL in every page, I want to hit the URL only once in scan. Then we have to configure this property
props.page.exclude
contains:YOUR_VALUE,startsWith:YOUR_VALUE,endsWith:YOUR_VALUE,regex:YOUR_VALUE
There is a chance of using images / links coming from other domain / public domain. When we want to exclude them in our site scan we can configure them in the given formats.
props.page.include
contains:YOUR_VALUE,startsWith:YOUR_VALUE,endsWith:YOUR_VALUE,regex:YOUR_VALUE
By default QATTS will exclude all the URL's that were not starting with the baseURL. If we want to include any links we can configure this property with the given formats
inputSiteData
FILE_NAME.xlsx
Whenever we are running scan on regualr basis, we need to check the content and size of the source was not getting changed. To validate this we need to maintain all scan details in a given format for comparision.
props.page.textTags
Ex : h1,h2,h3
Whenever we are doing the spelling checks, the content in the HTML will be saved in tag. User has to identify thoses tags and provide here.
missingImages
Ex : imageNotAvaliable.png, noImage.jpg
Some times HTML pages were not loaded properly and we will be seeing missing images in the HTML. To identify them in the site scan, user has to identify and configure here. If the user provided values were found anywhere in the document, The HTML page misssing image section would be displayed as YES
props.page.checks
link:src,link:href,img:src,a:href,li:href
By default QATTS extracts the links with the given. If the user wants to customize the extraction of link this can be used for configuring. if the value is "custom". User can write his own locator for extracting elements and configure below property. props.page.element : UI_ELEMENT_NAME
props.page.contentCleanExp
Ex: replace:[(?<=cfemail=).*?(?=>)]:[]
Whenever HTML pages were having the dynamic content, which will change everytime. User has to identify them and configure here. It will help in calculation the size of source and comparision perfectly
Using Smarty in Functionality :
After configuring our smarty with requirements, Navigate to Functionality
1
// doSmarty('SMARTY_NAME','LIST_OF_MODES')
Copied!
How simple
😄
😄
Lets understand modes of site scan and when to use
Mode Name
Mode Description
When should I use
sample
sample:100, Only the given number of URLS's will be scanned and report will be provided
Whenever we received changes to the website / we have changed the configurations of smarty
saveContent
Content will be saved for the future validations
When we initially run scan, to compare multiple scan saved files. It will be helpful
scanOnly
Only scans the website without validating with input data
When the configurations has been changed, and there was no input data. We have to use this
PageValidations
Performs W3C Checks on each and every web page and gives the results which will be helpful for DEV
Use this mode only for the first stabilized scan, This will consume time
SpellChecks
Performs spell checks on the tags provided
Use this mode only when there were content changes to website, Additional cost will be charged for this
ConsoleErrors
Extracts all the console error's and gives by URL
Use this mode when there were redirectional changes / source changes in the website
Best Practice's for the Site Scan
Site Scan Report
How to understand and analyze ?
Site scan report provides you 3 types of validations
  1. 1.
    Time Taken Validation -> It gives us the Maximum time limit of the source and how much it has taken in the current scan.
Status : Slow -> When the request has taken more than the max limit
Result -> Fail
2. Content Checksum validation -> QATTS generate a checksum value for each source content, It also has to be provided in the input site scan.
Status : Changed -> When the generated checksum value is not matched with the input checksum value.
Result: Fail
3. Content Size Validation -> QATTS generate the size for each source, It also has to be provided in the input site scan.
Status : Fail -> When the generated size is not matched with the input size value
Error : When we were not able to get the response from server, ERROR message provided by the server will given in result message column.
Result :
Pass : When all the above validations were passed
Fail : When any one of above validation is failed
Error : When the server was not able to give response. In this case status code will be -1
New : When the scan is run in scanOnly mode, result would be NEW and no validations will happen
AND
When there were new URL's extracted and they were not provided in the input scan.
Missing : When the URL's were provided in the input site scan and they were not extracted during the site scan. (Complete analyzation is required in this case)
Status Code : It is a code provided by the standard HTTP as below
Status Code
Client / Server
Description
200
Server
When the user receives a successful response
40X
Client
When there was some thing missing from user. We receive this kind of errors
50X
Server
When the server was unable to give response / it is taking more time to handle request. These type of errors appear
Missing Images : Missing a image is quite common these days, When we were having 100's of web pages in website. QATTS site scan will tell you "YES" when there were some missing images in a HTML page.
How to prepare the site scan INPUT data ?
Pre-Requisite : We have to be ready with our scanOnly report. It has to be analyzed and see there were no error's
Now in the report -> Filter status code (200) Only. Because giving the failure information will give failures again.
Copy base URL, resource URL, Max Limit, Check Sum, Content Size and paste in the input site scan.
Q : My site scan was stable, New pages were added. Do i need to do scanOnly again ?
A : No need, Along with the filter result (NEW) from your report and copy only those to your input.
Q : I want to run W3Checks / Console error / Spelling checks only for specific URL's. Is it possible ?
A : Yes, we can configure that in the input site scan ACTIONS column
Q : How can i maintain my site scan ?
A : Maintaining site scan is very easy, if the new URL's were added add them to input scan. If there were complete content changes, Run the base scan once with saveContent and then prepare your input data , execute.
Q : How can i get the report after site scan
A : Navigate to your test run -> click on download site scan. A .xlsx file will be generated and given along with analytics.
Q : How understand and filter failures in the site scan report
A : User can filter report on multiple basis Filter by status code -> To get the failure count easily
Filter by Result (New) -> To know how many new URL's have been extracted.
Filter by content size -> To see how many source's have changed -> Can send to dev for quick fixes.
Copy link