Behind the Screen: An insight into Context’s testing data

One of our Context Account Managers, Alexander Roxon, took a closer look at Context’s 2019 penetration testing data sets and caught up with some of our Consultants to get their view on the findings.

By Alexander Roxon

Account Manager

21 Apr 2020

Across 2019, Context conducted an enormous amount of penetration testing for our clients, testing many websites, applications and products that millions of us use on a daily basis. All this testing produced a treasure trove of data to explore. Across this blog we will look into three separate trends and intrigues seen from Context’s 2019 data set. To get an enhanced perspective on the trends we’ve seen, Context’s assurance consultants were asked to provide their opinion on the various findings.

In this first section, we look at Context’s individual service lines and the most frequently found high severity issues found within each. This is followed by a section exploring the effectiveness of time limited testing, finishing with a comparison of the different vulnerability ratings systems utilised.


High severity issues and where to find them

Service Comparison

Below is the distribution of findings per day for each of Context’s most commonly delivered service lines.

For the majority of the data you can see a reasonably standard spread between 2-3 vulnerabilities per day in five out of seven service lines. It is unsurprising that build & configuration reviews take the top spot, the scripts that are run undergo continuous improvement, maximising their efficiency. On top of this, build review engagements typically only span a single day of testing.

With cloud security assessments producing both the second most findings per day as well as second most critical/high severity issues per day and crucially not being entirely script based, it seemed logical to get the view of one of our consultants working at the proverbial coal face:

“The vast majority of clients have either already deployed all or part of their infrastructure in the cloud or are assessing the feasibility of doing so. As a comparatively new IT infrastructure hosting option security is not as well understood as with traditional on premise environments. This gap is evident in the results and illustrates why securing the cloud should be a priority for any security team.

The complexity of deploying a secure cloud configuration is compounded by the fine grained availability of permissions and how they are assigned to various computer and human users. Due to this complexity it is very common that the principle of least privilege is not followed leaving cloud services accessible from the internet or from within a private environment by exploiting an application layer weakness.” (Ranulf Green – Technical Lead North America)

Vulnerability Comparison

Next we took a look at the top three most commonly reported critical or high severity vulnerabilities found for each service line across 2019.

We asked a consultant for their thoughts on these findings.

The statistics show an unexpected consistency for the same vulnerabilities across all service lines. This indicates a common root cause which is independent of vendor, environment, type of assessment or product.

A closer look at the two most common vulnerabilities shows that they correlate to inadequate patching policies or patch management as well as the usage of clear-text protocols for the transfer of sensitive information. The underlying issue for both of these issues can be traced back to missing security processes in the development, deployment and maintenance of IT assets. This highlights the needs for continuous security analysis at all stages of a project life-cycle.” (Dominik Schelle – Senior Assurance Consultant)

It should be noted that complex services such as scenario assessments and Red Teams are objective led as opposed to methodology led, this makes them difficult to interpret as depending on the scenario, testing may end prematurely based on an objective being met or the Red Team being discovered, for this reason these services are excluded.

If you would like to read about the trends seen in Red Teams feel free to check out a previous post that looked into this area.


Does Time Limited Testing Work?

Time Limited Testing

Time limited testing is essentially a penetration test of a system, with no guarantee of full coverage. This typically occurs due to one of three reasons:

  • Budgetary constraint - the client would like the test to occur under a fixed cost.
  • Ambiguous scope - the client struggles to define a scope but knows that some testing needs to be done.
  • Low risk assets – the client is comfortable with us performing a test with limited coverage as the asset is either reaching end of life or is deemed to be of low business value.

Any of these reasons results in time limited testing, where less than the ideal number of scoped days is allocated to the penetration tester.

Context had a look at how time limited testing impacts the tester’s rate of vulnerabilities found per day within web application testing.

Context find 7 % more findings per day when the tester is not restricted to time limited testing. For some customers there is a belief that a tester can focus their efforts on finding the juiciest vulnerabilities when time limited, as the tester can ignore the little informational issues and focus on finding critical vulnerabilities. So how does the hypothesis compare to reality? To check we change the question to: "Do we find more impactful findings per day in time limited testing?"

The % difference is now a whopping 20 %. In other words, Context find 20 % more critical/high severity findings per day in normal web application testing as compared to time limited testing. The majority of Context consultants predicted this, with 75 % believing they find more vulnerabilities per day when not restricted by time limited testing.

Rather than jumping to conclusions, let’s get more data and look at another service line. We will be looking at the data for Internal Infrastructure testing as this has a much higher rate of time limited testing (45%), compared to External Infrastructure testing (12%).

We begin with the same question: “Do we find more vulnerabilities per day when time limited?”

Seemingly not, with testers finding 30 % more vulnerabilities per day when not time limited. Again, is this reduced efficiency justified as the testers are focusing on areas likely to yield critical/high severity findings?

The difference is even starker than that seen in web applications. During internal infrastructure tests, Context consultants find 45 % more critical/high severity findings per day when not time limited. As discussed, this doesn’t surprise Context consultants at all:

“Low hanging fruit, or findings that typically result in lower impact issues are sacrificed in time limited testing. A lack of time means a lack of coverage and a lack of coverage means some elements remain untested. These elements could contain issues of a low or high impact. Crucially, those low impact issues could have been chained together to form a higher impact finding under the right circumstances, however time-limited testing reduces the amount of time a consultant is able to do this. This is why we recommend comprehensive testing.” (Sasha Zivojinovic – Complex Projects Lead)

In summary, if you want to increase the chances of your penetration tester finding the most significant vulnerabilities within your applications and infrastructure, remember to place as few restrictions on the tester as is within reason. In section three we finish up by looking into the different vulnerability matrixes used to report vulnerabilities, analysing the bias that each introduces to penetration testing findings.


Vulnerability Ratings System Comparison

Vulnerability ratings

When Context discovers a vulnerability, the way we represent the finding in the final report will vary depending on the vulnerability matrix the client has chosen. The 4 most commonly utilised matrices are Common Vulnerability Scoring System (CVSS) v2, CVSS v3 and two Context proprietary ratings system named Impact rating & Risk rating. These systems are all trying to provide a simple and intuitive measure for customers to instantly be able to understand how important a penetration test finding is within the context of their business and the item tested.

CVSS v2 and v3 aim to compute a number of technical components belonging to an issue and to summarise their overall score in a rating from 0 to 10 (10 being the most critical). The Risk and Impact ratings systems output a rating based on either the product of impact & likelihood, as outlined below, or based purely on impact.

Risk Lookup Table:

Below is a table which shows how each of the vulnerability ratings systems roughly align with each other.

Because of the differences in deducing severity, a vulnerability can appear as “Critical” severity for one rating but only “high” severity in another. Context developed the Risk rating system with the goal of providing a more holistic view on vulnerabilities than the in-house developed Impact rating or externally developed CVSS systems. But did this work?

Does Risk rating prioritise the vulnerabilities Context thinks matter most?

To answer this we needed to identify vulnerabilities that one rating system prioritises compared to another. In essence, which vulnerabilities are given more importance by one rating system, but not by others. To calculate this, we first convert every vulnerability instance into a standardised score as shown below.

For example, an instance of Cross Site Scripting reported as “Medium” impact would have a Standardised Severity score of 3. Or an instance of Cross Site Scripting reported as 7.1 in CVSS v2 would have a score of 4. We can then take averages for each vulnerability over each ratings index. These averages are then be expressed as a % difference, looking for the largest differences between the averaged scores.

We took the 3 biggest spikes in either direction, representing the vulnerabilities with the biggest discrepancy between the two ratings systems being compared. As an example:

Top 3 vulnerabilities that Risk rating prioritises above CVSS v2 rating

  • Digitally Signed SMB Signatures not enforced
  • Debugging enabled
  • HTML Source Code Leaks Software Version Details

Top 3 vulnerabilities that CVSS v2 rating prioritises above Risk rating

  • Verbose Tomcat Error Pages
  • Certification Authority Authorisation not Implemented
  • TLS Version 1.0 and 1.1 Supported

Which list of vulnerabilities would you be more worried about if they existed within your estate?

By providing the Context Information Security assurance team with the same conundrum, (and a few others), we were able to analyse whether the Risk rating system aligns best with the consultant population’s impression of severity. It should be noted that this was conducted from a blind perspective, consultants did not know which rating system each list was aligned to.

The Results

77.3 % of Context consultants thought that the first list contained the more significant vulnerabilities, in essence concluding that 77.3 % of consultants think that Context’s Risk rating system is “better” than CVSS v2.

97.8 % of Context consultants thought that Risk rating was “better” than CVSS v3 rating.

83.7 % of Context consultants thought that Risk rating was “better” than Impact rating.

Which is good news for Context, validating the design and usage of the in-house developed Risk rating system.

Consultant Comment

“Risk rating is definitely an improvement over using Impact as it allows us to provide a more granular and representative assessment of a finding; but if you want to harmonise results from multiple pentest suppliers, or your risk management system works really well with technical ratings, then there’s a strong case for using CVSS v3 instead.” (Robert Nicholls – Managing Consultant (Assurance))

“The Context Risk rating gives more flexibility for rating a vulnerability than CVSS v2 or CVSSv3. In particular it offers our consultants more freedom when determining the likelihood of an attacker successfully exploiting the vulnerability, which is then reflected in the overall rating. This translates into a more appropriate ordering of the issues in the report, making it easier for you to prioritise fixing the issues most likely to be actively exploited by an attacker first.” (Dan Cater – Senior Consultant (Assurance))


Context consultants thought that the vulnerabilities prioritised by the Risk rating system were more important than the vulnerabilities prioritised by the CVSS v2, CVSS v3 and Impact ratings system in a blind experiment. You could always take the quiz yourself if you would like to make sure your vulnerability rating system is best aligned with your own perception/priorities, just ask your account manager for a link.

Series Review

Across the three part series we have looked to provide insight to customers, showcasing where and what findings we saw across our service lines in 2019. From there moving into analysis probing the effectiveness of time limited testing compared to a comprehensive scope. Lastly we’ve looked at the different vulnerability ratings systems, gaining a view on how each ratings system holds up against Context’s recommended Risk rating system. Hopefully this has provided some food for thought and distraction during this difficult time.

If you have a question for us or require any further information, please get in touch.

Contact Us

About Alexander Roxon

Account Manager

Alex is an account manager who specialises in designing penetration testing programmes that prioritise business risk. SSCP, FAIR.

Find out more

Book a Meeting

CHECK IT Health Check Service
Cyber Essentials
CESG Certified Service
First - Improving Security Together
BSI ISO 9001 FS 581360
BSI ISO 27001 IS 553326
PCI - Approved Scanning Vendor