Help me design coverage metrics

I’m building a Threat Modeling Tool and I am facing the following problem:

There are a bunch of threat models (:gift:). Each has threats (:cloud_with_lightning:). Threats have their associated mitigations (:umbrella:).

I see all of those threat models in an overview.

I want to answer: How far are they? Then display in a red-yellow-green gradient if threat models are mature / have good progress.

Threat models that suffer from admiration for the problem (little mitigations) shall have a bad rank.

How would you compute that?

=> I’m looking for coverage metrics.

Check out The Metrics Manifesto by Richard Seiersen — tough but worth it. It introduces BOOM (Burndown, Arrival, Wait, Escape):

• Burndown — time‑to‑remediate. Track the distribution (median, percentiles, survival curve). Example: median patch time 72h → 24h after a process change. Time = risk.
• Arrival — rate/timestamps of new risks (vulns, alerts, new assets). Example: arrivals spike after a big feature/asset onboarding. Why: rising inflow can overwhelm capacity.
• Wait — time from arrival to work start (triage latency). Example: average wait 4h → 16h on Mondays. Why: long waits increase exposure and backlog.
• Escape — fraction of risky events that become incidents. Example: 0.5% of phishing attempts lead to compromise. Why: the ultimate KPI for control effectiveness.

You need timestamped events to build a life‑table (survival analysis) and a versioned history of your threat models to track trends. I thought about adding BOOM support to Risquanter (GitHub - risquanter/register · GitHub) — it tracks model versions, but implementing full BOOM metrics would still be non‑trivial.

1 Like

It may help to have a connection to a issue tracking software where the various dev teams have their backlog. You would get quite interesting stats from locking at the relationship between threat, backlog issues, status and time. To connect a model with a issue you could use labels, ussing the issue id in the threat model will help too.

1 Like

Thank you for the inspiration, @agota.daniel and @Johan_Sydseter . That sounds like opportunities for interesting life cycle metrics.

The Arrival metric is closest to what I am looking for.

Let’s say I have 20 scoped threat models, some of them have some threats, some of them mitigations. How would I know how “far” / “complete” they are?

Back to Arrival… when nobody has spotted the threat yet, how would I have it’s arrival? I need to somehow incorporate the known/unknown unknowns also…

I’ve been thinking about the following:

For each threat model, estimate it’s final size (Small, Medium, Large) and set a parameter M (10, 20, 40).

For N threats,
set ThreatCoverage(N, M) = N / (N+M).

Problem: rewards quantity over quality. How would I know M? Will never be 100%.

For N threats where O have at least one mitigation,
set MitigationCoverage(N, O) = O / N.

Problem: rewards quantity over quality. Does not reward richness in mitigations or consider mitigation effectiveness.

What are your thoughts?

What is M supposed to mean here? Number of Mitigations? Or number of Mitigated threats? Something else?

If it is mitigated threats in N /(N+M) and N is the number of threats then in M + N you are counting once all threats and add to it those that have mitigations again.

What does Low, Medium, High refer to? The threat model or the threats? Their impact or likelihood? I am not really sure based on the description…but basically this a good example why I avoid qualitative representation where ever I can: you will have a very hard time figuring out what the other meant :sweat_smile:

First of all I would bucket your threats so that you are tracking Low, Medium, High categories separately.

each threat has a „discovery date". in most cases this is what you realistically know, unless you are able to tie it to a commit of version or a specific update which introduced it … then you would have a real „date of the exposure". Each threat should have also its own ID. Then there is a date when it got mitigated. Depending on how much you want to complicate it this can be the date when your team committed a patch or when that patch got deployed… I recommend the latter.

At this point you have for each threat an ID, when it was introduced (discovered) and fixed.

ideally you have an SLA of fixing thing say for category M 30 days.

You iterate over your data and check at each day how many are open and how many got mitigated within SLA. The proportion is what is if interest to you I thing. It will be a running number.

The data structure you need to track this is a „life table" and if you dump this on chatgp I think it will be able to make a PoC code for calculating it.

The book I linked has working (but a bit buggy) code in R. I would use an agent to analyse what it’s doing and translate it to your favourite language. You can clean up the result as a learning exercise :hugs:

I think metrics should be between 0 (worst) and 1 (best). The M in ThreatCoverage(N,M) = N/(N+M) is just a parameter that helps turn ever-growing threat count into something that is in [0, 1[.

For M=1, this would be 0, 1/2, 2/3, 3/4, 4/5, 5/6, 6/7, …

M is the value of N that will result in 50%.

See also plot n/(n+10) from 0 to 50 - Wolfram|Alpha


Example: Two threat models.
One with small scope => “There should be some threats”. There are 5. => ThreatCoverage(5, 5=“small”) = 5/(5+5) = 50%
One with large scope => “There should be a lot of threats”. There are 5. => ThreatCoverage(5, 20=“large”) = 5 / (5+20) = 20%

Yes, I think that metric is a good idea, too.

SLARespecting(threat) = min{1, SLA(severity(threat)) / [TimeFixed(threat) - TimeDiscovered(threat)] }

With TimeFixed(threat) unset, treat as a seperate group and assume TimeFixed(threat) = today.

:partying_face: