Introducing “KPO%”: Why Mitigating Shot Location Might Be the Next Important Layer of Measuring Defensive Value

 

This article is being co-posted on Hockey Prospectus as well as on my own site, OriginalSixAnalytics.com. Find me @OrgSixAnalytics on twitter.

 Although hockey analytics has come a long way, there is a still lot of room for improvement – particularly when evaluating the defensive contributions of skaters. Most analytics users are well aware of shot rate (CF%/relative CF%) and shot suppression (CA/60) stats by now – but after that, there aren’t many other easy-to-use defensive metrics. As a result, ‘single number’ stats like Wins Above Replacement from the (former) website War On Ice (WOI), often seem to undervalue defensive players.

In order to find another dimension for defensive evaluation, a logical area that many authors have thought to test is whether a skater can influence his goalie’s Sv% while he is on the ice. For those who haven’t been following, this is a hotly contested topic; but the short summary is that it is extremely hard to tell if players can actually influence On-Ice Sv%. Studies that show skaters can influence On-Ice Sv% tend to be inconclusive, at best – and most work has suggested that impacting On-Ice Sv% is largely driven by randomness.

Intuitively, many think it should be possible for a skater to impact Sv%, so the work continues. However, the most fundamental question that we are all asking is really, ‘What are the best tools we can use to measure a skater’s defensive contribution?’ So – let’s attack the problem at a slightly higher level.

Underlying Drivers of Sv% Impacts

Presumably, if skaters could impact on-ice Sv%, they would do so by reducing the ‘quality’ of the shots taken against his team – easier shots against, fewer goals. Even simpler than ‘quality’, if a skater can consistently mitigate the location of the shots against his team – e.g. ‘keeping pucks to the outside’ – we know that the decrease in Sh% as shots are taken further from the net should ease the burden on his goalie; regardless of whether the goalie actually stops those shots.

Fortunately, websites like Corsica and the former WOI having created quite rigorous Scoring Chance (SC) metrics that we can use to test this. With these, we can measure Scoring Chance mitigation two ways: through an overall rate stat (e.g. SCA/60), or as a proportion of all shot attempts against (as used by Scott Cullen, here). In Scott’s article he simply divides SC Against by Corsi Against, allowing us to see what portion of all shot attempts are Scoring Chances when a certain player is on the ice. Due to its straightforward nature, I’m sure many others have used/alluded to this figure in the past.

Although this stat is not at all complicated, in this article I will explore the idea that mitigating shot location/quality could actually be one of the next important layers in quantifying a player’s defensive contribution. Granted, some of the most complex, advanced models (e.g. xG from Corsica) already do go to great detail to factor in shot quality. Despite the value of those models, I hope to make the case that a statistic like Scott’s (On-Ice SCA/CA) can represent a simple, broadly usable metric to evaluate defensive contributions from skaters – a close second to things like CA/60 and CF%.

To make this argument, we need to know: (i) is this a metric that skaters can actually ‘influence’? To test that, we will have to see (ii) if past results are predictive of future results – e.g. do certain players perform the best/worst on this metric, year after year? Along the way, we should also figure out (iii) is it best to use Corsi Against, Fenwick Against or Shots Against as a denominator?

So – let’s dig in.

Defining Scoring Chances

First – let’s define what a ‘Scoring Chance’ (SC) is. @MannyElk has done a great job recently creating a Scoring Chance stat on his Corsica website, and all citations of ‘Scoring Chances’ here use his data and metric – so big ‘thank you’ to the hard work that he does. You can support Corsica here.

Manny goes into great detail on how he reached his metric here. In short, he built upon the War-On-Ice SC definition by putting shots into three danger ‘tiers’ (high, medium and low) – though Manny didn’t stick to the exact locations used by WOI. Instead, he focused on the likelihood of the shot to be a goal (based on a number of factors, like shot angle, rebounds, etc.), and worked backward into his ‘zones’ from there.

Below is Manny’s heat map of shot location by danger zone.

 corsica-heat-map

The next table is Manny’s summary of the Fenwick Shooting %, Shooting %, and percentage of all shots within each danger ‘tier’, or zone.

corsica-table

What is important about this table is the third column from the right. This ‘FSh%’ column summarizes just how dangerous each shot attempt is: low danger attempts have approximately 2% chance of going in, medium danger have a ~6% chance, and high danger (e.g. Scoring Chances) have ~16%  chance of becoming goals. Notably, the medium tier was deliberately set to be quite close to the league-wide ‘average’ shot attempt Sh%, of 6.79%.

Here is Manny’s definition of a Scoring Chance:

“Scoring chances may be defined as unblocked shots belonging to the High-Danger zone – that is, whose xG is equal to or exceeds 0.09. For convenience, one can approximate that one goal is scored for each 6 scoring chances”. [As compared  to 1 in ~16 medium danger chances, and 1 in ~50 low danger chances].

So to be clear – it isn’t quite as simple as ‘if a shot is in the mid-to-low slot, it is a Scoring Chance’ – which is closer to the WOI definition. However, as you can see from his heat map, the vast majority of SCs are originating from the dense yellow area in the mid-to-low slot – so we can consider SCs as largely coming from that location.

Mitigating Scoring Chances – Keeping Pucks Outside % (“KPO%”)

 Earlier I introduced Scott Cullen’s metric of (Scoring Chances / Corsi Against). Most players in the league come out in the 10-20% range of this number, meaning that 10-20% of their shot attempts against are ‘Scoring Chances’.

In order to make this metric somewhat more intuitive, I want to center it on the concept of ‘Keeping Pucks to the Outside” – a simple, easily understood concept that is core to defensive-zone play. As such, I will make two changes to the stat:

  • Instead of showing the % of shot attempts that ARE scoring chances – instead, I will show the metric as the % of attempts that were NOT scoring chances (simply by taking (1 – Scott’s metric).
  • As a result of this, I will give this stat a new name – “Keeping Pucks Outside %”, or KPO% – the percentage of shot attempts against that a skater prevents from being a Scoring Chance, or that he ‘keeps outside’.
    • (As a side note – I have deliberately tried to make this label clear and straightforward, for use by coaches or players who aren’t familiar with most analytics. For those who want a more formal name – you could also use ‘Scoring Chance Mitigation %’)

As a result, most players will instead be in the ~80-90% range – and should be aiming for as high of a % as possible.

KPO % – Repeatability

 Now, the most important question for this to be a relevant metric is – can skaters actually repeatedly ‘influence’ KPO%? To determine this, we will have to test past results against future results, to see how strong that relationship is.

To do so, I downloaded Corsica data for all Forwards and Defensemen who played from 2010-2016. 2010-2013 represents the ‘first half’ sample and 2013-2016 represents the ‘second half’, and players needed at least 1000 minutes in each. This resulted in a sample of 216 Forwards and 113 Defensemen, which I tested separately. Only 5v5 data was included.

The two charts below summarize the results.

defensemen-correlation

forward-corellation

As you can see, across both D and F there was a considerable relationship between past and future performance on the KPO% metric, at R^2 = 28.9% and 20.6%, respectively. This suggests that KPO% has solid predictive capability, supporting its use for player evaluation. Intuitively, it also makes sense that Defensemen would be able to more consistently influence this metric (shown in the higher R^2), as it is a larger part of their role.

It is worth noting that the charts above use SCA/CA to calculate KPO%, as CA had the strongest relationship tested. I also ran the results with SCA/Fenwick Against, and SCA/Shots Against, and the results are below:

rsq

On the defensive side, CA and FA are quite close, but after that there is a slight drop down to SA. I think it is also positive to see that Corsi Against has the strongest relationship, when we area including blocked shots in the denominator. Given that blocking a potential Scoring Chance is a meaningful way for a skater to add defensive value, it would be logical to include that in the calculation.

For the last few sections, I will quickly summarize how performance on KPO% tends to be distributed around the league, and if we can quantify how much ‘value’ it really contributes.

2015-2016 League-wide Performance

 In order for KPO% to have value, there needs to be a wide-enough distribution of results across the league in order for players to differentiate themselves. Below is a histogram of the distribution of defensemen on this metric, across the 2015-2016 season:

2015-2016-histogram

 (Note – I have omitted the forward chart as it follows the same general pattern)

Using the 2015-2016 season shows KPO% as following a relatively normal distribution, and with a reasonable variation of results, given the range of 8.7%. Given there is a moderate amount of variation across the league – how big of an impact does a change in KPO% have on expected goals against?

What is +/- 1% of KPO% actually worth?

Now – I want to try to understand how big of an impact the best players in the league can have on KPO% – two good examples from the sample were Mark-Edouard Vlasic and Roman Josi, scoring at 89.4% and 88.1%, respectively.

To answer this, I have done a very basic, ‘back of the envelope’ calculation for how many theoretical ‘goals’ a skater adds to his team over a season (at 5v5) if he were to have a KPO% of +1% or -1% from the league average.

goal-value So, to walk through the high level math here:

  • The average defenseman from the original sample had 997 5v5 Corsi Against over a season
  • 8% of those are HD SCA, on average – or 147.6 Scoring Chances
  • Increasing a skater’s KPO% by +1% above the league average results in 137.6 HD SCA per season, or a reduction of 10 SCA
  • With 6.2 SCA per goal, that is 1.6 goals prevented
  • However, given these shot attempts are being substituted by lower quality chances, we need to add-back the value of those chances:
    • Manny’s table showed LD and MD chances each make up ~40% of all shots – or roughly 50/50 split of all non-HD shot attempts
    • Thus, for each 10 HD SC mitigated, there will be 5 LD and 5 MD added back, or 0.4 goals ‘substituted’
  • Thus – the net goals prevented from a skater improving his KPO% by 1% is 1.21 goals per season

The one big caveat: the KPO% I am using is derived with Corsi Against, while Manny has only been able to calculate the Fenwick Sh% of his Scoring Chances. As such, the number of chances per goal stats (6.2, 16.0, 51.3) are proxies – and we should consider this calculation to be illustrative of the ‘directional’ impact, rather than actual.

How big of an impact are 1.2 goals per 1%? With a range of 8.7% across the sample, 1.2 goals means the best player on KPO% is contributing ~10 goals prevented over the season more than the worst player. If we define ‘replacement level’ at approximately the bottom 20% of the league – then the top quartile of defenders in the league could add roughly 3.5-4 goals ‘above replacement’ on this metric. Given 6 goals are approximately equivalent to one win, the top 25% of the league is adding roughly 0.50-.66 of a win for their teams – which is not immaterial in a league where every little edge counts.

For the sake of clarity, I am not arguing that KPO% is ‘more important’ than Corsi Against/60 – e.g., if a skater gives up 300 additional CA over a season, that will more than offset a reduction of KPO% by 1%. Rather, I am arguing that these two elements are important to consider in conjunction with one another – as having a poor CA/60 can be somewhat mitigated by a strong KPO%, just as a great CA/60 can be off-set by a terrible KPO%.

Along those lines – if we were to add this to the WOI Goals Above Replacement (GAR) calculation, KPO% is looking like it could be a close 2nd to shot rate stats for the highest-value way to measure a skater’s defensive contributions. Given Forwards have ~5 areas where they add goal-value to WOI GAR – versus ~2 for D-men (CF/CA) – simply adding another element of defensive contribution will help to off-set the F/D value imbalance in today’s metrics.

 Top/Bottom KPO% Defensemen

Before concluding, I wanted to share the top 12 and bottom 12 KPO% performing defensemen from the 2015-2016 season, for reference. In the table below, I have also added the column ‘KPO% Above Average’ – this is simply expressing a player’s KPO% minus the league average score. Thus – the top players will be a positive figure, distance above average, and the bottom players will be a negative figure, distance below average.

top-12

bot-12

As you can see, there is an interesting set of defensemen in each category. The top defensemen have some well-renowned players like Vlasic and Josi, mentioned earlier, as well as some not-necessarily-analytically-loved players like Shea Weber and Roman Polak.

My own hypothesis for why we get this result is that there may be a connection between a defensemen’s play-style/skill set, and his resulting shot rate/KPO% stats, in a sometimes off-setting fashion. For example, Polak and Weber’s ‘stay at home’ style may help them lock down the front of the net defensively, while it causes them to struggle on the shot rate side of the equation. On the other hand, some of the league’s more dynamic defensemen (not listed, but Doughty and Klingberg both come out at roughly -2%  KPO% Below Average) may have quite strong Corsi stats, but their play-style causes them to give up higher quality chances against as a result. Granted – this is just a hypothesis – only more study and time will tell.

Conclusion

Despite having gone all the way from introducing KPO% to taking a high-level estimate of its goal—value, I definitely see this analysis as exploratory, rather than ‘complete’. Hopefully this article encourages some others to dig into the KPO% metric (or other, similar ones) – allowing us to continue to learn more about how to measure individual-skater defensive contribution outside of simply shot rate stats.

Some future areas to build on this analysis include adding the impact on and value in special team (e.g. PK) situations, creating more detailed versions of the stat (e.g. KPO% relative to teammates, usage adjustments), or to develop a more statistically rigorous calculation of its goal-value. Hopefully you have found this analysis to be interesting and thought provoking – or alternatively, that KPO% helps to decrease the number of On-ice Sv% debates in the world…

 

Myth-busters Series: Three Arguments Against the Idea that “The Leafs are Decreasing Their Focus on Analytics”

This article is being co-posted on Maple Leaf Hotstove as well as on my own site, OriginalSixAnalytics.com. Find me @OrgSixAnalytics on twitter.

 As we move past Labour Day and the hockey world turns its attention to the upcoming World Cup and 2016-2017 season, there are a fresh set of narratives that have come to life this past summer – Stammer-geddon, Vesey-gate, and the 2016 draft, to name a few. In particular, Leafs Nation seems to have shifted its tone slightly: although most are still quite optimistic about the team’s future, some have also started to call into question the front office’s focus on analytics, it’s effectiveness at ‘salesmanship’, and more.

Reflecting on some of these storylines, there were a few that I thought might be interesting to test out with some objective, data-based analysis – and see just how accurate they really are. As a result – this article will be the first of a few I will call my ‘Mythbusters Series’. So – let’s get into it.

 Toronto’s 2016 Draft Picks

The first narrative I will focus on – and one of the biggest coming out of the summer – is the widely alluded-to ‘decreased emphasis on analytics’ coming out of Toronto’s front office. This storyline has come to life in part due to the picks made by Toronto in the 2016 draft, and in part due to (the term and AAV) of Matt Martin’s signing. Although a lot of ink has been spilled over the tradable, four-year contract of a 27 year old, representing 3.5% of the Leafs’ relatively flexible mid-term salary cap situation – today I will just be focusing on the 2016 draft.

We all know the story by now: in the 2016 draft, (i) Toronto picked a bunch of over-age players, many of whom were ‘off the board’ (e.g. unranked/not well known) (ii) Toronto seemed to prioritize height/size this year, and (iii) these two things combine to suggest that ‘analytics’ – and the implicit preference for small, speedy, skilled players – has departed from Toronto’s thought process.

Factually speaking, (i) and (ii) are quite accurate. Five of the Leafs’ picks were over-agers, and eight of their eleven picks were 6’2 or taller. However, what I will question today is the conclusion of (iii), and the idea that targeting size and over-age players suggests anything ‘anti-analytics’ about the Leafs’ front office.

In this article I will argue there is significant analytical support in favor of the type of player Toronto targeted (e.g. over-age and bigger players in general, rather than the specific individuals the Leafs picked). Further – if any team in any sport is truly trying to be on the ‘leading edge’ and develop innovative approaches to the game – that often might actually require doing things others see as questionable at the time.

Let’s dig into the three reasons why the Leafs’ older/bigger picks may be more supported by analytics than we all think:

  1. (Asset Management) Portfolio Theory

First off – let’s talk about size. Most in the analytics community tend to prioritize small, skilled players that can drive puck possession above anything else – and for good reason. However, I also think most can agree that there is some value to (the very different benefits brought by) large, physical players as well. Does ‘conventional thinking’ over-value size relative to other characteristics of players? Probably. But is there no value to having a physical presence on your team? Probably not.

That’s where the asset management concept of ‘Portfolio Theory’ comes in. In the financial world, diversity reigns supreme. “Don’t have all your eggs in one basket” sums it up. Put differently, any investor doesn’t want to be too concentrated in one stock, in equities, or bonds, or in any other asset class – lest they find themselves in a situation where that asset class is going to underperform.

After an excellent draft in 2015 and a strong prioritization of bringing fast, skilled players into the organization, the Leafs have arguably reached the point of diminishing marginal returns on that type of player – with an extremely deep pool of forwards in that mold. Portfolio theory suggests that their ‘return on investment’ of their next few 6-foot-plus players – who ideally have some speed and skill as well – will be much greater than picking an Alex DeBrincat-type player, even though guys like Nylander and DeBrincat are hugely valuable in an absolute sense.

The main point here: it should be safe to say that there is some logic to having a supporting cast of size to supplement Toronto’s already strong focus on speed and skill. Especially with a younger team, lacking ‘grinder’ type players – the tougher teams in the league would be silly not to make physicality a deliberate part of their game planning against Toronto this year. Compared to some of the observable alternatives (e.g. $6M AAV, 7 year signing of Milan Lucic…) – drafting some size seems like a solid idea.

Last – what are some of the other, innovative teams in the league with small, skilled line-ups saying on this topic? From Sportsnet:

In Crouse’s 6-foot-4, 212-pound frame, [John] Chayka [Arizona Coyotes GM] brings size to a club currently more focused on speed and skill in an effort to diversify the type of player the Coyotes are putting on the ice—the “portfolio theory,” he says.

        2. Over-Aged Players as a Market Inefficiency

Second – let’s talk about drafting over-aged players, or those who have ‘re-entered’ the NHL draft. Most of the critics of the Leaf’s 2016 draft found the decision to draft five re-entries as strange unexpected – and likely questionable. Even the analytically-minded crowd seems to see ‘less upside’ to over-agers, despite interesting analysis supporting targeting over-agers as a strategy.

Before we jump to that conclusion, I did a quick bit of analysis to compare the results of players drafted at 18, 19, and 20 years old. A few trains of thought that lead into this chart:

  • The top, top players will likely be identifiable when they are young, so it makes sense for most of the picks in the 1st and 2nd rounds to be focused on players in their first year of eligibility (e.g. 18 year olds) – you won’t be finding Olli Juolevi or Ivan Provorov in the 3rd round of the draft
  • For any draft pick – the theoretical goal should be to pick players that will be above replacement level, who can add significant value beyond just ‘filling a seat’
    • Replacement level can be defined in simple terms as top AHL players or free agents that can be signed for approximately a league-minimum contract

Thus – ‘success’ for a draft pick shouldn’t represent a player who just ‘makes’ the NHL, but is 4th line F or 3rd pair D. Those players are available essentially for free in the free agent/waiver market. Rather, success for a draft pick is someone who outperforms that replacement level of production.

All that said – what exactly is replacement level is very open to interpretation. We should be cautious about using strictly games played as the ‘success’ determinant – loads of 4th liners play 150+ NHL games without adding significant value to their teams, or earning more than single-digit minutes per night. For the purposes of the chart below, I have included only players with >200 NHL games played, and also included scoring rate data, across the ranges of >0.2 Pts/GM to >0.5 Pts/GM.

chart-1

 Note –This data is not adjusted for the rule change with respect to NCAA eligibility and declaring for the NHL draft – however, I believe the impact would be relatively minor.

I won’t go in huge detail into my methodology, as much of it is summarized in the fine print on the chart. In terms of what the chart tells us:

  • After the 1st and 2nd Rounds, players drafted at 19 years old (e.g. Draft year + 1) are roughly equally as likely to exceed replacement level as 18 year olds
  • Even more interesting – players drafted at 20 years old (e.g. Draft year +2) are significantly more likely to surpass replacement level than the other two ages, if defined as >200 NHL GP and anywhere between 0.2 to 0.5 career Pts/GM
  • (Note – to save time, I blended forwards and defensemen in this analysis – though the replacement level definition would of course be very different for each)

Put differently – if we choose >0.3 Pts/GM as our ‘replacement level’ threshold – in the time period sampled, only 55 players drafted after the 1st & 2nd rounds surpassed the ‘replacement level’ definition. Of these 55, 53% were drafted as 19 or 20 year olds (e.g. in their D+1 or D+2 years) – a huge portion of the players who ultimately were ‘NHL contributors’ to their teams.

FYI – I’m definitely not the first person to dig into this subject – here is a tweet from last June from recently promoted London Knights Assistant General Manager and Director of Analytics, Jake Goldberg:

jake

Some may disagree – but I would argue it’s probably a ‘good news story’ for Leafs fans that the rest of the league spent the last five draft rounds focused largely on first-year players who will make up 47% of the ‘above-replacement-level’ pool. Meanwhile, the Leafs spent the 2016 draft significantly prioritizing picking through that other 53%. If that is not taking advantage of market inefficiency, I don’t know what is.

  1. Draft Expected Value (DEV)

 Finally, just to round out the analytical view that is ‘pro’ overage players, let’s give credit to another pair who have done some great work on predicting prospect success: @Zac_Urback and @3Hayden2, and their Draft Expected Value model. I won’t go through every detail of their approach, as they have already summarized it very well in their posts: Introducing DEV, Explaining DEV, Limitations of DEV and – you guessed it – Draft Inefficiencies: Overage Prospects. These guys have rightly got a lot of attention since the draft, so I am happy to pile on.

In short, much like the ‘Prospect Cohort Success’ model created by @MoneyPuck_ (now of the Florida Panthers), DEV generates a list of the most comparable prospects to a particular one, based on age, league, adjusted scoring, and size – in particular, adjusting for whether they are in their Draft Year (D), before it (D-1), or 1 or two years after it (D+1, D+2). The model then converts that list to an expected NHL result, and then directly assigns a value to a prospect in terms of approximately when he should be picked, and his expected output.

You don’t need to look much further than Zac’s article on Overage Prospects to get an idea of if the Leafs are still putting their analytics team to good use. In it, Zac makes the following point:

 Looking forward to the 2016 NHL draft, I ran the DEV numbers for all draft eligible overage players. One player in particular that I want to discuss is Adam Brooks. Brooks is relatively undersized at 5’10, but in his 3rd year of draft eligibility DEV suggests he’s worth selecting with a pick from 28 – 33 overall. Brooks was valued as a pick from 55 – 82 last year, demonstrating two things: 30 NHL teams passed over a prospect worth selecting in the 3rd round with their late round picks last year, and Brooks has improved considerably since last year.

 Some of Brooks’ successful comparables include players like Claude Giroux, Derek Roy, Ondrej Palat, Patrick O’Sullivan, Martin Erat & Jordan Eberle. I suspect he will not be selected as high as DEV values him, but if he’s available in the mid-rounds, Brooks seems like the obvious candidate to draft if a team is looking for a value selection. Obviously Brooks is not a lock to be a successful NHL player, but DEV indicates that he’s just as likely to be an impact NHL player as any other player who is optimally selected in the top of the 2nd round.” 

 And wouldn’t you know it – in the fourth round at #92 overall – which team selected the small, skilled, but overage player, Adam Brooks? The Toronto Maple Leafs. Mark Hunter and Kyle Dubas seem to right back at it with their old tricks, trying to create value for their franchise. Well done, gentlemen.

Conclusion

 To wrap things up – I think it is safe to say that the analytics function in the Leafs’ organization seems to still be playing an important role – and doing well to convince their broader organization to make bold, un-loved picks based on statistics that suggest those moves will maximize value. If anything, the TML front office deserves a bit of credit for making their innovative decisions seem like they are not analytically-driven. The only downside of the approach (and to a small extent, articles like these) is that is that now the TML management team will need to continue searching for the next ‘new’ thing, if they want to keep their edge in 2017.