Wading into a Minefield: Why On-ice Sv% Stats May Not Be Completely Meaningless

This article is being co-posted on the new Puckalytics Blog as well as on my own site, OriginalSixAnalytics.com. Find me @michael_zsolt on twitter.

 For those who don’t know – there is a hotly contested debate in the hockey analytics community about whether players are able to influence their teams’ save percentages. Many hockey analysts have looked into the subject, and some of our best and brightest have come out on opposite sides of the debate (e.g. Eric Tulsky, Garret Hohl, David Johnson and Kyle Dubas). As you can imagine, it is not a straightforward concept to pin down.

The core of the debate centers on the idea that for a statistic to be meaningful, it has to be repeatable – there needs to be evidence that players can consistently outperform/underperform on the metric over time. The current consensus is that there has been no evidence found to date to show that impacting on-ice Sv% is a repeatable skill. Instead, when you look across the league as a whole, what you find are outcomes that essentially resemble randomness.

Having been a goaltender my whole life, I never felt right about this conclusion. All of the analysis I have seen was sound, but my ‘gut’ (twitter explosion alert) told me that there must be some variance in the type of chances a goalie faces with different skaters in front of him. As a result, I have taken an attempt at reviewing past work on the subject, and trying to dig a little deeper – in particular, by trying to control for some additional variables that could be creating ‘noise’ in our findings.

Although I think I have reached some interesting conclusions, before getting into those results I’d like to openly invite the online ‘peer-review’ process. Only once others have pressure-tested these findings and concluded the work is (or is not) statistically meaningful will I be fully confident in the legitimacy of this outcome. However, hopefully this work is able to begin to move the conversation forward on the topic of whether or not relSv% and Sv%RelTM are relevant metrics in player analysis.

(For those who don’t know – Sv%RelTM represents a team’s save percentage when a particular player is on the ice, compared to how his teammates performed without him. It is expressed as a percentage above or below zero (much like CF%RelTM) – representing the amount he is above or below his teammates’ time-weighted average. Due to data availability, I have focused much of my analysis on Sv%RelTM, and to be clear, relSv% and Sv%RelTM are slightly different metrics. However, the findings shown here are almost exactly the same when using relSv%).

Prior Analysis of On-ice Sv% / relSv%

To start, I’d like to re-visit some of the historical work done on the subject. Eric Tulsky, a very well-regarded member of the hockey analytics community, wrote the initial ‘book’ on the subject. Tulsky demonstrated that on a league-wide basis, neither forwards nor defensemen are able to consistently display above/below average performance on the metric. He showed this in his 2013 article by using on-ice Sv% itself (rather than a relative stat), and by comparing a player’s ‘first three seasons’ of on ice Sv% to that player’s next three seasons. If this type of approach revealed a strong correlation, it would suggest players can consistently impact their team’s Sv% – for better or worse.

Tulsky’s analysis of forwards from 2007-2010 and 2010-2013 is shown below (his analysis of defensemen reaches largely the same conclusion):

Tulsky

(Source: Eric Tulsky)

As you can see, the plot appears to be scattered almost evenly and has an R2 value of less than 1%. This result suggests there is almost no correlation between past and future impact on on-ice Sv%, and that it is not a ‘repeatable’ skill.

However, following Tulsky’s work, there continued to be articles suggesting that certain NHL players do in fact have consistent impacts to on-ice Sv% relative to their teammates over time – arguably based on ‘anecdotal’ evidence. As a result, another well-known member of the community, Hockey Graph’s Garret Hohl, conducted an updated piece of analysis to show that – even when using a relative metric (here he used relSv%) – there remains limited evidence of it being a metric that is fundamentally able to predict future performance. I have copied his output below:

Holh

(Source: Garret Hohl)

As you can see, Garret used a similar methodology and demonstrated that the historical relSv% of a defenseman is not a reliable predictor of how that defender will do in the future – coming out at an R2 of only 2.6%.

Areas to Dig Further

However – as I suggested at the beginning – while these are absolutely valid findings based on the samples used, personal on-ice experience has kept this topic on my mind. Many writers often debate the wide range of factors that could be causing ‘noise’ in these results, such as:

  • Player usage
  • Quality of teammate
  • Quality of competition
  • Team-level systems
  • Quality of goaltender
  • Etc.

However, just because we have factors that could be causing noise in the findings does not mean we have a statistically valid statistic. Thinking through the lens of a goaltender, I hypothesized two specific factors I saw as major potential drivers of the noise in the relationship. They were:

  1. Changes to the starting goaltender a skater typically plays with
  2. Changes to the team a skater plays on

Intuitively, I think these factors make sense as things that could possibly impact on ice / relative Sv% numbers over time. For the first point, if a skater is playing in front of an elite goalie like Braden Holtby or Henrik Lundqvist – only specific, highly dangerous defensive lapses will actually turn into goals. However, if a skater is playing in front of Ray Emery – those same types of lapses may turn into incremental goals against. As a result, I wanted to investigate how Sv%RelTM’s predictive capabilities changed when a skater is consistently playing with the same starting goalie.

Secondly, when a player changes teams one can safely assume it would create a mess for this metric. For example, when Phil Kessel went from Toronto to Pittsburgh he changed not only the systems he played within, but also the individual teammates he is being compared against (on an innately relative metric). He also implicitly changed #1 – the starting goalie that he is most frequently playing with.

Controlling for Goaltender and Team Changes

As such, I set out to conduct a very similar analysis to what Garret did, while also attempting to control for the two factors above. I did so by limiting my sample to the following criteria:

  • Players included must have played for a single team for all four seasons studied (2010-2014)
  • Teams included must have had the same starting goalie for all four seasons
    • Starting goalie was defined as 50+ starts in full seasons and 20+ starts in the 2012-2013 lockout season
  • Players included must have played at least 500 minutes per season, and have been on the ice for at least 800 shots against

As you would expect, using all of the filters listed significantly reduced the size of my relevant sample. For example, only nine teams met the starting goalie criterion in this time period: SJS, MTL, CHI, NYR, DET, ANA, LAK, DAL and PIT. Further, across these nine teams, only 21 defensemen and 25 forwards (total n=46) played four consecutive seasons for one of those teams over this period. Of course, I need to point out that these sample sizes are definitely on the lower boundary of statistical significance. However, given the need to control for various factors, I think this was a necessary trade-off, and trying to increase sample size will be an important area to further investigate in the future.

As I move into the results in the next section, I have to say: they surprised me.

Save % Relative to Team – Defensemen only

Naturally, I think everyone considers supporting the goaltender and/or playing a major defensive role to be the primary job of just that – defensemen. This was my hypothesis going into the analysis, so, like Garret, I initially focused on just those players. However, as you can see from my chart below – the results were far from groundbreaking:

1 - Sv%RelTM Defense

 (Note – all data shown is even strength in order to remove the impact of PP and PK Sv%).

As you can see – there appears to be almost no relationship (R2 < 1%) between time periods in terms of a defenseman’s ability to impact his team’s on-ice Sv%. This finding is consistent when using relSv%.

One thing worth pointing out: this data has two relatively unique results in the top-left quadrant – Nick Leddy (CHI) and Cam Fowler (ANA). Both of these players had very poor Sv%RelTM results in 2010-2012 (roughly -2%), but improved significantly in 2012-2014 (1%). Given they both had their rookie seasons in 2010, one could argue their own development could have impacted their differences between periods. For interest’s sake, removing the two of them increases the R2 to approximately ~15%. Although this is interesting to note, this adjustment would need to be applied more broadly before we could draw conclusions from it.

Although this was ultimately less meaningful than I had hoped, my next thought was to add forwards to the sample – if nothing else, this would increase the sample size. This brings me to the next category:

Save % Relative to Team – Defensemen & Forwards

2 - Sv%RelTM Forwards &amp; D

Now, as you can see in the chart above, what I found here was both exciting and surprising. The exciting aspect is that there actually was some relationship – something otherwise unheard of in this type of analysis.  The surprising aspect of this came up when I asked myself: why are forwards the category of skaters who are able to show a sustained skill in Sv%RelTM, and not defenders? Before I had a good answer, I proceeded to do one final chart to test this unexpected result on its own:

Save % Relative to Team – Forwards only

3 - Sv%RelTM Forwards only

Looking at this chart is where I was blown away. By removing defensemen altogether (getting to a total n=25), you can see that the predictive value of Sv%RelTM reaches the highest found in any analysis I have reviewed – an R2 of 41% (or 38% for the same players when using relSv%). Although this isn’t a massive correlation, it is within the same range that xG and CF% fall into in terms of predicting a team’s future GF% over a season – metrics that are both widely regarded as valid for forecasting future outcomes.

Explaining These Findings

Now – as mentioned – the most difficult question here is: why are we getting this result? Why would a defenseman who fits the same team/goaltender-controlled data as the forwards sampled not be able to repeatedly impact his team’s on-ice Sv%? I certainly can’t say I know the answer, but one hypothesis that came up when discussing this with David Johnson: defensemen often play generalist roles, while forwards more often play specialized ones. For example – a 1st forward line is typically a scoring line, and a 3rd is typically a shutdown line. On the other hand, defensemen often get paired with their ‘opposite’ – having an offensively-minded defensemen paired with a stay-at-home one. I think these hypotheses intuitively make sense, however, the major drivers of these results remain open for interpretation.

Cross-team Applicability

One last question that will likely come up: given players can only consistently demonstrate Sv%RelTM impact when on a single team with a single starting goalie, does that mean it is not a skill transferable to other teams? My own (short) answer to this would be ‘no’. The relationship shown here for forwards indicates to me that it is in fact a skill, whether or not it would be displayed to the same degree on a different team or with a different goalie.

Much like CF% has significant team-level and player-level components – and won’t necessarily be directly transferrable between teams and systems – we still think of players as being strong/weak at driving shot-attempt differentials. I think the same logic can be applied here. But again – I am very curious about others’ thoughts.

Conclusion

In the end – this has been a hotly debated topic for quite a long period of time, and I doubt my work today will stop people from having different views on it. Although the analysis I have shown reaches a particular conclusion, I personally will not rely too heavily on this finding until it is broadly tested by the analytics community. However, given the desperate need in hockey to find additional ways to evaluate a player’s defensive contribution, hopefully this analysis helps us all continue to learn more about the merits of relSv% / Sv%RelTM as metrics for player evaluation. I look forward to anyone’s feedback or thoughts, @michael_zsolt  on twitter or at OriginalSixAnalytics [at] gmail [dot] com.