Introducing hbGAR, Part 2: A Full Analysis of the Updated Model

More details about my GAR model for hockey, including a goalie adaptation

Almost a month ago (June went by very fast), I debuted my own model for Goals Above Replacement, a hockey statistic meant to quantify a player’s value. In Part 1 of this article, I went over the different models for each subcategory that goes into hbGAR (which stands for HB Analytics’ Goals Above Replacement, in case you didn’t already know) and showed a visual I had made to go along with it that contains a variety of charts depicting a player’s hbGAR total in each subcategory, as well as showing how each player attained their total in said subcategory. Since I posted that original piece on June 1, the model has been edited four times, I have created a model for goalies, and all hbGAR visuals, even sortable tables, have been uploaded all-access to my Tableau profile. Because of all this, I figured I’d bring you all up to speed in terms of telling you more about the goalie model, and go in-depth on the topic of how my model differs from the most popular GAR model out there by Evolving-Hockey. Additionally, I hope the fact that my hbGAR data and visuals are available to the public make it easier to take in all the concepts discussed here. You can even follow along by experimenting with it on my Tableau.

The Goalie Model

Forecasting hockey goalie performance is something that nobody in this world has even come close to mastering. Goaltending is voodoo, and we’ve seen many goalies go from among the best in the league to average, or in some cases worse (John Gibson, Frederik Andersen), or vice versa (Connor Hellebuyck), in just two seasons. My skater model can be used for the purposes of predicting a player’s future performance, but for goalies, this is not the case. There’s only one statistic that shows the full extent of a goaltender’s performance in a season while taking into account the influence the team in front of him had on said performance, and that is GSAx, or goals saved above expected. Naturally, that is the only stat that provides the basis for my model. Evolving-Hockey’s GAR model for goalies is comprised of two major categories: Even-strength performance and shorthanded performance, and my interpretation is no different. The formulas for both strength states are listed below.

Even-Strength

=Z-Score of EV GSAx + (Inverted Z-Score of EV xGA times (TOI/1000 times 0.25))

Shorthanded

=Z-Score of SH GSAx times 1.5

Again, there isn’t much to it. The reason I threw in xGA at even-strength was to give slightly more credit to goalies who had to deal with large workloads, which should explain why Connor Hellebuyck grades out so well according to my model. Like I did with my skater model at even-strength, I included TOI so as not to skew ratings for goalies who hardly played this season. I weighted the shorthanded portion slightly heavier than normal simply for the sake of rough similarity to Evolving-Hockey’s model.

But seriously, Connor Hellebuyck was so much better than every other NHL goalie this year. It’s not even fair.

So, now that I’ve outlined the parameters of my model for both skaters and goalies, I think you’ll notice that my interpretation of this statistic is both similar and different to Evolving-Hockey’s model. Both models in question try to answer the same thing - How much value does a player provide to his team? - but they go about doing so in different ways. Here’s an in-depth explanation of how the two models compare to and contrast from one another, the benefits and drawbacks of each, and the philosophical objectives that went into hbGAR.

hbGAR vs Evolving-Hockey’s GAR and xGAR: A Detailed Analysis

Let’s start by showing how hbGAR and GAR correlate (for skaters only):

As you can see, the relationship between the two is pretty weak. In Part 1, I went over how my model is essentially a hybrid between GAR and xGAR. It uses both shot-based metrics, like Corsi and expected goals, that go into E-H’s xGAR model, as well as individual-level statistics that go into E-H’s GAR model. The contrast between the two models is a little more complicated than that, which I’ll go over later, but that’s primarily the reason that the models do not correlate. E-H’s model was very kind to players like Ryan Pulock, Oscar Klefbom, and Rasmus Ristolainen, whereas mine was not. On the other hand, players like Sean Walker, Lars Eller, and Mike Hoffman graded out as far better according to my model as opposed to E-H’s. Not all of the above players are great analytically, but all these outliers have one thing in common: The correlation between their xGAR and hbGAR ratings were much stronger. But, on the basis of the entire league, I regret to inform you that those two models don’t seem to have a very strong relationship either:

They tend to both agree on who the top players in the league are, but there are still some players who grade out very differently between the two. E-H’s xGAR model is a lot higher on players like Patrick Kane, Mats Zuccarello, and Kyle Connor than my model is, whereas my model gives more credit to outliers such as Jan Rutta, Adrian Kempe, and Morgan Rielly.

I decided to investigate this further: I took each NHL skater’s average percentile between their GAR and xGAR ratings in 2019-20, and decided to see how that compared to my model. As I said, though, hbGAR is much different than a simple GAR/xGAR hybrid.

This is easily the strongest correlation we’ve seen so far, but it’s still a lot less closer than I thought it would be when I put these scatter plots together. Altogether, E-H’s models like Tyler Bertuzzi, Mats Zuccarello, Patrick Kane, Brent Burns, and Oscar Klefbom a lot more than mine does, but mine gives the most credit to Kevin Labanc, Adrian Kempe, Jan Rutta, Sean Walker, and Lars Eller, compared to theirs.

So, why the abundance of outliers? One of the biggest reasons is the modelling process itself. Let me be very clear when I say I have next to no knowledge about coding, and while it’s not like my model is the simplest thing out there, it’s beginner-level work compared to the other GAR/WAR models out there, like Evolving-Hockey’s, and its predecessors, some of which were created by figures like Emmanuel Perry and Dawson Springings, two very respected members of the hockey analytics community. All of these models before mine use linear weights; meanwhile, I went through trial and error with a bunch of different weights until I found the one that would present a GAR scale that is most similar to Evolving-Hockey’s. I’m not here to discuss the complexity of my model because, relative to the other ones that this is based off of, it’s virtually non-existent.

Evolving-Hockey explains the process behind bringing their GAR model about in this extremely detailed three-part article on Hockey Graphs. It’s quite an intricate project that is by no means a simple one. Their model includes things like team adjustments and converting everything to replacement level itself. Mine doesn’t, because the metrics in my model already do so in a sense. In fact, most of them adjust for much more than how a player compares to the rest of his team.

Evolving-Hockey’s model is built considerably on relative-to-teammate metrics, whereas mine is built almost entirely on total (not rate, or /60) RAPM values. This is the primary reason as to why their model likes players like Patrick Kane and Brent Burns more than mine does, especially defensively: hbGAR punishes players like those two a lot more for being bad defensively than E-H’s does because I take only shot-based metrics into account for my EVD (even-strength defense) section, whereas theirs includes stats like takeaways, giveaways, and hits, to go with that. There are a lot more relative-to-teammate metrics, such as GF, xGF, and SF (shots for) in their EVO model.

To put this into context even further, let’s use Ryan Pulock as an example. He was near the top of the league in terms of GAR, and his xGAR total wasn’t too shabby either, but my model was not nearly as kind to him. Pulock posted a 5.4 EVD GAR rating, which is far better than his 2.5 for xEVD. E-H’s GAR is based more off goals for-and-against than their xGAR model is, and the Islanders didn’t allow many goals this season because they got great goaltending, but should Pulock be given credit defensively for the work that his team’s goaltenders did? I know that’s what xGAR can be used for, but this is where the philosophical differences between the two models come into play. I think Pulock doesn’t deserve the defensive credit from a player-value standpoint that E-H’s GAR gives him. He actually grades out as below replacement-level defensively according to hbGAR. He’s great at limiting quality chances against, but is not so great at keeping large quantities of shot attempts away from his team’s net. Additionally, Pulock posted a 10.4 EVO GAR total while only managing a 1.2 in terms of xEVO. This is because he saw great offensive results in 2019-20, but doesn’t generate a large volume of shots.

The difference in metrics used between the two models are the primary culprit for the contrast in GAR and hbGAR values we see for certain players, but there are more layers of different methodology to discuss. I have an A3Z section in hbGAR, which E-H’s models do not, that takes into account offensive zone entries and defensive zone exits, both with possession only. I felt like I should give a certain amount of credit to players who grade out well in the statistical categories that are based off of what is easier to see when watching the game (yes, I actually watch hockey games. I didn’t want to have to admit it).

As well, I was sort of surprised at how much hbGAR’s penalty-taking and penalty-drawing values differed from E-H’s. Their versions of Take and Draw GAR eliminate penalties that resulted in offsetting minors, and of course, include a team adjustment that mine do not. I can see why this could be seen as a shortcoming in my model, but I also see the benefit of taking into account only the amount of penalties a player took or drew. The leader in hbGAR’s Take category was Auston Matthews, with a total of 1.0. The worst player in terms of Take was Evander Kane, with a total of -2.7. This is quite a difference between the best and the worst, and you’ll notice that the scale flips for Draw: The highest Draw hbGAR totals are a lot farther from 0 than the lowest Draw values are. Additionally, you’ll notice that many players have very low Take hbGAR ratings while having very high Draw ratings, which perfectly illustrates the number one problem of NHL officiating today: Making a call against a player/team before making a call for them. This unnecessary balancing effect is exposed on full display by the hbGAR model.

At this point, the most accurate way to describe the contrast between hbGAR and E-H’s model is that my model is a more isolated version of theirs. It focuses more on macro-level adjusted statistics than E-H’s, which also explains why a lot more players have negative hbGAR ratings compared to the smaller amount of players that show up as below replacement-level according to E-H’s models, and thus, the difference in how both models define "replacement level” itself.

This also shows why I see no need to create an xGAR model of my own. Since my model is more RAPM-fuelled and takes into account both the “results” and the “expected” side of things, creating an xGAR model would be counterproductive and a waste of time.

So, which model do you trust more? I’m not here to tell you that my model is better than Evolving-Hockey’s because they both go about trying to quantify a player’s value in different ways, and comparing them would be sort of unfair because of that. It would sort of be like, I don’t know, comparing Nichushkin and Draisaitl (please do not take that literally). I don’t convert everything to replacement level or include team adjustments because the statistics I use in my model already do so. This will be unsurprising, but I choose to use both when it comes to player evaluation.

Alas, no two models are the same, and no model itself is perfect. To conclude, I’ll go over what I believe is the biggest shortcoming of my model, and how I could potentially fix it in the near future.

Improvements

This is Jeff Petry. According to hbGAR, he’s one of the best defensemen in the league. According to me, having seen many other statistics’ interpretation of him, he’s not. Petry is one of those players whose offensive ability can be overstated by the hockey analytics community because of his high xGF and CF ratings, despite the fact that he has zero ability to get results on his chances. This has been the case his whole career. Petry is 32 years old, and I’d wager that it’s safe to say he hasn’t been screwed over by bad luck for all 10 years of his career.

His EVO chart shows that he generates a lot of shots but doesn’t see results, yet he still has an 8.6 EVO total. The same criticisms are often thrown at players such as Shea Theodore, but it’s not quite the same thing, since Theodore grades out well in multiple shooting talent models and has only been in the league for three seasons. It’s fair to assume that he’s on the verge of a big breakout season offensively in 2020-21; some could argue his breakout season has already happened, as he was on pace for over 50 points before the shutdown.

Still, why does my model give so much credit to players like this offensively, even if they don’t get the results to show for it? The answer is that I weighed all components of hbEVO the same way. hbEVD weighs quality and quantity against evenly despite the fact that allowing low amounts of quality shots against are way more indicative of defensive ability and future defensive success. The same can be said for the powerplay and penalty-kill sections.

This is the biggest thing about my model that I hope to change in the near future. Evolving-Hockey’s models recognize outliers such as Petry, and deducts credit from them. I feel like tweaking this would not only help show a player’s true value through the eyes of my model, but be even more indicative of future success offensively and defensively. As well, converting hbGAR to hbWAR (Wins Above Replacement) and hbSPAR (Standings Points Above Replacement) is something I feel would help add a bit more context to this statistic. I’ll get to that soon, but model-tweaking is a mentally demanding thing, and each time I’ve edited the ins and outs of hbGAR, I’ve come out of it thinking my brain is about to explode. I’ll give it a week or two, and then make those changes, because there are always improvements to be made with things like this.

So there you have it! Hopefully this piece provides a bit more clarity as to what hbGAR is all about, and the unique ways it goes about quantifying player value. You can interact with hbGAR visuals, tables, and dashboards on my Tableau (link above and in my Twitter bio). I would like to acknowledge Evolving-Hockey for the extensive amount of work they’ve done with their GAR and xGAR models, and for providing the data and the inspiration that went into my own take on the ever-controversial topic of Goals Above Replacement.