IFT6758 Project A blog about an NHL data analysis

Warm-up

Warm-up

Question 1

What issues do you notice by using this metric (‘SV%’) to rank goalies? What could be done to deal with this?

As we can see on the dataframe below (which is the top 10 goalies ranked by ‘SV%’ for season 2018-2019), some of the goalies have only played a few games. Since they faced way less shots against them than most of the other goalies, they can reach a very high SV% in case they stop these. Let’s take a look at Jack Campbell as an example. He is one of the top goalies according to this metric (‘SV%’ = 1.0) with only 5 shots against and 5 shots saved. While for the goalies which played most games of the season, it is way more difficult to reach a ratio close to 1 since they face way more shots. Thus, it’s almost impossible for them to reach such a ratio.

Player Tm GP SA SV SV%
63 Alex Nedeljkovic CAR 1 17 17 1.000
85 Dustin Tokarski ANA 1 5 5 1.000
11 Jack Campbell LAK 1 5 5 1.000
29 Kristers Gudlevskis TBL 1 3 3 1.000
26 Jon Gillies CGY 1 28 27 .964
51 Charlie Lindgren MTL 2 59 56 .949
81 Alex Stalock MIN 2 54 51 .944
17 Aaron Dell SJS 20 533 496 .931
9 Sergei Bobrovsky CBJ 63 1854 1727 .931
33 Magnus Hellberg NYR 2 28 26 .929

In the other case, if the goalies who played only a few games didn’t play well, their score is pretty low and they could not increase it, they had less opportunity than a goalie who played many games. The dataframe below is 10 goalies with the lowest SV% score.

Player Tm GP SA SV SV%
50 Michael Leighton CAR 4 92 80 .870
55 Spencer Martin COL 3 96 83 .865
58 Zane McIntyre BOS 8 155 133 .858
24 Anton Forsberg CBJ 1 27 23 .852
57 Marek Mazanec NSH 4 87 73 .839
32 Andrew Hammond OTT 6 86 72 .837
13 Pheonix Copley STL 1 29 24 .828
83 Malcolm Subban BOS 1 16 13 .813
19 Chris Driedger OTT 1 15 11 .733
1 Jorge Alves CAR 1 0 0 NaN

So this scenario might be in favor or against the goalies while ranking them using SV%.

To face this issue, we could filter out the goalies which have played less than X games or which have faced less than Z shots. The NHL website has set this threshold X to 20 games. It make sense to us to use this value.

Question 2

Filter out the goalies using your proposed approach above, and produce a bar plot with player names on the y-axis and save percentage (‘SV%’) on the x-axis. You can keep the top 20 goalies. sv20

Question 3

Save percentage is obviously not a very comprehensive feature. Discuss what other features could potentially be useful in determining a goalie’s performance.

Save percentage is a bit simplistic, we would need to take into account the game winning streak of the goalie, the number of shots faced per game, the average distance of the shots and the save percentage in 5v5 situations, since a goalie which spends a lot of game time in penalty kill is bound to let a couple of goals in more often.