IFT6758 Project A blog about an NHL data analysis

Feature Engineering 2

Feature Engineering 2

Column name in df Description
eventEither goal/shot
game_secondsTotal number of seconds elapsed in the game
game_periodThe period number of the event
coord_xCoordinate x of the event
coord_yCoordinate y of the event
shot_distanceThe distance between the event and the net
shot_angleThe angle at which the shot was taken relative to the net
shot_typeThe type of shot ('Wrist Shot', 'Snap Shot', 'Tip-In', 'Backhand', 'Slap Shot', 'Deflected', 'Wrap-around')
empty_netBinary column to assess wether the goaler is present on the ice
last_event_typeThe type of the event which happened just before the current one (it might include events outside the scope of shots/goal)
last_event_coord_xCoordinate x of the last event
last_event_coord_yCoordinate y of the last event
time_from_last_eventThe elapsed time between the last event and the current one (seconds)
distance_from_last_eventThe distance between the last event and the current one
reboundTrue if the last event was also a shot, otherwise False
change_shot_angleThe difference between the current shot angle and the one of the last event
speedThe distance from the previous event, divided by the time since the previous event
shooter_rank1-10 scale evaluating the player performance (1=best, 10=worst)
goalie_rank1-10 scale evaluating the goaler performance (1=best, 10=worst)
* The unmentioned columns which are present in the dataframe were only used to create the mentioned ones.

The 2 main additional custom features are the rankings of the shooter and the goalie.

To compute the first one, we calculated the ratio of the number of goals by the sum of the number of goals and the number of shots for every single player of the training data. We removed the players who took less than 15% of the average number of shots per player and per season. Those players were given the average ranking (5). We then used every other shooter and gave them a rank (1 to 10) based on their success rate.

Likewise, the goalers ranking was computed using their save percentage. As for the 1st feature, we removed from the computation the goalies who faced less than 150 shots so those who played on average less than 5 games. They received the average rank (5). The rest of the goalies got their respective rank.

In order to be able to infer these values in the test set, the new players will be given the median rank of 5 while the known players will get the same rank as the one they had in the training set while all.