20 Nov 2021
Feature Engineering 2
Column name in df | Description |
---|
event | Either goal/shot |
game_seconds | Total number of seconds elapsed in the game |
game_period | The period number of the event |
coord_x | Coordinate x of the event |
coord_y | Coordinate y of the event |
shot_distance | The distance between the event and the net |
shot_angle | The angle at which the shot was taken relative to the net |
shot_type | The type of shot ('Wrist Shot', 'Snap Shot', 'Tip-In', 'Backhand', 'Slap Shot', 'Deflected', 'Wrap-around') |
empty_net | Binary column to assess wether the goaler is present on the ice |
last_event_type | The type of the event which happened just before the current one (it might include events outside the scope of shots/goal) |
last_event_coord_x | Coordinate x of the last event |
last_event_coord_y | Coordinate y of the last event |
time_from_last_event | The elapsed time between the last event and the current one (seconds) |
distance_from_last_event | The distance between the last event and the current one |
rebound | True if the last event was also a shot, otherwise False |
change_shot_angle | The difference between the current shot angle and the one of the last event |
speed | The distance from the previous event, divided by the time since the previous event |
shooter_rank | 1-10 scale evaluating the player performance (1=best, 10=worst) |
goalie_rank | 1-10 scale evaluating the goaler performance (1=best, 10=worst) |
* | The unmentioned columns which are present in the dataframe were only used to create the mentioned ones. |
The 2 main additional custom features are the rankings of the shooter and the goalie.
To compute the first one, we calculated the ratio of the number of goals by the sum of the number of goals and the
number of shots for every single player of the training data.
We removed the players who took less than 15% of the average number of shots per player and per season.
Those players were given the average ranking (5). We then used every other shooter and gave them a rank (1 to 10)
based on their success rate.
Likewise, the goalers ranking was computed using their save percentage.
As for the 1st feature, we removed from the computation the goalies who faced less than 150 shots
so those who played on average less than 5 games. They received the average rank (5).
The rest of the goalies got their respective rank.
In order to be able to infer these values in the test set, the new players will be given the median rank of 5 while
the known players will get the same rank as the one they had
in the training set while all.