IFT6758 Project A blog about an NHL data analysis

Tidy Data

Tidy Data

Question 1

Include a small snippet of your final dataframe

Here are the first 10 rows of our dataframe: tidydata

Note that the nan in the net_empty come from the fact that we do not have any information about the emptiness of the net when it’s a saved shot. We only have access to this data when it’s a goal. Therefore, we filled with nan the net_empty column for the saved shots. It’s the same scenario for the strength column.

Question 2

Discuss how you could add the actual strength information (i.e. 5 on 4, etc.) to both shots and goals, given the other event types (beyond just shots and goals) and features available.

By using the events “Penalty” and “penaltySeverity” we can know when a team has a player in the penalty box. We could change the “strength field” whenever some of these events happen and revert it back again if the full-time of the penalty has passed before there was any goal or right after a goal is scored in power-play situation. We will need to take into account the particularity of multiple penalties happenning on bothe sides. From our understanding, there can’t be less than 3 skaters on the ice even if the team in question has add more than 2 penalties.

Question 3

Discuss some additional features you could consider creating from the data available in this dataset.

Here are our ideas:

  • Define events as counter-attack or not. We could try to determine it by checking if the time interval between a takeaway and a shot.
  • We could create the feature rebound when the time interval between 2 shots of the same time is smaller than a threshold.
  • We would also like to define the number of players in the offense and defense so we would have a useful extra information for each shot such as: 2 defense players vs 2 attack players. Unfortunately, we don not have enough information to achieve this with the current API.