Tidy Data
14 Oct 2021Tidy Data
Question 1
Include a small snippet of your final dataframe
Here are the first 10 rows of our dataframe:
Note that the nan in the net_empty come from the fact that we do not have any information about the emptiness of the net when it’s a saved shot. We only have access to this data when it’s a goal. Therefore, we filled with nan the net_empty column for the saved shots. It’s the same scenario for the strength column.
Question 2
Discuss how you could add the actual strength information (i.e. 5 on 4, etc.) to both shots and goals, given the other event types (beyond just shots and goals) and features available.
By using the events “Penalty” and “penaltySeverity” we can know when a team has a player in the penalty box. We could change the “strength field” whenever some of these events happen and revert it back again if the full-time of the penalty has passed before there was any goal or right after a goal is scored in power-play situation. We will need to take into account the particularity of multiple penalties happenning on bothe sides. From our understanding, there can’t be less than 3 skaters on the ice even if the team in question has add more than 2 penalties.
Question 3
Discuss some additional features you could consider creating from the data available in this dataset.
Here are our ideas:
- Define events as counter-attack or not. We could try to determine it by checking if the time interval between a takeaway and a shot.
- We could create the feature rebound when the time interval between 2 shots of the same time is smaller than a threshold.
- We would also like to define the number of players in the offense and defense so we would have a useful extra information for each shot such as: 2 defense players vs 2 attack players. Unfortunately, we don not have enough information to achieve this with the current API.