wsba_hockey package
Subpackages
- wsba_hockey.tools package
- Submodules
- wsba_hockey.tools.agg module
- wsba_hockey.tools.globals module
- wsba_hockey.tools.http module
- wsba_hockey.tools.plotting module
- wsba_hockey.tools.scraping module
adjust_coords()analyze_shifts()apply_passing_imputation()assign_target()clean_html_pbp()combine_data()combine_pbp()combine_shifts()edge_stat_entry()espn_game_id()fix_players()get_game_coaches()get_game_info()get_game_roster()logical_sort()parse_espn()parse_game_roster()parse_html()parse_json()parse_shift_events()parse_shifts_html()parse_shifts_json()strip_html_pbp()
- wsba_hockey.tools.xg_model module
- Module contents
Submodules
wsba_hockey.wsba_main module
- wsba_hockey.wsba_main.nhl_agg_stats(games_df: DataFrame, group_by: list[Literal['player_id', 'season', 'team_abbr', 'position', 'season_type', 'strength_state']] = ['player_id', 'season', 'team_abbr', 'position', 'season_type', 'strength_state'], params: dict | None = None, sort: dict = {}, metrics: list[tuple] = [], rates: bool = True, comparison: bool = True, exclude: list = [], manual_agg: dict[str] = {}, schedule_path: str = '/home/runner/work/wsba_hockey/wsba_hockey/src/wsba_hockey/tools/schedule/schedule.csv', roster_path: str | None = None) DataFrame[source]
Given statistical data, columns, and rosters, return aggregated statistics at the skater, goalie, or team level.
- Parameters:
games_df (pd.DataFrame) – A DataFrame already containing game-by-game statistical data (generated with nhl_calculate_stats).
group_by (list[str], optional) – List of columns to group by. You may provide an optional unspecified but this is currently unstable.
params (dict or None, optional) –
Parameters to filter the games_df by before aggregating. Default is None. In order to filter correctly, set each key to the desired column name in the dataframe and the value to the expression to filter by. A third element in the tuple value can indicate whether to perform the filter before aggregating or after. By default, it will occur before (using ‘before’ or ‘after’).
Ex. ‘TOI’: (‘>=’, 150, ‘before’) or ‘Date’: (‘between’, ‘2025-12-01’, ‘2026-01-01’)
sort (dict[str], optional) – Dict of values formatted with the sort column as the key and a bool determining whether to sort ascending or not as the value. Default is empty leading to default sort.
metric (list[tuple], optional) –
List of additional metrics to calculate. Use one of ‘+’, ‘-’, ‘*’, ‘/’ to perform an operation on any existing column (using pd.eval). The first tuple element should be the name of the metric value, the second the numerator, and the third should be the denominator (if there is none then pass None).
Ex. [(‘time_on_ice_per_games_played’, ‘time_on_ice’, ‘games_played’), (‘goals_saved_above_expected’, ‘expected_goals_against-goals_against’, None)]
comparison (bool, optional) – If True, calculates rate (per-sixty minutes of ice time) stats. Default is True.
comparison – If True, calculates percentiles for all (applicable) numeric values in the dataframe. Default is True.
exclude (list[str], optional) – List of columns to exclude from summation. Default is None (summing all columns that are not grouped by).
manual_agg (dict[str], optional) – Dict with manual aggregation clause. Default is empty dict.
schedule_path (bool, optional) – If True, specifies the path with schedule data necessary to add schedule data to games_df.
roster_path (str or None, optional) – File path to the roster data used for mapping players and teams.
- Returns:
A DataFrame containing the aggregated statistics according to the selected parameters.
- Return type:
pd.DataFrame
- wsba_hockey.wsba_main.nhl_calculate_stats(pbp: DataFrame, group: Literal['skater', 'goalie', 'team', 'game_score'], game_strength: Literal['all'] | str | list[str] = 'all', season_types: int | list[int] = [2, 3], schedule_path: str = '/home/runner/work/wsba_hockey/wsba_hockey/src/wsba_hockey/tools/schedule/schedule.csv', roster_path: str = '/home/runner/work/wsba_hockey/wsba_hockey/src/wsba_hockey/tools/rosters/nhl_rosters.csv') DataFrame[source]
Given play-by-play data, seasonal information, game strength, rosters, and an xG model, return raw-total statistics at the game level for skaters, goalies, or teams.
- Parameters:
pbp (pd.DataFrame) – A DataFrame containing play-by-play event data.
group (Literal['skater', 'goalie', 'team', 'game_score']) – Type of statistics to calculate. Must be one of ‘skater’, ‘goalie’, ‘team’, or ‘game_score’ (specific combination of skaters and goaltenders by game).
season (int) – The NHL season formatted such as “20242025”.
game_strength (int or list[str], optional) – List of game strength states to include (e.g., [‘5v5’,’5v4’,’4v5’]). Default is ‘all’.
season_types (int or List[int], optional) – List of season_types to include in scraping process. Default is all regular season and playoff games which are the integers 2 and 3 respectively.
split_game (bool, optional) – If True, aggregates stats separately for each game; otherwise, stats are aggregated across all games. Value is ignored when group == ‘game_score’. Default is False.
roster_path (str, optional) – File path to the roster data used for mapping players and teams.
- Returns:
A DataFrame containing the aggregated statistics according to the selected parameters.
- Return type:
pd.DataFrame
- wsba_hockey.wsba_main.nhl_plot_events(pbp: DataFrame, group: Literal['skater', 'goalie', 'team', 'coach', 'game'], entities: int | str | list[int] | list[str], events: Literal['all'] | str | list[str] = ['missed-shot', 'shot-on-goal', 'goal'], season: int | list[int] | None = None, strengths: Literal['all'] | str | list[str] = 'all', season_types: int | list[int] = 2, strengths_title: str | None = None, marker_dict: dict = {'blocked-shot': 'v', 'faceoff': 'X', 'giveaway': '1', 'goal': '*', 'hit': 'P', 'missed-shot': 'o', 'shot-on-goal': 'D', 'takeaway': '2'}, team_colors: dict = {'away': 'primary', 'home': 'primary'}, titles: str | list[str] | None = None, legend: bool = False, rotation: int | None = 0, display_range: str = 'full')[source]
Given play-by-play data, plot arbitrary event locations for a group of entities.
- Parameters:
pbp (pd.DataFrame) – A DataFrame containing play-by-play event data.
group (Literal['skater','goalie','team','coach','game']) – Entity type to plot (skater, goalie, team, coach, or game).
entities (int|str|list[int]|list[str]) – List of entities for the specified group: - skater/goalie: NHL API player_id(s) - team: team_abbr(s) (e.g. ‘BOS’) - coach: coach name(s) as stored in pbp - game: game_id(s)
events (str or list[str] or 'all', optional) – Event types to plot. Defaults to wsba.FENWICK_EVENTS. Use ‘all’ to plot all wsba.EVENTS.
season (int|list[int]|None) – If provided, filters season(s). If an int is provided with multiple entities, that season is used for all. If a list is provided, it must align one-to-one with entities. If None, seasons are inferred from pbp.
strengths (str or list[str] or 'all', optional) – Strength states to include. Default is ‘all’.
season_types (int or list[int], optional) – Season type(s) to include. Default is 2 (regular season).
strengths_title (str or None, optional) – Optional label for the selected strengths (used on non-game plots).
marker_dict (dict, optional) – Mapping from event_type to matplotlib marker.
team_colors (dict, optional) – For game plots, selects ‘primary’ or ‘secondary’ for away/home team colors.
titles (str or list[str] or None, optional) – Optional title(s) aligned with entities.
legend (bool, optional) – If True, show a legend.
display_range (str, optional) – Rink display range. Passed to wsba_rink() / hockey_rink.NHLRink.draw() (e.g. ‘full’, ‘offense’, ‘defense’). Default is ‘full’.
rotation (int or None, optional) – Rink rotation (degrees). Default is 0.
- Returns:
A dictionary of matplotlib figures: {entity: fig}.
- Return type:
dict
- wsba_hockey.wsba_main.nhl_scrape_draft_rankings(arg: str | Literal['now'] = 'now', category: int = 0) DataFrame[source]
Returns draft rankings :param arg: Date formatted as ‘YYYY-MM-DD’ to scrape draft rankings for specific date or ‘now’ for current draft rankings. Default is ‘now’. :type arg: str, optional :param category: Category number for prospects. When
arg='now'this does not apply. Categories: 1=North American Skaters, 2=International Skaters, 3=North American Goalies, 4=International Goalies. Default is 0 (all prospects). :type category: int, optional- Returns:
A DataFrame containing draft rankings.
- Return type:
pd.DataFrame
- wsba_hockey.wsba_main.nhl_scrape_edge(season: int, group: Literal['skater', 'goalie', 'team'], scrape: list[int | str], season_type: int = 2) DataFrame[source]
Returns NHL Edge stats and data for a selection of skaters, goalies, or teams in a given season.
- Parameters:
season (int) – The NHL season formatted such as “20242025”.
group (Literal['skater', 'goalie', 'team']) – Type of statistics to calculate. Must be one of ‘skater’, ‘goalie’, or ‘team’.
scrape (list[int or str]) – List of skaters, goalies, or teams to scrape (player_ids for skaters/goalies and three letter abbreviation (i.e. ‘BOS’) for teams.)
season_types (int, optional) – Season type to include in scraping process. Default is all regular season games which is the int ‘2’.
- Returns:
A DataFrame containing NHL EDGE metrics for the requested skaters, goalies, and/or teams for the specified season.
- Return type:
pd.DataFrame
- wsba_hockey.wsba_main.nhl_scrape_game(game_ids: int | list[int], split_shifts: bool = False, export_roster: bool = False, remove: list[str] = [], xg: bool = False, sources: bool = False, errors: bool = False) DataFrame | dict[str, DataFrame] | tuple[DataFrame | dict[str, DataFrame], DataFrame][source]
Given a set of game_ids (NHL API), return complete play-by-play information as requested.
- Parameters:
game_ids (int or List[int] or ['random', int, int, int]) – List of NHL game IDs to scrape or use [‘random’, n, start_year, end_year] to fetch n random games.
split_shifts (bool, optional) – If True, returns a dict with separate ‘pbp’ and ‘shifts’ DataFrames. Default is False.
export_roster (bool, optional) – If True, returns a second DataFrame with rosters for all players in the provided games. Default is False.
remove (List[str], optional) – List of event types to remove from the result. Default is an empty list.
xg (bool, optional) – If True, calculates xG for the play-by-play data (for most accurate values leave ‘remove’ empty).
sources (bool, optional) – If True, saves raw HTML, JSON, SHIFTS, and single-game full play-by-play to a separate folder in the working directory. Default is False.
errors (bool, optional) – If True, includes a list of game IDs that failed to scrape in the return. Default is False.
- Returns:
If split_shifts is False, returns a single DataFrame of play-by-play data.
If split_shifts is True, returns a dictionary with keys:
’pbp’: play-by-play events
’shifts’: shift change events
’errors’ (optional): list of game IDs that failed if errors=True
If export_roster is True, returns a tuple of (pbp, roster_df), where pbp is either a DataFrame or a dict (depending on split_shifts).
- wsba_hockey.wsba_main.nhl_scrape_game_info(game_ids: list[int]) DataFrame[source]
Given a set of game_ids (NHL API), return information for each game.
- Parameters:
game_ids (List[int] or ['random', int, int, int]) – List of NHL game IDs to scrape or use [‘random’, n, start_year, end_year] to fetch n random games.
- Returns:
An DataFrame containing information for each game.
- Return type:
pd.DataFrame
- wsba_hockey.wsba_main.nhl_scrape_game_roster(game_ids: int | list[int]) DataFrame[source]
Returns rosters for a list of individual games
- Parameters:
game_ids (int or List[int] or ['random', int, int, int]) – List of NHL game IDs to scrape or use [‘random’, n, start_year, end_year] to fetch n random games.
- Returns:
A DataFrame containing the rosters for all games in the specified list.
- Return type:
pd.DataFrame
- wsba_hockey.wsba_main.nhl_scrape_player_info(player_ids: list[int]) DataFrame[source]
Returns player data for specified players.
- Parameters:
player_ids (list[int]) – List of NHL API player IDs to retrieve information for.
- Returns:
A DataFrame containing player data for specified players.
- Return type:
pd.DataFrame
- wsba_hockey.wsba_main.nhl_scrape_prospects(team: str) DataFrame[source]
Returns prospects for specified team
- Parameters:
team (str) – Three character team abbreviation such as ‘BOS’
- Returns:
A DataFrame containing the prospect data for the specified team.
- Return type:
pd.DataFrame
- wsba_hockey.wsba_main.nhl_scrape_roster(season: int, teams: str | list[str] | None = None) DataFrame[source]
Returns rosters for a selection teams in a given season.
- Parameters:
season (int) – The NHL season formatted such as “20242025”.
teams (str or list[str], optional) – List of teams(three letter abbreviation) to scrape.
- Returns:
A DataFrame containing the rosters for all teams in the specified season.
- Return type:
pd.DataFrame
- wsba_hockey.wsba_main.nhl_scrape_schedule(season: int | Literal['now'] = 'now', start: str | None = None, end: str | None = None) DataFrame[source]
Given season and an optional date range, retrieve NHL schedule data.
- Parameters:
season (int or str, optional) – The NHL season formatted such as “20242025” or “now”. Default is “now”.
start (str, optional) – The date string (MM-DD) to start the schedule scrape at. Default is None
end (str, optional) – The date string (MM-DD) to end the schedule scrape at. Default is None
- Returns:
A DataFrame containing the schedule data for the specified season and date range.
- Return type:
pd.DataFrame
- wsba_hockey.wsba_main.nhl_scrape_season(season: int, split_shifts: bool = False, export_roster: bool = False, season_types: list[int] = [2, 3], remove: list[str] = [], start: str | None = None, end: str | None = None, local: bool = False, local_path: str = '/home/runner/work/wsba_hockey/wsba_hockey/src/wsba_hockey/tools/schedule/schedule.csv', xg: bool = False, sources: bool = False, errors: bool = False) DataFrame | dict[str, DataFrame] | tuple[DataFrame | dict[str, DataFrame], DataFrame][source]
Given season, scrape all play-by-play occuring within the season.
- Parameters:
season (int) – The NHL season formatted such as “20242025”.
split_shifts (bool, optional) – If True, returns a dict with separate ‘pbp’ and ‘shifts’ DataFrames. Default is False.
export_roster (bool, optional) – If True, returns a second DataFrame with rosters for all players in the provided games. Default is False.
season_types (List[int], optional) – List of season_types to include in scraping process. Default is all regular season and playoff games which are 2 and 3 respectively.
remove (List[str], optional) – List of event types to remove from the result. Default is an empty list.
start (str, optional) – The date string (MM-DD) to start the schedule scrape at. Default is None
end (str, optional) – The date string (MM-DD) to end the schedule scrape at. Default is None
local (bool, optional) – If True, use local file to retreive schedule data.
local_path (bool, optional) – If True, specifies the path with schedule data necessary to scrape a season’s games (only relevant if local = True).
xg (bool, optional) – If True, calculates xG for the play-by-play data (for most accurate values leave ‘remove’ empty).
sources (bool, optional) – If True, saves raw HTML, JSON, SHIFTS, and single-game full play-by-play to a separate folder in the working directory. Default is False.
errors (bool, optional) – If True, includes a list of game IDs that failed to scrape in the return. Default is False.
- Returns:
If split_shifts is False, returns a single DataFrame of play-by-play data.
If split_shifts is True, returns a dictionary with keys:
’pbp’: play-by-play events
’shifts’: shift change events
’errors’ (optional): list of game IDs that failed if errors=True
- wsba_hockey.wsba_main.nhl_scrape_seasons(analytic: bool = False) DataFrame[source]
Returns list of NHL seasons
- Parameters:
analytic (bool, optional) – Filters list of seasons to those only included in the WSBA Hockey package (2007-2008 and beyond) if True. Default is False.
- Returns:
A DataFrame containing a list of all NHL seasons.
- Return type:
pd.DataFrame
- wsba_hockey.wsba_main.nhl_scrape_seasons_info(seasons: list[int] = []) DataFrame[source]
Returns info related to NHL seasons (by default, all seasons are included) :param seasons: The NHL season formatted such as “20242025”. :type seasons: List[int], optional
- Returns:
A DataFrame containing the information for requested seasons.
- Return type:
pd.DataFrame
- wsba_hockey.wsba_main.nhl_scrape_standings(arg: int | list[int] | Literal['now'] = 'now', season_type: int = 2) DataFrame[source]
Returns standings or playoff bracket :param arg: Date formatted as ‘YYYY-MM-DD’ to scrape standings, NHL season such as “20242025”, list of NHL seasons, or ‘now’ for current standings. Default is ‘now’. :type arg: int or list[int] or str, optional :param season_type: Part of season to scrape. If 3 (playoffs) then scrape the playoff bracket for the season implied by arg. When arg = ‘now’ this is defaulted to the most recent playoff year. Any dates passed through are parsed as seasons. Default is 2. :type season_type: int, optional
- Returns:
A DataFrame containing the standings information (or playoff bracket).
- Return type:
pd.DataFrame
- wsba_hockey.wsba_main.nhl_scrape_team_info(country: bool = False) DataFrame[source]
Returns team or country information from the NHL API.
- Parameters:
country (bool, optional) – If True, returns country information instead of NHL team information.
- Returns:
A DataFrame containing team or country information from the NHL API.
- Return type:
pd.DataFrame
- wsba_hockey.wsba_main.repo_load_rosters(seasons: int | list[int] | None = None) DataFrame[source]
Returns roster data from repository
- Parameters:
seasons (int | list[int] | None, optional) – Season or seasons to return. If None, all repository roster data is returned.
- Returns:
A DataFrame containing roster data for supplied seasons.
- Return type:
pd.DataFrame
- wsba_hockey.wsba_main.repo_load_schedule(seasons: int | list[int] | None = None) DataFrame[source]
Returns schedule data from repository
- Parameters:
seasons (int | list[int] | None, optional) – Season or seasons to return. If None, all repository schedule data is returned.
- Returns:
A DataFrame containing the schedule data for the specified season and date range.
- Return type:
pd.DataFrame
- wsba_hockey.wsba_main.repo_load_teaminfo() DataFrame[source]
Returns team data from repository
Args:
- Returns:
A DataFrame containing general team information.
- Return type:
pd.DataFrame
- wsba_hockey.wsba_main.utility_get_schema(df: DataFrame) DataFrame[source]
Returns schema for provided dataframe
- Parameters:
df (pd.DataFrame) – Any dataframe generated by functions in the wsba-hockey package
- Returns:
A DataFrame containing the schema for the specified dataframe.
- Return type:
pd.DataFrame
- wsba_hockey.wsba_main.utility_get_unique(df: DataFrame) DataFrame[source]
Returns unique values in each column for provided dataframe.
- Parameters:
df (pd.DataFrame) – Any dataframe generated by functions in the wsba-hockey package
- Returns:
A DataFrame containing the unique values in each column for the specified dataframe.
- Return type:
pd.DataFrame