StatsBomb

mplsoccer contains functions to return StatsBomb data in a flat, tidy dataframe. However, if you want to flatten the json into a dictionary you can also set dataframe=False.

You can read more about the Statsbomb open-data on their resource centre page.

It can be used with the StatBomb open-data or the StatsBomb API if you are lucky enough to have access:

StatsBomb API:

# this only works if you have access
# to the StatsBomb API and assumes
# you have set the environmental
# variables SB_USERNAME
# and SB_PASSWORD
# otherwise pass the arguments:
# parser = Sbapi(username='changeme',
# password='changeme')
from mplsoccer import Sbapi
parser = Sbapi(dataframe=True)
(events, related,
freeze, tactics) = parser.event(3788741)

StatsBomb local data:

from mplsoccer import Sblocal
parser = Sblocal(dataframe=True)
(events, related,
freeze, tactics) = parser.event(3788741)

Here are some alternatives to mplsoccer’s statsbomb module:

from mplsoccer import Sbopen

# instantiate a parser object
parser = Sbopen()

Competition data

Get the competition data as a dataframe

df_competition = parser.competition()
df_competition.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 71 entries, 0 to 70
Data columns (total 12 columns):
 #   Column                     Non-Null Count  Dtype
---  ------                     --------------  -----
 0   competition_id             71 non-null     int64
 1   season_id                  71 non-null     int64
 2   country_name               71 non-null     object
 3   competition_name           71 non-null     object
 4   competition_gender         71 non-null     object
 5   competition_youth          71 non-null     bool
 6   competition_international  71 non-null     bool
 7   season_name                71 non-null     object
 8   match_updated              71 non-null     object
 9   match_updated_360          54 non-null     object
 10  match_available_360        8 non-null      object
 11  match_available            71 non-null     object
dtypes: bool(2), int64(2), object(8)
memory usage: 5.8+ KB

Match data

Get the match data as a dataframe. Note there is a mismatch between the length of this file and the number of event files because some event files don’t have match data in the open-data.

df_match = parser.match(competition_id=11, season_id=1)
df_match.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36 entries, 0 to 35
Data columns (total 52 columns):
 #   Column                           Non-Null Count  Dtype
---  ------                           --------------  -----
 0   match_id                         36 non-null     int64
 1   match_date                       36 non-null     datetime64[ns]
 2   kick_off                         36 non-null     datetime64[ns]
 3   home_score                       36 non-null     int64
 4   away_score                       36 non-null     int64
 5   match_status                     36 non-null     object
 6   match_status_360                 36 non-null     object
 7   last_updated                     36 non-null     datetime64[ns]
 8   last_updated_360                 36 non-null     datetime64[ns]
 9   match_week                       36 non-null     int64
 10  competition_id                   36 non-null     int64
 11  country_name                     36 non-null     object
 12  competition_name                 36 non-null     object
 13  season_id                        36 non-null     int64
 14  season_name                      36 non-null     object
 15  home_team_id                     36 non-null     int64
 16  home_team_name                   36 non-null     object
 17  home_team_gender                 36 non-null     object
 18  home_team_group                  0 non-null      object
 19  home_team_country_id             36 non-null     int64
 20  home_team_country_name           36 non-null     object
 21  home_team_managers_id            36 non-null     int64
 22  home_team_managers_name          36 non-null     object
 23  home_team_managers_nickname      36 non-null     object
 24  home_team_managers_dob           36 non-null     datetime64[ns]
 25  home_team_managers_country_id    36 non-null     int64
 26  home_team_managers_country_name  36 non-null     object
 27  away_team_id                     36 non-null     int64
 28  away_team_name                   36 non-null     object
 29  away_team_gender                 36 non-null     object
 30  away_team_group                  0 non-null      object
 31  away_team_country_id             36 non-null     int64
 32  away_team_country_name           36 non-null     object
 33  away_team_managers_id            36 non-null     int64
 34  away_team_managers_name          36 non-null     object
 35  away_team_managers_nickname      36 non-null     object
 36  away_team_managers_dob           36 non-null     datetime64[ns]
 37  away_team_managers_country_id    36 non-null     int64
 38  away_team_managers_country_name  36 non-null     object
 39  metadata_data_version            36 non-null     object
 40  metadata_shot_fidelity_version   36 non-null     object
 41  metadata_xy_fidelity_version     29 non-null     object
 42  competition_stage_id             36 non-null     int64
 43  competition_stage_name           36 non-null     object
 44  stadium_id                       36 non-null     int64
 45  stadium_name                     36 non-null     object
 46  stadium_country_id               36 non-null     int64
 47  stadium_country_name             36 non-null     object
 48  referee_id                       28 non-null     float64
 49  referee_name                     28 non-null     object
 50  referee_country_id               28 non-null     float64
 51  referee_country_name             28 non-null     object
dtypes: datetime64[ns](6), float64(2), int64(17), object(27)
memory usage: 14.8+ KB

Lineup data

df_lineup = parser.lineup(7478)
df_lineup.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36 entries, 0 to 35
Data columns (total 9 columns):
 #   Column           Non-Null Count  Dtype
---  ------           --------------  -----
 0   player_id        36 non-null     int64
 1   player_name      36 non-null     object
 2   player_nickname  36 non-null     object
 3   jersey_number    36 non-null     int64
 4   match_id         36 non-null     int64
 5   team_id          36 non-null     int64
 6   team_name        36 non-null     object
 7   country_id       36 non-null     int64
 8   country_name     36 non-null     object
dtypes: int64(5), object(4)
memory usage: 2.7+ KB

Event data

df_event, df_related, df_freeze, df_tactics = parser.event(7478)

# exploring the data
df_event.info()
df_related.info()
df_freeze.info()
df_tactics.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3380 entries, 0 to 3379
Data columns (total 70 columns):
 #   Column                          Non-Null Count  Dtype
---  ------                          --------------  -----
 0   id                              3380 non-null   object
 1   index                           3380 non-null   int64
 2   period                          3380 non-null   int64
 3   timestamp                       3380 non-null   object
 4   minute                          3380 non-null   int64
 5   second                          3380 non-null   int64
 6   possession                      3380 non-null   int64
 7   duration                        2583 non-null   float64
 8   match_id                        3380 non-null   int64
 9   type_id                         3380 non-null   int64
 10  type_name                       3380 non-null   object
 11  possession_team_id              3380 non-null   int64
 12  possession_team_name            3380 non-null   object
 13  play_pattern_id                 3380 non-null   int64
 14  play_pattern_name               3380 non-null   object
 15  team_id                         3380 non-null   int64
 16  team_name                       3380 non-null   object
 17  tactics_formation               3 non-null      object
 18  player_id                       3341 non-null   float64
 19  player_name                     3341 non-null   object
 20  position_id                     3341 non-null   float64
 21  position_name                   3341 non-null   object
 22  pass_recipient_id               775 non-null    float64
 23  pass_recipient_name             775 non-null    object
 24  pass_length                     1018 non-null   float64
 25  pass_angle                      1018 non-null   float64
 26  pass_height_id                  1018 non-null   float64
 27  pass_height_name                1018 non-null   object
 28  end_x                           1844 non-null   float64
 29  end_y                           1844 non-null   float64
 30  sub_type_id                     421 non-null    float64
 31  sub_type_name                   421 non-null    object
 32  body_part_id                    963 non-null    float64
 33  body_part_name                  963 non-null    object
 34  x                               3334 non-null   float64
 35  y                               3334 non-null   float64
 36  under_pressure                  607 non-null    float64
 37  pass_switch                     29 non-null     object
 38  outcome_id                      612 non-null    float64
 39  outcome_name                    612 non-null    object
 40  ball_recovery_recovery_failure  10 non-null     object
 41  counterpress                    97 non-null     float64
 42  foul_won_defensive              7 non-null      object
 43  aerial_won                      27 non-null     object
 44  block_offensive                 2 non-null      object
 45  shot_statsbomb_xg               25 non-null     float64
 46  end_z                           19 non-null     float64
 47  technique_id                    36 non-null     float64
 48  technique_name                  36 non-null     object
 49  goalkeeper_position_id          22 non-null     float64
 50  goalkeeper_position_name        22 non-null     object
 51  pass_assisted_shot_id           17 non-null     object
 52  pass_goal_assist                3 non-null      object
 53  shot_key_pass_id                17 non-null     object
 54  pass_cross                      27 non-null     object
 55  pass_backheel                   5 non-null      object
 56  dribble_overrun                 2 non-null      object
 57  dribble_nutmeg                  1 non-null      object
 58  pass_shot_assist                14 non-null     object
 59  shot_one_on_one                 4 non-null      object
 60  foul_committed_penalty          1 non-null      object
 61  foul_won_penalty                1 non-null      object
 62  foul_committed_card_id          1 non-null      float64
 63  foul_committed_card_name        1 non-null      object
 64  foul_committed_offensive        2 non-null      object
 65  substitution_replacement_id     4 non-null      float64
 66  substitution_replacement_name   4 non-null      object
 67  ball_recovery_offensive         1 non-null      object
 68  bad_behaviour_card_id           1 non-null      float64
 69  bad_behaviour_card_name         1 non-null      object
dtypes: float64(23), int64(10), object(37)
memory usage: 1.8+ MB
<class 'pandas.core.frame.DataFrame'>
Index: 6424 entries, 0 to 4847
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype
---  ------             --------------  -----
 0   match_id           6424 non-null   int64
 1   id                 6424 non-null   object
 2   index              6424 non-null   int64
 3   type_name          6424 non-null   object
 4   id_related         6424 non-null   object
 5   index_related      6424 non-null   int64
 6   type_name_related  6424 non-null   object
dtypes: int64(3), object(4)
memory usage: 401.5+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 279 entries, 0 to 278
Data columns (total 10 columns):
 #   Column           Non-Null Count  Dtype
---  ------           --------------  -----
 0   teammate         279 non-null    bool
 1   match_id         279 non-null    int64
 2   id               279 non-null    object
 3   x                279 non-null    float64
 4   y                279 non-null    float64
 5   player_id        279 non-null    int64
 6   player_name      279 non-null    object
 7   position_id      279 non-null    int64
 8   position_name    279 non-null    object
 9   event_freeze_id  279 non-null    int64
dtypes: bool(1), float64(2), int64(4), object(3)
memory usage: 20.0+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33 entries, 0 to 32
Data columns (total 8 columns):
 #   Column            Non-Null Count  Dtype
---  ------            --------------  -----
 0   jersey_number     33 non-null     int64
 1   match_id          33 non-null     int64
 2   id                33 non-null     object
 3   player_id         33 non-null     int64
 4   player_name       33 non-null     object
 5   position_id       33 non-null     int64
 6   position_name     33 non-null     object
 7   event_tactics_id  33 non-null     int64
dtypes: int64(5), object(3)
memory usage: 2.2+ KB

360 data

df_frame, df_visible = parser.frame(3788741)

# exploring the data
df_frame.info()
df_visible.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45737 entries, 0 to 45736
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype
---  ------    --------------  -----
 0   teammate  45737 non-null  bool
 1   actor     45737 non-null  bool
 2   keeper    45737 non-null  bool
 3   match_id  45737 non-null  int64
 4   id        45737 non-null  object
 5   x         45737 non-null  float64
 6   y         45737 non-null  float64
dtypes: bool(3), float64(2), int64(1), object(1)
memory usage: 1.5+ MB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3370 entries, 0 to 3369
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype
---  ------        --------------  -----
 0   match_id      3370 non-null   int64
 1   id            3370 non-null   object
 2   visible_area  3370 non-null   object
dtypes: int64(1), object(2)
memory usage: 79.1+ KB

Total running time of the script: (0 minutes 0.842 seconds)

Gallery generated by Sphinx-Gallery