Note
Go to the end to download the full example code.
StatsBomb
mplsoccer contains functions to return StatsBomb data in a flat, tidy dataframe.
However, if you want to flatten the json into a dictionary you can also set dataframe=False
.
You can read more about the Statsbomb open-data on their resource centre page.
It can be used with the StatBomb open-data or the StatsBomb API if you are lucky enough to have access:
StatsBomb API:
# this only works if you have access
# to the StatsBomb API and assumes
# you have set the environmental
# variables SB_USERNAME
# and SB_PASSWORD
# otherwise pass the arguments:
# parser = Sbapi(username='changeme',
# password='changeme')
from mplsoccer import Sbapi
parser = Sbapi(dataframe=True)
(events, related,
freeze, tactics) = parser.event(3788741)
StatsBomb local data:
from mplsoccer import Sblocal
parser = Sblocal(dataframe=True)
(events, related,
freeze, tactics) = parser.event(3788741)
Here are some alternatives to mplsoccer’s statsbomb module:
from mplsoccer import Sbopen
# instantiate a parser object
parser = Sbopen()
Competition data
Get the competition data as a dataframe
df_competition = parser.competition()
df_competition.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 74 entries, 0 to 73
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 competition_id 74 non-null int64
1 season_id 74 non-null int64
2 country_name 74 non-null object
3 competition_name 74 non-null object
4 competition_gender 74 non-null object
5 competition_youth 74 non-null bool
6 competition_international 74 non-null bool
7 season_name 74 non-null object
8 match_updated 74 non-null object
9 match_updated_360 56 non-null object
10 match_available_360 10 non-null object
11 match_available 74 non-null object
dtypes: bool(2), int64(2), object(8)
memory usage: 6.1+ KB
Match data
Get the match data as a dataframe. Note there is a mismatch between the length of this file and the number of event files because some event files don’t have match data in the open-data.
df_match = parser.match(competition_id=11, season_id=1)
df_match.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36 entries, 0 to 35
Data columns (total 52 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 match_id 36 non-null int64
1 match_date 36 non-null datetime64[ns]
2 kick_off 36 non-null datetime64[ns]
3 home_score 36 non-null int64
4 away_score 36 non-null int64
5 match_status 36 non-null object
6 match_status_360 36 non-null object
7 last_updated 36 non-null datetime64[ns]
8 last_updated_360 36 non-null datetime64[ns]
9 match_week 36 non-null int64
10 competition_id 36 non-null int64
11 country_name 36 non-null object
12 competition_name 36 non-null object
13 season_id 36 non-null int64
14 season_name 36 non-null object
15 home_team_id 36 non-null int64
16 home_team_name 36 non-null object
17 home_team_gender 36 non-null object
18 home_team_group 0 non-null object
19 home_team_country_id 36 non-null int64
20 home_team_country_name 36 non-null object
21 home_team_managers_id 36 non-null int64
22 home_team_managers_name 36 non-null object
23 home_team_managers_nickname 36 non-null object
24 home_team_managers_dob 36 non-null datetime64[ns]
25 home_team_managers_country_id 36 non-null int64
26 home_team_managers_country_name 36 non-null object
27 away_team_id 36 non-null int64
28 away_team_name 36 non-null object
29 away_team_gender 36 non-null object
30 away_team_group 0 non-null object
31 away_team_country_id 36 non-null int64
32 away_team_country_name 36 non-null object
33 away_team_managers_id 36 non-null int64
34 away_team_managers_name 36 non-null object
35 away_team_managers_nickname 36 non-null object
36 away_team_managers_dob 36 non-null datetime64[ns]
37 away_team_managers_country_id 36 non-null int64
38 away_team_managers_country_name 36 non-null object
39 metadata_data_version 36 non-null object
40 metadata_shot_fidelity_version 36 non-null object
41 metadata_xy_fidelity_version 29 non-null object
42 competition_stage_id 36 non-null int64
43 competition_stage_name 36 non-null object
44 stadium_id 36 non-null int64
45 stadium_name 36 non-null object
46 stadium_country_id 36 non-null int64
47 stadium_country_name 36 non-null object
48 referee_id 28 non-null float64
49 referee_name 28 non-null object
50 referee_country_id 28 non-null float64
51 referee_country_name 28 non-null object
dtypes: datetime64[ns](6), float64(2), int64(17), object(27)
memory usage: 14.8+ KB
Lineup data
df_lineup = parser.lineup(7478)
df_lineup.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36 entries, 0 to 35
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 player_id 36 non-null int64
1 player_name 36 non-null object
2 player_nickname 36 non-null object
3 jersey_number 36 non-null int64
4 match_id 36 non-null int64
5 team_id 36 non-null int64
6 team_name 36 non-null object
7 country_id 36 non-null int64
8 country_name 36 non-null object
dtypes: int64(5), object(4)
memory usage: 2.7+ KB
Event data
df_event, df_related, df_freeze, df_tactics = parser.event(7478)
# exploring the data
df_event.info()
df_related.info()
df_freeze.info()
df_tactics.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3380 entries, 0 to 3379
Data columns (total 70 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 3380 non-null object
1 index 3380 non-null int64
2 period 3380 non-null int64
3 timestamp 3380 non-null object
4 minute 3380 non-null int64
5 second 3380 non-null int64
6 possession 3380 non-null int64
7 duration 2583 non-null float64
8 match_id 3380 non-null int64
9 type_id 3380 non-null int64
10 type_name 3380 non-null object
11 possession_team_id 3380 non-null int64
12 possession_team_name 3380 non-null object
13 play_pattern_id 3380 non-null int64
14 play_pattern_name 3380 non-null object
15 team_id 3380 non-null int64
16 team_name 3380 non-null object
17 tactics_formation 3 non-null object
18 player_id 3341 non-null float64
19 player_name 3341 non-null object
20 position_id 3341 non-null float64
21 position_name 3341 non-null object
22 pass_recipient_id 775 non-null float64
23 pass_recipient_name 775 non-null object
24 pass_length 1018 non-null float64
25 pass_angle 1018 non-null float64
26 pass_height_id 1018 non-null float64
27 pass_height_name 1018 non-null object
28 end_x 1844 non-null float64
29 end_y 1844 non-null float64
30 sub_type_id 421 non-null float64
31 sub_type_name 421 non-null object
32 body_part_id 963 non-null float64
33 body_part_name 963 non-null object
34 x 3334 non-null float64
35 y 3334 non-null float64
36 under_pressure 607 non-null float64
37 pass_switch 29 non-null object
38 outcome_id 612 non-null float64
39 outcome_name 612 non-null object
40 ball_recovery_recovery_failure 10 non-null object
41 counterpress 97 non-null float64
42 foul_won_defensive 7 non-null object
43 aerial_won 27 non-null object
44 block_offensive 2 non-null object
45 shot_statsbomb_xg 25 non-null float64
46 end_z 19 non-null float64
47 technique_id 36 non-null float64
48 technique_name 36 non-null object
49 goalkeeper_position_id 22 non-null float64
50 goalkeeper_position_name 22 non-null object
51 pass_assisted_shot_id 17 non-null object
52 pass_goal_assist 3 non-null object
53 shot_key_pass_id 17 non-null object
54 pass_cross 27 non-null object
55 pass_backheel 5 non-null object
56 dribble_overrun 2 non-null object
57 dribble_nutmeg 1 non-null object
58 pass_shot_assist 14 non-null object
59 shot_one_on_one 4 non-null object
60 foul_committed_penalty 1 non-null object
61 foul_won_penalty 1 non-null object
62 foul_committed_card_id 1 non-null float64
63 foul_committed_card_name 1 non-null object
64 foul_committed_offensive 2 non-null object
65 substitution_replacement_id 4 non-null float64
66 substitution_replacement_name 4 non-null object
67 ball_recovery_offensive 1 non-null object
68 bad_behaviour_card_id 1 non-null float64
69 bad_behaviour_card_name 1 non-null object
dtypes: float64(23), int64(10), object(37)
memory usage: 1.8+ MB
<class 'pandas.core.frame.DataFrame'>
Index: 6424 entries, 0 to 4847
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 match_id 6424 non-null int64
1 id 6424 non-null object
2 index 6424 non-null int64
3 type_name 6424 non-null object
4 id_related 6424 non-null object
5 index_related 6424 non-null int64
6 type_name_related 6424 non-null object
dtypes: int64(3), object(4)
memory usage: 401.5+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 279 entries, 0 to 278
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 teammate 279 non-null bool
1 match_id 279 non-null int64
2 id 279 non-null object
3 x 279 non-null float64
4 y 279 non-null float64
5 player_id 279 non-null int64
6 player_name 279 non-null object
7 position_id 279 non-null int64
8 position_name 279 non-null object
9 event_freeze_id 279 non-null int64
dtypes: bool(1), float64(2), int64(4), object(3)
memory usage: 20.0+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33 entries, 0 to 32
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 jersey_number 33 non-null int64
1 match_id 33 non-null int64
2 id 33 non-null object
3 player_id 33 non-null int64
4 player_name 33 non-null object
5 position_id 33 non-null int64
6 position_name 33 non-null object
7 event_tactics_id 33 non-null int64
dtypes: int64(5), object(3)
memory usage: 2.2+ KB
360 data
df_frame, df_visible = parser.frame(3788741)
# exploring the data
df_frame.info()
df_visible.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45737 entries, 0 to 45736
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 teammate 45737 non-null bool
1 actor 45737 non-null bool
2 keeper 45737 non-null bool
3 match_id 45737 non-null int64
4 id 45737 non-null object
5 x 45737 non-null float64
6 y 45737 non-null float64
dtypes: bool(3), float64(2), int64(1), object(1)
memory usage: 1.5+ MB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3370 entries, 0 to 3369
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 match_id 3370 non-null int64
1 id 3370 non-null object
2 visible_area 3370 non-null object
dtypes: int64(1), object(2)
memory usage: 79.1+ KB
Total running time of the script: (0 minutes 0.728 seconds)