{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# Expected threat\n\nThis example shows how to create an expected threat (xT) model. Expected threat is a method\nfor valuing the likelihood of scoring with possession of the ball at\na position on the football pitch.\n\nExpected threat is based on [Markov chains](https://en.wikipedia.org/wiki/Markov_chain).\nThe main assumption for modelling soccer in this way is the probability of scoring\nonly depends on the current action, it is memoryless, and it does not consider what\nhappened before or after the event. Often in soccer, this isn't a fair assumption as attacks\nmay form quickly on the counter or due to pressuring the opponent high up the field. In\nreality how an action came about and how the defence is shifted\nmay have an impact on what happens next.\n\nI recommend reading through this excellent\n[blog post](https://soccermatics.medium.com/explaining-expected-threat-cbc775d97935) by\n[David Sumpter (@soccermatics)](https://twitter.com/Soccermatics) on the history of expected\nthreat, its limitations, and possible extensions.\n\nThe first use of Markov chains to evaluate the probability of scoring was by\n[Sarah Rudd](https://twitter.com/srudd_ok) in their\n[conference presentation](http://nessis.org/nessis11/rudd.pdf) \"a framework for tactical\nanalysis and individual offensive production assessment in soccer using Markov chains.\" Although\nnot named expected threat it contained many of the ideas used here.\n[Karun Singh](https://twitter.com/karun1710) then popularised and named the idea\nin their [fantastic interactive blog post](https://karun.in/blog/expected-threat.html). In this\ntutorial, we model expected threat using Karun's ideas in the blog post.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import matplotlib.patheffects as path_effects\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport pandas as pd\n\nfrom mplsoccer import Sbopen, Pitch\n\nparser = Sbopen()\npitch = Pitch(line_zorder=2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set up the grid\nOur first decision, is how to grid the soccer field.\nHere we copy Karun's setup and have 16 cells in the x-direction and\n12 cells in the y-direction\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"bins = (16, 12) # 16 cells x 12 cells"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get event data\nGet event data from the FA Women's Super League 2019/20.\nHere we include only the carries, shots, and passes used to model expected threat.\nYou may additionally want to filter out set pieces and counter-attacks.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# first let's get the match file which lists all the match identifiers for\n# the 87 games from the FA WSL 2019/20\ndf_match = parser.match(competition_id=37, season_id=42)\nmatch_ids = df_match.match_id.unique()\n\n# next we create a dataframe of all the events\nall_events_df = []\ncols = ['match_id', 'id', 'type_name', 'sub_type_name', 'player_name',\n 'x', 'y', 'end_x', 'end_y', 'outcome_name', 'shot_statsbomb_xg']\nfor match_id in match_ids:\n # get carries/ passes/ shots\n event = parser.event(match_id)[0] # get the first dataframe (events) which has index = 0\n event = event.loc[event.type_name.isin(['Carry', 'Shot', 'Pass']), cols].copy()\n\n # boolean columns for working out probabilities\n event['goal'] = event['outcome_name'] == 'Goal'\n event['shoot'] = event['type_name'] == 'Shot'\n event['move'] = event['type_name'] != 'Shot'\n all_events_df.append(event)\nevent = pd.concat(all_events_df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Bin the data\nHere we calculate the probability of a shot,\nsuccessful move (pass or carry), and goal (given a shot).\nWe are averaging the boolean columns (True = 1) and (False = 0) to give us the\nprobability between zero and one.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"shot_probability = pitch.bin_statistic(event['x'], event['y'], values=event['shoot'],\n statistic='mean', bins=bins)\nmove_probability = pitch.bin_statistic(event['x'], event['y'], values=event['move'],\n statistic='mean', bins=bins)\ngoal_probability = pitch.bin_statistic(event.loc[event['shoot'], 'x'],\n event.loc[event['shoot'], 'y'],\n event.loc[event['shoot'], 'goal'],\n statistic='mean', bins=bins)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plot shot probability\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"fig, ax = pitch.draw()\nshot_heatmap = pitch.heatmap(shot_probability, ax=ax)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plot move probability\nAs we only consider moves and shot probabilities. This is the mirror of the shot probability.\nThe shot_probability + goal_probability adds up to one for each grid cell,\nas we assume only these two event types occur when in possession.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"fig, ax = pitch.draw()\nmove_heatmap = pitch.heatmap(move_probability, ax=ax)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plot goal probability\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"fig, ax = pitch.draw()\ngoal_heatmap = pitch.heatmap(goal_probability, ax=ax)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Calculate the move transition matrix\nThe move transition matrix takes into account the success probability of carrying\nout the transitions. It is the probability of moving the ball successfully from one grid\ncell to another grid cell.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# get a dataframe of move events and filter it\n# so the dataframe only contains actions inside the pitch.\nmove = event[event['move']].copy()\nbin_start_locations = pitch.bin_statistic(move['x'], move['y'], bins=bins)\nmove = move[bin_start_locations['inside']].copy()\n\n# get the successful moves, which filters out the events that ended outside the pitch\n# or where not successful (null)\nbin_end_locations = pitch.bin_statistic(move['end_x'], move['end_y'], bins=bins)\nmove_success = move[(bin_end_locations['inside']) & (move['outcome_name'].isnull())].copy()\n\n# get a dataframe of the successful moves\n# and the grid cells they started and ended in\nbin_success_start = pitch.bin_statistic(move_success['x'], move_success['y'], bins=bins)\nbin_success_end = pitch.bin_statistic(move_success['end_x'], move_success['end_y'], bins=bins)\ndf_bin = pd.DataFrame({'x': bin_success_start['binnumber'][0],\n 'y': bin_success_start['binnumber'][1],\n 'end_x': bin_success_end['binnumber'][0],\n 'end_y': bin_success_end['binnumber'][1]})\n\n# calculate the bin counts for the successful moves, i.e. the number of moves between grid cells\nbin_counts = df_bin.value_counts().reset_index(name='bin_counts')\n\n# create the move_transition_matrix of shape (num_y_bins, num_x_bins, num_y_bins, num_x_bins)\n# this is the number of successful moves between grid cells.\nnum_y, num_x = shot_probability['statistic'].shape\nmove_transition_matrix = np.zeros((num_y, num_x, num_y, num_x))\nmove_transition_matrix[bin_counts['y'], bin_counts['x'],\n bin_counts['end_y'], bin_counts['end_x']] = bin_counts.bin_counts.values\n\n# and divide by the starting locations for all moves (including unsuccessful)\n# to get the probability of moving the ball successfully between grid cells\nbin_start_locations = pitch.bin_statistic(move['x'], move['y'], bins=bins)\nbin_start_locations = np.expand_dims(bin_start_locations['statistic'], (2, 3))\nmove_transition_matrix = np.divide(move_transition_matrix,\n bin_start_locations,\n out=np.zeros_like(move_transition_matrix),\n where=bin_start_locations != 0,\n )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get the matrices\nGet the matrices from the dictionaries and turn nans into zeros\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"move_transition_matrix = np.nan_to_num(move_transition_matrix)\nshot_probability_matrix = np.nan_to_num(shot_probability['statistic'])\nmove_probability_matrix = np.nan_to_num(move_probability['statistic'])\ngoal_probability_matrix = np.nan_to_num(goal_probability['statistic'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Calculate xT\nCalculate xT until convergence. Initially the expected threat is set to the shot probability\nmultiplied by the goal probability. This means the expected value\nin the first step is the probability of scoring from the grid cell if the person takes a shot.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"xt = np.multiply(shot_probability_matrix, goal_probability_matrix)\ndiff = 1\niteration = 0\nwhile np.any(diff > 0.00001): # iterate until the differences between the old and new xT is small\n xt_copy = xt.copy() # keep a copy for comparing the differences\n # calculate the new expected threat\n xt = (np.multiply(shot_probability_matrix, goal_probability_matrix) +\n np.multiply(move_probability_matrix,\n np.multiply(move_transition_matrix, np.expand_dims(xt, axis=(0, 1))).sum(\n axis=(2, 3)))\n )\n diff = (xt - xt_copy)\n iteration += 1\nprint('Number of iterations:', iteration)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plot xT grid\nPlot the xT grid\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"path_eff = [path_effects.Stroke(linewidth=1.5, foreground='black'),\n path_effects.Normal()]\n# new bin statistic for plotting xt only\nfor_plotting = pitch.bin_statistic(event['x'], event['y'], bins=bins)\nfor_plotting['statistic'] = xt\nfig, ax = pitch.draw(figsize=(14, 9.625))\n_ = pitch.heatmap(for_plotting, ax=ax)\n_ = pitch.label_heatmap(for_plotting, ax=ax, str_format='{:.2%}',\n color='white', fontsize=14, va='center', ha='center',\n path_effects=path_eff)\n# sphinx_gallery_thumbnail_path = 'gallery/tutorials/images/sphx_glr_plot_xt_004'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Scoring events\nWe score each successful move as the additional expected threat gained from\nmoving from one grid cell to another grid cell.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# first get grid start and end cells\ngrid_start = pitch.bin_statistic(move_success.x, move_success.y, bins=bins)\ngrid_end = pitch.bin_statistic(move_success.end_x, move_success.end_y, bins=bins)\n\n# then get the xT values from the start and end grid cell\nstart_xt = xt[grid_start['binnumber'][1], grid_start['binnumber'][0]]\nend_xt = xt[grid_end['binnumber'][1], grid_end['binnumber'][0]]\n\n# then calculate the added xT\nadded_xt = end_xt - start_xt\nmove_success['xt'] = added_xt\n\n# show players with top 5 total expected threat\nmove_success.groupby('player_name')['xt'].sum().sort_values(ascending=False).head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Improvements\nNow we have a simple model, let's try to make some\n`sphx_glr_gallery_tutorials_plot_xt_improvements.py` model.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"plt.show() # If you are using a Jupyter notebook you do not need this line"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
}
},
"nbformat": 4,
"nbformat_minor": 0
}