File size: 5,399 Bytes
d6ea71e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
.. currentmodule:: socceraction.data.opta

=========================
Loading Opta data
=========================

`Opta's event stream data`_ comes in many different flavours. The
:class:`OptaLoader` class provides an API client enabling you to fetch
data from the following data feeds as Pandas DataFrames:

- Opta F1, F9 and F24 JSON feeds
- Opta F7 and F24 XML feeds
- StatsPerform MA1 and MA3 JSON feeds
- WhoScored.com JSON data

Currently, only loading data from local files is supported.

--------------------------
Connecting to a data store
--------------------------

First, you have to create a :class:`OptaLoader` object and configure it
for the data feeds you want to use.

Generic setup
=============

To set up a :class:`OptaLoader` you have to specify the root
directory, the filename hierarchy of the feeds and a parser for each feed.
For example::

  from socceraction.data.opta import OptaLoader, parsers

  api = OptaLoader(
    root="data/opta",
    feeds = {
        "f7": "f7-{competition_id}-{season_id}-{game_id}.xml",
        "f24": "f24-{competition_id}-{season_id}-{game_id}.xml",
    }
    parser={
        "f7": parsers.F7XMLParser,
        "f24": parsers.F24XMLParser
    }
  )


Since the loader uses the directory structure and file names to determine
which files should be parsed, the root directory should have a predefined
file hierarchy defined in the ``feeds`` argument. A wide range of file names
and directory structures are supported. However, the competition, season, and
game identifiers must be included in the file names to be able to locate the
corresponding files for each entity. For example, you might have grouped feeds
by competition and season as follows::

  root
  β”œβ”€β”€ competition_<competition_id>
  β”‚   β”œβ”€β”€ season_<season_id>
  β”‚   β”‚   β”œβ”€β”€ f7_<game_id>.xml
  β”‚   β”‚   └── f24_<game_id>.xml
  β”‚   └── ...
  └── ...

In this case, you can use the following feeds configuration::

    feeds = {
        "f7": "competition_{competition_id}/season_{season_id}/f7_{game_id}.xml",
        "f24": "competition_{competition_id}/season_{season_id}/f24_{game_id}.xml",
    }

.. note::

   On Windows, the backslash character should be used as a path separator.

Furthermore, a few standard configurations are provided. These are listed below.


Opta F7 and F24 XML feeds
=========================

.. code-block:: python

  from socceraction.data.opta import OptaLoader

  api = OptaLoader(root="data/opta", parser="xml")

The root directory should have the following structure:

.. code-block::

  root
  β”œβ”€β”€ f7-{competition_id}-{season_id}.xml
  β”œβ”€β”€ f24-{competition_id}-{season_id}-{game_id}.xml
  └── ...


Opta F1, F9 and F24 JSON feeds
==============================

.. code-block:: python

  from socceraction.data.opta import OptaLoader

  api = OptaLoader(root="data/opta", parser="json")

The root directory should have the following structure:

.. code-block::

  root
  β”œβ”€β”€ f1-{competition_id}-{season_id}.json
  β”œβ”€β”€ f9-{competition_id}-{season_id}.json
  β”œβ”€β”€ f24-{competition_id}-{season_id}-{game_id}.json
  └── ...

StatsPerform MA1 and MA3 JSON feeds
===================================

.. code-block:: python

  from socceraction.data.opta import OptaLoader

  api = OptaLoader(root="data/statsperform", parser="statsperform")

The root directory should have the following structure:

.. code-block::

  root
  β”œβ”€β”€ ma1-{competition_id}-{season_id}.json
  β”œβ”€β”€ ma3-{competition_id}-{season_id}-{game_id}.json
  └── ...


WhoScored
=========

`WhoScored.com`_ is a popular website that provides detailed live match statistics.
These statistics are compiled from Opta's event feed, which can be scraped
from the website's source code using a library such as `soccerdata`_. Once you
have downloaded the raw JSON data, you can parse it using the :class:`OptaLoader`
with:

.. code-block:: python

  from socceraction.data.opta import OptaLoader

  api = OptaLoader(root="data/whoscored", parser="whoscored")

The root directory should have the following structure:

.. code-block::

  root
  β”œβ”€β”€ {competition_id}-{season_id}-{game_id}.json
  └── ...


Alternatively, the soccerdata library provides a wrapper that immediately
returns a :class:`OptaLoader` object for a scraped dataset.

.. code-block:: python

  import soccerdata as sd

  # Setup a scraper for the 2021/2022 Premier League season
  ws = sd.WhoScored(leagues="ENG-Premier League", seasons=2021)
  # Scrape all games and return a OptaLoader object
  api = ws.read_events(output_fmt='loader')


.. warning::

   Scraping data from WhoScored.com violates their terms of service. Legally,
   scraping this data is therefore a grey area. If you decide to use this
   data anyway, this is your own responsibility.


------------
Loading data
------------

Next, you can load the match event stream data and metadata by calling the
corresponding methods on the :class:`OptaLoader` object.

- :func:`OptaLoader.competitions()`
- :func:`OptaLoader.games()`
- :func:`OptaLoader.teams()`
- :func:`OptaLoader.players()`
- :func:`OptaLoader.events()`

.. _Opta's event stream data: https://www.statsperform.com/opta-event-definitions/
.. _soccerdata: https://soccerdata.readthedocs.io/en/latest/datasources/WhoScored.html
.. _WhoScored.com: https://www.whoscored.com/