Task
Tell a data story of British traffic accidents ending with death or
serious injury (“road casualties”) in the past five years, based on the
authentic data from the British government’s website
data.gov.uk
.
Introduction
In this exercise, you are going to work with authentic data “in the
wild”. Fortunately, these data sets are quite neat. Also, the site
provides a metadata file (guide.xlsx
). It seems to have
been created for traffic data safety data sets produced by the
Department of Transport, and it is not clear how well it corresponds to
the casualty data sets. It may be outdated or just incomplete. This is
very often the case in open data provided by public administration.
This real-world exercise will: - make you familiar with the RNotebook
format - RMarkdown. You can comfortably write a long text in it as well
as embed code chunks. With RNotebooks you can generate nice reports that
you can directly export to html, pdf, or even MS Word. For a quick
reference to Markdown formatting, go to RStudio Help > Markdown Quick
Reference (which opens in the RStudio Help pane) or to https://raw.githubusercontent.com/rstudio/cheatsheets/main/rmarkdown.pdf.
Road casualties 2019
rc_2019 <- read_csv("https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-2019.csv", show_col_types = FALSE)
rc_2020 <- read_csv("https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-2020.csv", show_col_types = FALSE)
rc_2021 <- read_csv("https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-2021.csv", show_col_types = FALSE)
rc_2022 <- read_csv("https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-2022.csv", show_col_types = FALSE)
rc_2023 <- read_csv("https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-2023.csv", show_col_types = FALSE)
Error in (function (con, what, n = 1L, size = NA_integer_, signed = TRUE, :
Failure when receiving data from the peer
All casualties
Make labels comprehensible
Here is an excel spreadsheet with some metadata. It obviously
contains many more statistics.
Find values of the field name
column that also occur in
our data (all
) Use the base R intersect
Select interesting columns
Select a few labels from the guide that you find interesting
How to check that many tables have identical columns (including
their order)
By pairs
The base R way, a bit of manual work but good enough. Take the
vectors of column names of pairs of tables and call ==
or
all.equal()
. Note that the all.equal
function
also compares only two vectors, although its name tempts us to think
otherwise.
Create a data frame of column names
Read the data
LS0tDQp0aXRsZTogIlJvYWQgY2FzdWFsdGllcyBpbiB0aGUgVUsgKGRhdGEuZ292LnVrKSINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCiMgVGFzaw0KVGVsbCBhIGRhdGEgc3Rvcnkgb2YgQnJpdGlzaCB0cmFmZmljIGFjY2lkZW50cyBlbmRpbmcgd2l0aCBkZWF0aCBvciBzZXJpb3VzIGluanVyeSAoInJvYWQgY2FzdWFsdGllcyIpIGluIHRoZSBwYXN0IGZpdmUgeWVhcnMsIGJhc2VkIG9uIHRoZSBhdXRoZW50aWMgZGF0YSBmcm9tIHRoZSBCcml0aXNoIGdvdmVybm1lbnQncyB3ZWJzaXRlIGBkYXRhLmdvdi51a2AuICAgDQoNCg0KIyMgSW50cm9kdWN0aW9uDQpJbiB0aGlzIGV4ZXJjaXNlLCB5b3UgYXJlIGdvaW5nIHRvIHdvcmsgd2l0aCBhdXRoZW50aWMgZGF0YSAiaW4gdGhlIHdpbGQiLiANCkZvcnR1bmF0ZWx5LCB0aGVzZSBkYXRhIHNldHMgYXJlIHF1aXRlIG5lYXQuIEFsc28sIHRoZSBzaXRlIHByb3ZpZGVzIGEgbWV0YWRhdGEgZmlsZSAoYGd1aWRlLnhsc3hgKS4gDQpJdCBzZWVtcyB0byBoYXZlIGJlZW4gY3JlYXRlZCBmb3IgdHJhZmZpYyBkYXRhIHNhZmV0eSBkYXRhIHNldHMgcHJvZHVjZWQgYnkgdGhlIERlcGFydG1lbnQgb2YgVHJhbnNwb3J0LCBhbmQgaXQgaXMgbm90IGNsZWFyIGhvdyB3ZWxsIGl0IGNvcnJlc3BvbmRzIHRvIHRoZSBjYXN1YWx0eSBkYXRhIHNldHMuIEl0IG1heSBiZSBvdXRkYXRlZCBvciBqdXN0IGluY29tcGxldGUuIFRoaXMgaXMgdmVyeSBvZnRlbiB0aGUgY2FzZSBpbiBvcGVuIGRhdGEgcHJvdmlkZWQgYnkgcHVibGljIGFkbWluaXN0cmF0aW9uLiAgDQoNClRoaXMgcmVhbC13b3JsZCBleGVyY2lzZSB3aWxsOg0KLSBtYWtlIHlvdSBmYW1pbGlhciB3aXRoIHRoZSBSTm90ZWJvb2sgZm9ybWF0IC0gUk1hcmtkb3duLiBZb3UgY2FuIGNvbWZvcnRhYmx5IHdyaXRlIGEgbG9uZyB0ZXh0IGluIGl0IGFzIHdlbGwgYXMgZW1iZWQgY29kZSBjaHVua3MuIFdpdGggUk5vdGVib29rcyB5b3UgY2FuIGdlbmVyYXRlIG5pY2UgcmVwb3J0cyB0aGF0IHlvdSBjYW4gZGlyZWN0bHkgZXhwb3J0IHRvIGh0bWwsIHBkZiwgb3IgZXZlbiBNUyBXb3JkLiBGb3IgYSBxdWljayByZWZlcmVuY2UgdG8gTWFya2Rvd24gZm9ybWF0dGluZywgZ28gdG8gUlN0dWRpbyBIZWxwID4gTWFya2Rvd24gUXVpY2sgUmVmZXJlbmNlICh3aGljaCBvcGVucyBpbiB0aGUgUlN0dWRpbyBIZWxwIHBhbmUpIG9yIHRvIGh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9yc3R1ZGlvL2NoZWF0c2hlZXRzL21haW4vcm1hcmtkb3duLnBkZi4gDQoNCg0KDQoNCg0KDQoNCiMjIA0KDQpgYGB7cn0NCmxpYnJhcnkocmVhZHIpDQpsaWJyYXJ5KGRwbHlyKQ0KYGBgDQoNClJvYWQgU2FmZXR5IERhdGEgKERlcGFydG1lbnQgb2YgVHJhbnNwb3J0KQ0KDQoNCmh0dHBzOi8vd3d3LmRhdGEuZ292LnVrL2RhdGFzZXQvY2I3YWU2ZjAtNGJlNi00OTM1LTkyNzctNDdlNWNlMjRhMTFmL3JvYWQtc2FmZXR5LWRhdGENCg0KIyBSb2FkIGNhc3VhbHRpZXMgMjAxOQ0KYGBge3J9DQpyY18yMDE5IDwtIHJlYWRfY3N2KCJodHRwczovL2RhdGEuZGZ0Lmdvdi51ay9yb2FkLWFjY2lkZW50cy1zYWZldHktZGF0YS9kZnQtcm9hZC1jYXN1YWx0eS1zdGF0aXN0aWNzLWNhc3VhbHR5LTIwMTkuY3N2Iiwgc2hvd19jb2xfdHlwZXMgPSBGQUxTRSkNCmBgYA0KYGBge3J9DQpyY18yMDIwIDwtIHJlYWRfY3N2KCJodHRwczovL2RhdGEuZGZ0Lmdvdi51ay9yb2FkLWFjY2lkZW50cy1zYWZldHktZGF0YS9kZnQtcm9hZC1jYXN1YWx0eS1zdGF0aXN0aWNzLWNhc3VhbHR5LTIwMjAuY3N2Iiwgc2hvd19jb2xfdHlwZXMgPSBGQUxTRSkNCmBgYA0KDQpgYGB7cn0NCnJjXzIwMjEgPC0gcmVhZF9jc3YoImh0dHBzOi8vZGF0YS5kZnQuZ292LnVrL3JvYWQtYWNjaWRlbnRzLXNhZmV0eS1kYXRhL2RmdC1yb2FkLWNhc3VhbHR5LXN0YXRpc3RpY3MtY2FzdWFsdHktMjAyMS5jc3YiLCBzaG93X2NvbF90eXBlcyA9IEZBTFNFKQ0KYGBgDQpgYGB7cn0NCnJjXzIwMjIgPC0gcmVhZF9jc3YoImh0dHBzOi8vZGF0YS5kZnQuZ292LnVrL3JvYWQtYWNjaWRlbnRzLXNhZmV0eS1kYXRhL2RmdC1yb2FkLWNhc3VhbHR5LXN0YXRpc3RpY3MtY2FzdWFsdHktMjAyMi5jc3YiLCBzaG93X2NvbF90eXBlcyA9IEZBTFNFKQ0KYGBgDQpgYGB7cn0NCnJjXzIwMjMgPC0gcmVhZF9jc3YoImh0dHBzOi8vZGF0YS5kZnQuZ292LnVrL3JvYWQtYWNjaWRlbnRzLXNhZmV0eS1kYXRhL2RmdC1yb2FkLWNhc3VhbHR5LXN0YXRpc3RpY3MtY2FzdWFsdHktMjAyMy5jc3YiLCBzaG93X2NvbF90eXBlcyA9IEZBTFNFKQ0KYGBgDQoNCiMjIEFsbCBjYXN1YWx0aWVzDQoNCmBgYHtyfQ0KYWxsX3JjIDwtIGJpbmRfcm93cyhyY18yMDE5LCByY18yMDIwLCByY18yMDIxLCByY18yMDIyLCByY18yMDIzKQ0KI3N1bW1hcnkoYWxsKQ0KI3N0cihhbGwpDQpnbGltcHNlKGFsbF9yYykNCmBgYA0KIyMgTWFrZSBsYWJlbHMgY29tcHJlaGVuc2libGUNCkhlcmUgaXMgYW4gZXhjZWwgc3ByZWFkc2hlZXQgd2l0aCBzb21lIG1ldGFkYXRhLiBJdCBvYnZpb3VzbHkgY29udGFpbnMgDQptYW55IG1vcmUgc3RhdGlzdGljcy4gDQoNCmBgYHtyfQ0KZG93bmxvYWQuZmlsZSh1cmwgPSAiaHR0cHM6Ly9kYXRhLmRmdC5nb3YudWsvcm9hZC1hY2NpZGVudHMtc2FmZXR5LWRhdGEvZGZ0LXJvYWQtY2FzdWFsdHktc3RhdGlzdGljcy1yb2FkLXNhZmV0eS1vcGVuLWRhdGFzZXQtZGF0YS1ndWlkZS0yMDIzLnhsc3giLCANCiAgICAgICAgICAgICAgZGVzdGZpbGUgPSAiZ3VpZGUueGxzeCIpDQpsaWJyYXJ5KHJlYWR4bCkgDQpndWlkZSA8LSByZWFkX3hsc3gocGF0aCA9ICJndWlkZS54bHN4Iiwgc2hlZXQ9MSkgIyB0aGlzIGZ1bmN0aW9uIHJlYWRzIG9ubHkgbG9jYWwgZmlsZXMsIG5vIHVybHMNCmdsaW1wc2UoZ3VpZGUpDQpgYGANCkZpbmQgdmFsdWVzIG9mIHRoZSBgZmllbGQgbmFtZWAgY29sdW1uIHRoYXQgYWxzbyBvY2N1ciBpbiBvdXIgZGF0YSAoYGFsbGApIA0KVXNlIHRoZSBiYXNlIFIgYGludGVyc2VjdGANCmBgYHtyfQ0KaW50ZXJzZWN0KHggPSBjb2xuYW1lcyhhbGwpLCB5ID0gZ3VpZGUkYGZpZWxkIG5hbWVgKQ0KYGBgDQojIyMgU2VsZWN0IGludGVyZXN0aW5nIGNvbHVtbnMNClNlbGVjdCBhIGZldyBsYWJlbHMgZnJvbSB0aGUgZ3VpZGUgdGhhdCB5b3UgZmluZCBpbnRlcmVzdGluZw0KDQoNCg0KDQoNCg0KDQoNCg0KDQoNCg0KDQoNCiMgSG93IHRvIGNoZWNrIHRoYXQgbWFueSB0YWJsZXMgaGF2ZSBpZGVudGljYWwgY29sdW1ucyAoaW5jbHVkaW5nIHRoZWlyIG9yZGVyKQ0KIyMjIEJ5IHBhaXJzDQpUaGUgYmFzZSBSIHdheSwgYSBiaXQgb2YgbWFudWFsIHdvcmsgYnV0IGdvb2QgZW5vdWdoLiANClRha2UgdGhlIHZlY3RvcnMgb2YgY29sdW1uIG5hbWVzIG9mIHBhaXJzIG9mIHRhYmxlcyBhbmQgY2FsbCBgPT1gIG9yIGBhbGwuZXF1YWwoKWAuIE5vdGUgdGhhdCB0aGUgYGFsbC5lcXVhbGAgZnVuY3Rpb24gYWxzbyBjb21wYXJlcyBvbmx5IHR3byB2ZWN0b3JzLCBhbHRob3VnaCBpdHMgbmFtZSB0ZW1wdHMgdXMgdG8gdGhpbmsgb3RoZXJ3aXNlLiANCmBgYHtyfQ0KY29sbmFtZXMocmNfMjAxOSkgPT0gY29sbmFtZXMocmNfMjAyMCkNCmBgYA0KYGBge3J9DQphbGwuZXF1YWwoY29sbmFtZXMocmNfMjAxOSksIGNvbG5hbWVzKHJjXzIwMjEpKQ0KYGBgDQojIyMgQ3JlYXRlIGEgZGF0YSBmcmFtZSBvZiBjb2x1bW4gbmFtZXMgDQoNCg0KDQpgYGB7cn0NCmRhdGEuZnJhbWUoY29sbmFtZXMocmNfMjAxOSksDQogICAgICAgICAgIGNvbG5hbWVzKHJjXzIwMjApLCANCiAgICAgICAgICAgY29sbmFtZXMocmNfMjAyMSksIA0KICAgICAgICAgICBjb2xuYW1lcyhyY18yMDIyKSwNCiAgICAgICAgICAgY29sbmFtZXMocmNfMjAyMykpICU+JSANCiAgcm93d2lzZSgpICU+JQ0KICBtdXRhdGUoaG93X21hbnlfZGlzdGluY3QgPQ0KICAgICAgICAgICBuX2Rpc3RpbmN0KHBpY2soZXZlcnl0aGluZygpKSkpICU+JSANCiAgdW5ncm91cCgpICU+JSANCiAgcmVsb2NhdGUoaG93X21hbnlfZGlzdGluY3QpICU+JQ0KICBhcnJhbmdlKGRlc2MoaG93X21hbnlfZGlzdGluY3QpKQ0KICAgICAgICAgICAgICAgICAgICAgICAgICANCg0KYGBgDQoNCg0KDQoNCg0KIyMgUmVhZCB0aGUgZGF0YQ0K