Task

Tell a data story of British traffic accidents ending with death or serious injury (“road casualties”) in the past five years, based on the authentic data from the British government’s website data.gov.uk.

Introduction

In this exercise, you are going to work with authentic data “in the wild”. Fortunately, these data sets are quite neat. Also, the site provides a metadata file (guide.xlsx). It seems to have been created for traffic data safety data sets produced by the Department of Transport, and it is not clear how well it corresponds to the casualty data sets. It may be outdated or just incomplete. This is very often the case in open data provided by public administration.

This real-world exercise will: - make you familiar with the RNotebook format - RMarkdown. You can comfortably write a long text in it as well as embed code chunks. With RNotebooks you can generate nice reports that you can directly export to html, pdf, or even MS Word. For a quick reference to Markdown formatting, go to RStudio Help > Markdown Quick Reference (which opens in the RStudio Help pane) or to https://raw.githubusercontent.com/rstudio/cheatsheets/main/rmarkdown.pdf.

library(readr)
library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Road Safety Data (Department of Transport)

https://www.data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data

Road casualties 2019

rc_2019 <- read_csv("https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-2019.csv", show_col_types = FALSE)
rc_2020 <- read_csv("https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-2020.csv", show_col_types = FALSE)
rc_2021 <- read_csv("https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-2021.csv", show_col_types = FALSE)
rc_2022 <- read_csv("https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-2022.csv", show_col_types = FALSE)
rc_2023 <- read_csv("https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-2023.csv", show_col_types = FALSE)
Error in (function (con, what, n = 1L, size = NA_integer_, signed = TRUE,  : 
  Failure when receiving data from the peer

All casualties

Make labels comprehensible

Here is an excel spreadsheet with some metadata. It obviously contains many more statistics.

Find values of the field name column that also occur in our data (all) Use the base R intersect

Select interesting columns

Select a few labels from the guide that you find interesting

How to check that many tables have identical columns (including their order)

By pairs

The base R way, a bit of manual work but good enough. Take the vectors of column names of pairs of tables and call == or all.equal(). Note that the all.equal function also compares only two vectors, although its name tempts us to think otherwise.

Create a data frame of column names

Read the data

LS0tDQp0aXRsZTogIlJvYWQgY2FzdWFsdGllcyBpbiB0aGUgVUsgKGRhdGEuZ292LnVrKSINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCiMgVGFzaw0KVGVsbCBhIGRhdGEgc3Rvcnkgb2YgQnJpdGlzaCB0cmFmZmljIGFjY2lkZW50cyBlbmRpbmcgd2l0aCBkZWF0aCBvciBzZXJpb3VzIGluanVyeSAoInJvYWQgY2FzdWFsdGllcyIpIGluIHRoZSBwYXN0IGZpdmUgeWVhcnMsIGJhc2VkIG9uIHRoZSBhdXRoZW50aWMgZGF0YSBmcm9tIHRoZSBCcml0aXNoIGdvdmVybm1lbnQncyB3ZWJzaXRlIGBkYXRhLmdvdi51a2AuICAgDQoNCg0KIyMgSW50cm9kdWN0aW9uDQpJbiB0aGlzIGV4ZXJjaXNlLCB5b3UgYXJlIGdvaW5nIHRvIHdvcmsgd2l0aCBhdXRoZW50aWMgZGF0YSAiaW4gdGhlIHdpbGQiLiANCkZvcnR1bmF0ZWx5LCB0aGVzZSBkYXRhIHNldHMgYXJlIHF1aXRlIG5lYXQuIEFsc28sIHRoZSBzaXRlIHByb3ZpZGVzIGEgbWV0YWRhdGEgZmlsZSAoYGd1aWRlLnhsc3hgKS4gDQpJdCBzZWVtcyB0byBoYXZlIGJlZW4gY3JlYXRlZCBmb3IgdHJhZmZpYyBkYXRhIHNhZmV0eSBkYXRhIHNldHMgcHJvZHVjZWQgYnkgdGhlIERlcGFydG1lbnQgb2YgVHJhbnNwb3J0LCBhbmQgaXQgaXMgbm90IGNsZWFyIGhvdyB3ZWxsIGl0IGNvcnJlc3BvbmRzIHRvIHRoZSBjYXN1YWx0eSBkYXRhIHNldHMuIEl0IG1heSBiZSBvdXRkYXRlZCBvciBqdXN0IGluY29tcGxldGUuIFRoaXMgaXMgdmVyeSBvZnRlbiB0aGUgY2FzZSBpbiBvcGVuIGRhdGEgcHJvdmlkZWQgYnkgcHVibGljIGFkbWluaXN0cmF0aW9uLiAgDQoNClRoaXMgcmVhbC13b3JsZCBleGVyY2lzZSB3aWxsOg0KLSBtYWtlIHlvdSBmYW1pbGlhciB3aXRoIHRoZSBSTm90ZWJvb2sgZm9ybWF0IC0gUk1hcmtkb3duLiBZb3UgY2FuIGNvbWZvcnRhYmx5IHdyaXRlIGEgbG9uZyB0ZXh0IGluIGl0IGFzIHdlbGwgYXMgZW1iZWQgY29kZSBjaHVua3MuIFdpdGggUk5vdGVib29rcyB5b3UgY2FuIGdlbmVyYXRlIG5pY2UgcmVwb3J0cyB0aGF0IHlvdSBjYW4gZGlyZWN0bHkgZXhwb3J0IHRvIGh0bWwsIHBkZiwgb3IgZXZlbiBNUyBXb3JkLiBGb3IgYSBxdWljayByZWZlcmVuY2UgdG8gTWFya2Rvd24gZm9ybWF0dGluZywgZ28gdG8gUlN0dWRpbyBIZWxwID4gTWFya2Rvd24gUXVpY2sgUmVmZXJlbmNlICh3aGljaCBvcGVucyBpbiB0aGUgUlN0dWRpbyBIZWxwIHBhbmUpIG9yIHRvIGh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9yc3R1ZGlvL2NoZWF0c2hlZXRzL21haW4vcm1hcmtkb3duLnBkZi4gDQoNCg0KDQoNCg0KDQoNCiMjIA0KDQpgYGB7cn0NCmxpYnJhcnkocmVhZHIpDQpsaWJyYXJ5KGRwbHlyKQ0KYGBgDQoNClJvYWQgU2FmZXR5IERhdGEgKERlcGFydG1lbnQgb2YgVHJhbnNwb3J0KQ0KDQoNCmh0dHBzOi8vd3d3LmRhdGEuZ292LnVrL2RhdGFzZXQvY2I3YWU2ZjAtNGJlNi00OTM1LTkyNzctNDdlNWNlMjRhMTFmL3JvYWQtc2FmZXR5LWRhdGENCg0KIyBSb2FkIGNhc3VhbHRpZXMgMjAxOQ0KYGBge3J9DQpyY18yMDE5IDwtIHJlYWRfY3N2KCJodHRwczovL2RhdGEuZGZ0Lmdvdi51ay9yb2FkLWFjY2lkZW50cy1zYWZldHktZGF0YS9kZnQtcm9hZC1jYXN1YWx0eS1zdGF0aXN0aWNzLWNhc3VhbHR5LTIwMTkuY3N2Iiwgc2hvd19jb2xfdHlwZXMgPSBGQUxTRSkNCmBgYA0KYGBge3J9DQpyY18yMDIwIDwtIHJlYWRfY3N2KCJodHRwczovL2RhdGEuZGZ0Lmdvdi51ay9yb2FkLWFjY2lkZW50cy1zYWZldHktZGF0YS9kZnQtcm9hZC1jYXN1YWx0eS1zdGF0aXN0aWNzLWNhc3VhbHR5LTIwMjAuY3N2Iiwgc2hvd19jb2xfdHlwZXMgPSBGQUxTRSkNCmBgYA0KDQpgYGB7cn0NCnJjXzIwMjEgPC0gcmVhZF9jc3YoImh0dHBzOi8vZGF0YS5kZnQuZ292LnVrL3JvYWQtYWNjaWRlbnRzLXNhZmV0eS1kYXRhL2RmdC1yb2FkLWNhc3VhbHR5LXN0YXRpc3RpY3MtY2FzdWFsdHktMjAyMS5jc3YiLCBzaG93X2NvbF90eXBlcyA9IEZBTFNFKQ0KYGBgDQpgYGB7cn0NCnJjXzIwMjIgPC0gcmVhZF9jc3YoImh0dHBzOi8vZGF0YS5kZnQuZ292LnVrL3JvYWQtYWNjaWRlbnRzLXNhZmV0eS1kYXRhL2RmdC1yb2FkLWNhc3VhbHR5LXN0YXRpc3RpY3MtY2FzdWFsdHktMjAyMi5jc3YiLCBzaG93X2NvbF90eXBlcyA9IEZBTFNFKQ0KYGBgDQpgYGB7cn0NCnJjXzIwMjMgPC0gcmVhZF9jc3YoImh0dHBzOi8vZGF0YS5kZnQuZ292LnVrL3JvYWQtYWNjaWRlbnRzLXNhZmV0eS1kYXRhL2RmdC1yb2FkLWNhc3VhbHR5LXN0YXRpc3RpY3MtY2FzdWFsdHktMjAyMy5jc3YiLCBzaG93X2NvbF90eXBlcyA9IEZBTFNFKQ0KYGBgDQoNCiMjIEFsbCBjYXN1YWx0aWVzDQoNCmBgYHtyfQ0KYWxsX3JjIDwtIGJpbmRfcm93cyhyY18yMDE5LCByY18yMDIwLCByY18yMDIxLCByY18yMDIyLCByY18yMDIzKQ0KI3N1bW1hcnkoYWxsKQ0KI3N0cihhbGwpDQpnbGltcHNlKGFsbF9yYykNCmBgYA0KIyMgTWFrZSBsYWJlbHMgY29tcHJlaGVuc2libGUNCkhlcmUgaXMgYW4gZXhjZWwgc3ByZWFkc2hlZXQgd2l0aCBzb21lIG1ldGFkYXRhLiBJdCBvYnZpb3VzbHkgY29udGFpbnMgDQptYW55IG1vcmUgc3RhdGlzdGljcy4gDQoNCmBgYHtyfQ0KZG93bmxvYWQuZmlsZSh1cmwgPSAiaHR0cHM6Ly9kYXRhLmRmdC5nb3YudWsvcm9hZC1hY2NpZGVudHMtc2FmZXR5LWRhdGEvZGZ0LXJvYWQtY2FzdWFsdHktc3RhdGlzdGljcy1yb2FkLXNhZmV0eS1vcGVuLWRhdGFzZXQtZGF0YS1ndWlkZS0yMDIzLnhsc3giLCANCiAgICAgICAgICAgICAgZGVzdGZpbGUgPSAiZ3VpZGUueGxzeCIpDQpsaWJyYXJ5KHJlYWR4bCkgDQpndWlkZSA8LSByZWFkX3hsc3gocGF0aCA9ICJndWlkZS54bHN4Iiwgc2hlZXQ9MSkgIyB0aGlzIGZ1bmN0aW9uIHJlYWRzIG9ubHkgbG9jYWwgZmlsZXMsIG5vIHVybHMNCmdsaW1wc2UoZ3VpZGUpDQpgYGANCkZpbmQgdmFsdWVzIG9mIHRoZSBgZmllbGQgbmFtZWAgY29sdW1uIHRoYXQgYWxzbyBvY2N1ciBpbiBvdXIgZGF0YSAoYGFsbGApIA0KVXNlIHRoZSBiYXNlIFIgYGludGVyc2VjdGANCmBgYHtyfQ0KaW50ZXJzZWN0KHggPSBjb2xuYW1lcyhhbGwpLCB5ID0gZ3VpZGUkYGZpZWxkIG5hbWVgKQ0KYGBgDQojIyMgU2VsZWN0IGludGVyZXN0aW5nIGNvbHVtbnMNClNlbGVjdCBhIGZldyBsYWJlbHMgZnJvbSB0aGUgZ3VpZGUgdGhhdCB5b3UgZmluZCBpbnRlcmVzdGluZw0KDQoNCg0KDQoNCg0KDQoNCg0KDQoNCg0KDQoNCiMgSG93IHRvIGNoZWNrIHRoYXQgbWFueSB0YWJsZXMgaGF2ZSBpZGVudGljYWwgY29sdW1ucyAoaW5jbHVkaW5nIHRoZWlyIG9yZGVyKQ0KIyMjIEJ5IHBhaXJzDQpUaGUgYmFzZSBSIHdheSwgYSBiaXQgb2YgbWFudWFsIHdvcmsgYnV0IGdvb2QgZW5vdWdoLiANClRha2UgdGhlIHZlY3RvcnMgb2YgY29sdW1uIG5hbWVzIG9mIHBhaXJzIG9mIHRhYmxlcyBhbmQgY2FsbCBgPT1gIG9yIGBhbGwuZXF1YWwoKWAuIE5vdGUgdGhhdCB0aGUgYGFsbC5lcXVhbGAgZnVuY3Rpb24gYWxzbyBjb21wYXJlcyBvbmx5IHR3byB2ZWN0b3JzLCBhbHRob3VnaCBpdHMgbmFtZSB0ZW1wdHMgdXMgdG8gdGhpbmsgb3RoZXJ3aXNlLiANCmBgYHtyfQ0KY29sbmFtZXMocmNfMjAxOSkgPT0gY29sbmFtZXMocmNfMjAyMCkNCmBgYA0KYGBge3J9DQphbGwuZXF1YWwoY29sbmFtZXMocmNfMjAxOSksIGNvbG5hbWVzKHJjXzIwMjEpKQ0KYGBgDQojIyMgQ3JlYXRlIGEgZGF0YSBmcmFtZSBvZiBjb2x1bW4gbmFtZXMgDQoNCg0KDQpgYGB7cn0NCmRhdGEuZnJhbWUoY29sbmFtZXMocmNfMjAxOSksDQogICAgICAgICAgIGNvbG5hbWVzKHJjXzIwMjApLCANCiAgICAgICAgICAgY29sbmFtZXMocmNfMjAyMSksIA0KICAgICAgICAgICBjb2xuYW1lcyhyY18yMDIyKSwNCiAgICAgICAgICAgY29sbmFtZXMocmNfMjAyMykpICU+JSANCiAgcm93d2lzZSgpICU+JQ0KICBtdXRhdGUoaG93X21hbnlfZGlzdGluY3QgPQ0KICAgICAgICAgICBuX2Rpc3RpbmN0KHBpY2soZXZlcnl0aGluZygpKSkpICU+JSANCiAgdW5ncm91cCgpICU+JSANCiAgcmVsb2NhdGUoaG93X21hbnlfZGlzdGluY3QpICU+JQ0KICBhcnJhbmdlKGRlc2MoaG93X21hbnlfZGlzdGluY3QpKQ0KICAgICAgICAgICAgICAgICAgICAgICAgICANCg0KYGBgDQoNCg0KDQoNCg0KIyMgUmVhZCB0aGUgZGF0YQ0K