NOTE: web update Nov 28, 2024. Next year, check the links again. They happened to update it in the middle of this lecture series… :-O

Introduction

This is a continuation of work with UK government data on Road Accidents. Here comes a brief summary. For details, please refer to the file data_wrangling_task.Rmd in the folder 2024-11-15and22.

library(dplyr, warn.conflicts = FALSE, quietly = TRUE)
library(ggplot2, warn.conflicts = FALSE, quietly = TRUE)
library(readr, warn.conflicts = FALSE, quietly = TRUE)
library(readxl, warn.conflicts = FALSE, quietly = TRUE)

Road Safety Data (Department of Transport)

This is the website that contains the casualty datasets.

https://www.data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data

This is a cleaned version of the source data. Each row corresponds to one person who was killed or injured in a traffic accident on British roads between 2019 and 2023.

casualties <- read_tsv("all_casualties_labeled.tsv")
Rows: 665408 Columns: 10
── Column specification ─────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (6): ID, accident_reference, casualty_class, casualty_type, sex_of_casualty, casualty_severity
dbl (4): rowid, accident_year, age_of_casualty, vehicle_reference

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(casualties)
Rows: 665,408
Columns: 10
$ ID                 <chr> "rc_2019", "rc_2019", "rc_2019", "rc_2019", "rc_2019", "rc_2019", "rc_2019",…
$ rowid              <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 2…
$ accident_reference <chr> "010128300", "010128300", "010128300", "010152270", "010155191", "010155192"…
$ accident_year      <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019…
$ age_of_casualty    <dbl> 58, -1, -1, 24, 21, 68, 47, 16, 20, 41, 25, 40, 24, 20, 25, 24, 28, 74, 34, …
$ vehicle_reference  <dbl> 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2…
$ casualty_class     <chr> "Driver or rider", "Passenger", "Passenger", "Driver or rider", "Passenger",…
$ casualty_type      <chr> "Car occupant", "Car occupant", "Car occupant", "Car occupant", "Cyclist", "…
$ sex_of_casualty    <chr> "Male", "Female", "Female", "Female", "Female", "Male", "Female", "Female", …
$ casualty_severity  <chr> "Slight", "Slight", "Slight", "Slight", "Slight", "Serious", "Slight", "Slig…

Which labels do we have there and how are the casualties distributed within these labels?

Casualties by year

casualties %>% group_by(accident_year) %>% count()

Bar plot of casualties counts by year

Accidents including a casualty broken by years

Hint: Casualties are individual people. There could be more than one casualty in an accident. You need to de-duplicate rows with identical accident ids or aggregate the data frame accordingly.

Computing accidents (not casualties) broken by years

Plotting accidents broken by years

Casualties broken by years and casualty class

Casualty class says whether the casualty was the driver, passenger or pedestrian

Compute casualty_class broken by year

Plot casualties broken by years and casualty classes

Draw a bar plot. Play around with the position and fill of the bars.

Casualty severity

Is is the same every year? Facet the bar plot by year

Casualty’s age

Do the age distributions of the casualties differ by year? Create an aggregated table containing for each year the mean, median, 25th percentile, and 75th percentile of casualties’ ages.

Draw boxplots or violin plots for the individual years

Casualty’s age and casualty type

A question about how the data is labeled: how are casualty_class and casualty_type associated?

The worst accident - who were the casualties?

Find the worst accident (max count of casualties)

aggreg <- casualties %>% 
  group_by(accident_reference) %>% 
  count() %>% 
  ungroup()
glimpse(aggreg)
Rows: 519,549
Columns: 2
$ accident_reference <chr> "010128300", "010152270", "010155191", "010155192", "010155194", "010155195"…
$ n                  <int> 3, 1, 1, 1, 2, 3, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1…
max_reference <- aggreg %>% 
  slice_max(order_by = n, n = 1) %>% 
  pull(accident_reference)

Create a separate data frame of casualties from this accident and present as many relevant details about them as you can find.

worst_accident <- casualties %>% filter(accident_reference == max_reference)

Model the age and sex of casualties

What was the age and sex of the motorbike rider? We know that the person was on Vehicle 1

LS0tDQp0aXRsZTogIkJyaXRpc2ggQ2FzdWFsdGllcyBpbiB0cmFmZmljIGFjY2lkZW50cyINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQo+IE5PVEU6IHdlYiB1cGRhdGUgTm92IDI4LCAyMDI0LiBOZXh0IHllYXIsIGNoZWNrIHRoZSBsaW5rcyBhZ2Fpbi4gVGhleSBoYXBwZW5lZCB0byB1cGRhdGUgIGl0IGluIHRoZSBtaWRkbGUgb2YgdGhpcyBsZWN0dXJlIHNlcmllcy4uLiA6LU8gDQoNCiMgSW50cm9kdWN0aW9uDQpUaGlzIGlzIGEgY29udGludWF0aW9uIG9mIHdvcmsgd2l0aCBVSyBnb3Zlcm5tZW50IGRhdGEgb24gUm9hZCBBY2NpZGVudHMuIA0KSGVyZSBjb21lcyBhIGJyaWVmIHN1bW1hcnkuIEZvciBkZXRhaWxzLCBwbGVhc2UgcmVmZXIgdG8gdGhlIGZpbGUgYGRhdGFfd3JhbmdsaW5nX3Rhc2suUm1kYCBpbiB0aGUgZm9sZGVyIGAyMDI0LTExLTE1YW5kMjJgLiANCg0KYGBge3J9DQpsaWJyYXJ5KGRwbHlyLCB3YXJuLmNvbmZsaWN0cyA9IEZBTFNFLCBxdWlldGx5ID0gVFJVRSkNCmxpYnJhcnkoZ2dwbG90Miwgd2Fybi5jb25mbGljdHMgPSBGQUxTRSwgcXVpZXRseSA9IFRSVUUpDQpsaWJyYXJ5KHJlYWRyLCB3YXJuLmNvbmZsaWN0cyA9IEZBTFNFLCBxdWlldGx5ID0gVFJVRSkNCmxpYnJhcnkocmVhZHhsLCB3YXJuLmNvbmZsaWN0cyA9IEZBTFNFLCBxdWlldGx5ID0gVFJVRSkNCmBgYA0KDQoNCiMjIFJvYWQgU2FmZXR5IERhdGEgKERlcGFydG1lbnQgb2YgVHJhbnNwb3J0KQ0KDQpUaGlzIGlzIHRoZSB3ZWJzaXRlIHRoYXQgY29udGFpbnMgdGhlIGNhc3VhbHR5IGRhdGFzZXRzLiANCg0KPGh0dHBzOi8vd3d3LmRhdGEuZ292LnVrL2RhdGFzZXQvY2I3YWU2ZjAtNGJlNi00OTM1LTkyNzctNDdlNWNlMjRhMTFmL3JvYWQtc2FmZXR5LWRhdGE+DQoNClRoaXMgaXMgYSBjbGVhbmVkIHZlcnNpb24gb2YgdGhlIHNvdXJjZSBkYXRhLiBFYWNoIHJvdyBjb3JyZXNwb25kcyB0byBvbmUgcGVyc29uIHdobyB3YXMga2lsbGVkIG9yIGluanVyZWQgaW4gYSB0cmFmZmljIGFjY2lkZW50IG9uIEJyaXRpc2ggcm9hZHMgYmV0d2VlbiAyMDE5IGFuZCAyMDIzLiANCg0KYGBge3J9DQpjYXN1YWx0aWVzIDwtIHJlYWRfdHN2KCJhbGxfY2FzdWFsdGllc19sYWJlbGVkLnRzdiIpDQpnbGltcHNlKGNhc3VhbHRpZXMpDQpgYGANCg0KV2hpY2ggbGFiZWxzIGRvIHdlIGhhdmUgdGhlcmUgYW5kIGhvdyBhcmUgdGhlIGNhc3VhbHRpZXMgZGlzdHJpYnV0ZWQgd2l0aGluIHRoZXNlIGxhYmVscz8NCg0KIyBDYXN1YWx0aWVzIGJ5IHllYXINCg0KYGBge3J9DQpjYXN1YWx0aWVzICU+JSBncm91cF9ieSguLi4uKSAlPiUgY291bnQoKQ0KYGBgDQpCYXIgcGxvdCBvZiBjYXN1YWx0aWVzIGNvdW50cyBieSB5ZWFyDQoNCmBgYHtyfQ0KY2FzdWFsdGllcyAlPiUgDQogIGdncGxvdChtYXBwaW5nID0gYWVzKHggPSBhY2NpZGVudF95ZWFyKSkgKyANCiAgZ2VvbV9iYXIoKSArIA0KICBzY2FsZV95X2NvbnRpbnVvdXMoYnJlYWtzID0gc2VxKDAsIDE2MDAwMCwgMjAwMDApLCANCiAgICAgICAgICAgICAgICAgICAgIG5hbWUgPSAiQ2FzdWFsdGllcyBjb3VudCIpICsNCiAgc2NhbGVfeF9jb250aW51b3VzKG5hbWUgPSAiQWNjaWRlbnQgeWVhciIpICsNCiAgZ2d0aXRsZShsYWJlbCA9ICJDYXN1YWx0aWVzIG9uIEJyaXRpc2ggcm9hZHMgYnJva2VuIGJ5IHllYXIiKQ0KYGBgDQoNCiMgQWNjaWRlbnRzIGluY2x1ZGluZyBhIGNhc3VhbHR5IGJyb2tlbiBieSB5ZWFycw0KSGludDogQ2FzdWFsdGllcyBhcmUgaW5kaXZpZHVhbCBwZW9wbGUuIFRoZXJlIGNvdWxkIGJlIG1vcmUgdGhhbiBvbmUgY2FzdWFsdHkgaW4gYW4gYWNjaWRlbnQuIA0KWW91IG5lZWQgdG8gZGUtZHVwbGljYXRlIHJvd3Mgd2l0aCBpZGVudGljYWwgYWNjaWRlbnQgaWRzIG9yIGFnZ3JlZ2F0ZSB0aGUgZGF0YSBmcmFtZSBhY2NvcmRpbmdseS4gDQoNCkNvbXB1dGluZyAqYWNjaWRlbnRzKiAobm90IGNhc3VhbHRpZXMpIGJyb2tlbiBieSB5ZWFycw0KDQpgYGB7cn0NCg0KYGBgDQoNClBsb3R0aW5nIGFjY2lkZW50cyBicm9rZW4gYnkgeWVhcnMNCg0KDQoNCiMgQ2FzdWFsdGllcyBicm9rZW4gYnkgeWVhcnMgYW5kIGNhc3VhbHR5IGNsYXNzIA0KQ2FzdWFsdHkgY2xhc3Mgc2F5cyB3aGV0aGVyIHRoZSBjYXN1YWx0eSB3YXMgdGhlIGRyaXZlciwgcGFzc2VuZ2VyIG9yIHBlZGVzdHJpYW4NCg0KQ29tcHV0ZSBgY2FzdWFsdHlfY2xhc3NgIGJyb2tlbiBieSB5ZWFyDQpgYGB7cn0NCmNhc3VhbHRpZXMgJT4lIGdyb3VwX2J5KCkgJT4lIGNvdW50KCkNCmBgYA0KUGxvdCBjYXN1YWx0aWVzIGJyb2tlbiBieSB5ZWFycyBhbmQgY2FzdWFsdHkgY2xhc3Nlcw0KDQpEcmF3IGEgYmFyIHBsb3QuIFBsYXkgYXJvdW5kIHdpdGggdGhlIHBvc2l0aW9uIGFuZCBmaWxsIG9mIHRoZSBiYXJzLiAgIA0KDQpgYGB7cn0NCg0KYGBgDQoNCg0KIyBDYXN1YWx0eSBzZXZlcml0eQ0KDQpgYGB7cn0NCmNhc3VhbHRpZXMgJT4lIGdyb3VwX2J5KGNhc3VhbHR5X3NldmVyaXR5KSAlPiUgY291bnQoKQ0KYGBgDQoNCmBgYHtyfQ0KY2FzdWFsdGllcyAlPiUgDQogIGdncGxvdChhZXMoeCA9IGNhc3VhbHR5X3NldmVyaXR5LCBmaWxsID0gY2FzdWFsdHlfY2xhc3MpKSArDQogIGdlb21fYmFyKHBvc2l0aW9uID0gImRvZGdlIikNCg0KYGBgDQoNCklzIGlzIHRoZSBzYW1lIGV2ZXJ5IHllYXI/IEZhY2V0IHRoZSBiYXIgcGxvdCBieSB5ZWFyDQoNCmBgYHtyfQ0KY2FzdWFsdGllcyAlPiUgDQogIGdncGxvdChhZXMoeCA9IGNhc3VhbHR5X3NldmVyaXR5LCBmaWxsID0gY2FzdWFsdHlfY2xhc3MpKSArDQogIGdlb21fYmFyKHBvc2l0aW9uID0gImRvZGdlIikgKw0KICBmYWNldF93cmFwKH4gYWNjaWRlbnRfeWVhcikNCmBgYA0KDQoNCg0KYGBge3J9DQpjYXN1YWx0aWVzICU+JSBnZ3Bsb3QoYWVzKHggPSBjYXN1YWx0eV9zZXZlcml0eSwgeSA9IGNhc3VhbHR5X2NsYXNzKSkgKyANCiAgZ2VvbV9iaW4yZCgpIA0KYGBgDQoNCiMgQ2FzdWFsdHkncyBhZ2UgDQpEbyB0aGUgYWdlIGRpc3RyaWJ1dGlvbnMgb2YgdGhlIGNhc3VhbHRpZXMgZGlmZmVyIGJ5IHllYXI/IA0KQ3JlYXRlIGFuIGFnZ3JlZ2F0ZWQgdGFibGUgY29udGFpbmluZyBmb3IgZWFjaCB5ZWFyIHRoZSBtZWFuLCBtZWRpYW4sIDI1dGggcGVyY2VudGlsZSwgYW5kIDc1dGggcGVyY2VudGlsZSBvZiBjYXN1YWx0aWVzJyBhZ2VzLiANCg0KYGBge3J9DQpjYXN1YWx0aWVzICU+JSBncm91cF9ieShhY2NpZGVudF95ZWFyKSAlPiUgDQogIHN1bW1hcml6ZShtZWFuX2FnZSA9IG1lYW4oYWdlX29mX2Nhc3VhbHR5KSwgDQogICAgICAgICAgICBtZWRpYW5fYWdlID0gbWVkaWFuKGFnZV9vZl9jYXN1YWx0eSksDQogICAgICAgICAgICBRMV9hZ2UgPSBxdWFudGlsZShhZ2Vfb2ZfY2FzdWFsdHksIHByb2JzID0gMC4yNSksDQogICAgICAgICAgICBRM19hZ2UgPSBxdWFudGlsZShhZ2Vfb2ZfY2FzdWFsdHksIHByb2JzID0gMC43NSkpDQpgYGANCg0KRHJhdyBib3hwbG90cyBvciB2aW9saW4gcGxvdHMgZm9yIHRoZSBpbmRpdmlkdWFsIHllYXJzDQoNCmBgYHtyfQ0KDQpgYGANCg0KDQoNCg0KDQojIENhc3VhbHR5J3MgYWdlIGFuZCBjYXN1YWx0eSB0eXBlDQoNCg0KDQoNCg0KDQojIEEgcXVlc3Rpb24gYWJvdXQgaG93IHRoZSBkYXRhIGlzIGxhYmVsZWQ6IGhvdyBhcmUgYGNhc3VhbHR5X2NsYXNzYCBhbmQgYGNhc3VhbHR5X3R5cGVgIGFzc29jaWF0ZWQ/IA0KDQpgYGB7cn0NCmNhc3VhbHRpZXMgJT4lIGdyb3VwX2J5KGNhc3VhbHR5X2NsYXNzLCBjYXN1YWx0eV90eXBlKSAlPiUgY291bnQoKQ0KYGBgDQoNClRoZSB3b3JzdCBhY2NpZGVudCAtIHdobyB3ZXJlIHRoZSBjYXN1YWx0aWVzPw0KDQpGaW5kIHRoZSB3b3JzdCBhY2NpZGVudCAobWF4IGNvdW50IG9mIGNhc3VhbHRpZXMpDQoNCmBgYHtyfQ0KYWdncmVnIDwtIGNhc3VhbHRpZXMgJT4lIA0KICBncm91cF9ieShhY2NpZGVudF9yZWZlcmVuY2UpICU+JSANCiAgY291bnQoKSAlPiUgDQogIHVuZ3JvdXAoKQ0KZ2xpbXBzZShhZ2dyZWcpDQpgYGANCg0KYGBge3J9DQptYXhfcmVmZXJlbmNlIDwtIGFnZ3JlZyAlPiUgDQogIHNsaWNlX21heChvcmRlcl9ieSA9IG4sIG4gPSAxKSAlPiUgDQogIHB1bGwoYWNjaWRlbnRfcmVmZXJlbmNlKQ0KYGBgDQoNCg0KQ3JlYXRlIGEgc2VwYXJhdGUgZGF0YSBmcmFtZSBvZiBjYXN1YWx0aWVzIGZyb20gdGhpcyBhY2NpZGVudCBhbmQgcHJlc2VudCBhcyBtYW55IHJlbGV2YW50IGRldGFpbHMgYWJvdXQgdGhlbSBhcyB5b3UgY2FuIGZpbmQuIA0KDQoNCmBgYHtyfQ0Kd29yc3RfYWNjaWRlbnQgPC0gY2FzdWFsdGllcyAlPiUgZmlsdGVyKGFjY2lkZW50X3JlZmVyZW5jZSA9PSBtYXhfcmVmZXJlbmNlKQ0KYGBgDQoNCmBgYHtyfQ0Kd29yc3RfYWNjaWRlbnQgJT4lIGdyb3VwX2J5KGNhc3VhbHR5X3R5cGUpICU+JSBjb3VudCgpDQpgYGANCmBgYHtyfQ0Kd29yc3RfYWNjaWRlbnQgJT4lIGdyb3VwX2J5KGNhc3VhbHR5X2NsYXNzKSAlPiUgY291bnQoKQ0KYGBgDQoNCmBgYHtyfQ0Kd29yc3RfYWNjaWRlbnQgJT4lIA0KICBncm91cF9ieSh2ZWhpY2xlX3JlZmVyZW5jZSwgY2FzdWFsdHlfY2xhc3MpICU+JSANCiAgY291bnQoKQ0KYGBgDQoNCk1vZGVsIHRoZSBhZ2UgYW5kIHNleCBvZiBjYXN1YWx0aWVzDQoNCmBgYHtyfQ0Kd29yc3RfYWNjaWRlbnQgJT4lIA0KICBnZ3Bsb3QoYWVzKHggPSBhZ2Vfb2ZfY2FzdWFsdHkpKSArIA0KICBnZW9tX2JhcihhZXMoZmlsbCA9IHNleF9vZl9jYXN1YWx0eSkpICsgDQogIHNjYWxlX3hfY29udGludW91cyhicmVha3MgPSBzZXEoMCwgbWF4KHdvcnN0X2FjY2lkZW50JGFnZV9vZl9jYXN1YWx0eSksIDIpKSsNCiAgc2NhbGVfeV9jb250aW51b3VzKGJyZWFrcyA9IHNlcSgwLCA2LCAxKSkNCmBgYA0KDQoNCldoYXQgd2FzIHRoZSBhZ2UgYW5kIHNleCBvZiB0aGUgbW90b3JiaWtlIHJpZGVyPyBXZSBrbm93IHRoYXQgdGhlIHBlcnNvbiB3YXMgb24gVmVoaWNsZSAxIA0KDQpgYGB7cn0NCndvcnN0X2FjY2lkZW50ICU+JSBmaWx0ZXIodmVoaWNsZV9yZWZlcmVuY2UgPT0gMSkNCmBgYA0KDQoNCg0KDQoNCg0KDQoNCg0KDQoNCg0KDQoNCg0KDQo=