New International Swimming League Season - New Version(s) of SwimmeR
Welcome back to Swimming + Data Science friends. Last week something exciting happened - two things actually! First, the International Swimming League kicked off its 2020 season in Budapest. Second, and directly related, I released version 0.5.0 of SwimmeR
to CRAN with functions to read ISL results from the inaugural 2019 season and from the newly begun 2020 season. ISL functions in SwimmeR v0.5.0
were fully tested on all the available meets and working great. Then however, there was a problem. ISL did me dirty.
They changed their reporting for the second meet, in a way that broke SwimmeR v0.5.0
. What a pain! I quickly patched the problem and was preparing another CRAN submission but then I thought “hmmm there’s more ISL meets next week. What if they do it again?”. I’ve decided that rather than pestering the good folks at CRAN with another version of SwimmeR
every time someone at ISL decides to mess around I’m going to limit myself to releasing development versions of SwimmeR
throughout the 2020 ISL season. Hopefully that will give ISL time to settle on a results format, at which point I’ll do another CRAN release. We start today with the now-available SwimmeR v0.5.1
.
SwimmeR v0.5.1
includes the function swim_parse_ISL
specifically for dealing with ISL results. So update your version of SwimmeR
with devtools::install_github()
and let’s get going.
devtools::install_github("gpilgrim2670/SwimmeR", build_vignettes = TRUE, force = TRUE)
In addition to the new SwimmeR v0.5.1
we’ll use the always excellent dplyr
, purrr
, and stringr
and take these new ISL results for a spin. I also want flextable
for reporting, and my special flextable_style
function.
library(SwimmeR)
library(dplyr)
library(purrr)
library(stringr)
library(flextable)
flextable_style <- function(x) {
x %>%
flextable() %>%
bold(part = "header") %>% # bold header
bg(bg = "#D3D3D3", part = "header") %>% # puts gray background behind the header row
align_nottext_col(align = "center", header = TRUE, footer = TRUE)# center alignment
}
ISL Results
There have been two ISL meets thus far, both in the “Budapest Bubble”. Results are available at SwimSwam.
match_1 <- "https://cdn.swimswam.com/wp-content/uploads/2020/10/Results_Book_Match_1_V2.pdf"
match_2 <- "https://cdn.swimswam.com/wp-content/uploads/2020/10/Results_Book_Full_M2-1.pdf"
ISL_matches <- c(match_1, match_2)
swim_parse_ISL
works just like our old friend swim_parse
. It takes the output of read_results
and returns a data frame. It’s dead simple.
match_1 %>%
read_results() %>%
swim_parse_ISL() %>%
head(5) %>%
flextable_style()
We do have a list of match results though, so rather than doing them individually let’s do them all at once with some tidyverse
magic. All we have to do is pass our list of ISL match results to read_results
and then to swim_parse_ISL
. Since we have a list we’ll use map
to do the passing, applying read_results
and swim_parse_ISL
to each element, that is, each match result in the list of matches. Then we’ll name the resulting list elements, which are two data frames, by match number (1 and 2) with setNames
and stick them together with bind_rows
.
ISL_results <-
map(ISL_matches , read_results) %>% # map SwimmeR::read_results over the list of links
map(swim_parse_ISL) %>% # now it's swim_parse_ILS's turn
setNames(c(1, 2)) %>% # name the dataframes 1 and 2 respectively
bind_rows(.id = "Match") %>% # stick the dataframes together, with a new column called "Match" which will contain the relevant dataframe name, either 1 or 2
mutate(Match = as.numeric(Match))
ISL Dataframe - Now What?
Now we have one big data frame of ISL results, almost exactly like we do when we use swim_parse
.
ISL_results %>%
head() %>%
flextable_style()
Match | Place | Lane | Name | Team | Finals_Time | Event | Points | DQ |
1 | 1 | 4 | SJOSTROM Sarah | ENS | 56.00 | Women's 100m Butterfly | 9 | 0 |
1 | 2 | 3 | SHKURDAI Anastasiya | ENS | 56.07 | Women's 100m Butterfly | 7 | 0 |
1 | 3 | 5 | DAHLIA Kelsi | CAC | 56.70 | Women's 100m Butterfly | 6 | 0 |
1 | 4 | 6 | BROWN Erika | CAC | 56.80 | Women's 100m Butterfly | 5 | 0 |
1 | 5 | 8 | SURKOVA Arina | NYB | 57.18 | Women's 100m Butterfly | 4 | 0 |
1 | 6 | 7 | OTTESEN Jeanette | NYB | 57.81 | Women's 100m Butterfly | 3 | 0 |
Those of you who are close readers of SwimmeR
documentation (so all of you, right?) know that Lilly King is a hero around these parts. Let’s see how she’s doing in the ISL.
ISL_results %>%
filter(Name == "KING Lilly") %>% # only want Lilly's results
flextable_style()
Match | Place | Lane | Name | Team | Finals_Time | Event | Points | DQ |
1 | 1 | 5 | KING Lilly | CAC | 2:17.11 | Women's 200m Breaststroke | 15 | 0 |
1 | 1 | 5 | KING Lilly | CAC | 28.86 | Women's 50m Breaststroke | 19 | 0 |
1 | 1 | 3 | KING Lilly | CAC | 1:03.16 | Women's 100m Breaststroke | 24 | 0 |
1 | 1 | 3 | KING Lilly | CAC | 29.16 | Women's 50m Breaststroke Skins | 15 | 0 |
1 | 1 | 3 | KING Lilly | CAC | 29.25 | Women's 50m Breaststroke Skins Round 2 | 14 | 0 |
1 | 1 | 4 | KING Lilly | CAC | 28.90 | Women's 50m Breaststroke Skins Final | 21 | 0 |
So Lilly swam 6 races and won all of them. That sounds like her. Lilly only swam in the first match though. Let’s look at the women’s breaststrokes in both matches. We’ll exclude the skins matches because in match 2 the skins races weren’t breaststroke. Probably because none of the teams in that match had Lilly King.
First we’ll filter out events that aren’t women’s breaststroke. Then we’ll create a new column with times in seconds format (total seconds) rather than minutes:seconds.hundreths using the sec_format
function from SwimmeR
. Next we’ll group_by
event, arrange
the entries in order of time, change the places with mutate
to reflect our new ordering and finally, check out the results.
ISL_results %>%
filter(str_detect(Event, "Women's \\d{2,3}m Breaststroke$") == TRUE) %>% # only want women's breaststroke events
mutate(Time_sec = sec_format(Finals_Time)) %>% # convert times to second format
group_by(Event) %>%
arrange(Time_sec) %>% # order entries by increasing time
mutate(Place = rank(Time_sec)) %>% # recode place to new order, based on time
select(-Time_sec, Lane, Points) %>% # don't need these columns
slice(1:3) %>% # top three finishers in each event
flextable_style()
Match | Place | Lane | Name | Team | Finals_Time | Event | Points | DQ |
1 | 1 | 3 | KING Lilly | CAC | 1:03.16 | Women's 100m Breaststroke | 24 | 0 |
1 | 2 | 5 | PILATO Benedetta | ENS | 1:03.67 | Women's 100m Breaststroke | 7 | 0 |
2 | 3 | 4 | ATKINSON Alia | LON | 1:04.21 | Women's 100m Breaststroke | 10 | 0 |
1 | 1 | 5 | KING Lilly | CAC | 2:17.11 | Women's 200m Breaststroke | 15 | 0 |
1 | 2 | 7 | ESCOBEDO Emily | NYB | 2:18.46 | Women's 200m Breaststroke | 7 | 0 |
2 | 3 | 4 | LAZOR Annie | LON | 2:18.85 | Women's 200m Breaststroke | 15 | 0 |
1 | 1 | 5 | KING Lilly | CAC | 28.86 | Women's 50m Breaststroke | 19 | 0 |
1 | 2 | 4 | PILATO Benedetta | ENS | 28.97 | Women's 50m Breaststroke | 7 | 0 |
1 | 3 | 6 | HANNIS Molly | CAC | 29.04 | Women's 50m Breaststroke | 6 | 0 |
Turns out Lilly King is dominant in both matches. Makes sense. SwimmeR v0.5.1
is working well.
In Closing
So that’s how you can use SwimmeR v0.5.1
to get ISL results into R
. Not to bad right? Now we’ll just wait and see what other mischief the results people over at ISL can come up next week. If and when they strike again I’ll update SwimmeR
appropriately. If you find an issue please leave a comment, or post it on github. That’s it until next time here at Swimming + Data Science.
Updated: 18 November, 2021
Created: 20 October, 2020