Swimming + Data Science

New Version of SwimmeR and the Next Round of the State-Off Tournament

The first round of the 2020 High School Swimming State-Off Tournament is in the books and saw California (1), Texas (2), Florida, and Pennsylvania (5) advance.

Before beginning the next round there are a few administrative details I’d like to cover.

  1. First and foremost: SwimmeR version 0.4.1 is now available on CRAN! The State-Off has been the first major outing for my SwimmeR package. We’ve used it extensively to read in and parse swimming results from a variety of sources, including “normal” html web pages, Hy-Tek real time results pages, and .pdf files. It’s performed admirably, but some bugs have revealed themselves behind the scenes. Version 0.4.1 contains bug fixes plus a host of new features:
  • A version of results_score, the function we developed during the State-Off. It handles timed finals style meets (like the State-Off) but also scores prelims-finals style meets, a more common and also more complex format.
library(stringr)
library(dplyr)
library(purrr)
library(SwimmeR)
library(flextable)

Please note the following analysis was updated November 22nd 2020 to reflect changes beginning with SwimmeR v0.6.0 released via CRAN on November 22nd 2020. Please make sure your version of SwimmeR is up-to-date.


base <- "http://sidearmstats.com/auburn/swim/200218F0"
event_numbers <-
  1:42 # sequence of numbers, total of 42 events across men and women
event_numbers <-
  str_pad(event_numbers,
          width = 2,
          side = "left",
          pad = "0") # add leading zeros to single digit numbers
SEC_Links <-
  paste0(base, event_numbers, ".htm") # paste together base urls and sequence of numbers (with leading zeroes as needed)

SEC_Results <-
  map(SEC_Links, read_results, node = "pre") %>% # map SwimmeR::read_results over the list of links
  map(
    swim_parse,
    typo = c(
      "A&M",
      "FLOR",
      "Celaya-Hernande",
      # names which were cut off, and missing the last, first structure
      "Hernandez-Tome",
      "Garcia Varela,",
      "Von Biberstein,"
    ),
    replacement = c(
      "AM",
      "Florida",
      "Celaya, Hernande",
      # replacement names that artificially impose last, first structure.  Names can be fixed after parsing
      "Hernandez, Tome",
      "Garcia, Varela",
      "Von, Biberstein"
    )
  ) %>%
  bind_rows()


# some diving finals results don't list places 9-24, which do score.  we can get those divers from the prelim results
SEC_Diving_Prelims_Links <-
  c(
    "http://sidearmstats.com/auburn/swim/200218P015.htm",
    # M 1m prelims
    "http://sidearmstats.com/auburn/swim/200218P001.htm",
    # W 1m prelims
    "http://sidearmstats.com/auburn/swim/200218P022.htm",
    # W 3m prelims
    "http://sidearmstats.com/auburn/swim/200218P029.htm",
    # M platform prelims
    "http://sidearmstats.com/auburn/swim/200218P040.htm"
  ) # W platform prelims

SEC_Diving_Prelims <-
  map(SEC_Diving_Prelims_Links, read_results, node = "pre") %>% # map SwimmeR::read_results over the list of links
  map(
    swim_parse,
    typo = c("A&M", "FLOR", "Celaya-Hernande", "Garcia Varela,"),
    replacement = c("AM", "Florida", "Celaya, Hernande", "Garcia, Varela")
  ) %>%
  bind_rows()

SEC_Diving_Prelims <- SEC_Diving_Prelims %>%
  anti_join(SEC_Results, by = c("Name", "Team", "Event")) # make sure divers aren't counted twice for a given event

SEC_Results <- bind_rows(SEC_Results, SEC_Diving_Prelims)

SEC_Results <-
  SEC_Results %>% # actual use of new results_score function
  results_score(
    events = unique(SEC_Results$Event),
    meet_type = "prelims_finals",
    lanes = 8,
    scoring_heats = 3,
    point_values = c(
      32,
      28,
      27,
      26,
      25,
      24,
      23,
      22,
      20,
      17,
      16,
      15,
      14,
      13,
      12,
      11,
      9,
      7,
      6,
      5,
      4,
      3,
      2,
      1
    )
  )

SEC_Results_Gender <- SEC_Results %>%
  mutate(Gender = case_when(str_detect(Event, "Men") ~ "M",
                            str_detect(Event, "Women") ~ "F")) %>%
  group_by(Team, Gender) %>%
  summarise(Score = sum(Points, na.rm = TRUE)) %>%
  arrange(desc(Score)) %>%
  arrange(Gender) %>%
  ungroup() %>%
  group_split(Gender)


The scored results match the official results for women:

SEC_Results_Gender[[1]] %>%
  flextable() %>%
  bold(part = "header") %>%
  bg(bg = "#D3D3D3", part = "header") %>%
  autofit()
SEC Women Final Scores

SEC Women Final Scores



Scores also match for men:

SEC_Results_Gender[[2]] %>%
  flextable() %>%
  bold(part = "header") %>%
  bg(bg = "#D3D3D3", part = "header") %>%
  autofit()
SEC Men Final Scores

SEC Men Final Scores


  • The ability to read in .hy3 files. Hy-Tek .hy3 files are another form of results, intended to be read into Team Manager. As of version 0.4.1 SwimmeR can now also read them. This feature is not complete and will evolve in future releases. Bug reports are welcome at the SwimmeR github page. Here though we can use it to read in results from the USA Swimming 2019 December Sectional Meet for CA and NV.
temp <- tempfile()
temp2 <- tempfile()
url <-
  "http://www.pacswim.org/userfiles/meets/documents/1691/meet-results-speedo-sectionals-2019-ca-nv-december-2019-13dec2019-003.zip"

download.file(url, temp)
unzip(zipfile = temp, exdir = temp2)
raw_results <-
  read_results(
    file.path(
      temp2,
      "Meet Results-Speedo Sectionals 2019 CA-NV December 2019-13Dec2019-003.hy3"
    )
  )
unlink(c(temp, temp2))

results <- swim_parse(raw_results) %>%
  mutate(Event = str_replace(Event, "NA", "Yard"))

results %>%
  filter(Event == "100 Yard Butterfly",
         Gender == "M") %>%
  select(Name, Team, Prelims_Time, Finals_Time) %>%
  arrange(Finals_Time) %>%
  head(5) %>%
  flextable() %>%
  bold(part = "header") %>%
  bg(bg = "#D3D3D3", part = "header") %>%
  autofit()


  • Recording of DQ and Exhibition swims in the output of swim_parse, as the columns DQ and Exhibition respectively. This ended up being important for results_score, since Exhibition and DQ swimmers can’t score.


Ithaca_Union <-
  swim_parse(
    read_results(
      "https://athletics.ithaca.edu/services/download_file.ashx?file_location=https://s3.amazonaws.com/sidearm.sites/bombers.ithaca.edu/documents/2020/2/1/ithaca_vs_union_2020.pdf"
    )
  )

Ithaca_Union %>%
  filter(Event == "Men 400 Yard Freestyle Relay") %>%
  select(Place, Team, Finals_Time, Exhibition, DQ) %>%
  flextable() %>%
  bold(part = "header") %>%
  bg(bg = "#D3D3D3", part = "header") %>%
  autofit()

We can see that in the Mens 400 Yard Freestyle Relay the third place relay was exhibition (Exhibition == 1) and that another relay was disqualified (DQ == 1). Official Results: Men 400 Yard Freestyle Relay


  • Bug fixes include fixing an issue where tied athletes, with "*" in front of their places would not be imported, an issue where times or scores with a “J” in front of them (a Hy-Tek marker meaning a time/score was judged) would not be imported.
  1. Since we’ve already read in results for each state I’m not going to re-read them in each State-Off post going forward. Instead I’m hosting the results on github and will just pull them from there. Don’t worry, there will still be plenty of work for SwimmeR to do.

  2. Continuing from point 2, the focus of the first round was mostly on demonstrating how to read in swimming data with Swimmer. This next round will focus more on exactly what that data is and how to use it.

Thanks for joining us, and don’t forget to update your version of SwimmeR in preparation of the next round of the High School Swimming State-Off Tournament!