Swimming + Data Science

Handling Splits with SwimmeR

Splits are generally reported in one of two formats, cumulative or lap. When working with data I find lap format to be more useful, but what’s most useful is to have all splits in the same format. This post discusses how to do just that with data from swimming and track. First step is to make sure you have the most recent versions of SwimmeR and JumpeR installed. SwimmeR is available from CRAN, JumpeR is from github.

install.packages("SwimmeR")
devtools::install_github("gpilgrim2670/JumpeR")

This post will consist of a demonstration of the tools available in SwimmeR for converting between split formats and their applicability to swimming and track data.

library(SwimmeR)
library(JumpeR)
library(dplyr)
library(flextable)

flextable_style <- function(x) {
  x %>%
    flextable() %>%
    bold(part = "header") %>% # bolds header
    bg(bg = "#D3D3D3", part = "header") %>%  # puts gray background behind the header row
    autofit()
}



Split Formats

Cumulative splits accumulate over the duration of an event. Say an athlete was clocked at 30.00 seconds for her first 50 yards. That 30.00 seconds is her 50 split. If the clock keeps running, and she’s clocked at 1:05.00 at the 100y mark that 1:05.00 is her cumulative 100 split. It contains the 30.00 50 split inside it. Her lap 100y split is 1:05.00 minus 30.00, which is 35.00. Lap splits are generally preferred, because they’re more specific. Rather than containing information about the entire race to a given point they only contain information about one specific component (lap) of a race.

Results with only cumulative splits do exist. Luckily the SwimmeR package contains functions to convert between the two types.



Cumulative Splits to Lap Splits

link <- "https://swimswam.com/wp-content/uploads/2019/03/D3.NCAA-2005.pdf"
df <- link %>% 
  SwimmeR::read_results() %>% 
  swim_parse(splits = TRUE, avoid = c("QUALIFYING", "NCAA"))

df_demo <- df %>% 
  filter(Event == "WOMEN's 200 Yard BUTTERFLY") %>% 
  filter(is.na(Name) == FALSE) %>% 
  head(3) %>% 
  select(Name, Finals_Time, contains("Split")) %>% 
  select(where(~!all(is.na(.x))))

df_demo %>% 
  flextable_style() %>% 
  set_caption("Raw Results, Cumulative Splits")

These splits, from the 2005 DIII NCAA championships are cumulative, which is not ideal. Enter SwimmeR’s splits_to_lap function, which uhhh, converts splits to lap format.

df_demo %>% 
  splits_to_lap() %>% 
  flextable_style() %>% 
  set_caption("Splits Converted to Lap Format")

Of course if you’re some kind of sicko and like cumulative splits SwimmeR is also (begrudgingly) here for you with the splits_to_cumulative function. Here’s splits_to_cumulative undoing all the good work of splits_to_lap.

df_demo %>% 
  splits_to_lap() %>% 
  splits_to_cumulative() %>% 
  flextable_style() %>% 
  set_caption("Splits Converted back to Cumulative Format")



Data Frames with Mixed Cumulative and Lap Splits

Here at Swimming + Data Science we often assemble data frames from multiple meets. That means that some splits in a given data frame could be in cumulative format, while others are in lap format. How can we deal with a mixed format data frame? Well, here’s a example data frame with data from two swimmers, one with lap format splits and the other with cumulative.

df_mixed <- data.frame(
  Place = 1,
  Name = c("Lenore Lap", "Casey Cumulative"),
  Team = rep("KVAC", 2),
  Event = rep("Womens 200 Freestyle", 2),
  Finals_Time = rep("1:58.00", 2),
  Split_50 = rep("28.00", 2),
  Split_100 = c("31.00", "59.00"),
  Split_150 = c("30.00", "1:29.00"),
  Split_200 = c("29.00", "1:58.00")
)

df_mixed %>% 
  flextable_style() %>% 
  set_caption("Mixed Lap and Cumulative Splits")

In order to convert cumulative splits to lap format, but not interfere with those splits already in lap format it’s necessary to set a parameter called threshold in splits_to_lap. Setting threshold defines a maximum acceptable split value. All splits greater than threshold will be converted to lap format, and all splits less threshold will be unchanged. Looking at the table above all of Lenore Lap’s splits are less than 31.01, and all of Casey Cumulative’s cumulative splits are greater than 58.99, so any value between 31.01 and 58.99 will work for threshold. I’ll use threshold = 35 for this example

df_mixed %>% 
  splits_to_lap(threshold = 35) %>% 
  flextable_style() %>% 
  set_caption("All Splits in Lap Format")

Similarly splits_to_cumulatve also has a threshold parameter, which serves the same purpose. In splits_to_cumulative the threshold parameter is basically a minimum split time. The fastest (i.e. minimum) split in df_mixed is 28.00, so any value less than 28.00 will work. I’ll use threshold = 27.99 and all splits will be converted to cumulative format.

df_mixed %>% 
  splits_to_cumulative(threshold = 27.99) %>% 
  flextable_style() %>% 
  set_caption("All Splits in Cumulative Format")



Track and Field

Hadley has the tidyverse, his empire of interconnected packages. I’ve got my two, SwimmeR and JumpeR, which together make up the name-pending-verse. Send in your ideas.

A goal for the two packages going forward is to make utility functions, like splits_to_lap and splits_to_cumulative work for data gathered with both packages. This is an ongoing goal, and not fully realized, but the split handling functions are a step in the right direction.

Here’s an example of track data read in with JumpeR. The splits are in cumulative format.

df_track <- "https://www.flashresults.com/2017_Meets/Outdoor/04-29_VirginiaGrandPrix/025-1-01.htm" %>% 
  flash_parse_table(clean = TRUE, wide_format = TRUE) %>% 
  select(Name, Event, contains("Split")) %>% 
  head(3)

df_track %>% 
  flextable_style() %>% 
  set_caption("Track Results in Cumulative Format")

Converting these track splits to lap format is done exactly the same way as with swimming results, via SwimmeR::splits_to_lap.

df_track %>% 
  splits_to_lap() %>% 
  flextable_style() %>% 
  set_caption("Track Results in Lap Format")



In Closing

SwimmeR now offers functions for regularizing split formats and they’re also applicable to track results collected with JumpeR. Continued and expanded interoperability between the two packages is a development focus going forward. Thanks for joining us here at Swimming + Data Science