Swimming + Data Science

Introducing JumpeR - For Track and Field Data

Ordinarily posts on Swimming + Data Science have focused on swimming, or sometimes diving. Today though we’re going to visit some of our more gravity-afflicted colleagues and do a bit of cross-training. That’s because following what I’m going to call the SwimmeR package’s massive success literally several people reached out to me regarding developing a similar package for track and field. That package, called JumpeR, is now available on CRAN.

Please not this post was updated to reflect changes contained in JumpeR v0.3.0 released November 2021

You can get your very own copy of this cutting edge sports-data-science package, for free, today!

install.packages("JumpeR")
library(JumpeR)
library(flextable)
library(dplyr)
library(ggplot2)

flextable_style <- function(x) {
  x %>%
    flextable() %>%
    bold(part = "header") %>% # bolds header
    bg(bg = "#D3D3D3", part = "header") %>%  # puts gray background behind the header row
    autofit()
}

What does JumpeR do?

JumpeR is very similar to SwimmeR. They both mostly serve to convert results from human readable documents to machine & human readable data frames in the context of the R programming environment.

Supported Results Format

JumpeR currently supports single column Hy-Tek results, like these, and Flash Results .pdf files like these. JumpeR does not support multi-column Hy-Tek results or Flash .html files. Further details are available in the package readme file.

Examples

A Running Race

Here’s an example, reading in the 2019 Ivy League Championships and looking at the finals of the Women’s 200M Dash

df <- tf_parse(
  read_results("http://www.leonetiming.com/2020/Indoor/IvyLeague/Results.htm")
  )

df %>% 
  filter(Event == "Women 200 Meter Dash") %>% 
  group_by(Name, Team) %>% # to remove prelims
  slice(2) %>% # to remove prelims
  arrange(Place) %>% # arrange by Place
  flextable_style()

Discus, with Flights

But wait, there’s more! Field events, like jumping and throwing, allow athletes to try several times, with each try called a “round”. Rounds can be captured as well. Here’s the Men’s Discus from the 2019 Virginia Grand Prix

df <- tf_parse(
  read_results("https://www.flashresults.com/2019_Meets/Outdoor/04-27_VirginiaGrandPrix/038-1.pdf"),
  rounds = TRUE
  )

df %>% 
  flextable_style()

Pole Vault, with Rounds and Attempts

JumpeR can even capture attempts for vertical jumping events, like in these Women’s Pole Vault results from the 2019 Texas A&M Invite. These results do get quite wide, so here they’re cut off at Flight 2.

df <- tf_parse(
  read_results("https://www.flashresults.com/2019_Meets/Outdoor/04-12_TamuInvite/014-1.pdf"),
  rounds = TRUE,
  round_attempts = TRUE
  )

df %>% 
  select(Place:Round_2_Attempts) %>% 
  flextable_style()

Pole Vault Long Format

These results do get quite wide, but don’t worry. Switching to longer is easy as with JumpeR::attempts_split_long.

df <- tf_parse(
  read_results("https://www.flashresults.com/2019_Meets/Outdoor/04-12_TamuInvite/014-1.pdf"),
  rounds = TRUE,
  round_attempts = TRUE
  )


df %>% 
  attempts_split_long() %>% 
  filter(Place == 1) %>% # only first place athlete
  select(Place, Name, Age, Team, Finals_Result, Event, Bar_Height, Attempt, Result) %>% 
  flextable_style()

Relay Athletes

Going back to those Ivy League results, we can pull out the names relay athletes for each relay.

df <- tf_parse(
  read_results("http://www.leonetiming.com/2020/Indoor/IvyLeague/Results.htm"),
  relay_athletes = TRUE
  )

df %>% 
  filter(Event == "Men 4x400 Meter Relay") %>% 
  select(-Tiebreaker, -Name) %>% 
  flextable_style()

Formating Results

Track and field results are of two forms. Times, as “MM:SS.HH”, and lengths/heights, often as “X.XXm”. JumpeR has math_format for converting these result strings into numerics, which is useful when doing comparisons and plotting. Here’s the men’s pole vault at the USA T&F 2019 Championships .

df <- tf_parse(
  read_results("https://www.flashresults.com/2019_Meets/Outdoor/07-25_USATF_CIS/026-1.pdf"))


df %>%
  mutate(Finals_Math = math_format(Finals_Result)) %>% # results to numerics
  mutate(Name = factor(Name, unique(Name))) %>% # order names by order of finish
  ggplot(aes(x = Name, y = Finals_Math)) +
  geom_col() +
  theme_bw() +
  theme(axis.text.x = element_text(
    angle = 90,
    vjust = 0.5,
    hjust = 1
  )) +
  labs(y = "Height Cleared (m)",
       title = "USA Pole Vault Championships")

One can use math_format on mixed format lists too. Times will be converted to seconds, meters will remain in meters, and standard units (feet, inches) will be converted to inches. Units however are not included, so be aware.

demo_list <- c(
  "1.23m", # a height/length in meters, output in meters
  "5-06.45", # a height/length in standard, output in inches
  "10:34.34", # a time with minutes, output in seconds
  "9.45" # a time without minutes, output in seconds
)

math_format(demo_list)
## [1]   1.23  66.45 634.34   9.45

JumpeR Going Forward

I plan to maintain JumpeR, fix bugs, and respond to feature requests as I’m able. Another useful improvement would be increasing the number/types of supported results. More contributors are certainly welcome. If you’d like to be involved get in touch, or visit the project repo on github.