Using gganimate and tweenr to animate a tSNE plot

{All Code} {Session Info} {Updates}

I recently posted a GIF of our new transcriptomics clustering in the form of a tSNE plot, which appears to resolve from a cloud of random points into a nice, orderly structure. Below, I’ll show how this was built with the help of gganimate, tweenr, and of course ggplot2 in R.

The main gist of the process is that we need to set up several states for the points to shift between, which the tweenr package will interpolate to generate all of the positions in each frame, and then the gganimate package (along with ImageMagick if you’d like to make a GIF) can be used to build an animation.

To get started, you’ll need to download gganimate and tweenr from Github using the devtools package:

if (!require(devtools)) {
    install.packages("devtools")
}
if (!require(gganimate)) {
  devtools::install_github("dgrtwo/gganimate")
}
if (!require(tweenr)) {
  devtools::install_github("thomasp85/tweenr")
}

In addition to these packages, I’ll also use dplyr for data manipulation, ggplot2 for plotting, and data.table to quickly read the data from .csv.

library(data.table)
library(dplyr)
library(ggplot2)
library(gganimate)
library(tweenr)

In this demo, I’ll be starting with tSNE coordinates and cluster labels for each point that we’ve generated previously. These can be retrieved from Github:

tsne_dims <- fread("https://github.com/hypercompetent/plottery_barn_data/raw/master/gganimate-tweenr-tsne-plot/tsne_dims.csv")

head(tsne_dims)
##             sample_id cluster_id cluster_color    tsne_x     tsne_y
## 1: LS-14690_S02_E1-50         35       #994C00  25.07267 -2.7706899
## 2: LS-14690_S03_E1-50         56       #00979D -39.18723 20.9199491
## 3: LS-14690_S05_E1-50         56       #00979D -44.35434 12.0985906
## 4: LS-14690_S06_E1-50         56       #00979D -42.82413 21.0532487
## 5: LS-14690_S07_E1-50         56       #00979D -42.08457  0.6795734
## 6: LS-14690_S08_E1-50         56       #00979D -41.15694 13.0337231
ggplot() +
  geom_point(data = tsne_dims,
             aes(x = tsne_x,
                 y = tsne_y,
                 color = cluster_color)) +
  scale_color_identity() +
  theme_void()

At the beginning of the animation, I’d like all of the points to start in a random position, and come into focus in their tSNE positions. This function will generate random positions for num_points distributed over the surface of a disc with the given disc_radius.

random_disc_jitter <- function(num_points,
                               disc_radius,
                               random_seed = 42) {
  
  if(!is.null(random_seed)) {
    set.seed(random_seed)
  }
  
  # Random radius positions
  r <- runif(num_points, 0, disc_radius ^ 2)
  # Random angles
  t <- runif(num_points, 0, 2 * pi)
  
  # Convert radius and angles to cartesian coordinates
  data.frame(x = sqrt(r) * cos(t),
             y = sqrt(r) * sin(t))
}

I’ll use dplyr’s mutate() to add this to the tsne_dims data:

# Get the max x or y value from the tsne dims.
# I'll use this for the radius
max_tsne <- max(abs(c(tsne_dims$tsne_x, tsne_dims$tsne_y)))

tsne_dims <- tsne_dims %>%
  mutate(random_x = random_disc_jitter(num_points = n(),
                                       disc_radius = max_tsne)$x,
         random_y = random_disc_jitter(num_points = n(),
                                       disc_radius = max_tsne)$y)

head(tsne_dims)
##            sample_id cluster_id cluster_color    tsne_x     tsne_y
## 1 LS-14690_S02_E1-50         35       #994C00  25.07267 -2.7706899
## 2 LS-14690_S03_E1-50         56       #00979D -39.18723 20.9199491
## 3 LS-14690_S05_E1-50         56       #00979D -44.35434 12.0985906
## 4 LS-14690_S06_E1-50         56       #00979D -42.82413 21.0532487
## 5 LS-14690_S07_E1-50         56       #00979D -42.08457  0.6795734
## 6 LS-14690_S08_E1-50         56       #00979D -41.15694 13.0337231
##    random_x   random_y
## 1  52.58733  10.462040
## 2 -54.22300  -2.175301
## 3 -24.48068  17.318257
## 4  32.24114  39.626810
## 5 -16.53971 -41.751597
## 6  31.92132  24.745645
ggplot() +
  geom_point(data = tsne_dims,
             aes(x = random_x,
                 y = random_y,
                 color = cluster_color)) +
  scale_color_identity() +
  theme_void()

For the final state in the animation, I’d like the points to gather around their cluster centroids. A nice way to densely pack points is with Fermat’s spiral. Thanks to a StackOverflow reply by user Manish Nag, I was able to write a function to arrange points along this spiral.

This function will generate coordinates for num_points around a central position (center_x and center_y), and return x and y coordinates for spiral positioning in a circle with radius supplied by the size argument. I’ll use this to generate positions for the third phase of the plot, which gathers the points around the centroid for each cluster.

fermat_jitter <- function(num_points, 
                          size, 
                          center_x, 
                          center_y) {
  
  golden_ratio <- (sqrt(5) + 1) / 2
  fibonacci_angle <- 360 / (golden_ratio ^ 2)
  
  ci <- sqrt(size / num_points)
  
  x <- rep(center_x, num_points)
  y <- rep(center_y, num_points)
  
  
  for (m in 1:(num_points - 1)) {
    n <- m - 1
    r <- ci * sqrt(n)
    theta <- fibonacci_angle * (n)
    x[n] <- center_x + r * cos(theta)
    y[n] <- center_y + r * sin(theta)
    
  }
  
  data.frame(x = x, 
             y = y)
  
}

To generate these for the plot, I’ll use dplyr’s group_by() function to find the centroid of each cluster in tSNE coordinates, then jitter around the centroids using the fermat_jitter() function.

# Some trial and error for this value to find a size that packs the points without getting
# too much overlap
max_size <- 16
# We'll also need the max cluster n to scale the jitter radius
max_n <- max(table(tsne_dims$cluster_id))

tsne_dims <- tsne_dims %>%
  group_by(cluster_id) %>%
  mutate(centroid_x = mean(tsne_x, trim = 0.5),
         centroid_y = mean(tsne_y, trim = 0.5)) %>%
  mutate(fermat_x = fermat_jitter(num_points = n(),
                                  size = max_size * n() / max_n,
                                  center_x = centroid_x[1],
                                  center_y = centroid_y[1])$x,
         fermat_y = fermat_jitter(num_points = n(),
                                  size = max_size * n() / max_n,
                                  center_x = centroid_x[1],
                                  center_y = centroid_y[1])$y) %>%
  ungroup()

head(tsne_dims)
## # A tibble: 6 x 11
##   sampl~ clust~ clus~ tsne~  tsne_y rand~ rando~ cent~ centr~ ferm~ ferma~
##   <chr>   <int> <chr> <dbl>   <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>  <dbl>
## 1 LS-14~     35 #994~  25.1 - 2.77   52.6  10.5   25.5 - 4.84  25.6 - 4.91
## 2 LS-14~     56 #009~ -39.2  20.9   -54.2 - 2.18 -42.7  12.4  -42.6  12.3 
## 3 LS-14~     56 #009~ -44.4  12.1   -24.5  17.3  -42.7  12.4  -42.7  12.2 
## 4 LS-14~     56 #009~ -42.8  21.1    32.2  39.6  -42.7  12.4  -42.8  12.2 
## 5 LS-14~     56 #009~ -42.1   0.680 -16.5 -41.8  -42.7  12.4  -42.9  12.3 
## 6 LS-14~     56 #009~ -41.2  13.0    31.9  24.7  -42.7  12.4  -42.9  12.5
ggplot() +
  geom_point(data = tsne_dims,
             aes(x = fermat_x,
                 y = fermat_y,
                 color = cluster_color)) +
  scale_color_identity() +
  theme_void()

For gganimate and tweenr, I’ll need these to be separate data frames with matching column names.

state_1 <- data.frame(x = tsne_dims$random_x,
                      y = tsne_dims$random_y,
                      color = tsne_dims$cluster_color)

state_2 <- data.frame(x = tsne_dims$tsne_x,
                      y = tsne_dims$tsne_y,
                      color = tsne_dims$cluster_color)

state_3 <- data.frame(x = tsne_dims$fermat_x,
                      y = tsne_dims$fermat_y,
                      color = tsne_dims$cluster_color)

Now, tweenr can be used for interpolation. To get a round-trip animation, I’ll build a list of states that goes from 1 -> 2 -> 3 -> 2 -> 1. The “cubic-in-out” option for the ease parameter applies some acceleration to the points at the beginning and end of each transition, which makes it look smoother.

state_list <- list(state_1, state_2, state_3, state_2, state_1)

tweened_states <- tween_states(state_list, 
                               tweenlength = 2, 
                               statelength = 1,
                               ease = "cubic-in-out",
                               nframes = 96)

Then, the plot can be assembled with the frame aesthetic, which will be used by gganimate to build the animation.

tweened_plot <- ggplot(data = tweened_states, 
                       aes(x = x, 
                           y = y)) +
  geom_point(aes(frame = .frame, 
                 color = color), 
             size = 1.5) +
  scale_color_identity() +
  theme_void()
## Warning: Ignoring unknown aesthetics: frame

Finally, gganimate can be used to build a GIF of the animation.

Side note: In order to get GIF output (at least on Windows, which is where I’m using this), you’ll need ImageMagick. The easiest way I’ve found to get this to work is to get the portable version of ImageMagick. I installed this at C:/ImageMagick/, which I need to add to the R PATH to get gganimate to work.

Sys.setenv(PATH = paste("C:/ImageMagick/",
                        Sys.getenv("PATH"), 
                        sep = ";"))

gganimate(tweened_plot, 
          "tsne_to_centroids_twitter.gif", 
          ani.width = 506, 
          ani.height = 506, 
          interval = 1/8, 
          title_frame = F)

And that’s it! Note that the plot can be made smoother and clearer by making a larger version (increase nframes in tween_states(), and increase ani.width and ani.height in gganimate()) at the expense of more time to generate the additional frames, and a larger final file size.

All Code

Below is all of the code together in one section.

library(dplyr)
library(ggplot2)
library(gganimate)
library(tweenr)
library(data.table)

## tSNE dimensions
tsne_dims <- fread("https://github.com/hypercompetent/plottery_barn_data/raw/master/gganimate-tweenr-tsne-plot/tsne_dims.csv")

head(tsne_dims)

ggplot() +
  geom_point(data = tsne_dims,
             aes(x = tsne_x,
                 y = tsne_y,
                 color = cluster_color)) +
  scale_color_identity() +
  theme_void()

## Random dimensions
random_disc_jitter <- function(num_points,
                               disc_radius,
                               random_seed = 42) {
  
  if(!is.null(random_seed)) {
    set.seed(random_seed)
  }
  
  # Random radius positions
  r <- runif(num_points, 0, disc_radius ^ 2)
  # Random angles
  t <- runif(num_points, 0, 2 * pi)
  
  # Convert radius and angles to cartesian coordinates
  data.frame(x = sqrt(r) * cos(t),
             y = sqrt(r) * sin(t))
}

# Get the max x or y value from the tsne dims.
# I'll use this for the radius
max_tsne <- max(abs(c(tsne_dims$tsne_x, tsne_dims$tsne_y)))

tsne_dims <- tsne_dims %>%
  mutate(random_x = random_disc_jitter(num_points = n(),
                                       disc_radius = max_tsne)$x,
         random_y = random_disc_jitter(num_points = n(),
                                       disc_radius = max_tsne)$y)

head(tsne_dims)

ggplot() +
  geom_point(data = tsne_dims,
             aes(x = random_x,
                 y = random_y,
                 color = cluster_color)) +
  scale_color_identity() +
  theme_void()

## Fermat spiral dimensions
fermat_jitter <- function(num_points, 
                          size, 
                          center_x, 
                          center_y) {
  
  golden_ratio <- (sqrt(5) + 1) / 2
  fibonacci_angle <- 360 / (golden_ratio ^ 2)
  
  circle_r <- sqrt(size / num_points)
  
  x <- rep(center_x, num_points)
  y <- rep(center_y, num_points)
  
  
  for (m in 1:(num_points - 1)) {
    n <- m - 1
    r <- circle_r * sqrt(n)
    theta <- fibonacci_angle * (n)
    x[n] <- center_x + r * cos(theta)
    y[n] <- center_y + r * sin(theta)
    
  }
  
  data.frame(x = x, 
             y = y)
  
}

# Some trial and error for this value to find a size that packs the points without getting
# too much overlap
max_size <- 16
# We'll also need the max cluster n to scale the jitter radius
max_n <- max(table(tsne_dims$cluster_id))

tsne_dims <- tsne_dims %>%
  group_by(cluster_id) %>%
  mutate(centroid_x = mean(tsne_x, trim = 0.5),
         centroid_y = mean(tsne_y, trim = 0.5)) %>%
  mutate(fermat_x = fermat_jitter(num_points = n(),
                                  size = max_size * n() / max_n,
                                  center_x = centroid_x[1],
                                  center_y = centroid_y[1])$x,
         fermat_y = fermat_jitter(num_points = n(),
                                  size = max_size * n() / max_n,
                                  center_x = centroid_x[1],
                                  center_y = centroid_y[1])$y) %>%
  ungroup()

head(tsne_dims)

ggplot() +
  geom_point(data = tsne_dims,
             aes(x = fermat_x,
                 y = fermat_y,
                 color = cluster_color)) +
  scale_color_identity() +
  theme_void()

## Build the animation
state_1 <- data.frame(x = tsne_dims$random_x,
                      y = tsne_dims$random_y,
                      color = tsne_dims$cluster_color)

state_2 <- data.frame(x = tsne_dims$tsne_x,
                      y = tsne_dims$tsne_y,
                      color = tsne_dims$cluster_color)

state_3 <- data.frame(x = tsne_dims$fermat_x,
                      y = tsne_dims$fermat_y,
                      color = tsne_dims$cluster_color)

state_list <- list(state_1, state_2, state_3, state_2, state_1)

tweened_states <- tween_states(state_list, 
                               tweenlength = 2, 
                               statelength = 1,
                               ease = "cubic-in-out",
                               nframes = 96)

tweened_plot <- ggplot(data = tweened_states, 
                       aes(x = x, 
                           y = y)) +
  geom_point(aes(frame = .frame, 
                 color = color), 
             size = 1) +
  scale_color_identity() +
  theme_void()

# May need to set the path

#Sys.setenv(PATH = paste("C:/ImageMagick/",
#                        Sys.getenv("PATH"), 
#                        sep = ";"))

gganimate(tweened_plot, 
          "tsne_to_centroids.gif", 
          ani.width = 300, 
          ani.height = 300, 
          interval = 1/8, 
          title_frame = F)
Session Info
sessionInfo()
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 16299)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] methods   stats     graphics  grDevices utils     datasets  base     
## 
## other attached packages:
## [1] bindrcpp_0.2         tweenr_0.1.5.9999    gganimate_0.1.0.9000
## [4] ggplot2_2.2.1        dplyr_0.7.4          data.table_1.10.4-3 
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.14     pillar_1.0.1     compiler_3.4.3   plyr_1.8.4      
##  [5] bindr_0.1        tools_3.4.3      digest_0.6.13    evaluate_0.10.1 
##  [9] tibble_1.4.1     gtable_0.2.0     pkgconfig_2.0.1  rlang_0.1.6     
## [13] cli_1.0.0        curl_3.1         yaml_2.1.16      blogdown_0.4    
## [17] stringr_1.2.0    knitr_1.18       rprojroot_1.3-1  grid_3.4.3      
## [21] glue_1.2.0       R6_2.2.2         rmarkdown_1.8    bookdown_0.5    
## [25] magrittr_1.5     backports_1.1.2  scales_0.5.0     htmltools_0.3.6 
## [29] assertthat_0.2.0 colorspace_1.3-2 labeling_0.3     utf8_1.1.2      
## [33] stringi_1.1.6    lazyeval_0.2.1   munsell_0.4.3    crayon_1.3.4
Updates

2017-01-07: Added section links to the top of the page.

2017-12-31: First version