I’m excited to be part of a new research group - the Conservation Aquaculture Research Team at NCEAS. Our group is interested in helping shape the future of sustainable aquaculture. Over the past couple of decades, farmed seafood has grown significantly and is likely to become the dominant source of where we get our fish as global populations increase. I wanted to take a look at this trend over time using data from the Food and Agriculture Organization (FAO). This also gave me the opportunity to play with some of the packages out there that create dynamic visualizations. I’ve used the
googlevis R packages to create interactive plots, and the
animation package to create gifs from raster data (e.g. Sea Surface Temperature over time), but I’ve never created a high-quality dynamic visualization. I know D3.js is often used to create beautiful figures, but I don’t have the time to learn D3.js (yet) and I figured there had to be a way to do something fancy in R. And of course, I was right. So this is my process of creating a dynamic visualization with the
tweenr package. ___
Install and/or load packages:
tweenr for customizing the figures.
library(ggplot2) #devtools::install_github("dgrtwo/gganimate") library(gganimate) library(tweenr) library(ggthemes) library(tidyverse)
I start with data downloaded from FAO’s website, specifically the “Total Production” dataset CSV. One problem with using this dataset to understand how seafood production from wild capture (i.e. wild-caught from the ocean) and aquaculture (farmed, not wild) has grown over time, is that this data does not differentiate between fish caught for consumption versus non-consumption (e.g. fish used for feed).
#this data identifies aquaculture vs wild capture production source <- read_csv("https://raw.githubusercontent.com/CART-sci/storymap/master/data/GlobalProuction_2017.1.1/CL_FI_PRODUCTION_SOURCE.csv") #this is the time series data for global production (does not discern b/w seafood and nonseafood) fao <- read_csv("https://raw.githubusercontent.com/CART-sci/storymap/master/data/GlobalProuction_2017.1.1/TS_FI_PRODUCTION.csv") ##species info spp <- read_csv("https://raw.githubusercontent.com/CART-sci/storymap/master/data/GlobalProuction_2017.1.1/CL_FI_SPECIES_GROUPS.csv") %>% mutate(Species = `3Alpha_Code`)
By using a second dataset on global food supply of seafood, I can calculate the amount of fish caught for consumption. Why not only use this dataset? Because this one does not differentiate between wild capture and aquaculture.
If you’re interested in the data I used, here is a screenshot of the manual query on FAOSTAT.
I am still hopeful that someone, somewhere will create an FAO R package that lets me query all of their data directly from R. In the meantime, I’ll wrangle these two datasets together as best I can.
#read in the seafood data queried from FAOSTAT and get totals per year seafood <- read_csv("https://raw.githubusercontent.com/CART-sci/storymap/master/data/FAOSTAT_data_12-21-2017.csv") %>% group_by(Year) %>% summarize(sf_tons = sum(Value))
Calculate annual capture and aquaculture production
Since no dataset from FAO has exactly what I want, I can take the seafood dataset, calculate total production (tons) per year, and then remove the total aquaculture production for each year calculated from the
fao dataset. Then I have wild capture seafood per year (from the
seafood dataset), and aquaculture production per year (from the
data <- fao %>% left_join(spp) %>% mutate(source = case_when( Source %in% c(1,2,3,5) ~ "Aquaculture", Source == 4 ~ "Wild_Capture" )) %>% filter(Major_Group != "PLANTAE AQUATICAE") %>% #removing aquatic plants mutate(source = as.factor(source)) %>% #doing this for tweenr...still don't know why we need to group_by(source, Year) %>% summarize(tons = sum(Quantity, na.rm = T)/1000000) %>% filter(Year > 1989) %>% #only interested in showing 1990 onwards spread(source, tons) %>% left_join(seafood, by = "Year") %>% mutate(Wild_for_food = (sf_tons/1000000) - Aquaculture) %>% #subtract aquaculture from seafood series to get wild capture for seafood filter(!is.na(Wild_for_food)) %>% select(Year, Aquaculture, Wild_for_food)
Forecast production values
The data only goes to 2013, but I want to include forecasted growth of these two sectors. The 2016 State of the Worlds Fisheries and Aquaculture report projects a 39% growth in Aquaculture production and just a 1% growth in Wild Capture. To make this easy, I simply used the 2013 production values and set the 2025 values to 139% and 101% of those values.
I create two new dataframes, one for all years 2014-2024 with two years full of
NA, one for Aquaculture and one for
Wild_for_food (Wild Capture). The second is a one row dataframe for the year 2025 with the
Wild_fod_food values equal to 139% and 101% of their 2013 values respectively. Using the
zoo::na.approx() function I simply do a linear interpolation of production values between 2013 and 2025.
#forecast forward to 2025 #Projected 1% of growth in wild capture by 2025 #projected 39% for aquaculture yrs <- data.frame(Year = 2014:2024, Aquaculture = NA, Wild_for_food = NA) data_2025 <- data.frame(Year = 2025, Aquaculture = 1.39*last(data$Aquaculture), Wild_for_food = 1.01*last(data$Wild_for_food)) plot_data <- data %>% rbind(yrs, data_2025) %>% mutate(Aquaculture = zoo::na.approx(.$Aquaculture), Wild_for_food = zoo::na.approx(.$Wild_for_food)) %>% gather(source, tons, Aquaculture, Wild_for_food)%>% mutate(ease = "linear", x = Year) %>% rename(y = tons, id = source, time = Year)
Notice I renamed the columns to
time. This is for use in the next section with
x column identifies what I want on the x-axis (years),
y identifies the y-axis (tons),
id identifies the different series for plotting (Aquaculture or Wild_for_food) and
time is used to tell
tween_elements() what the different time points will be for interpolation. The
ease column will tell
tween_elements() what the easing (or interpolation) function will be. In this case, I just want a linear interpolation between my datapoints.
Making the plot
Here’s how I went from creating a static
ggplot to animating with
gganimate then improving it with
static_plot <- ggplot(plot_data, aes(x = time, y = y)) + geom_line(aes(color = id), show.legend = F) + labs(x = "Year", y = "Seafood Production (million tonnes)") + theme_minimal() static_plot
You can animate a static ggplot just with the
dynam_plot <- ggplot(plot_data, aes(x = x, y = y, cumulative = TRUE, frame = time)) + geom_line(aes(color = id), show.legend = F) + labs(x = "Year", y = "Seafood Production (million tonnes)") + theme_hc() + scale_y_continuous(breaks = seq(0, 100, by = 25)) + scale_color_manual(values = c("#24757A", "#7FBAC0")) + ylim(0, 100) + theme_hc() gganimate(dynam_plot, filename = "fao_gganimate.gif", title_frame = F)
Smooth animation with
To make the animation smoother, I’m using the tweenr package. Specifically, the
tween_elements() function creates a new dataframe with interpolated points between your datapoints (called “tweens” !) allowing
gganimate to plot all these points one after the other, resulting in a smooth dynamic visualization. The
nframes argument allows you to set how many total timepoints you want. I played around with this and chose 100 because I thought it gave the right speed. The more
nframes, the more points to plot and thus the slower the visualization. I suggest just playing with this argument until the animation looks right to you.
After creating the new dataframe with
tween_elements() you use
gganimate to create the final animation.
data_tween <- plot_data %>% tween_elements(., "time", "id", "ease", nframes = 100) %>% #using tweenr! mutate(year = round(time), id = .group) %>% left_join(plot_data) tween_plot <- ggplot(data_tween, aes(x = x, y = y, frame = .frame, color = id)) + geom_path(aes(group = id, cumulative = T), size = 1, show.legend = F) + xlab("") + ylab("Seafood Production (million tonnes)") + scale_y_continuous(breaks = seq(0, 100, by = 25)) + scale_color_manual(values = c("#24757A", "#7FBAC0")) + ylim(0, 100) + theme_hc() + theme(axis.title.y = element_text(size=14), axis.text.y = element_text( size=12), axis.text.x = element_text(size = 12)) + annotate(geom = "text", x = 1994, y = 29, label = "Aquaculture", cex = 6, angle = 22, fontface = "bold", color = "#24757A") + annotate(geom = "text", x = 1994, y = 70, label = "Wild Capture", cex = 6, fontface = "bold", color = "#7FBAC0") + geom_segment(aes(x=2015,xend=2020, y=88, yend=88),arrow=arrow(length=unit(0.2,"cm")),show.legend=F, color = "darkgray") + annotate(geom = "text", x = 2013, y = 88, label = "Estimated \nfuture growth", color= "darkgray", fontface = "bold", cex = 4.5, angle = 90) + geom_vline(xintercept=c(2015), linetype="dotted"); tween_plot gganimate(tween_plot, title_frame = FALSE, interval = 0.05)
I spent a lot of time messing with the text sizes, colors, label placement, etc. I also learned how to add an arrow (thanks,
geom_segment()). While I could keep tweaking this forever, I think it’s good enough as is. The whole process of learning how to use
tweenr took me just a couple hours and I’m excited for the next opportunity to make something like this!