ISRaD user manual
Alison Hoyt, Sophie von Fromm
6/03/2025
Helpful packages for this tutorial:
#Packages to install (you only need to do this once)
#For data analysis:
install.packages("tidyverse") #includes dplyr, ggplot2
install.packages("maps")
install.packages("ggmap")
#For loading the ISRaD package below:
install.packages("devtools")
install.packages("rcrossref")
#Load these packages (you need to do this every time you restart R)
library(tidyverse)
library(maps)
library(ggmap)
library(devtools)
library(rcrossref)
Working with the database in R
The ISRaD database includes:
1) ISRaD_data: The complete collection of data,
reported from publications, compiled in one place
2) ISRaD_extra: An augmented dataset, with useful
global variables and radiocarbon calculations completed for you
3) Tools: Options to compile your own dataset to
compare to ISRaD, and functions to work with the data
Two options to download ISRaD data:
Manually Download & load into R:
- Download from www.soilradiocarbon.org following the purple “Download Spreadsheet Files (.zip)” link on this page.
- Unzip the files
- Load the .rda files into R. Specify the folder.
load("C:/Users/YourPathHere/ISRaD_data_v 2.6.6.2024-01-25.rda") #Loads ISRaD_data
load("C:/Users/YourPathHere/ISRaD_data_v 2.6.6.2024-01-25.rda") #Loads ISRaD_extra
Install the data through the ISRaD R package:
- Install the ISRaD package from CRAN
- Use the ‘get_data’ function to download the latest version of the database to a local directory
# 2) Install package 'ISRaD' from CRAN:
install.packages("ISRaD")
library(ISRaD) # load the package
# 3) Load data using the get_data function:
mydir <- "C:/Users/YourPathHere/" # replace with local path to desired directory
ISRaD_extra <- ISRaD.getdata(directory = mydir, dataset = "full", extra = T, force_download = T)
# note that force_download = T will replace any previous versions of the data you may have in the directory
Some other useful packages for working with the data:
Tidyverse and dplyr (part of tidyverse) are useful packages for exploration of ISRaD data. This is a helpful cheatsheet.
We use packages dplyr
for filtering data and
ggplot
for plotting. Both packages are included in the
tidyverse
library.
Structure of ISRaD data
The ISRaD data are formated as a list in an R object called
ISRaD_data
. ISRaD_data
is a list comprised of
8 data.frames (metadata
, site
,
profile
, flux
, layer
,
interstitial
, fraction
, and
incubation
), which correspond to the template data tables.
‘ISRaD_extra’ follows the same structure. Here is a conceptual overview of
the data structure (scroll down to “Data Structure” section).
#Check to make sure the package is loaded by looking for the database object "ISRaD_data"
#View a single table (e.g. Site, Profile, Layer, etc):
#This will show you all the sites compiled in the database
view(ISRaD_extra$site)
Creating Your Own “Flat” Data Frame of Interest
Ready to start your analysis? You may want to create your own “flat” data frame with the data of interest to you. To do this you can:
- Use the ISRaD built-in flatten function and specify the table of
interest. The function
flatten
creates a data frame with all relevant layers of the hierarchy. You can create a flat data frame with data from theflux
,layer
,interstitial
,fraction
orincubation
tables. You can flatten data fromISRaD_data
,ISRaD_extra
, or your own compiled data with a similar structure.
inc_data <- ISRaD.flatten(ISRaD_extra, 'incubation')
lyr_data <- ISRaD.flatten(ISRaD_extra, 'layer')
frc_data <- ISRaD.flatten(ISRaD_extra, 'fraction')
- Or flatten the data yourself by joining the data of interest with other levels of the hierarchy. This does the same thing!
#Flatten layer data:
lyr_data <- ISRaD_extra$layer %>% #Start with layer data
left_join(ISRaD_extra$profile) %>% #Join to profile data
left_join(ISRaD_extra$site) %>% #Join to site data
left_join(ISRaD_extra$metadata) #Join to metadata
#or merge fraction data with other data up the hierarchy in an object called "frc_data"
frc_data <- ISRaD_extra$fraction %>% #Start with fraction data
left_join(ISRaD_extra$layer) %>% #Join to layer data
left_join(ISRaD_extra$profile) %>% #Join to profile data
left_join(ISRaD_extra$site) %>% #Join to site data
left_join(ISRaD_extra$metadata)
#Take a look at it:
view(frc_data)
inc_data <- ISRaD_extra$incubation %>% #Start with incubation data
left_join(ISRaD_extra$layer) %>% #Join to layer data
left_join(ISRaD_extra$profile) %>% #Join to profile data
left_join(ISRaD_extra$site) %>% #Join to site data
left_join(ISRaD_extra$metadata)
- Or join with a more limited set of information:
#Merge flux data with site information only
flx_data <- ISRaD_data$flux %>% #Start with flux data
left_join(ISRaD_data$site) #Join to site data
#Take a look at it:
View(flx_data)
Filtering the Data & Summary Statistics
How much incubation data do we have? How is it distributed? To answer this:
- Start with all inc data
- Filter by depths less than 50cm
- Group by land cover class
- Print the number of data points and the mean 14C for each class
inc_data %>%
filter(lyr_bot < 50) %>% #filters depths above 50cm.
filter(is.na(inc_14c) != TRUE) %>% #filters 14C data only
group_by(pro_land_cover) %>% #groups by land cover class
summarise(num_data_points = n(), #summarizes number of points
mean_inc_14c = mean(inc_14c, na.rm=TRUE)) #Calculates mean 14C
## # A tibble: 8 × 3
## pro_land_cover num_data_points mean_inc_14c
## <fct> <int> <dbl>
## 1 bare 8 49.1
## 2 cultivated 83 79.4
## 3 forest 892 92.3
## 4 rangeland/grassland 97 6.17
## 5 shrubland 132 51.7
## 6 tundra 348 -9.25
## 7 wetland 78 62.1
## 8 <NA> 37 -25.3
Filter & Plot:
How do the density fractions across parent material compare?
frc_data %>%
filter(frc_scheme == "density") %>%
filter(frc_property != "NA",
pro_parent_material != "NA") %>%
ggplot() +
geom_boxplot(aes(x = frc_property, y = frc_14c)) +
theme_bw()+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.text = element_text(color = "black"),
axis.title.x = element_blank()) +
facet_wrap(~pro_parent_material) #Makes one plot for each parent material
Is there any temperature dependence in the free light fraction in forests globally?
frc_data %>%
dplyr::filter(lyr_bot <= 30 & lyr_top >=0) %>% #filter for depth: 0-30cm
dplyr::filter(pro_treatment == 'control') %>% #filter for control treatment only
dplyr::filter(pro_land_cover == "forest") %>% #filter for forests only
dplyr::filter(frc_scheme == "density") %>% #filter for density fraction only
dplyr::filter(frc_property == "free light") %>% #filter for free light fraction only
dplyr::filter(lyr_obs_date_y >=1990 & lyr_obs_date_y <= 2010) %>% #filter for sampling year: 1990-2010
dplyr::filter(is.na(frc_14c) != TRUE) %>% #remove missing 14C data
ggplot() +
geom_point(aes(x = pro_MAT, y = frc_14c, color = pro_MAP)) + #plot 14C against temperature and color by precip
theme_bw() +
xlab("Mean Annual Temperature (C)") +
ylab(expression(Delta^14 *"C of Free Light Fraction")) +
ggtitle("Free Light Fractions, 0-30cm, Forests, 1990-2010") +
scale_color_gradient(low = "orange", high = "blue") +
theme(axis.text = element_text(size = 16, color = "black"),
axis.title = element_text(size = 16))
Mapping ISRaD Data
Map all ISRaD layer data, and sites with 14C layer data
First setup the basemap:
#Get world map:
world_map <- map_data("world")
#Function to plot sampling locations for the different datatypes in ISRaD
map_fun <- function(dataset){
ggplot() +
geom_map(
data = world_map, map = world_map,
aes(long, lat, map_id = region),
color = "#d9d9d9", fill = "#d9d9d9") +
geom_point(data = dataset,
aes(x = pro_long, y = pro_lat, color = data_14c),
size = 3, alpha = 0.5) +
theme_bw(base_size = 14) +
theme(rect = element_blank(),
panel.grid = element_blank(),
axis.ticks = element_line(color = "black"),
axis.text = element_text(color = "black"),
axis.line = element_line(color = "black"),
legend.position = c(0.1,0.2)) +
scale_x_continuous("", expand = c(0,0)) +
scale_y_continuous("", expand = c(0,0)) +
scale_color_manual(values = c("#a1d99b", "#4D36C6")) +
coord_cartesian()
}
Plot locations of ISRaD bulk data (layer) on the map. Blue points have bulk soil (layer) 14C data. Green points have other data, but do not have bulk soil 14C data.
# Create column that indicates if 14C was measured or not:
lyr_data_14c <- lyr_data %>%
mutate(data_14c = case_when(
is.na(lyr_14c) ~ FALSE,
!is.na(lyr_14c) ~ TRUE
))
# Map layer data
map_fun(dataset = lyr_data_14c) +
ggtitle("ISRaD - layer data")
Plot locations of ISRaD incubation on the map. Blue points have incubation 14C data. Green points have other data, but do not have incubation 14C data.
# Create column that indicates if 14C was measured or not:
inc_data_14c <- inc_data %>%
mutate(data_14c = case_when(
is.na(lyr_14c) ~ FALSE,
!is.na(lyr_14c) ~ TRUE
))
# Map layer data
map_fun(dataset = inc_data_14c) +
ggtitle("ISRaD - incubation data")
Compiling your own data offline:
Advanced users may wish to compile their own data locally in order to view it in the context of the larger database. (note: that this operation is not the same as submitting your data for ingest)
The compile
function is used QA/QC and assemble
additional datasets (that pass the QA/QC test) into an new list, which
can later be merged with ISRaD_data
. Note that it cannot be
merged with ISRaD_extra.
In order to run compile
on a set of user specified data
entries, the user must create a local folder whose path is specified
with dataset_directory
. This folder must only
contain the entries to be compiled in .xlsx format. If other files
types exist in the directory, compile
will fail. (note:
entries cannot be open in Excel)
compiled <- compile(dataset_directory = "~/Directory/to/data/", write_report = T,
write_out = T, return = "list")
The parameter return
determines format of the object
that is returned and should be set to “list” unless the user prefers a
flattend version of the database formatted as a single data.frame.
When set to “TRUE”, the parameter write_out
will trigger
the creation of several output files:
Description | Location | File name |
---|---|---|
Report files that identify issues with the files in the dataset_directory | dataset_directory/QAQC | QAQC_*.txt (* corresponds to dataset file names) |
Flattened database file | dataset_directory/database | ISRaD_flat.csv |
List structured database file in the same format as template | dataset_directory/database | ISRaD_list.xlsx |
Log file generated by compile function. Most importantly, tells you which files passed QAQC. | dataset_directory/database | ISRaD_log.txt |
Summary statistics for datasets compiled into database | dataset_directory/database | ISRaD_summary.csv |
QAQC check on compiled database. | dataset_directory/database | QAQC_ISRaD_list.txt |
Merging a user compiled list with ISRaD_data
The function mapply
can be used to merge the user
compiled list with ISRaD_data
as follows:
#set extra = F to only get ISRaD_data (templates as they have been entered)
ISRaD_data <- ISRaD.getdata(directory = mydir, dataset = "full", extra = F, force_download = T)
# or alternatively:
load("C:/Users/YourPathHere/ISRaD_data_v 2.6.6.2024-01-25.rda") #Loads ISRaD_data
merged_data <- mapply(rbind, ISRaD_data, compiled, SIMPLIFY = FALSE)
Have fun exploring the global repository of all things soil radiocarbon (ISRaD)!