Since this is our first module on downloading data, I’m going to try to emphasize the general approach to downloading from a remote data source. I’ll break it down into 4 steps:
https://greatdata.gov/bestdata/2021/raster.tif
, and over FTP it’ll be something like ftp://greatdata.gov/bestdata/2021/raster.tif
. You can probably separate that into a base URL (https://greatdata.gov/bestdata/
), plus some parameters like year (2021/
) and filename (raster.tif
). Combine them all with R’s paste0()
function, and you’re ready to go!download.file()
that you can use. You can also pass the text from R to the system shell and use a system downloader (e.g., curl
, wget
).In this module, I’ll emphasize these 4 steps for each of our SWE data sources, but in the following modules, we will jump straight into the downloading.
Snow Water Equivalent (SWE) is a common snowpack measurement. It is the amount of water contained within the snowpack. It can be thought of as the depth of water that would theoretically result if you melted the entire snowpack instantaneously.
In other words, SWE accounts for how fluffy or dense the snow is. That tends to make it a better measurement for understanding animal movements than snow depth alone. However, note that many of the data sources we cover here also have snow depth data, so you should be able to get snow depth if you prefer.
For a detailed discussion of how snowpack affects animal movement, you might be interested in Mahoney et al. (2018).
Mahoney, Peter J., et al. 2018. “Navigating snowscapes: scale‐dependent responses of mountain sheep to snowpack properties.” Ecological Applications 28(7): 1715-1729. DOI: 10.1002/eap.1773.
There are multiple data sources that have broad spatial coverage of SWE data. We will discuss these three:
Dataset | Source | Link | Citation |
---|---|---|---|
Daymet v4 | NASA | ORNL DAAC | Thornton et al. 2020 |
ERA5-Land Hourly | ECMWF | ERA5-Land | DOI: 10.24381/cds.e2161bac |
SNODAS | NOAA | NSIDC | DOI: 10.7265/N5TB14TC |
Each of these three data sources have their pros and cons.
The Daymet data covers all of North America. It models surface weather at a 1 km resolution based on weather station data. Data are available from 1980 – present. Unfortunately, the model’s treatment of SWE is not ideal. From the Daymet user manual:
Snowpack, quantified as snow water equivalent (SWE), is estimated as part of the Daymet processing in order to reduce biases in shortwave radiation estimates related to multiple reflections between the surface and atmosphere that are especially important when the surface is covered by snow (Thornton et al. 2000). The Daymet (v3.0) dataset includes estimated SWE as an output variable since this quantity may be of interest for research applications in addition to its primary intended use as a component of the Daymet shortwave radiation algorithm. An important caveat in the use of SWE from the Daymet (v3.0) dataset is that the algorithm used to estimate SWE is executed with only a single calendar year of primary surface weather inputs (daily maximum and minimum temperature and daily total precipitation) available for the estimation of a corresponding calendar year of snowpack. Since northern hemisphere snowpack accumulation is commonly underway already at the beginning of the calendar year, the SWE algorithm uses data from a single calendar year to make a two-year sequence of temperature and precipitation, then predicts the evolution of snowpack over this two-year period to provide an estimate of yearday 365 (December 31 for non-leapyears) snowpack as an initial condition for the January 1 time step of the actual calendar year. The problem with this approach is that it ignores the dependence of January 1 snowpack on preceding calendar year temperature and precipitation conditions, and so generates potential biases in mid-season snowpack which can propagate to biases in late-season timing of snow melt.
What I take away from this warning is that the SWE is calculated to adjust for the other meterological variables, but that the calculation itself is sub-optimal (because it ignores the dependence of January 1 on the preceding year’s temperature and precipitation). I’ve also seen instances in the ecological literature where someone used Daymet for temperature and precipitation but not for SWE. Ultimately, you should make your own judgement call, but note that Daymet is available at a finer scale than ERA5 and for a longer timeseries than SNODAS, so it may be the best choice for you.
The ERA5-Land Hourly dataset is meant to be a global, historical climate reconstruction. It predicts SWE (among other variables) at hourly intervals across the globe. It’s native resolution is coarse at ~9 km (technically 1° x 1° cells), so it’s very well suited to very large-scale analyses. Data are currently available from 1981 – present, but they will soon provide data from 1950 – present.
SNODAS is a modeling and data assimilation system developed by NOHRSC to provide the best possible estimates of snow cover and associated parameters to support hydrologic modeling and analysis. The aim of SNODAS is to provide a physically consistent framework to integrate snow data from satellite, airborne platforms, and ground stations with model estimates of snow cover.
Okay, so SNODAS is fancy 🤵 🍸. It was developed by NOAA’s National Snow and Ice Data Center, so it seems apparent that it should do a good job of estimating snow and ice cover. It is available for the contiguous US at a 1 km resolution. Data are available from 2004 – present.
I have seen SNODAS used in many ecology publications, and in my opinion, it is the highest quality of these three datasets. However, it also covers the shortest time window and has the most limited geographic coverage.
We’ll start by looking at how to download data the hard way. I want to go through the process here because it gives a good overview of the diversity of formats and tools by which geospatial data are made available. If you can master these skills, you can find a way to download almost any geospatial data from R. In the next section, we will cover downloading these data using an R package that I am writing, snowdl
.
The obvious place to start is on the Daymet website. From there, it’s fairly easy to find the link to “Get Data”.
From there, you can find the section “Direct Download”, with a link that will take you to Daymet THREDDS Data Server. This webpage shows the directories available to you, and you can pick one of your choice. We want the daily tile data, so we’ll click Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Version 4 (tiles). In this directory, we can see the data are organized by year. Inside a year directory, the data are organized by tile. And in each tile’s folder, you can find various NetCDF (*.nc
) rasters, including swe.nc
. Click on swe.nc
, and it takes you to a page that describes how to download it.
Okay, we’ve found our data. Let’s take a look at where we found it and see if we can figure out the pattern. A good rule of thumb if you aren’t very experienced with remote data is to look for an HTTP or HTTPS server. We can see that the THREDDS Data Server does provide us a link via an HTTP server. Here is an example:
https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2019/10090_2019/swe.nc
We can figure out (from our browsing in Step 1) that the base URL, i.e., the part of the link that all the files share in common, is everything up until the year:
https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/
The next directory in the path is just simply the year. In this case, 2019/
.
The next directory is <tile>_<year>
, so in this case (tile 10090 and year 2019), 10090_2019
.
You can figure out what tiles you need for your study area at the Daymet Tile Selection Tool.
Now we’re in the directory with our rasters, and we want to pick swe.nc
in all cases. (Of course, if you want more data from Daymet, feel free to change which file you download).
With all of this information, we can construct the URL(s) to the file(s) we need.
So say we want SWE data for Logan, UT for 2018. The tile we need is 11735. Here’s some R code to create our URL:
# Year and Tile
year <- 2018
tile <- 11735
# Base URL
base <- "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/"
daymet_swe_url <- paste0(base,
year, "/",
tile, "_", year, "/",
"swe.nc")
print(daymet_swe_url)
## [1] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2018/11735_2018/swe.nc"
What if we want all the data from Logan from 1990 – 2019? We can pass a vector of years to paste and get all of our URLs at once.
# Year and Tile
year <- 1990:2019
tile <- 11735
# Base URL
base <- "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/"
daymet_swe_urls <- paste0(base,
year, "/",
tile, "_", year, "/",
"swe.nc")
print(daymet_swe_urls)
## [1] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/1990/11735_1990/swe.nc"
## [2] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/1991/11735_1991/swe.nc"
## [3] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/1992/11735_1992/swe.nc"
## [4] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/1993/11735_1993/swe.nc"
## [5] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/1994/11735_1994/swe.nc"
## [6] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/1995/11735_1995/swe.nc"
## [7] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/1996/11735_1996/swe.nc"
## [8] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/1997/11735_1997/swe.nc"
## [9] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/1998/11735_1998/swe.nc"
## [10] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/1999/11735_1999/swe.nc"
## [11] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2000/11735_2000/swe.nc"
## [12] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2001/11735_2001/swe.nc"
## [13] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2002/11735_2002/swe.nc"
## [14] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2003/11735_2003/swe.nc"
## [15] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2004/11735_2004/swe.nc"
## [16] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2005/11735_2005/swe.nc"
## [17] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2006/11735_2006/swe.nc"
## [18] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2007/11735_2007/swe.nc"
## [19] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2008/11735_2008/swe.nc"
## [20] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2009/11735_2009/swe.nc"
## [21] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2010/11735_2010/swe.nc"
## [22] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2011/11735_2011/swe.nc"
## [23] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2012/11735_2012/swe.nc"
## [24] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2013/11735_2013/swe.nc"
## [25] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2014/11735_2014/swe.nc"
## [26] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2015/11735_2015/swe.nc"
## [27] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2016/11735_2016/swe.nc"
## [28] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2017/11735_2017/swe.nc"
## [29] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2018/11735_2018/swe.nc"
## [30] "https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840/tiles/2019/11735_2019/swe.nc"
Now that we’ve got our URLs, we’re ready to download the files. We’ll use R’s download.file()
here.
First, check the help file to understand how the function works.
?download.file
link
We can see that we need to pass it the URL as a character string, the destination file (what you want to save it as) as a character string, and possibly some other optional arguments. One of those arguments is important for us now. The argument mode
tells it what kind of file to download. Since this file stores binary data, we need to tell it that with mode = "wb"
. This is generally true of raster data. Let’s download our 2018 file for Logan.
download.file(url = daymet_swe_url,
destfile = "Logan_2018.nc",
mode = "wb")
We can load it and plot it using the raster
package to make sure it worked. The file has 365 bands, each representing a day of the year. We can tell raster
to load band = 1
to get January 1st.
library(raster)
## Loading required package: sp
daymet_rast <- raster("Logan_2018.nc", band = 1)
## Loading required namespace: ncdf4
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum unknown in Proj4 definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum unknown in Proj4 definition
plot(daymet_rast)
Choose your favorite method for iterating over many files. You could write a for()
loop or perhaps use mapply()
to download any number of tiles/years you need. In a later section, we’ll see some R functions I wrote to make this easy for you.
Turns out, there is an R package for downloading data from the European Centre for Medium-range Weather Forecasting (ecmwf) called ecmwfr
. The vignette for the package gives a good overview of how the process works.
You first need to register with ECMWF for a free account to be able to download the data through the API.
The need for us to follow the general steps goes out the window when an organization builds a dedicated API. We no longer need to figure out a URL, but rather, we need to figure out how to construct a “request” that the API can understand. All of this is detailed, including an animated GIF, in the ecmwfr
vignette.
Later, we’ll see the wrapper functions that I wrote to make this even simpler for getting SWE data, but if you want other data from ERA5 (or any other climate data from the ECMWF), check out the vignette.
Let’s begin at the SNODAS home page. We can see there is a tab called “Download Data”. Click on it.
You’ll immediately notice that there is information about an FTP site. Jackpot! 🤑
FTP data can be downloaded through a Web browser (Firefox or Edge are recommended) or command line via FTP. For help downloading data through an FTP client, go to the How to access data using an FTP client support page.
Click the link that says How to access data using an FTP client.
There is information here about how to use a command line FTP. We’re not going to do this, but this tells us how to construct our URLs.
The command line instructions give as a good idea what the base URL will be. First off, it’s an FTP site, so it will start with ftp://
. The node to connect to is sidas.colorado.edu
, so to make that a full URL, add the protocol to the front: ftp://sidas.colorado.edu
.
Don’t worry about the login part, because we’re not browsing.
Then it tells you to use the cd
command. We know what that does! It changes directory. So cd /pub/DATSETS/NOAA/XXXX
would make our new URL ftp://sidas.colorado.edu/pub/DATSETS/NOAA/XXXX
.
At this point it is probably still unclear what exactly you should put for XXXX
. So you can actually follow the directions on their page to login via FTP from your shell and explore. If you were to do that, you’d find the URL of a file you want and be able to generalize the pattern. Here’s an example I found exactly that way.
ftp://sidads.colorado.edu/DATASETS/NOAA/G02158/masked/2019/01_Jan/SNODAS_20190101.tar
Note that a *.tar
file is a “tarball”, which is just a compressed directory (like a ZIP file on Windows). R can decompress tarballs with untar()
.
Okay, now that we’ve found a file, let’s construct the URL. We can see that after the base URL, we need to navigate into folders that use parts of the date to make the folder names. R has functions to handle dates fairly easily, so let’s try to build a URL from an actual Date
object.
Let’s say we want data for February 22, 2020.
# Date
date <- as.Date("2020-02-22")
# Base URL
base_url <- "ftp://sidads.colorado.edu/DATASETS/NOAA/G02158/masked/"
# Grab date components
y <- format(date, "%Y")
m <- format(date, "%m")
mon <- format(date, "%b") #Gets the abbreviated month, e.g. "Feb"
d <- format(date, "%d")
# Filename
fn <- paste0("SNODAS_", y, m, d, ".tar")
# Construct URL
snodas_url <- paste0(base_url, y, "/", m, "_", mon, "/", fn)
print(snodas_url)
## [1] "ftp://sidads.colorado.edu/DATASETS/NOAA/G02158/masked/2020/02_Feb/SNODAS_20200222.tar"
Now that we’ve got our URL, we can pass it to download.file()
, just like before. Remember to set mode = "wb"
.
download.file(url = snodas_url,
destfile = "SNODAS_file.tar",
mode = "wb")
Since SNODAS gives us access to tarballs, we still need to decompress the tarball to get to our data. AND THEN, turns out the files inside are also compressed with gzip (*.gz
), so they need to be extracted, as well.
Linux and macOS users should be able to extract these files easily with built-in tools. Windows users need another utility. I highly recommend 7-Zip if you are on Windows and you need to unpack any sort of compressed file.
Turns out, R can also unpack these files, but I’m going to skip the details for now. I have written functions in snowdl
that will handle this for you, which we’ll look at in just a few minutes.
Now that we’ve seen the process, we can see what all of this looks like wrapped up in some nice R functions to do it for you. I wrote snowdl
to make my own life easier when I was comparing SWE data sources, but I’m hoping that you also find that it makes your life easier.
snowdl
As long as you have all the software required for this workshop, you should have no problem installing snowdl
from GitHub. You can download and install it from R with this command:
devtools::install_github("bsmity13/snowdl")
Once it’s installed, attach it to your R session the usual way:
library(snowdl)
This package is still a work in progress, so it could change significantly in the coming months. I have some already-planned changes that are laid out on GitHub, but who knows what I’ll decide as I go. When you need this for your own work, make sure to check the GitHub repo to see if there are any updates.
https://github.com/bsmity13/snowdl
One day soon there will be nice wrapper functions that do all the steps for you in one line of code, but for now, I’m planning to keep this step-by-step workflow intact.
Daymet data is quite easy to get, since it doesn’t require logging into and API or extracting compressed files. To do it, use the function get_daymet_swe()
. Check out the help file with ?get_daymet_swe
.
Here’s an example that downloads 5 years of data for 2 different tiles to a folder called “test”. Note the output directory here is a relative path, so it will create a folder called “test” inside your current working directory.
get_daymet_swe(year = 2015:2019,
tile = c(12095, 12096),
out_dir = "test")
The ERA5 data is a little trickier because it requires you to login via the API and create a “request”. Remember, see the ecwmfr
vignette for details on how to do that. snowdl
has wrapper functions to make things a little bit quicker.
The process requires 4 functions. (1) e_key()
to register your user key, (2) e_request
to build the request, and (3) get_request()
to download the data.
This example (if it had a real API key) would download data for February 22 for the years from 2015 – 2019. It also crops it to a specific bounding box.
# Register key (this one is fake)
e_key(user = "test@mail.com", key = "123", service = "cds")
# Build request list
req <- e_request(variable = "snow_depth_water_equivalent",
years = 2015:2019,
months = 2,
days = 22,
area = c(46, -112, 44, -109),
out_file = "test.nc")
# Get data
ecmwf_path <- get_request(user = "74133",
request = req)
The SNODAS workflow also requires 3 functions: (1) download_SNODAS()
to download the tarballs, unpack_SNODAS()
to extract the contents from the compressed files, and (3) rasterize_SNODAS()
to load the rasters (possibly crop) and save in a format that is easy to load into R.
This example downloads data for February 22 of 2019 and 2020, then unpacks it all and rasterizes it.
# Download
download_SNODAS(as.Date("2020-02-22"),
out_dir = "test")
download_SNODAS(as.Date("2019-02-22"),
out_dir = "test")
# Unpack
unpack_SNODAS(tar_dir = "test") #unpacks all tarballs in directory
# Write to raster
rasterize_SNODAS(extent = extent(-111, -109, 44, 46))
In this module, we covered three potential data sources for snow-water equivalent data. Each has pros and cons, and it is up to you to decide which is best for your application.
Hopefully, you have a better understanding of how to find the files you need on a webserver, how to download them with R, and how to decompress them and load them.
I also hope you find snowdl
useful next time you need SWE data.