Soil Data Access (SDA2) Tutorial

2016-01-29
Dylan Beaudette

Introduction

This is a short tutorial on how to interact with the Soil Data Access (SDA) web-service using R. Queries are written using a dialect of SQL. On first glance SQL appears similar to the language used to write NASIS queries and reports, however, these are two distinct languages. Soil Data Access is a "window" into the spatial and tabular data associated with the current SSURGO snapshot. Queries can contain spatial and tabular filters. If you are new to SDA or SQL, have a look at this page.

Spatial queries can be included in SQL statements submitted to SDA as long as the geometry has first been transformed to WGS84 geographic (or psuedo-Web Mercator) coordinates and formatted as "well known text" (WKT). The sp and rgdal packages provide functionality for converting between coordinate systems via spTransform(). Coordinate reference system definitions (a "CRS") are typically provided using proj4 notation. You can search for various CRS definitions in a variety of formats using spatialreference.org/.

The soilDB library for R provides a helper function (SDA_query()) for submitting queries to SDA, processing the result, and reformatting the results into a rectangular table (a data.frame). Most of the work required to use the SDA_query() function will be writing SQL to describe the columns you would like returned and how the data should be filtered and possibly grouped.

Follow along with the blocks of code below by copying / pasting into a new R "script" document. Each block of command can be run by pasting into the R console, or by "stepping through" lines of code by moving the cursor to the top of a block (in the R script panel) and repeatedly pressing ctrl + enter.

If you are feeling adventurous, have a look at a draft tutorial on queries that return geometry from SDA. Additional tips on advanced spatial queries can be found here.

Install Required R Packages

You only need to do this once. If you haven't installed these packages, then copy the code below and paste into the RStudio "console" pane.

# run these commands in the R console
# stable version from CRAN + dependencies
install.packages("httr", dep=TRUE)
install.packages("soilDB", dep=TRUE)
install.packages("rgdal", dep = TRUE)
install.packages("raster", dep = TRUE)
install.packages("rgeos", dep = TRUE)
# latest versions from r-forge
install.packages("soilDB", repos = "http://R-Forge.R-project.org", type = "source")

Simple Queries

Now that you have the required packages, load them into the current R session.

library(soilDB)
library(sp)
library(rgdal)
library(plyr)
library(raster)
library(rgeos)

When was the CA653 survey area last exported to SSURGO?

SDA_query("SELECT areasymbol, saverest FROM sacatalog WHERE areasymbol = 'CA653'")
##   areasymbol             saverest
## 1      CA653 10/1/2015 1:41:01 PM

Are there any survey areas that haven't been updated since Jan 1, 2010?

SDA_query("SELECT areasymbol, saverest FROM sacatalog WHERE saverest < '01/01/2010'")
##   areasymbol              saverest
## 1    MXNL001 11/27/2009 9:36:38 AM
## 2         US   7/6/2006 8:49:17 AM

What is the most recently updates survey in CA?

SDA_query("SELECT areasymbol, saverest FROM sacatalog WHERE areasymbol LIKE 'CA%' ORDER BY saverest DESC")[1, ]
##   areasymbol             saverest
## 1      CA653 10/1/2015 1:41:01 PM

A simple query from the component table, for a single map unit: mukey = '461958'.

q <- "SELECT 
mukey, cokey, comppct_r, compname, taxclname
FROM component
WHERE mukey = '461958'"

# run the query
res <- SDA_query(q)

# check
head(res)
##    mukey    cokey comppct_r    compname                                 taxclname
## 1 461958 12008609        85 San Joaquin Fine, mixed, thermic Abruptic Durixeralfs
## 2 461958 12008610         4        Galt                                      <NA>
## 3 461958 12008611         4     Bruella                                      <NA>
## 4 461958 12008612         3       Hedge                                      <NA>
## 5 461958 12008613         3     Kimball                                      <NA>
## 6 461958 12008614         1     Unnamed                                      <NA>

Get a list of map units that contain "Amador" as minor component.

q <- "SELECT 
muname, mapunit.mukey, cokey, compname, comppct_r
FROM mapunit INNER JOIN component on mapunit.mukey = component.mukey
WHERE compname LIKE '%amador%'
AND majcompflag = 'No'"

# run the query
res <- SDA_query(q)

# check
head(res)
##                                                                    muname  mukey    cokey compname
## 1                        Whiterock rocky silt loam, 3 to 8 percent slopes 463199 12757553   Amador
## 2                Whiterock rocky silt loam, 3 to 8 percent slopes, eroded 463200 12757558   Amador
## 3                       Whiterock rocky silt loam, 8 to 30 percent slopes 463201 12757563   Amador
## 4               Whiterock rocky silt loam, 8 to 30 percent slopes, eroded 463202 12757568   Amador
## 5 Hicksville sandy clay loam, 0 to 2 percent slopes, occasionally flooded 461904 12008311   Amador
## 6                       Pardee-Ranchoseco complex, 3 to 15 percent slopes 461931 12008455   Amador
##   comppct_r
## 1         5
## 2         5
## 3         5
## 4         5
## 5         3
## 6         3
# optionally save the results to CSV file
# write.csv(res, file='path-to-file.csv', row.names=FALSE)

Get basic map unit and component data for a single survey area, Yolo County (ca113).

q <- "SELECT 
component.mukey, cokey, comppct_r, compname, taxclname, 
taxorder, taxsuborder, taxgrtgroup, taxsubgrp
FROM legend
INNER JOIN mapunit ON mapunit.lkey = legend.lkey
LEFT OUTER JOIN component ON component.mukey = mapunit.mukey
WHERE legend.areasymbol = 'ca113'"

# run the query
res <- SDA_query(q)

# check
head(res)
##    mukey    cokey comppct_r compname                                                      taxclname
## 1 757748 12207591        80 Scribner    Fine-loamy, mixed, superactive, thermic Cumulic Endoaquolls
## 2 757748 12207592        10     Vina  Coarse-loamy, mixed, superactive, thermic Pachic Haploxerolls
## 3 757748 12207593         8 Corbiere   Fine, mixed, superactive, thermic Cumulic Vertic Endoaquolls
## 4 757748 12207594         2  Unnamed                                                           <NA>
## 5 757749 12207595         5 Hustabel Coarse-loamy, mixed, superactive, thermic Cumulic Haploxerolls
## 6 757749 12207596        80  Westfan    Fine-loamy, mixed, superactive, thermic Pachic Haploxerolls
##    taxorder taxsuborder  taxgrtgroup                  taxsubgrp
## 1 Mollisols     Aquolls  Endoaquolls        Cumulic Endoaquolls
## 2 Mollisols     Xerolls Haploxerolls        Pachic Haploxerolls
## 3 Mollisols     Aquolls  Endoaquolls Cumulic Vertic Endoaquolls
## 4      <NA>        <NA>         <NA>                       <NA>
## 5 Mollisols     Xerolls Haploxerolls       Cumulic Haploxerolls
## 6 Mollisols     Xerolls Haploxerolls        Pachic Haploxerolls

Cross tabulate the occurrence of components named "Auburn" and "Dunstone" with survey areasymbol.

q <- "SELECT areasymbol, component.mukey, cokey, comppct_r, compname, compkind, taxclname
FROM legend
INNER JOIN mapunit ON mapunit.lkey = legend.lkey
LEFT OUTER JOIN component ON component.mukey = mapunit.mukey
WHERE compname IN ('Auburn', 'Dunstone')
AND areasymbol != 'US'
ORDER BY areasymbol, compname"

res <- SDA_query(q)

xtabs(~ areasymbol + compname, data=res)
##           compname
## areasymbol Auburn Dunstone
##      CA067      9        0
##      CA607     21        0
##      CA612      8       19
##      CA618     31        1
##      CA619     25        1
##      CA620     14        0
##      CA624     24        0
##      CA628     20        0
##      CA632      4        0
##      CA644     13        0
##      CA648      6        0
##      CA649     21        0
##      CA707     11        0
##      CA731      5        0
##      CA750      1        0

Queries Using Simple Spatial Filters

Get the map unit key and name at a single, manually-defined point (-121.77100 37.368402).

q <- "SELECT mukey, muname
FROM mapunit
WHERE mukey IN (
SELECT * from SDA_Get_Mukey_from_intersection_with_WktWgs84('point(-121.77100 37.368402)')
)"

SDA_query(q)
##     mukey                                        muname
## 1 1882921 Diablo clay, 15 to 30 percent slopes, MLRA 15

Get the map names and mukey values for a 1000m buffer around a manually-defined point (-121.77100 37.368402). A 1000m buffer applied to geographic coordinates will require several spatial transformations. First, the query point needs to be initialized in a geographic coordinate system with WGS84 datum. Next, the point is transformed to a planar coordinate system with units in meters; the NLCD coordinate reference system works well for points within the continental US. After computing a buffer in planar coordinates, the resulting polygon is converted back to geographic coordinates--this is what SDA is expecting.

# the query point is in geographic coordinates with WGS84 datum
p <- SpatialPoints(cbind(-121.77100, 37.368402), proj4string = CRS('+proj=longlat +datum=WGS84'))
# transform to planar coordinate system for buffering
p.aea <- spTransform(p, CRS('+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs '))
# create 1000 meter buffer
p.aea <- gBuffer(p.aea, width = 1000)
# transform back to WGS84 GCS
p.buff <- spTransform(p.aea, CRS('+proj=longlat +datum=WGS84'))
# convert to WKT
p.wkt <- writeWKT(p.buff)

q <- paste0("SELECT mukey, muname
FROM mapunit
WHERE mukey IN (
SELECT * from SDA_Get_Mukey_from_intersection_with_WktWgs84('", p.wkt, "')
)")

res <- SDA_query(q)
head(res)
##     mukey                                             muname
## 1  456983                Diablo clay, 9 to 15 percent slopes
## 2  456993              Gaviota loam, 15 to 30 percent slopes
## 3  457017 Los Gatos-Gaviota complex, 50 to 75 percent slopes
## 4 1882920               Diablo clay, 30 to 50 percent slopes
## 5 1882921      Diablo clay, 15 to 30 percent slopes, MLRA 15
## 6 1882923      Alo-Altamont complex, 15 to 30 percent slopes

It is possible to download small collections of SSURGO map unit polygons from SDA using a bounding-box in WGS84 geographic coordinates. SDA will return polygons and their map unit keys that overlap with the BBOX query.

# extract bounding-box from out last point
# coordinates are in WGS84 GCS
b <- as.vector(bbox(p.buff))
# download map unit polygons that overlap with bbox
p.mu.polys <- mapunit_geom_by_ll_bbox(b)

Graphical description of the previous steps: query point, 1000m buffer, buffer bounding box (BBOX), intersecting map unit polygons, and overlapping polygons.

# plot
par(mar=c(0,0,0,0))
plot(p.mu.polys)
plot(p.mu.polys[which(p.mu.polys$mukey %in% setdiff(p.mu.polys$mukey, res$mukey)), ], add=TRUE, col='grey')
lines(p.buff, col='red', lwd=2)
plot(extent(bbox(p.buff)), add=TRUE, col='RoyalBlue')
points(p, col='orange', pch=15)
legend('bottomleft', legend=c('query point', '1000m buffer', 'buffer BBOX', 'intersected polygons', 'overlapping polygons'), col=c('orange', 'red', 'royalblue', 'black', 'grey'), lwd=c(NA, 2, 2, 2, 2), pch=c(15, NA, NA, NA, NA))
box()

Get some component data for a manually-defined bounding box, defined in WGS84 geographic coordinates.

# define a bounding box: xmin, xmax, ymin, ymax
#
#         +-------------------(ymax, xmax)
#         |                        |
#         |                        |
#     (ymin, xmin) ----------------+

b <- c(-120.9, -120.8, 37.7, 37.8)
# convert bounding box to WKT
p <- writeWKT(as(extent(b), 'SpatialPolygons'))
# compose query, using WKT BBOX as filtering criteria
q <- paste0("SELECT mukey, cokey, compname, comppct_r
            FROM component 
            WHERE mukey IN (SELECT DISTINCT mukey FROM SDA_Get_Mukey_from_intersection_with_WktWgs84('", p, "') )
            ORDER BY mukey, cokey, comppct_r DESC")

res <- SDA_query(q)

# check
head(res)
##    mukey    cokey    compname comppct_r
## 1 462527 11158457      Madera        10
## 2 462527 11158458       Alamo        85
## 3 462527 11158459 San Joaquin         5
## 4 462541 11158510     Chualar        85
## 5 462541 11158511     Oakdale         5
## 6 462541 11158512     Modesto         5

Queries Using Complex Spatial Filters

The examples above illustrate how to perform SDA queries using a single spatial filter. Typically we need to perform these queries over a collection of points, lines or polygons. The soilDB package provides some helper functions for iterating over a collection of features (usually points). Note that spatial queries for more than 1000 features should probably be done using a local copy of the map unit polygons.

The first function SDA_make_spatial_query() will convert a single Spatial* (Points, Lines, Polygons) object to WKT and submit a spatial query to SDA, returning intersecting map unit keys and names. Let's try it using a SpatialPoints object with 1 feature.

# the query point is in geographic coordinates with WGS84 datum
p <- SpatialPoints(cbind(-121.77100, 37.368402), proj4string = CRS('+proj=longlat +datum=WGS84'))
SDA_make_spatial_query(p)

The second function SDA_query_features() will iterate over the features of a Spatial* (Points, Lines, Polygons) object, send a query to SDA, and return a set of the results as a data.frame object. This time, our example set of 2 points will also have some fake pedons IDs.

# the query points are in geographic coordinates with WGS84 datum
p <- SpatialPointsDataFrame(cbind(c(-121, -122), c(37, 37)), data=data.frame(pedon_id=1:2), proj4string = CRS('+proj=longlat +datum=WGS84'))
SDA_query_features(p, id='pedon_id')

Let's apply the SDA_query_features() function to some real data; KSSL pedons correlated to "Auburn". Not all of these pedons have coordinates, so we will have to do some filtering first. See the in-line comments for details on each line of code.

# get KSSL pedons with taxonname = Auburn
# coordinates will be NAD83 GCS
auburn <- fetchKSSL('auburn')
# keep only those pedons with valid coordinates
auburn <- subsetProfiles(auburn, s='!is.na(x) & !is.na(y)')
# init spatial information
coordinates(auburn) <- ~ x + y
# define projection
proj4string(auburn) <- '+proj=longlat +datum=NAD83'

# extract "site data"
auburn.sp <- as(auburn, 'SpatialPointsDataFrame')
# perform SDA query on each "site", result is a data.frame
mu.data <- SDA_query_features(auburn.sp, id='pedlabsampnum')

# join results to original SoilProfileCollection using 'pedlabsampnum'
site(auburn) <- mu.data

Check the results and plot the "Auburn" KSSL pedons, grouped by intersecting map unit key and coloring horizons according to clay content.

# check results
print(mu.data)
##    pedlabsampnum   mukey                                                       muname
## 1        40A3004  461845             Amador-Gillender complex, 2 to 15 percent slopes
## 2        40A3005  461922              Mokelumne gravelly loam, 2 to 15 percent slopes
## 3        83P0801  461854 Auburn-Argonaut-Rock outcrop complex, 8 to 30 percent slopes
## 4        84P0879  460384               Auburn-Sobrante complex, 3 to 8 percent slopes
## 5        91P0411  460408    Auburn-Timbuctoo-Argonaut complex, 8 to 15 percent slopes
## 6        91P0414  460408    Auburn-Timbuctoo-Argonaut complex, 8 to 15 percent slopes
## 7        01N0262  461425    Dunstone-loafercreek complex, dry, 1 to 15 percent slopes
## 8        05N0395 1403441                     Auburn silt loam, 5 to 15 percent slopes
## 9     UCD6445143  459938             Auburn clay loam, 8 to 30 percent slopes, eroded
## 10    UCD6505005 2600526                                    No Digital Data Available
## 11    UCD6604002  461423                Dunstone-Loafercreek , 2 to 15 percent slopes
## 12    UCD6605008 2600526                                    No Digital Data Available
## 13    UCD6605014 2600526                                    No Digital Data Available
## 14    UCD6705021 2600526                                    No Digital Data Available
## 15    UCD6705022 2600526                                    No Digital Data Available
## 16    UCD7355010 2600526                                    No Digital Data Available
## 17    UCD7455019 2600526                                    No Digital Data Available
## 18    UCD7455021 2600526                                    No Digital Data Available
## 19    UCD8005083 2600526                                    No Digital Data Available
# plot profiles, grouped by mukey
# color horizons with clay content
par(mar=c(0,0,4,0))
groupedProfilePlot(auburn, groups='mukey', group.name.cex=0.65, color='clay', name='hzn_desgn', id.style='side', label='pedon_id', max.depth=100)
# describe IDs
mtext('user pedon ID', side=2, line=-1.5)
mtext('mukey', side=3, line=-1, at = c(0,0), adj = 0)

More examples pending...


This document is based on soilDB version 1.6.9.