Title : Description of L4C Nature Run v4.1, Vv3040, and Associated Subsets Author : Lucas A. Jones, lucas@ntsg.umt.edu, Univ. of Montana Date : 01/24/2018 Project : NASA IDS Project 2018, Nick Parazoo (PI), Nicholas.C.Parazoo@jpl.nasa.gov, JPL/Cal Tech Temporal Extent: L4C Nature Run, daily Jan 1, 2000 - Sept 30, 2017 L4C Vv3040, daily Mar 31, 2015 - Dec 31, 2017 Spatial Format : Global NSIDC EASE Grid (v2) 9-km and 9-km summarized by Plant Functional Type (L4C processing done at 1-km and aggregated to 9-km) Subsets cover 15 Alaska study locations. All datasets provided in HDF-5 file format. The dataset archives have been provided with MD5 Checksums to ensure that delivery has not corrupted the contents. Examples : The MATLAB script 'plotSubset.m' demonstrates loading and plotting timeseries subsets for the Alaska study sites. Notes : This distribution consists of two runs of the L4C algorithm: L4C Nature Run v4.1 (Vv3040), and L4C Ops Vv3040 which differ in input forcing, initialization, and temporal extent. The L4C NRv4.1 dataset uses MERRA, L4SM Nature Run (v4.1, "open loop", i.e. no data assimilation), and MODIS MOD16A2H. This run uses the same model calibration as L4C Ops Vv3040. It was initialized on Jan 1, 2000, using spun-up soil organic carbon computed by cycling the NRv4.1 input data. This dataset was produced at the Univ. of Montana and is available under special request. The L4C Ops Vv3040 dataset uses GEOS-5 FP, L4SM (with data assimilation), and MODIS MOD16A2H. This run incorporates SMAP information through L4SM data assimilation, which impacts surface soil moisture, root zone soil moisture, and surface soil temperature state variables. This run was initialized on March 31, 2015, using spun-up soil organic carbon computed by cycling the NRv4.1 input data. This dataset is produced operationally at NASA GMAO/GSFC using code developed at the Univ. of Montana and is publically available from the National Snow and Ice Datacenter (NSIDC; https://nsidc.org/data/SPL4CMDL). Differences among the two datasets result from either (1) forcing data or (2) initialization, and (to a lesser extent) the cummulative impacts of (1) and (2) over the simulation period. Note: some metadata fields differ among the two runs and subset granules. Subset granules contain 2-dimensional matricies of time (row) x site (col) data. An additional group ('/SUBSET') contains relevant subset-associated data. The 'year_doy' associates the year and day-of-year for each row of the dataset.The 'site_name' dataset indicates which column goes with which site. The 'grid_row_col' indicates the EASE Grid row and column coordinates used for the timeseries and can be used to determine whether a site shares a grid cell with spatially adjacent sites in which case the row and column coordinates are identical. The 'site_latlon' field contains the provided point lat/lon information used to create the subset. The 'site_to_center_disk_km' field gives the spherical-earth distance (Haversine formula) from the site location to the extracted 9-km grid cell center. This distance can be greater than the grid cell radius (i.e. 4.5 km) because of North/South distance distortion at high latutidutes. The distance can also be considered a possible predictor of spatial representativeness. See granule- and dataset-level metadata for additional information. Relevant documents, including the ATBD and Product Specification Documents describing the algorithm and metadata fields, respectively, are available from NSIDC. Known problems : The user might notice a spatial 'stamping' effect caused by differences in effective spatial resolution of input datasets, particularly MERRA and GEOS-5 FP datasets. Much less obvious is temporal stamping which occurs from the 8-day MOD16A2H FPAR dataset, which is masked by day-to-day variability in incoming radiation and vapor pressure deficit which also control Gross Primary Productivity.