Data Cookbook

Obtaining Data

For most satellite Level 3 or modeled monthly products at the Goddard Earth Sciences Data and Information Services Center (GES DISC), one file contains one month of data.  Similarly, a daily file contains one day of data.  The data files downloaded  through the current data services at GES DISC will keep the original file structure. Thus, if one would like to download 10 years of monthly data, 120 files will be downloaded.   Sometimes, however, a user would like to have the data in a single file for simpler data processing.  This data recipe shows how to create a single file in netCDF by concatenating the time dimension of multiple files with NCO commands.    

This data recipe shows an example for downloading data files from an HTTP service at GES DISC with GNU wget commend.   The GNU wget is a free software for non-interactive downloading of files from the Web.  It is a Unix-based command-line tool, but is also available for other operating system, such as Linux, Windows, Mac OS X, etc.  

NetCDF (Network Common Data Form) data format is popularly used in Earth science modeling and application communities.  Obtaining data in NetCDF format is available in Mirador service for some datasets at GES DISC, such as products from the Tropical Rainfall Measuring Mission (TRMM), Atmospheric Infrared Sounder (AIRS), Global and North American Land Data Assimilation Systems (GLDAS and NLDAS), etc. This recipe provides as example demonstrating how to download data in NetCDF format through the Mirador data search and order service. 

A very common earth science data request is to acquire data for a certain period of each year, over a period of several years.   This request could mean data for successive months of April, or data for successive seasons, such as winter.  Mirador can provide this type of data with its “seasonal search” feature.   This data recipe provides an example on how to get data files for a specific time period of interest for all available years in a single order with Mirador.
 

This data recipe provides a walk-through example on how to find and download all of the data at the GES DISC for a hurricane event by using the data search engine, and how to visualize the data online by using Giovanni, or offline with other tools.   The recipe starts with a list of data products at the GES DISC that may be useful for conducting a hurricane case study.  

  1. Product list
  2. Search and download data via Mirador (Example)
  3. Search and download data via Simple Subset Wizard (SSW) (Example)
  4. View TRMM Level 3 daily data in Giovanni (Example)
  5. View TRMM Level 2 precipitation data with Panoply (Example)

The NetCDF (Network Common Data Form) data format is a commonly-used format in many Earth science modeling and application communities.  Obtaining data from the GES DISC in NetCDF format may be possible using the Simple Subset Wizard (SSW) service.  This recipe provides an example demonstrating how to download data in the NetCDF format through the SSW service.

In general, a data file contains more than one data variable.  If a researcher would like to conduct a regional study for a long period of time, downloading the original data files with all of their data variables may be time-consuming and also require a large volume of data storage on a user’s local system.   The Simple Subset Wizard (SSW) service can select specific data variables in a data file and also create spatial (regional) subsets from the original data coverage.  This recipe will show you how to download a variable for a region and time period of interest through the SSW.

The NASA Goddard Earth Sciences Data and Information Services Center (GES DISC) provides a number of data acquisition services, such as Mirador and Simple Subset Wizard (SSW), which allow users to obtain data in netCDF format regardless of the original archived data format. However, one limitation of these services is that a user will frequently acquire a large number of files. For example, daily data products are usually stored as one file per day, and monthly data products are stored as one file per month. In many cases, a user would prefer to have time-series data encapsulated in a single netCDF file.  This recipe gives an example demonstrating how to obtain a time-series in a single netCDF file though the GrADS Data Server (GDS) by using a netCDF Operator (NCO) command, ncks.

The GrADS Data Server (GDS) allows users to obtain a time series of a region of interest from GDS-enabled data sets. This recipe gives an example on how to get data in ASCII comma-delimited format for a parameter of interest and a user-specified temporal/spatial coverage with a simple command URL.

This data recipe demonstrates how to extract a subset of variables for part of an OCO-2 orbit using OPeNDAP.  Downloading one full orbit of OCO-2 Level 1B data from the GES DISC (Goddard Earth Science Data Information and Services Center) can take more than 10 minutes even over a fast internet connection.  However, if a user is only interested in one small region, it is not necessary to download the entire orbit.   The procedure below describes how to identify granules in a region of interest and read just the Longitude and Latitude from the file to find the indices for the geographic region of interest.  The procedure also shows how the indices can be used to read spatial subsets into Python over the internet or to create a url that will download a NetCDF-4 file containing only the data in the selected region of interest.  A small spatial and variable subset of OCO-2 radiances can take seconds to download rather than minutes.

NetCDF (Network Common Data Form) data format is popularly used in Earth science modeling and application communities.  Data served in OPeNDAP at GES DISC can be downloaded as NetCDF files regardless the original data format. This recipe is an example on how to download data in NetCDF format through an OPeNDAP service

Viewing Data

McIDAS-V (Man computer Interactive Data Access System – the Fifth Generation) is good for displaying weather satellite (including hyperspectral) and other geophysical data in 2- and 3-dimensions, and for analyzing and manipulating the data with its powerful mathematical functions. This data recipe provides an example of reading 3-dimensional AIRS Level 3 and Level 2 data into McIDAS-V, for quick display of an image map, vertical profile, and vertical cross section.   Other data available from the GES DISC can be visualized with McIDAS-V using a similar approach.

Many new data products at the Goddard Earth Sciences Data and Information Services Center (GES DISC) are in HDF-5 format, as it has advantages over the HDF-4 format for storing data, such as grouping variables, bigger file sizes, faster data extraction, etc.  The Grid Analysis and Display System (GrADS) supports many data file formats, including binary, GRIB, netCDF, HDF (version 4 and 5), and BUFR (for station data).  This recipe is an example demonstrating how to read and display data in HDF-5 format with GrADS.

R  is a software package for statistical computing and graphics. Data in netCDF format can be imported to R by using netCDF-enabled R-packages.  This recipe shows how to import a gridded data file by using either of two R-packages, RNetCDF and ncdf.

This recipe shows how to read data from the Global Precipitation Measurement (GPM) mission's IMERG dataset using Python.

If you are a user of  GrADS and wish to subset gridded data for an irregular area, such as a country, a state, or a water shed, it can be done in a two-step process.  The first step is to convert the irregular area into a mask, and the second step is to apply this mask to the gridded data in GrADS. If you have a shapefile, you may use the software such as gdal to make the mask.

GrADS is able to access data remotely through GrADS Data Server (GDS, formerly known as GrADS-DODS Server). Performance is much improved for analyses such as averaging or correlations on a remote data server. In such cases, users only need to download the small result data volume instead of the entire data set. This recipe gives examples on how to use GrADS scripts to do remote calculations on the data at GES-DISC GDS servers. It also describes how to view the results by plotting maps and time-series.

Most satellite data, including those served by NASA GES DISC, are time-stamped, meaning each data value is explicitly associated with a particular dates and/or times. NASA GES DISC data are provided in two major forms: one form is “Gridded”, and the other is “Swath” data. Both Gridded and Swath data are raster data in ArcGIS terminology. In the Gridded data, time is usually defined at hourly, daily, and monthly resolutions, and each layer of the Grid has one particular time, such as a particular day.  In Swath data, each raster pixel carries a time value and the time value changes very fast, usually in seconds or a fraction of second. This recipe deals only with time dimension in Grid data describing how to define time, enable, and visualize temporal grid data in ArcGIS.

The Modern Era Retrospective-analysis for Research and Applications (MERRA) is a NASA atmospheric reanalysis for the satellite era. It focuses on historical analyses of the hydrological cycle on a broad range of weather and climate time scales.  The data products contain many Geographic Information System- (GIS) relevant variables, including land surface variables such as moisture, precipitation, runoff, and energy fluxes, and are thus useful to a wide range of GIS applications. On the other hand, the HDF-EOS formatted MERRA data will have incorrect geolocation information when directly added into ArcGIS, due to MERRA’s spatial gridding and its HDF-EOS file writing convention. This recipe shows how to correctly import MERRA data into ArcGIS.

Some satellite data, including data served by the NASA Goddard Earth Sciences Data and Information Services Center (GES DISC), include a vertical spatial dimension in addition to the two horizontal spatial dimensions. Although the vertical dimension commonly represents the height (elevation, altitude) or depth, it may also be expressed in other forms, most frequently in pressure units for atmospheric data. Such data sets are 3-dimensional (3D) raster data. When importing such 3D data into ArcGIS, one can display and visualize the data at different vertical levels.

Integrated Multi-satellite Retrievals for Global Precipitation Measure (IMERG) data products provide global high spatial and temporal resolution precipitation data, at 0.1-degree resolution and for every half hour. The data are archived in native HDF5 format but are available in several other formats through GES DISC services, including netCDF format. This recipe is intended for users who prefer downloading the IMERG data in HDF5 format from GES DISC ftp web sites. For users who get the data in netCDF format, please refer to the How to Import Gridded Data in NetCDF Format into ArcGIS recipe.

 

Integrated Multi-satellite Retrievals for Global Precipitation Measure (IMERG) data products provide global high spatial and temporal resolution precipitation data, at 0.1-degree resolution and for every half hour. The data are archived in native HDF5 format but are available in several other formats through GES DISC services, including netCDF format. This recipe is intended for users who prefer running Arcpy script for downloaded the IMERG data in HDF5 format from GES DISC ftp web sites. For users who get the data in netCDF format, please refer to the How to Import Gridded Data in NetCDF Format into ArcGIS recipe

Satellite observation and climate model data become more and more widely used in GIS.  ArcGIS is one of the dominant software packages in the GIS community. NetCDF format is not a traditionally used GIS format although it is getting popular in the community.  This recipe shows how to import a satellite swath (Level 1 or Level 2 in NASA terminology) data in NetCDF format into ArcGIS as the point feature. 

Satellite observation and climate model data become more and more widely used in GIS.  ArcGIS is one of the dominant software packages in the GIS community. NetCDF format is not a traditionally used GIS format although it is getting popular in the community.  This recipe shows how to import a grided model or satellite (Level 3 or Level 4) data file in NetCDF format into ArcGIS. 

Panoply is a data viewer that quickly displays geo-referenced arrays in netCDF, HDF, and GRIB formats. This recipe shows the steps that will create a wind vector plot in Panoply.

Panoply is a data viewer that is able to display geo-referenced arrays quickly in netCDF, HDF, and GRIB data formats and export images in KMZ for GoogleEarth.   This recipe gives an example describing  how to export an image in KMZ and display the image in GoogleEarth.
 

Panoply is a data viewer that displays geo-referenced arrays in Netword Common Data Format (NetCDF), Hierarchical Data Format (HDF) and GRIdded Binary (GRIB) format. This recipe is an example to access data directly in OPeNDAP with Panoply.

Panoply is a data viewer that quickly displays geo-referenced arrays in NetCDF, HDF, and GRIB formats. Panoply can combine two arrays using a number of algorithms to make a single plot, such as combine, difference, average, etc. This recipe shows the steps for making a combined plot with the same variable from two files.

Panoply is a data viewer that displays geo-referenced arrays in NetCDF, HDF and GRIB formats. This recipe provides a tutorial on how to view data quickly with Panoply.

This tutorial demonstrates how to plot GPM IMERG (Global Precipitation Measurement Integrated Multi-Satellite Retrievals for GPM) data using NASA's Giovanni data visualization website.

Feedback