1. What is S4PA?
The Simple, Scalable, Script-Based, Science Processing Archive (S4PA) is a radically simplified data archive architecture for supporting our DAAC users with online access to data. S4PA is already being used operationally and its deployment will be expanded in the months and years to come.
2. How does S4PA store data?
S4PA stored data on disk in a hierarchical structure. At the top are data set groups (closely related data sets). Below that are datasets. Below that the data are divided into directories by the begin time of the data files.
3. Can S4PA store data in automated tape libraries?
S4PA is designed to work with disks, not tape libraries.
4. How are data backed up in S4PA?
Data are saved to tape at regular intervals, using standard system backup procedures.
5. What about metadata in S4PA?
S4PA stores metadata in separate XML files alongside each data file.
6. What metadata are required for S4PA?
At a minimum, the XML metadata file needs short name identifying the dataset, the version of the dataset, and the start and stop (aka begin and end) date/time of the data file. For data that need to be searchable by spatial area, some geographic information (polygons or bounding boxes) is also necessary.
7. What about all the other metadata that describes how the data were produced?
Metadata needed for full documentation of the data file and its production should be included within the data file itself, in a form that is at the discretion of the data producer. S4PA does not need this metadata to manage the data.
8. I don't want to create separate XML files with metadata. Can my data still be ingested into S4PA?
Yes, S4PA can include dataset-specific metadata extractors, in order to extract the key metadata from the data files.
9. How are data transferred to an S4PA instance?
The preferred method is the Product Delivery Record method, aka the SIPS interface. However, S4PA can also poll remote directories for refreshed data files.
10. How do users search S4PA data?
The GES DISC will provide search interfaces for the data. The current one is known as the Web Hierarchical Ordering Mechanism (WHOM). A new interface, in development, is known as Mirador. In addition, S4PA exports metadata to the EOS ClearingHOuse (ECHO) , making the data visible through ECHO clients such as the Reverb.
11. How do users order data from S4PA?
In S4PA, orders are passé. The data are available online, all the time, so they can be downloaded at any time. No order is necessary. However, some interfaces, such as WHOM and Mirador, do offer the construction of a batch download script to simplify the download of large numbers of data files.
12. Can I get data by subscription?
Currently, you can sign up to get a notification when new data of a specified type are added to the archive. In the future, FTP and SFTP push subscriptions will be added.
13. What other services will be available on the data?
Services that are currently provided on-the-fly, such as Giovanni analysis and on-the-fly subsetting, will be available for data hosted in an S4PA instance.
14. How much data can S4PA hold? Is it scalable?
Though S4PA presents a lightweight management system for data, its holding capacity is limited only by that of the disk systems it manages. Likewise, its ingest performance is close to the disk transfer rates, unless significant processing is needed for ingest (e.g. compression or decompression).
It is also easy to stand up multiple S4PA instances by partitioning data holdings by data set. (The current TRMM archive is currently spread over two S4PA instances.) Publication of metadata to a clearinghouse nevertheless allows almost "transparent" access across the S4PA instances.