Data Storage, Data Lakes, Mining, Curation, Analytics, Memory

Both Climate Modeling and Earth System Modeling entail petabytes (~1015 bytes), if not exabytes (~1018 bytes) of observational data and sensor network data, as well as the vast amounts of data output from the simulation process itself.

Generally speaking, observational data is best stored locally, at a place near to where the data has been collected - simply because moving data has a cost, and the 'pride of ownership' factor helps preserve the quality and integrity of such data on a long term basis. Simulation output data on the other hand, is best stored near to the computing centres at which the simulations are made.

When useful data is stored in several separate facilities, it needs to be 'federated' and 'harmonized' so as to become accessible and useful. Gaining access to remotely stored data through global networks is required in all such cases.

Data is only useful if it is of known quality and the circumstances of its collection are well understood. Such 'meta-data' is itself another level of data that needs to be carefully curated and kept along with the original underlying source data.

This section covers many aspects of the entire data life cycle process:     

