Data science, big data, data storage & mining, curation, analytics, in-memory techniques, graph databases

Both Climate Modeling and Earth System Modeling entail petabytes (1015 bytes), if not exabytes (1018 bytes) of observational data and sensor network data, as well as the vast amounts of data output from the simulation process itself.

Generally speaking, observational data is best stored locally, at a place near to where the data has been collected - simply because moving data has a high cost, and the 'pride of ownership' factor helps preserve the quality and integrity of such data on a long term basis. Simulation output data on the other hand, is best stored near to the computing centres at which the simulations are made, albiet easy access to such data is needed by the analysts and decision-makers, where ever they may be located.

When useful data happens to be stored in several separate facilities, it needs to be 'federated' and 'harmonized' so as to become accessible and useful. Gaining access to remotely stored data through global networks is required in such cases, and the security and integrity of such data must be preserved through encryption techniques.

Data is only useful if it is of known quality and the circumstances of its collection are well understood. Such 'meta-data' is itself another level of data that needs to be carefully curated and kept safe, just as does the original underlying source data.

