Both Climate Modeling and Earth System Modeling entail petabytes (1015 bytes), if not exabytes (1018 bytes) of observational data and sensor network data, as well as the vast amounts of data output from the simulation process itself.
Generally speaking, observational data is best stored locally, at a place near to where the data has been collected - simply because moving data has a high cost, and the 'pride of ownership' factor helps preserve the quality and integrity of such data on a long term basis. Simulation output data on the other hand, is best stored near to the computing centres at which the simulations are made, albiet easy access to such data is needed by the analysts and decision-makers, where ever they may be located.
When useful data happens to be stored in several separate facilities, it needs to be 'federated' and 'harmonized' so as to become accessible and useful. Gaining access to remotely stored data through global networks is required in such cases, and the security and integrity of such data must be preserved through encryption techniques.
Data is only useful if it is of known quality and the circumstances of its collection are well understood. Such 'meta-data' is itself another level of data that needs to be carefully curated and kept safe, just as does the original underlying source data.
This webpage covers most aspects of the entire data life cycle:
Data science:
Big data, big dreams
The world after big data
Big data's dirty little secret
The future of data science
Don't be a big data snooper
Building the big data highway
Data-Intesive system evolution
The evolution of the data scientist
Finding patterns in corrupted data
Is your smartphone spying on you?
A bottom-up approach to data quality
The future of big data is .... Javascript?
Software as a service for data scientist
New techniques turbo-charge data mining
Big data requires big vision for big change
Reinventing society in the wake of big data
Charting a course out of the big data doldrums
The evolving art (and business) of data curation
Why developers need to think like data scientists
Which programming language is best for big data?
Expert panel: what’s around the bend for big data?
What exactly is big data - if it's neither big nor data?
The new era of computing: an interview with 'Dr. Data'
The third age of data and the unfolding scale-out world
Big data challenges and advanced computing solutions
The origins of 'Big Data': an etymological detective story
Only a fraction of the 160 zettabyte 'datasphere' to be stored
Big data and the creative destruction of today's business models
From microprocessors to nanostores: rethinking data-centric systems
What CIOs and CTOs need to know about big data and data intensive computing
Re-platforming the enterprise, or putting data back at the center of the data center
Moving big data:
Globus moves 1 exabyte
Five reasons for leaving your data where it is
How to move 80 petabytes of data without down time
Sizing up big data:
Global datasphere to hit 175 zettabytes by 2025
Is big data dead? MotherDuck raises $47M to prove it
Data analytics - realtime and in-memory:
Pulling insights from unstructured data
Getting ready for real-time decisioning
Rating the advanced analytics vendors
Software engineering for data analytics
Five steps to de-mystify big data analytics
Analyzing video, the biggest data of them all
What's driving the rise of real-time analytics?
Arrow aims to defrag big in-memory analytics
Algorithms trump big data, apps and analytics
Peering into the crystal ball of advanced analytics
Beyond big: the analytically powered organization
Inflexible data, analytics fueling failures, survey finds
Text analytics and machine learning: a virtuous combination
Operationalizing data-driven decisions: a 5-step methodology
Mission analytics: data-driven decision making in government
5 ways big geospatial data is driving analytics in the real world
Combining HPC and big data analytics on the same infrastructure
Understanding data intensive analysis on large-scale HPC compute systems
Reducing big data using ideas from quantum theory makes it easier to interpret
Transitioning from big data to discovery: data management as a keystone analytics strategy
Big data analytical advances from academia, business are enhancing exploration of our universe
In-memory big data:
In-memory database goes 'translytical'
In-memory computing is the key to real-time analytics
Using in-memory data grids for global data integration
Using in-memory data grids for global data integration
How IMDGs can analyze fast-changing data in real-time
In-memory boosts Oracle OLTP by 2X, analytics by 1000X
Using an in-memory data grid for near real-time data analysis
Cloud-baed big data:
What is the CAP theorem?
Big data in the public cloud
Tracking the rapid rise in cloud data
Microsoft scales Azure Data Lake into exascale territory
AI and big data:
Why knowledge graphs are foundational to AI
Five reasons machine learning is moving to the cloud
From data to knowledge: machine-learning with real-time and streaming applications
Big data in science:
EarthServer
Seeing stars through the cloud
How big data advances physics
NOAA launches big data project
Why science really needs big data
Big data revolution in astrophysics
EU project looks to scale Earth data
A geodata fabric for the 21st century
AI called in to tackle LHC data deluge
Next generation team science platform
DOE focuses on scientific data integration
Supercomputer sails through world history
DOE exascale roadmap highlights big data
Astronomers leverage 'unprecedented' data set
JPL, Caltech team up to tackle big data projects
SKA prepares for the ultimate big data challenge
Big data in space: martian computational archeology
Really, really big data - NASA at the forefront of analytics
Tool enables scientists to uncover patterns in vast data sets
Los Alamos releases file index product to software community
15 million Euro boost for to manage European astronomy big data
As supercomputers approach exascale, experts wrestle with big data
Networking, data experts design a better portal for scientific discovery
To know, but not understand: David Weinberger on science and big data
Spatial data platform from SpaceCurve for real-time operational intelligence
Codesign challenges for exascale systems: performance, power and reliablility
Core scientific dataset model: a lightweight and portable model and file format for multi-dimensional data
Storage Systems, Data Lakes & Data Warehousing:
Storage at exascale
SDSC cloud storage services
Big Data file formats demystified
Data storage using individual molecules
High performance scalable unified storage
Optimize storage placement in sensor networks
ArongoDB reaping the fruits of its multi-modal labor
Is GOLAP the next wave for big data warehousing?
Availability in globally distributed storage systems
Multiparadigm data storage for enterprise applications
Stepping up to the life science storage system challenge
Software-defined storage takes off as big data gets bigger
Data lakes and overcoming the waste of 'data janitor' duties
New data storage is to Dye for - avoids DNA storage pitfalls
Big data, big demand: navigating the cloud storage landscape
Data warehouse modernization in the age of big data analytics
A four-phased approach to building an optimal data warehouse
To centralize or not to centralize your data - that is the question
The top 5 reasons to use multi-tier storage for managing scientific data
Storage systems for 'big data' dramatically speeds access to information
Phase change memory-based moneta system points to the future of computer storage
Vendor specific storage & tools:
HPE:
HP: Exascale Data Center
VMware:
The complexity of VMware storage management
Fujitsu:
Fujitsu lets big data cloud flag fly
Fujitsu develops world's first cloud platform to leverage big data
IBM:
IBM big data VP surveys landscape
IBM design wins the storage challenge at SC10
IBM announces HPC storage solution for streaming data
IBM demos record-breaking parallel file system performance
IBM storage breakthrough paves way for 330TB tape cartridges
Parallel file system OrangeFS starts to build a following
MINE:
MINE: Detecting novel associations in large data sets
MINE: Maximal Information-based Nonparametric Exploration
Presto:
Presto poised for a breaout year as data explosion continues
Forrester:
Forrester reshuffles the deck on BI and analytic tools
Lustre:
The State of the Lustre Community
Why Lustre Is Set to Excel in Exascale
Xyratex announces acquisition of Oracle's Lustre assets
Hadoop:
Can Hadoop be simple again?
Hate Hadoop? Then you are doing it wrong
Hadoop: Big Data, Big Analytics, Big Insights
Large-scale seismic signal processing with Hadoop
Why Hadoop isn't the Big Data solution that you think it is
Spark just passed Hadoop in popularity on the web - here's why
Database choices:
Different databases for different strokes
Oracle aims to break big data silos with SQL
RDBMS remains popular as data sources grow
Array databases: the next big thing in data analytics?
Self-driving databases are coming: what next for DBAs?
The polyglot problem: solving the paradox of the 'right' database
SQL vs non-SQL:
The new math driving NoSQL analytics
How SQL++ makes JSON more queryable
Crowded NoSQL wave shows abundant options
Graph Databases:
Graph databases worth $5.1B by 2026
AWS unveils 'Neptune' graph database
A look at the graph database landscape
Azure joins the Graph500 with Top20 showing
5 factors driving the graph database explosion
Graph databases gaining enterprise ready features
Why young developers don't get knowledge graphs
DIVE: a graph-based visual-analytics framework for big data
How mathematicians use homology to make sense of topology
KAIST introduces T-GPS, a tool for processing a trillion-edge graph on one computer
Neo4j delivers graph database hardened container in collaboration with DoD Platform One
Graph visualization:
Giga graph cities: their buckets, buildings, waves and fragments
Evaluating representation learning and graph layout methods for visualization
Graph maths:
Mathematicians answer old question about graphs
Database virtualization:
Is now the time for database virtualization?