Data Storage, Data Lakes, Mining, Curation, Analytics, Memory

Both Climate Modeling and Earth System Modeling entail petabytes (~1015 bytes), if not exabytes (~1018 bytes) of observational data and sensor network data, as well as the vast amounts of data output from the simulation process itself.

Generally speaking, observational data is best stored locally, at a place near to where the data has been collected - simply because moving data has a cost, and the 'pride of ownership' factor helps preserve the quality and integrity of such data on a long term basis. Simulation output data on the other hand, is best stored near to the computing centres at which the simulations are made.

When useful data is stored in several separate facilities, it needs to be 'federated' and 'harmonized' so as to become accessible and useful. Gaining access to remotely stored data through global networks is required in all such cases.

Data is only useful if it is of known quality and the circumstances of its collection are well understood. Such 'meta-data' is itself another level of data that needs to be carefully curated and kept along with the original underlying source data.

This section covers many aspects of the entire data life cycle process: 

Big data, big dreams

Data-Intesive system evolution

The evolution of the data scientist

The CAP Theorem's growing impact 

Software as a service for data scientists

New techniques turbo-charge data mining

The evolving art (and business) of data curation

Global datasphere to hit 175 zettabytes by 2025

Expert panel: what’s around the bend for big data?

What exactly is big data - if it's neither big nor data?

How to move 80 petabytes of data without down time

The new era of computing: an interview with 'Dr. Data'

What CIOs and CTOs need to know about big data and data intensive computing 

Big data analytics:

Pulling insights from unstructured data

Rating the advanced analytics vendors

Software engineering for data analytics

Five steps to de-mystify big data analytics

Mission analytics: data-driven decision making in government

From data to knowledge: machine-learning with real-time and streaming applications

Big data issues in science:

EarthServer

Why science really needs big data

Big data revolution in astrophysics

A geodata fabric for the 21st Century

Next generation team science platform

Supercomputer sails through world history

DOE exascale roadmap highlights big data

DOE Focuses on scientific data integration

Astronomers leverage 'unprecedented' data set

Big data in space: martian computational archeology  

Tool enables scientists to uncover patterns in vast data sets

Los Alamos releases file index product to software community

As supercomputers approach exascale, experts wrestle with big data

From microprocessors to nanostores: rethinking data-centric systems

To know, but not understand: David Weinberger on science and big data

Understanding data intensive analysis on large-scale HPC compute systems

Codesign challenges for exascale systems: performance, power and reliablility

Storage Systems, Data Lakes & Data Warehousing:

Storage at Exascale

SDSC cloud storage services

Data storage in DNA becomes a reality

Data storage using individual molecules

High performance scalable unified storage

Optimize storage placement in sensor networks

Is GOLAP the next wave for big data warehousing?

Availability in globally distributed storage systems

Stepping up to the life science storage system challenge

Software-defined storage takes off as big data gets bigger

Data lakes and overcoming the waste of 'data janitor' duties

Big data, big demand: navigating the cloud storage landscape

Data warehouse modernization in the age of big data analytics

To centralize or not to centralize your data - that is the question

The top 5 reasons to use multi-tier storage for managing scientific data

Storage systems for 'big data' dramatically speeds access to information

Phase change memory-based moneta system points to the future of computer storage

Vendor specific storage:

The future of storage: hardware

HP: Exascale Data Center     Multiparadigm Data Storage for Enterprise Applications

Fujitsu Lets Big Data Cloud Flag Fly     

Fujitsu Develops World's First Cloud Platform to Leverage Big Data

Samsung now mass producing industry's first 2nd generation, 10-nanometer class DRAM

IBM Scientists Demonstrate Phase-Change Memory Breakthrough

IBM storage breakthrough paves way for 330TB tape cartridges
    
IBM big data VP surveys landscape

IBM announces 3 bits/cell PCM

IBM Design Wins the Storage Challenge at SC10

IBM Demos Record-Breaking Parallel File System Performance

Parallel File System OrangeFS Starts to Build a Following

IBM Announces HPC Storage Solution for Streaming Data

The Complexity of VMware storage management

ArongoDB reaping the fruits of its multi-modal labor

Forrester reshuffles the deck on BI and analytic tools

MINE: Maximal Information-based Nonparametric Exploration

MINE: Detecting novel associations in large data sets

Big Data file formats demystified

Lustre:

The State of the Lustre Community

Why Lustre Is Set to Excel in Exascale 

Xyratex announces acquisition of Oracle's Lustre assets

Hadoop:

Hate Hadoop? Then you are doing it wrong

Hadoop: Big Data, Big Analytics, Big Insights

Large-scale seismic signal processing with Hadoop

Why Hadoop isn't the Big Data solution that you think it is

Spark just passed Hadoop in popularity on the web - here's why

Database choices:

Different databases for different strokes

Array databases: the next big thing in data analytics?

Self-driving databases are coming: what next for DBAs?

The polyglot problem: solving the paradox of the 'right' database

SQL vs non-SQL:

Crowded NoSQL wave shows abundant options

How SQL++ makes JSON more queryable

The new math driving NoSQL analytics

Graph Databases:

5 factors driving the graph database explosion

A look at the Graph Database landscape

AWS unveils 'Neptune' graph database

In-Memory Computing:

In-memory computing is the key to real-time analytics

Using in-memory data grids for global data integration

In-memory boosts Oracle OLTP by 2X, analytics by 1000X

Using an in-memory data grid for near real-time data analysis 

Memory Technolgies:

Future memory technology

New angle for optical memories

Storage approach mimics DNA in fossils

Hybrid memory cube angles for exascale

The switch that could double USB memory

Makng steps toward improved data storage

Towards data storage at the single molecule level

Patent granted for super-fast MRAM data storage

UK Researchers develop  super-fast memory chip

New technology of ultrahigh density optical storage 

New 3D chip combines computing and data storage

Tantalizing discovery may boost memory technology

Scientists stored an Amazon gift card on some DNA

Solid state quantum memories set endurance records

Write speeds for phase-change memory reach record limits

T-rays will 'speed up' computer memory by a factor of 1,000

DNA storage crams 700 terabytes of data into a single gram

Rice, UCLA slash energy needs for next-generation memory

A single-atom magnet breaks new ground for future data storage

5D 'superman memory crystal' heralds unlimited lifetime data storage

Industry leaders join forces to promote new high performance interconnect 

Next generation photonic memory devices are 'light-written', ultrafast and energy efficient

Room-temperature operation of low-voltage, non-volatile, compound semiconductor memory cells    

Battery and memory device in one    New computer memory can hold data 20 years without pwer