White Paper Title: 
Advances in Data Sharing Related to the Ridge 2000 Program

Vicki L. Ferrini, Suzanne M. Carbotte, Suzanne O’Hara, Robert Arko, William Ryan, Kerstin Lehnert

A principle goal of the Ridge 2000 Program is to develop focused, quantitative, whole-system models of Oceanic Spreading Center processes through coordinated, integrated, and interdisciplinary experiments at a small number of sites.  Pivotal to achieving this goal is an emphasis on interdisciplinary collaboration through the timely dissemination of data and results. The Ridge 2000 Data Management Office (DMO) was established to assist the science community in meeting their data sharing requirements by (1) establishing the infrastructure to document field programs and provide access to field and derived data sets, and (2) developing tools and services to facilitate the discovery, analysis, and integration of data funded by and related to the Ridge 2000 Program. 

Ridge 2000 Data Portal (www.marine-geo.org/portals/ridge2000) was established as part of the integrated Marine Geoscience Data System (MGDS, www.marine-geo.org).  At its core, the Ridge 2000 Data Portal includes an inventory of field programs and sampling activities, and provides access to raw and derived data sets held within the data system and hosted at specialized disciplinary data systems (e.g. PetDB, www.petdb.org; GenBank, www.ncbi.nlm.nih.gov; IRIS, www.iris.edu, NGDC, www.ngdc.noaa.gov).  The data system was designed to fully document field program activities such that scientists not in the field can discover relevant information about data acquisition activities. The system links information about sampling and surveying activities to the resulting data, provides basic provenance information linking field (raw) data with derived (processed) data products, and links data to references and NSF award information.  Data are discoverable through several web applications including text-based and map-based search interfaces, OGC-compliant web services, and Google Earth services.  As the system has evolved and more data have become available, functionality has been added to accommodate new data types, leverage emerging technology, and adapt to the needs of the user community.

In addition to the main database, the Ridge 2000 DMO has constructed two desktop applications for data visualization and analysis.  GeoMapApp (www.geomapapp.org) and Virtual Ocean (www.virtualocean.org) provide geographic context (map-view) for a variety of online data sets hosted by the Data Portal and by partner data systems (e.g. geochemical data hosted by PetDB, near-bottom imagery hosted by NDSF).  These tools not only access online data but also provide options to import data from the user’s local computer.  They are important tools for facilitating integration and synthesis across disciplinary boundaries by making specialized data sets (e.g. geophysical data) visually and quantitatively accessible to users from other disciplines.  In addition, these tools make available the Global Multi-Resolution Topography (GMRT) synthesis (www.marine-geo.org/portals/gmrt) which serves as the bathymetric basemap for several papers and presentations.

Evolving developments in partner data systems have also played an important role in Ridge 2000 data management efforts.  The System for Earth Sample Registration (SESAR, www.geosamples.org), part of the Geoinformatics for Geochemistry (GfG, www.geoinfogeochem.org) Program is a centralized registry that provides and administers unique identifiers for geoscience samples.  The use of International Geo Sample Numbers (IGSNs) prevents ambiguity in documenting samples by systematizing sample designation and ensuring that all information associated with a sample is preserved for accessibility on a global scale.  A partnership between SESAR and the Ridge 2000 Data Portal ensures that IGSNs for samples registered with SESAR are made available through the Data Portal search interface and GeoMapApp.

As the Ridge 2000 Program has progressed, the emphasis on data sharing and preservation across all NSF-funded programs has increased, and the culture within the science community with respect to data sharing has begun to shift.  More and more scientists are routinely submitting data, and sharing data is becoming an integral aspect of how we conduct our research, collaborate, and build upon each other’s work.  As a result of increased community engagement and technological advances, data submission, access, and analysis tools are constantly evolving.  The design of the integrated data system has enabled rapid development of new functionality and interfaces in response to the needs of the Ridge 2000 Program.  Recent system enhancements include the development and release of an integrated data compliance web form to help PIs document the status of their data submissions, and a fully integrated reference search that provides access to field program information and data sets based on publications.

Much of the functionality developed within the Ridge 2000 Data Portal has been scaled up and applied to data management efforts for other NSF-funded programs (e.g. MARGINS) and the broader community of ocean scientists.  Database design and functionality developed by the Ridge 2000 DMO has been used to build a prototype database for the National Deep Submergence Facility (NDSF), and to create a next generation digital event logger for use with NDSF vehicles to facilitate accurate documentation of sampling metadata.  In addition, Ridge 2000 data management efforts have informed developing and evolving data management efforts at NSF and NASA.  Lessons learned and technical achievements should also help inform parallel data management strategies such as OOI. 

Future Directions
An integrated data system that serves the needs of the science community is much more than a catalog of metadata and archived data with a single data access interface. A well-designed data system that provides a variety of data access options ensures that the system can leverage emerging technology and provide customized access for a diverse community of users.  Fundamental to the success of such a system is that it be populated with high-quality metadata that describes the digital data collection. While documenting data can be very time consuming for certain data types, some tools for routinely capturing metadata have already been developed.  In the future, we can expect that more tools will be constructed and refined to facilitate our efforts to routinely document data.  Further, as data format standards are accepted for more data types, tools that can leverage those standards will facilitate data submission.  Integrated online tools for online data submission are already being designed as part of the formal partnership between MGDS and GfG (Integrated Earth Data Applications (IEDA)). Consolidating existing data submission information and tools for these systems into a centralized location, will help clarify data submission requirements and provide information to the community about new data submission tools as they come online.

As technology evolves data sharing will become the norm as the culture of the science community changes to accept this new paradigm.  As data are more frequently used by scientists not involved with initial acquisition of the data, data publication and citation will become integral to publishing in peer-reviewed journals. As data sharing becomes more broadly recognized as a critical component of the modern scientific process, we are likely to see professional credit given for data submission.

Significant new scientific discoveries will inevitably result from increased data sharing. In addition to an increase in the rate of growth of  regional and global data compilations, more data will be available to construct and validate models that investigate earth processes.  Data management activities will continually evolve, leveraging new computer technology and increasing bandwidth to improve data discoverability, facilitate collaboration, and meet the evolving needs of a growing scientific user community.