Digital Preservation Framework for Risk Assessment and Preservation Planning

NARA developed its Digital Preservation Framework to document and share recommended preservation actions based on its electronic record holdings and current capabilities. It is a comprehensive resource that includes:

  • A Matrix for file format risk analysis and prioritization for action;
  • Preservation Plans for 16 categories of electronic records (or “record types”), such as email, still images, and software, which identify “Significant Properties,” the properties that should, if possible, be retained in any format migration; and
  • Preservation Action Plans for over 650 file formats, including proposed preservation actions and tools.

The Digital Preservation Framework is publicly available on the NARA Github account for reuse and adaptation, and discussion.

Keeping science reproducible in a world of custom code and data

Read the full story at Ars Technica.

Since the mid-1600s, the output from a typical scientific study has been an essay-style journal article describing the results. But today, in fields ranging from astronomy to microbiology, much of the technical work for a journal article involves writing code to manipulate data sets. If the data and code are not available, other researchers can’t reproduce the original authors’ work and, more importantly, may not be able to build upon the work to explore new methods and discoveries.

Thanks to cultural shifts and funding requirements, more researchers are warming up to open data and open code. Even 100-year-old journals like the Quarterly Journal of Economics or the Journal of the Royal Statistical Society now require authors to provide replication materials—including data and code—with any quantitative paper. Some researchers welcome the new paradigm and see the value in pushing science forward via deeper collaboration. But others feel the burden of learning to use distribution-related tools like Git, Docker, Jupyter, and other not-quite words.

Parts of the web are disappearing every day. Here’s how to save Internet history

Read the full story from Fast Company.

The websites of today are the historical evidence of tomorrow—but only if they are archived.

File not found: Biodiversity needs better data archiving

Read the full story from Michigan State University.

A Spartan-led research team reveals surprising gaps in ecological genetic data that could otherwise help global conservation efforts

The United States Geological Survey Science Data Lifecycle Model

Download the document. See also this article in Journal of eScience Librarianship for a discussion of the model’s development.

U.S. Geological Survey (USGS) data represent corporate assets with potential value beyond any immediate research use, and therefore need to be accounted for and properly managed throughout their lifecycle. Recognizing these motives, a USGS team developed a Science Data Lifecycle Model (SDLM) as a high-level view of data—from conception through preservation and sharing—to illustrate how data management activities relate to project workflows, and to assist with understanding the expectations of proper data management. In applying the Model to research activities, USGS scientists can ensure that data products will be well-described, preserved, accessible, and fit for re-use. The Model also serves as a structure to help the USGS evaluate and improve policies and practices for managing scientific data, and to identify areas in which new tools and standards are needed.

How Scientists Can Protect Their Data From the Trump Administration

Read the full story in The Intercept.

If you’re an American scientist who’s worried that your data might get censored or destroyed by Trump’s radically anti-science appointees, here are some technologies that could help you preserve it, and preserve access to it.

Researchers make soil experiment data available in real time

Read the full story in EnvironmentalResearchWeb.

Traditionally, the scientific world has had closed doors. Since the 17th century scientists have published their findings in learned journals, but the details of exactly what they did, how they gathered their data and what tools they used to reach their conclusions are not usually communicated beyond the laboratory walls. But times are changing. Modern technology is enabling scientists to share their data worldwide and put information into the hands of ordinary citizens. However, sharing data openly is not without its challenges.

October 2015 issue of NTIS National Technical Reports Newsletter features e-cycling publications

The October 2015 issue of NTIS’ National Technical Reports Newsletter features a sampling of new and historic information on electronics recycling that is available from NTIS via the NTRL website. The issue also includes links to the public access plans of several federal agencies and an overview of NTIS’ new NTRL database.

Geospatial Data: Progress Needed on Identifying Expenditures, Building and Utilizing a Data Infrastructure, and Reducing Duplicative Efforts

Geospatial Data: Progress Needed on Identifying Expenditures, Building and Utilizing a Data Infrastructure, and Reducing Duplicative Efforts
GAO-15-193: Published: Feb 12, 2015. Publicly Released: Mar 16, 2015.

Download at http://www.gao.gov/products/GAO-15-193.

What GAO Found

Federal agencies and state governments use a variety of geospatial datasets to support their missions. For example, after Hurricane Sandy in 2012, the Federal Emergency Management Agency used geospatial data to identify 44,000 households that were damaged and inaccessible and reported that, as a result, it was able to provide expedited assistance to area residents. Federal agencies report spending billions of dollars on geospatial investments; however, the estimates are understated because agencies do not always track geospatial investments. For example, these estimates do not include billions of dollars spent on earth-observing satellites that produce volumes of geospatial data. The Federal Geographic Data Committee (FGDC) and the Office of Management and Budget (OMB) have started an initiative to have agencies identify and report annually on geospatial-related investments as part of the fiscal year 2017 budget process.

FGDC and selected federal agencies have made progress in implementing their responsibilities for the National Spatial Data Infrastructure as outlined in OMB guidance; however, critical items remain incomplete. For example, the committee established a clearinghouse for records on geospatial data, but the clearinghouse lacks an effective search capability and performance monitoring. FGDC also initiated plans and activities for coordinating with state governments on the collection of geospatial data; however, state officials GAO contacted are generally not satisfied with the committee’s efforts to coordinate with them. Among other reasons, they feel that the committee is focused on a federal perspective rather than a national one, and that state recommendations are often ignored. In addition, selected agencies have made limited progress in their own strategic planning efforts and in using the clearinghouse to register their data to ensure they do not invest in duplicative data. For example, 8 of the committee’s 32 member agencies have begun to register their data on the clearinghouse, and they have registered 59 percent of the geospatial data they deemed critical. Part of the reason that agencies are not fulfilling their responsibilities is that OMB has not made it a priority to oversee these efforts. Until OMB ensures that FGDC and federal agencies fully implement their responsibilities, the vision of improving the coordination of geospatial information and reducing duplicative investments will not be fully realized.

OMB guidance calls for agencies to eliminate duplication, avoid redundant expenditures, and improve the efficiency and effectiveness of the sharing and dissemination of geospatial data. However, some data are collected multiple times by federal, state, and local entities, resulting in duplication in effort and resources. A new initiative to create a national address database could potentially result in significant savings for federal, state, and local governments. However, agencies face challenges in effectively coordinating address data collection efforts, including statutory restrictions on sharing certain federal address data. Until there is effective coordination across the National Spatial Data Infrastructure, there will continue to be duplicative efforts to obtain and maintain these data at every level of government.

Why GAO Did This Study

The federal government collects, maintains, and uses geospatial information—data linked to specific geographic locations—to help support varied missions, including national security and natural resources conservation. To coordinate geospatial activities, in 1994 the President issued an executive order to develop a National Spatial Data Infrastructure—a framework for coordination that includes standards, data themes, and a clearinghouse. GAO was asked to review federal and state coordination of geospatial data.

GAO’s objectives were to (1) describe the geospatial data that selected federal agencies and states use and how much is spent on geospatial data; (2) assess progress in establishing the National Spatial Data Infrastructure; and (3) determine whether selected federal agencies and states invest in duplicative geospatial data. To do so, GAO identified federal and state uses of geospatial data; evaluated available cost data from 2013 to 2015; assessed FGDC’s and selected agencies’ efforts to establish the infrastructure; and analyzed federal and state datasets to identify duplication.

What GAO Recommends

GAO suggests that Congress consider assessing statutory limitations on address data to foster progress toward a national address database. GAO also recommends that OMB improve its oversight of FGDC and federal agency initiatives, and that FGDC and selected agencies fully implement initiatives. The agencies generally agreed with the recommendations and identified plans to implement them.

For more information, contact David A. Powner at (202) 512-9286 or pownerd@gao.gov.

Computer equal to or better than humans at cataloging science

Read the full story from the University of Wisconsin.

In 1997, IBM’s Deep Blue computer beat chess wizard Garry Kasparov. This year, a computer system developed at the University of Wisconsin-Madison equaled or bested scientists at the complex task of extracting data from scientific publications and placing it in a database that catalogs the results of tens of thousands of individual studies…

The development, described in the current issue of PLoS, marks a milestone in the quest to rapidly and precisely summarize, collate and index the vast output of scientists around the globe, says first author Shanan Peters, a professor of geoscience at UW-Madison.