Read the full post from the Library of Congress.
The Digital Content Management section has been working on a project to extract and make available sets of files from the Library’s significant web archives holdings. This is another step to explore the web archives and make them more widely accessible and usable. Our aim in creating these sets is to identify reusable, “real world” content in the Library’s digital collections, which we can provide for public access. The outcome of the project will be a series of datasets, each containing 1,000 files of related media types selected from .gov domains. We will announce and explore these datasets here on The Signal, and the data will be made available through LC Labs. Although we invite usage and interest from a wide range of digital enthusiasts, we are particularly hoping to interest practitioners and scholars working on digital preservation education and digital scholarship projects.