The New York Times has millions of printed photographs stored in an underground archive nicknamed "the morgue," and it has begun the arduous task of digitizing this collection. Google is part of the project, according to a post on one of the company's blogs, where it explains that its machine learning and cloud technologies will help The New York Times store, process, and search its archive.

Recent Videos

The morgue houses between 5 and 7 million photographs dating back to the late 19th century, all of them stored in folders within file cabinets. Many of the photos haven't been viewed in decades and all of them are at risk of damage. In 2015, for example, the morgue experienced minor damage after water leaked in from a broken pipe.

The New York Times' CTO Nick Rockwell said in a statement to Google:

The morgue is a treasure trove of perishable documents that are a priceless chronicle of not just The Times's history, but of nearly more than a century of global events that have shaped our modern world ... Staff members across the photo department and on the business side have been exploring possible avenues for digitizing the morgue’s photos for years. But as recently as last year, the idea of a digitized archive still seemed out of reach.

To help preserve this visual history, Google has stepped in to provide The New York Times with its cloud storage product for storing high-resolution digital copies of the photographs. The New York Times has developed a processing pipeline for the digitization project that includes resizing images using Google Kubernetes Engine and storing metadata using PostgreSQL, in addition to the open source command-line software ExifTool and ImageMagick.

Google's machine learning technology augments the system to offer insights into the digitized content. The company's Cloud Vision API is used to detect text, logos, objects, and more within photographs, while the Cloud Natural Language API uses the detected text to categorize the images. This data makes it possible to search the digitized collection for specific images that would otherwise be lost in the vast archive.