The New York Times&#39; massive photo archive is being digitized with Google&#39;s help

They are using Google services to store, organize, and retrieve the digital versions of each photo. Google’s AI software can identify the content of the images to efficiently create an index.

Google does have special-purpose hardware for rapid digitization of books, but the article doesn’t mention Google helping with that part of the project, not to say they aren’t. The book-scanning hardware flips through pages and takes a photo of each one at a very high speed. Think scanning an entire book in a minute. It would be conceivable to build a machine to open envelopes...

Dec 3, 2018permalink

PostModernBloke

"Google's machine learning technology augments the system to offer insights into the digitized content" - so, history according to Google.

Am I the only person here that would require Google to pay their taxes, to stop their covert data harvesting, and to generally just f*!k-off?

Meanwhile, I logged on with a gmail account.

Is resistance futile? Never!

Cheers

Nov 14, 2018permalink

Seeky

You are certainly not the only one, but we may be a minority.

Nov 14, 2018permalink

amateriat

Futile? No. Just complicated.

Nov 15, 2018permalink

By insight they mean basic characteristics about each photo. Does it have people? Is it indoors or out? Are there any words or numbers visible and what do they say? What news articles were published that day? They’re not rewriting anything, just making it easier to find a specific photo.

Dec 3, 2018permalink

John Koch

To digitize 5 million photos related to historical events or people might be a worthy feat, provided they are indexed with key words, and aren't disproportionately redundant photos of the same leaders or celebrities. The NYT will have to monetize this somehow--or at least use the archives as tool to attract subscribers or ad space.

Conversely, the world's population snaps over 5 million photos every three hours, and maybe 5 million or more are posted daily to social media. Who has time or impulse to look at more than a tiny fraction? Yet they track "news" for all the ordinary people and (more and more) beat old-fashioned photo journalists to the scoop.

Nov 13, 2018permalink

PostModernBloke

You make some interesting points. But 'old-fashioned' photojournalists are few and far between these days, because the agency model is just so much cheaper - why pay a photographers salary when you can cherry-pick the images from hundreds of sources, and just pay for the images you use?

After Vietnam, governments (and their masters) learned the power of images, and have worked to control photographers' access. 'Embedding' and censoring became the norm. So we saw many photographs of coffins returning home, containing our compatriots, but very few photographs depicting the hundreds of thousands of dead Iraqi civilians.

So in a sense the smartphone camera is an important source, but it'll never wield the power of a Nick Ut image; just because photographers have the skills to make good photos.

Nov 13, 2018permalink

No offense Postmodernbloke, but the shear number of people using smartphones and digital cameras with access to the internet will overwhelm the work of actual photojournalists.

The amateur video of Rodney King being beaten was a harbinger of this. And many other videos of police shootings and abuse have been distributed since. And there was the Zapruder film that basically forced the press to film and photograph the president at virtually every opportunity possible.

Google Image Search is pretty powerful. You can drop a photo into it and there is a good chance it will identify the subject (if it is relatively common) or at least find images that have a similar look. So a lot of these amateur photos can be uploaded and easily found even if they don't have great captions or key words.

Nov 14, 2018permalink

I didn't hear any information about copyright ownership, usage and licensing of these photos. Despite being in the NYT morgue, I would assume the copyright has expired on a good number of these images. (Maybe all prior to 1923.) So making them accessible to the general public may be against the interests of the NYT. These are the published images. But what about any unpublished images they scan from prints or negs?

Nov 13, 2018*permalink

Bobthearch

I get the impression this is being done for in-house use only at the NYT. The article doesn't say anything about public access or external distribution.

Maybe they could recover some of the costs by selling books of select 'lost' photos.

Nov 13, 2018permalink

clickhiker

The NY Times might not have a clue who took their photos. In the mid-80's I had credit given to a different photographer for my work. They were called before the pictures were published (2 different occasions) well before deadline, and the flailing NY Times botched the credits both times. One was accompanying a front page story. If I look at the images online, it STILL shows credit to the wrong person. I did a lot of work for a lot of northern papers back then, and the NY Times was the only publication incapable of crediting the right person!

Nov 20, 2018permalink

Bobthearch

If the photos were taken by staff photographers, the New York Times themselves is the copyright owner.

Nov 20, 2018permalink

clickhiker

Not all photos in the Times are staffers. I did photos of a large southern city mayor/politician the Times wished to publicize as a freelancer, and it ran on page 1. The staff at the Times didn't seem to care about accuracy when it came to who took the image, but sure wanted positive publicity for one of their political friends. It is okay. I'm doing okay, and have a nice roof over my head. It was a long time ago, and the morons who botched the credit are long gone.

Nov 21, 2018permalink

D Gold

I look forward to it as a long time subscriber to the paper and online.

Nov 13, 2018permalink

DuckShots

If Sunday's insert on CA was any indication of the paper's intentions, sign me up. Fabulous. I hope to see many more. I loved the images, the Mosely intro and the retro material, including the captions and the money paid for the images. JUST GREAT.

Nov 13, 2018permalink

$DiffractionLtd$

DiffractionLtd

Couldn't some kind of production-line platform be created to do jobs like this? Rather than have humans physically handling and positioning, removing and replacing each image? Production lines today can do incredibly intricate tasks, automatically.

Nov 13, 2018permalink

George Zip

I imagine all the images will need to be given descriptions and keyworded. This would add a layer of complexity I would imagine

Nov 13, 2018permalink

davesurrey

I imagine it's the collection of metadata that will be the most important and difficult aspect of this, along with the time taken.
No doubt that's where Google's expertise comes in.

Nov 13, 2018permalink

Old Cameras

Patience. A robot will replace you soon enough.

Nov 13, 2018permalink

$DiffractionLtd$

DiffractionLtd

Robots can replace us all. In fact, one study out of Europe said that it would be cheaper for society to get rid of about 1/3 of the jobs out there and simply pay people to not work. The cost of allowing them to work exceeds their productivity gain.

Nov 13, 2018*permalink

$DiffractionLtd$

DiffractionLtd

This could be a disaster, just like Google Books was.

Nov 13, 2018permalink

aris14

"Staff members across the photo department and on the business side have been exploring possible avenues for digitizing the morgue’s photos for years."...
Staff members..? Staff..? Members..?
If that is not efficiency I don't know what efficiency is...

Nov 13, 2018permalink

bayville126

I'm sure many major publications would love to have this technology and money to do this.

Nov 13, 2018*permalink

aris14

They probably have in their payroll same attitude and quality staff...

Nov 14, 2018permalink

https://gizmodo.com/google-removes-nearly-all-mentions-of-dont-be-evil-from-1826153393

From the volume of images, they have a big task ahead of them. I am no expert on archiving, but it seems to me they could at least use high speed scanners with auto feeding. But maybe they need to handle the photos more carefully than that would allow.

I can see that the prints have various captions and other info on the back which is also important. But what happened to the original negatives? Might the negatives produce better scans? Plus there are surely many more negatives than there are prints.

Could there be overlooked jewels in these negatives? (Neil Leifer's famous photo of Alli was not recognized as special at first.) Or subjects that did not have obvious value at the time the photo was made and not printed. (Thinking of Dirck Halstead having staff search for slides he shot of Clinton with Monica Lewinski after the story broke.)

Nov 13, 2018*permalink

falconeyes

Split the morgue in 8 parts, give 8 workers a part each, and a decent digital system camera with electronic shutter. And photograph both sides of each photo, make an LR catalog for each part, and run the Excire plugin for the AI part. And a year later, the entire morgue is digital.

I don‘t see why this was considered challenging. Or why this wasn’t done years ago. Or why Google was asked for help. I am sure this could have turned into a revenue stream for TNYT.

Nov 13, 2018*permalink

Kafka2000

Why electronic shutter. Just curious.

Nov 13, 2018permalink

Loro Husk

Why Electronic shutter?

Nov 13, 2018permalink

falconeyes

Wear.

Nov 13, 2018permalink

davesurrey

@falconeyes,
Hmmm...so if, as you say, they divide the archive amongst 8 folk, each takes just one minute per pic (very optimistic estimate) and works an 8 hour day for 350 days per year then that's about 4.5 years work. Not your figure of a year.
I guess this will be a very labour intensive project if they do it right.

Nov 13, 2018*permalink

Johnny B

Cause Google must have its fingers in everything.

Nov 13, 2018permalink

IR1234

Why would you photograph photographs when you can scan them? And depending on the fragility of the photograph they can be scanned in batch at some speed. Your time consuming thing is actually annotating what you've scanned.

Nov 13, 2018permalink

John Koch

Old photos tend to attract lots of dust or shred. It could take a minute between scans just to clean off the platen and the photos. Any text, whether written on the back or separate notes, might be in messy cursive script, and require transcription. Notes on old newsprint or onion skin copies might disintegrate unless handled carefully. Were the work compressed to one year and a small staff, expect lots of dirty results and flawed indexing.

Nov 13, 2018permalink

falconeyes

@davesurrey
I assumed a stand with glass plate to digitize the photos, with an optimized workflow. Where e.g., all folders in a box are processed at once, reusing their common context.

I assumed 5s per photo and side (10s per photo) which is generous, actually. If you pay the workers per digitized photo, I assure you they'll be a lot faster ;) You could also photograph both sides at once (using two cameras).

This is photographing prints. Can be done a lot faster than scanning negatives. Dust, scratches etc. are no big concern. Just use a good lighting rig.

There are machines for automated scanning. Using air to separate, rotate and transport sheets. But those are expensive and the most labor intense work will be to get the prints out of their folders, and back in. So, I dismissed the digitizing machines and still got quite feasible figures.

Nov 13, 2018*permalink

PostModernBloke

"Who controls the past controls the future. Who controls the present controls the past." George Orwell.

"Don't do evil" - Google

And yet, the New York Times, with all the evidence of the actual evil Google does, hands over that unique archive.

Damn, this is depressing.

Nov 13, 2018permalink

Johnny B

Just when the NYT profits are up ... can't they invest some of their own resources?

Nov 13, 2018permalink

TillmanB

Google dropped the "Don't be evil" thing last Spring. Game on!

Nov 14, 2018permalink

tangbunna

digitize in RAW 14-bit ? if jpeg it will lose dynamic range from that bw photos

Nov 12, 2018permalink

Constantin V

So where to look at this great photographs? Otherwise why do I care.

Nov 12, 2018permalink

santamonica812

1. I agree with you, re the hope that digitizing will allow mass viewing of this historical treasure.
2. I completely disagree with your argument that if you can NOT view these, one should not care at all. As human beings, I would argue the opposite--that the historical value is incalculable and every reasonable effort should be taken in order to ensure that these images are available 10 years, and 50 years, and 700 years into the future.

Even though you and I will not be around that long, do you not feel *any* sense of responsibility to ensure that our descendants will have a full understanding of what happened in their history?

I find your approach a tiny bit self-centered. YMMV, of course. :-)

Nov 12, 2018permalink

Constantin V

Well, you are right about everything. I'm quite a bit egocentric (see nothing sinister in that) and I'm glad someone is less egocentric (good for me too:)). As for your concern about our descendants... If you were me, you probably would be concerned with other things. For example you wouldn't be able to get into MOMA. There wouldn't be a single album with photos of Henri Cartier Bresson published in your country and in your language. Getting such books wouldn't be as easy as you think. Internet would be your last hope. And yet it is overrun with nothing but reviews, advertisements and so-so photos. Imaging that and you'll probably be still happy that aliens got a massive archive of Earth history.

Nov 14, 2018permalink

Constantin V

@santamonica812 one more thing for a love'n'peace human being. Not everywhere people are thinking the way you used to. I'm just reading a post in social network about 'hidden treasures' of museums in our country. The author is astonished by the fact many items are held closed for safety and forbidden to be shown (even photographed!) to general public. Can't provide you with a link cause you don't understand the language anyway, but in two words this is for economical and bureaucratic reasons. So no. I'm fully disagree with a 'responsible' person as you are. This should be publicly available.

Mar 10, 2019*permalink

cdembrey

Morgue isn't a nickname. News organization have been calling their archive the morgue since forever.

Nov 12, 2018permalink

bodensee

As someone who works in a photo archive I was staggered at the kind of storage used, the way the photos were handled and the casual way they were put on a scanner. They need educating in how a real archivist would do it. The potential is enormous but the methodology is abysmal.

Nov 12, 2018permalink

Johnny B

Sad to hear.

Nov 13, 2018permalink

davesurrey

@bodensee,
I fully agree but sadly this is not unusual. For example the numereous TV programmes that have been lost forever as the TV companies preferred to erase the tapes for reuse. Very short sighted.

Nov 13, 2018permalink

Mssimo

Question: If a bear crapped in the woods and the one person that had the picture made it disappear. Did the bear crap in the woods? Question goes beyond this one article.

Nov 12, 2018permalink

Tourlou

Google HELP, never heard about that application!!! Lol!

Nov 12, 2018permalink

(unknown member)

Google is digitising the whole world, and they look what you are interested in, so they can sell it to advertisers, and become even richer and more profitable. Gone is privacy and your privacy is gone for decades, certainly if you use Android based systems. That is the way things go now. Governements should regulate these things, but they are bought by Google too ...

Nov 12, 2018permalink

I think the last thing needed is for government(s) to stick their noses and greasy fingers into the greater internet. If you want privacy, use a private system or learn how to hide your actions on the net. If you want to use a public system, then don't reasonably expect "privacy". I think that's the bottom line :)

Nov 12, 2018permalink

@Teila
You cannot expect everyone to be able to know or do this things. But everyone deserves privacy. Especially because soon there will be no alternative to the internet. We cannot leave the people alone against google, Facebook etc. Otherwise they will turn them into zombies of google interests

Nov 12, 2018permalink

Impulses

Better move to a zombie proof cabin in the woods, don't forget the tin foil tho.

Nov 12, 2018permalink

Google and Facebook decide what the vast majority of people can See. They can tweak their algorythms however they want. And people are Not fully aware of the fact that they are viewing what they want them to see.

Nov 12, 2018permalink

Government does regulate these things. The way they do it is by enforcing regulations that make Google capture and send your data (to the gov)!

Nov 12, 2018permalink

Will, I do not hold Google or Facebook responsible for the sheer number of idiot information consumers roving the planet today. Fake news is a personal problem. I don't have that problem... why do others? Oh... wait, because I don't read tripe is why.

Nov 12, 2018permalink

@teila: no, fakenews is NOT a Personal problem. They exist because the advertisment- and informationgathering System in the Internet reward getting attention for exaggerated and false Information. It leads to extrem Polarisation and masshysteria in the Long run. And that will affect you too. No matter which are your Internet habbits.

Nov 12, 2018permalink

MyReality

@Impulses - The Unibomber did that, but the government still found him.

Nov 12, 2018permalink

@Teila, fake news is a problem for everyone. Look who got themselves elected president of the usa! (by leveraging fake news to rally voters).

Also, don't kid yourself that you have any privacy. If the bits and bytes can go back and forth between you and some server, it's possible to track you down. Tor is not secure. Unless by "private system" you mean a computer that has zero connections to the world, as opposed to one that communicates over public internet infrastructure.

Nov 12, 2018permalink

desertsp... I've been online since the days when we were using 150 baud modems; I well understand the realities of privacy online, which is why I'm an advocate of people being responsible-for-self when it comes to their dealings online.

Fake news doesn't have a bearing what-so-ever on how I view a political candidate, the fact that so many people are affected by junk on Facebook isn't a problem with Facebook, rather a testament to how brain-dead the average American of voting age really is...

Nov 13, 2018permalink

Totally agree with you. All I’m saying is that it does impact you via the actions of others. That’s why we should expect Facebook and other platforms to do what they can to facilitate “real information” exchanges, and minimize fake news. Like it or not these platforms have massive influence on how people think and behave.

If it’s easy to identify something as fake and potentially “viral”, then they could easily put a damper on it. Don’t put the story about “Hillary’s secret hitlist !!!” on every republican’s newsfeed, for instance.

Nov 13, 2018permalink

I think we're breeding a too-coddled society when we expect business to do an individual's job. Facebook's job is to make money however they legally can. My job is to take care of me... that's not something I need Facebook or Google to help me do. If fake news sells content then so be it. I don't have a problem with it because I don't consume it.... just like I don't read Nat'l Enquirer at the check out line, because I don't care what Kim K is going to name her next spawn, or whether or not Trump is an alien... ;)

The responsibility to filter out non-news remains with me and I will never expect a company that makes money from gullibility to put the brakes on their cash cow. Will be interesting to see how it all shakes out though.

Nov 14, 2018*permalink

Of course a company isn't going to shoot their cash cow! But if the cow is rampaging and causing widespread destruction, then maybe it's the role of society to stop it.

Nov 14, 2018permalink

I think it comes down to culpability. Facebook is just a business that is predicated on one’s personal election to socially engage with others, unlike a business that provides water, power, or air travel. My position is that Facebook et al, are no more responsible for fake news, than fast food restaurants are responsible for the gross number of Americans being overweight.

Ultimately, if we are to actually help and further society, we need to hold people responsible for their own actions. Even kids learn to check sources and to consider a periodical’s strengths before using it as a source-of-information... but we want to hold adults’ hands and hold Facebook responsible for fake news?

That’s like having members of Congress waste tax dollars, holding inquiry on whether or not MySpace, YouTube, or snap chat needs to police themselves for misleading information. The cow isn’t the problem, rather the sheer number of people too eager to drink from an obviously dirty utter ;)

Dec 1, 2018permalink

MyReality

@Teila Day - Agree, but you are putting forth a somewhat revolutionary idea in todays society: That there is such a thing as personal responsibility. It does not matter if it is blaming your equipment for your bad photos, believing what you read point-blank or crossing the desert to storm a border crossing. Every person is responsible for the decisions they make. Blaming the system is an easy way out.

Dec 2, 2018*permalink

AksCT

This is an amazing archive and quite surprised how carelessly that guy handles the prints

Nov 12, 2018permalink

HowaboutRAW

How does Google do that?

There are numerous ways of disguising yourself when searching Google.

Google has given up reading Gmails.

Nov 12, 2018permalink

T Olivier

@HowaboutRAW - "Google has given up reading Gmails". If you meant "Google has given up reading content of Gmail mailboxes", you are wrong.
Google reads everybody's emails all the time. Google reads Gmail emails to build search index database. Each time you use the search function within Gmail mailbox, you are not asking the Google search engine to perform "live" search of your mailbox. The search you perform is a search of the database that is built and updated by an "always on" search. It is the same search engine, with important modifications, as the Google's web search engine.
Google not only searches your Gmail mailbox. Google also searches the **content** of the documents you store on Google Drive and in Google Documents. All these searches are performed all the time. The need for efficiency and speed of search results the users perform is the reason why Google searches through your stuff all the time. This also affects the companies who use Google commercial services.

Nov 12, 2018permalink

HowaboutRAW

T Oliver:

I've read, not confirmed, that Google gave up reading Gmails in 2016.

I didn't make any claims about Google Drive.

So I believe in fact you are wrong about the Gmail reading.

Nov 13, 2018permalink

Frank_BR

No technical information about digitizing equipment, format (RAW, JPEG), Mpixels, etc.?

Nov 12, 2018permalink

JEROME NOLAS

There are too many pics on this planet.

Nov 12, 2018permalink

Recipe for digital amnesia:
Step 1: digitize analog stuff
Step 2: get rid of the originals
Step 3: change storage medium and loose everything
Step 4: welcome to postdigital darkage

Nov 12, 2018permalink

panther fan

Steps for analogue amnesia
Step 1: Put everything in a warehouse
Step 2: People cannot access or even forget it is there
Step 3: After years when people really need it everything is rotten and gone

Digitalisation is important, especially for searchability and long term degradation prevention. I am pretty sure archieves like this will be backed up in one of the national US bunker archives, stored on microflm and magnetic tapes preserved for millenia

Nov 12, 2018*permalink

Impulses

Or, you know, keep multiple redundant copies in multiple mediums and 4 becomes a non-issue. I have no need to, but I could still access everything I ever stored to a floppy, Zip disk, or CD if I ever really needed to... But anything of value is on multiple hard drives and sync's to some online account, so no one single failure results in a loss, that's how you backup things right.

Nov 12, 2018permalink

tkbslc

1. Digitize photos
2. Maintain backups
3. Move to new, cheaper storage medium every 3-5 years.
4. Repeat forever
5. maintain perfect digital copy forever.

Nov 12, 2018permalink

jnd

With proper storing techniques digital data will last practically forever and don't suffer any degradation. Even more, they can be easily analyzed, searched, used. If you don't store the originals properly soon you won't have anything to look at. Ideally keep both but over time the physical stuff will degrade no matter how well it was made. Even not so old movies require extensive restiration work to hide all the dust and color fading that has been going on while sitting in the archives.

Nov 13, 2018permalink