Tuesday, July 22, 2014

Robots could do better than some journalists

When the Associated Press announced plans to use computers to write corporate earnings stories, a number of journalists asked me if I was as horrified by the prospect as they were.  In fact, I think robots could do better than some reporters.
With all respect and affection for my fellow journalists, I have concluded that a well-programmed set of algorithms can be far more analytic and precise than the sorts of harried, math-averse humans who are widely employed to write about complex business matters. Here is a case in point:
Poynter.Org led a story today about Gannett’s second-quarter earnings with a headline saying “Circulation Revenue Rises at Gannett Local Papers.” The problem with the headline is that the press release and the accompanying financial tables provided by the company showed unambiguously that circulation revenues actually FELL in both the first and second quarters of the year.
For the record, as reported in the press release, circulation revenue at Gannett’s newspapers for the first half of the year was down 1%, ad sales revenue was down 5.3% and “all other publishing” revenue was down 2.4%. 
So, how did the Poynter reporter get it wrong?  Because, in his haste to crank out a story, the author evidently relied on the bafflegab in Gannett’s press release, instead of looking at the several pages of detailed financial tables appended to it. In fairness the writer, who was alerted to this issue but so far has not amended his article, what human wouldn’t be confused by the following statement from the company:
“Circulation revenues were $277.9 million, down just 0.6 percent from $279.7 million in the second quarter in 2013. An increase in circulation revenue at Newsquest [GCI’s division in the United Kingdom] was offset by circulation revenue declines at domestic publishing operations. At local domestic publishing sites, home delivery circulation revenue was up in the quarter due, in part, to strategic pricing actions associated with enhanced content.”
You can’t blame Gannett for trying to put the best face on the umpteenth weak quarter in a row for its publishing operation. And you can sort of see, sort of, how a time-constrained journalist fell into the PR trap by seizing his lede from a fragment of the third sentence in the sixteenth paragraph of the press release. But a well-programmed computer could have done better.  
A half-decent, natural-language engine could have assimilated, organized and analyzed the facts and figures provided by the company in far less time that an ordinary human could read, much less unpack the meaning of, the document.
The robot would organize the data into normalized tables for instant publication and then drop the key information into templates designed to produce concise and understandable narratives. Knowing in advance the Wall Street consensus on a company's upcoming earnings, the robot could determine whether the company beat or failed to meet investor expectations. Templates would be pre-programmed with dictionaries that would know a drop in revenue from 2013 to 2014 was, depending on the degree of decline, a dip, a slip, a tumble or a plunge.
Because the variables in the realms of financial and sporting news are largely standardized and predictable, robots for the most part can be quicker and more accurate than humans, freeing time for journalists to dig deeper and more analytically into stories.  While it is highly unlikely that a robot would have caught the massive Enron accounting fraud, it actually took a long time before one smart human, Bethany McLean, figured it out, too.
So, bring on robo-journalism.  Like chicken soup, it couldnt hurt.  And it probably will help


Blogger Ddddddffre said...

A half-decent, natural-language engine could have assimilated, organized and analyzed the facts and figures provided by the company in far less time that an ordinary human could read, much less unpack the meaning of, the document.

I worked with AI , neural nets, expert systems, nlp, for some time a while ago and AFAIK, the above statement is simply untrue.

There are companies who hype something like this, but their demos actually boil down to fortunate cases which their software algos happen to perform well in. There are also very narrow verticals where something vaguely resembling an arbitrary paragraph's semantic content can be approximated, sometimes, but not anything like what is claimed in the above. Nothing is even on the horizon that can perform even as well as a cub repoter.

AI products, as distinguished from research, is where stupid money goes to die. The worst offender is IBM, who has taken non value producing buiness activity to new heights. When they're not initiating software patent lawsuits, overcharging for failed projects, then suing the hapless cities who cannot Google and therefore hired them for said projects is busying itself churning out dog and pony shows of AI driven sleight of hand, such as their Jeopardy player and before that, Deep Blue.

Yes they both perform, deep blue by massive lookahead with a dash of chess strategy - let's see how it performs with human level lookahead- and the Jeopardy player because of the unvarying and unusually clueful rules and norms of Jeopardy. The kind of thing to be returned, that is the type of the correct answer is stated ' cats for 1000 please Alex' , the stereotyped phrasing of the quetions - 'this x was recommended as a main course for her subjects by Marie Antoinette' lets a 1960's level parser assign correct SVO to the sentences and the rest is word association with zero understanding. Just Google the key words in the clues sometime and see how often the right answer lands on your lap.

I enjoy your blog and especially the technophilic take you have. In this instance , you're ahead of where we really are. Way. For instance, I know the C.I.A. has been keen for automatic newspaper reading with comprehension for some time, but I am pretty sure they still don't have it. It's all PageRanky statistical word co-occurrences and the like. Even the extraordinary achievements in translation are not semantically based, but just more of the same. That dog can do some tricks for sure, but it's never, yet, been shown to be much good at disovering the meaning of even simple paragraphs describing completely ordinary events.

Anyway, I do look forward to your posts in my inbox.

8:18 PM  
Blogger policywonk said...

Yeah, but a robo-writer wouldn't know to look in the accounting firm's notes in the 10K or 10Q for hints that there's something the company might not want to discuss, then ask the appropriate follow-up questions of the right source -- something that I've done on more than one occasion, to the consternation of a company's media relations department! Bring on the robo-writers: they're not gonna beat me to a scoop! I NEVER rely on the press release -- I always look at the 10K or 10Q instead and read the fine print. ;D

But what really concerns me is that reliance on robo-writers WILL miss just that kind of buried information, and then no one else will think to follow up. In fact, that's exactly what you can expect will happen, all in the name of so-called efficiency. Efficiency my ass. The important part about writing a quarterly earnings story is looking for what's buried or spun in the prose -- so no, I suppose I'm not willing to hand that over to a robo-writer ... but I would be willing to hand that to an assistant business editor who then taught all the newbie business writers how to spot stuff like that and then go digging for a different story.

This is was used to happen on the job, back in the day: seasoned reporters would pass on their own investigative techniques to younger greenhorns, starting with the supposedly routine stories. You can bury a lot in a 10K or 10Q or annual report if you really try, but a good reporter can still find it if he or she knows where to look. And they should be looking, every single time.

12:15 PM  

Post a Comment

<< Home