How useful is data mining without human judgment?

A recent exchange on Mathbabe’s blog about the meaning of Big Data led me to some insights about where decisions need human judgment and analysis, and where we can turn decisions over to automated data mining. For example, serving up “you might also like X” in a web store will work a lot better than estimating how many people have flu. Why?

Here’s what I wrote. (Not clear if her WordPress interface picked it up.)

Cathy, big data in your sense does not work widely. If you say that “no human judgment is needed,” this is approximately equivalent to “the relationships do not need to be supported by causal theory, just by raw correlation.” This works great in certain domains. But the underlying correlations have to be changing relatively slowly, compared to the amount of data that is available. With enough data for “this month,” an empirical relationship which holds for multiple months can be data mined  (discovered) and used to make decisions, without human judgment.

But many of the world’s important problems don’t have that much stability. For example trying to use searches to track the spread of an annual flu, at the state-by-state level, won’t be very reliable without human judgement. The correlation between search terms and flu incidence in 2012 is not likely to be the same in 2013. One reason is that news cycles very from year to year, so in some years people are more frightened of the flu than other years, and do more searches. Consider the following experiment: use the “big data relationships” from 2010, to track the incidence of flu in 2014. It won’t work very well, will it?
On the other hand, if you could get accurate weekly data about flu incidence, the same methods might work much better. Using the correlations between search terms and flu in November might give reasonably accurate estimates in December.

Automated systems based on data mining are a form of closed-loop decision systems. (Closed loop basically means “no human in the loop.”) Closed-loop feedback works great under certain conditions, and very poorly under others. A key difference is whether the system designer has sufficient (accurate) knowledge about the system’s true behavior.

Once again “it all comes back to knowledge.”

NOT FLYING BY THE BOOK: SLOW ADOPTION OF CHECKLISTS AND PROCEDURES IN WW2 AVIATION.

This is the “entry page” for my paper on the slow adoption of better flying methods in WW 2. Please link to this page, rather than to the actual PDF, which I will be updating.  Here is the paper itself. (July 19 version)

In the late 1930s, US military aviators in the American Army and Navy began using aviation checklists. Checklist became part of a new paradigm for how to fly, which I call Standard Procedure Flying, colloquially known as “flying by the book.” It consisted of elaborate standardized procedures for many activities, checklists to ensure they key steps had been done, and quantitative tables and formulas that specified the best settings, under different conditions, for speed, engine RPM, gasoline/air mixture, engine cooling, and many other parameters. This new paradigm had a major influence on reducing aviation accidents and increasing military effectiveness during World War II, particularly because of the rapidly increasing complexity of military aircraft, and the huge number of new pilots. Continue reading

Changing flying from a craft to a science: what went right, and what went wrong, in World War II

I have just finished  a working paper called  NOT FLYING BY THE BOOK: SLOW ADOPTION OF CHECKLISTS AND PROCEDURES IN WW2 AVIATION. It tells how, in 1937 shortly before World War 2,  the American air forces invented a much better way to train new pilots, and to fly complex aircraft and missions. What they invented is now used all over the world, by all licensed pilots and military aviators. But during the war, even American pilots resisted switching to the new way of flying. The only full-speed adopters were the strategic bombing forces attacking Germany and Japan. The US Navy, despite being one of the 1937 inventors, did not fully make the switch until after 1960!

Precise flying was a matter of life or death.

Precise flying was a matter of life or death.

POMS talk: Aviation 1940 = Medicine 2005

B-17 Throttles

B-17 Throttles (Photo credit: rkbentley)

On Sunday I gave a capstone talk at the Production & Operations Society meeting in Denver.  I oriented my talk toward a comparison of health care now, with aviation’s transition to Standard Procedure Flying in the 1940s and 50s. BOHN POMS Standard procedure flying 2013e

As in medicine now, experienced expert flyers who did not use standard procedures were still better than newly trained pilots who did. And there was resistance to the changes. But aviation had a couple of advantages in making the transition: New pilots who did not learn SPF died quickly, usually in accidents. And the old experts got rotated out of combat positions (United States Army Air Force), or eventually got shot down no matter how good they were. (Germany)

Continue reading

Technology’s Real Benefits- NOT so much in cancer research

The first example is cancer research. … The genomic approach helps establish the right treatments today, and will likely lead to new and better drugs in the next few years. ….” this is something that will be useful 200 years from now. This is a landmark that will stand the test of time.”

via Technology’s Real Benefits (Hint: They’re Not Economic).

Sorry, Andy, we have been getting hype about contributions of computers to biotech, and biotech to cancer, for 20+ years.  It’s past time to be highly skeptical that medical breakthroughs are “around the corner… just give us another $X billion for research…” Although the research results have been fascinating, the practical impacts have been modest. I think one reason is that the Big Pharma/Big Academia model of R&D is  inefficient and ineffective. Everyone hoards their data, and pursues their own stove pipe. There’s little collaboration or interchange among computer modelers, in-vitro, animal models, epidemiologists, etc. This is not something that better technology can solve – it’s a problem with business incentives and the academic promotion system.

Case in point: According to a friend, there have been no Randomized Clinical Trials on the relationship between crystalline salt and kidney disease. Everyone assumes there is a relationship, but what is the exact causal link? What’s the magnitude? What are the mediators of the effect (e.g. different diets, different climates). And what effects do intervention at different points (diet versus medications) have?  This is not cancer research, but same principles hold.

Other benefits of technology: sure. Cultural and scientific and business. Mapping Inca ruins: awesome. Effect of Facebook on daily lives: large,and not captured in GDP statistics. So your basic thesis is good; just don’t use medical promises as cases in point!

Are You Really Drowning in Data?

This blog challenges the “drowning in big data” cliche. He explains that most organizations don’t have useful access to most of their raw data – it sits somewhere in the IT department, but it’s not accessible, it has quality problems, and so forth.

But I think that is precisely where the “drowning” comes in. The psychological weight of all that unused data presses down and causes a sensation of “drowning.” The part of the data that is actually indexed, described, readily accessible and so forth is the data that we surf instead of drown under.

This applies on a personal level as well…. I drown under the weight of my “to read” pile; I surf the few things I actually sit and study.

Are You Really Drowning in Data? Challenging the Big Data Assumption – FICO Labs Blog.

A publisher who only supports Internet Explorer!

I encouraged my TOM students to check out the Investext database for their term projects. Imagine my surprise to learn that Thomson, the publisher, thinks that the world is Windows only. Here is a note from the UCSD librarian.

Hi Roger. On the new Thomson One interface/platform, Investext really only works with IE.
Other browsers may now load at all, or have functionality/displays that are hobbled.

P.S. Here is what MIT library says: “Microsoft Internet Explorer is required. Thomson One will not work with other browsers such as Firefox, Chrome, and Safari.”

We have to “love” the academic publishing industry. Still trying to saddle everything they provide with DRM. UC San Diego and the whole UC system are  moving toward an open publishing approach – I expect most of my colleagues will adopt it, although only slowly.