… on the Delusions of Big Data … Interview from IEEE Spectrum

Machine-Learning Maestro Michael Jordan on the Delusions of Big Data and Other Huge Engineering Efforts – IEEE Spectrum.

I agree 100% with the following discussion of big data learning methods, which is excerpted from an interview. Big Data is still in the ascending phase of the hype cycle, and its abilities are being way over-promised. In addition, there is a great shortage of expertise. Even people who take my course on the subject are only learning “enough to be dangerous.” It will take them months more of applied work to begin to develop reasonable instincts, and appropriate skepticism.

As we are now realizing, standard econometrics/regression analysis has many of the same problems, such as publication biases and excess re-use of data. And one can argue that it’s effects e.g. in health care have also been overblown to the point of being dangerous. (In particular, the randomized controlled trials approach to evaluating pharmaceuticals is much too optimistic about evaluating side effects. I’ve posted messages about this before.) The important difference is that now the popular press has adopted Big Data as its miracle du jour.

One result is excess credulity. On the NPR Marketplace program recently, they had a breathless story about The Weather Channel, and its ability to forecast amazing things using big data. The specific example was that certain weather conditions in Miami in January predict raspberry sales. What nonsense. How many Januaries of raspberry sales can they be basing that relationship on? 3? 10?

Why Big Data Could Be a Big Fail [this is the headline that the interviewee objected to - see below]

Spectrum: If we could turn now to the subject of big data, a theme that runs through your remarks is that there is a certain fool’s gold element to our current obsession with it. For example, you’ve predicted that society is about to experience an epidemic of false positives coming out of big-data projects.

Michael Jordan: When you have large amounts of data, your appetite for hypotheses tends to get even larger. And if it’s growing faster than the statistical strength of the data, then many of your inferences are likely to be false. They are likely to be white noise.

Spectrum: How so?

Continue reading

The Ph.D. Student’s Ticking Clock

The Ph.D. Student’s Ticking Clock – Graduate Students – The Chronicle of Higher Education.

Many of my former students come back years later and ask my advice about getting a PhD. I generally tell them that a PhD program is like a monastery – you have to love the pursuit of knowledge, for its own sake,  to make it bearable. If you are doing it only in pursuit of a post-graduation goal, it is too hard a life.

Voila_Capture 2014-10-21_04-35-12_PM This article includes a startling graph on time-to-graduation. I graduated from MIT in 1982, after 4 years. According to the graph, the average time in social sciences then was 8 years?!  I had a lot of breaks (NSF Fellowship, stipend from one of my thesis advisors, pregnant wife to provide emotional support and incentive!) but 4 to 5 years seemed like the norm in my program.

In any case, the second half of the article has some realistic advice about the stresses of protracted graduate programs, and about the importance of your particular advisor’s style.

TOM IRGN438 course information

This post is for students who want to take my course, Technology and Operations Management, IRGN438, but have not been able to register. Here is the syllabus. Take a careful look, and realize that it involves a considerable amount of work. If you want permission to take the course, please send me an email with: Continue reading

Ted for IRPS faculty: setting up your first Ted class

This page is for my colleagues who are using Ted for the first time in Fall 2014. The main difficulty you will encounter is that it is too flexible. There are many ways to do just about every operation. But they often look different to students. One result is that students don’t know where to find material. Another result (again, speaking from my experience last year) is that you will design a Ted setup that you want to change after a few weeks, potentially adding further confusion for students. This is just for IRPS faculty although other faculty new to Blackboard are welcome to look. Everyone else should ignore it.

Continue reading

Kindle books and academic research = needless pain

I’ve probably purchased 300 books in the last year for research purposes, not to mention all the fiction my wife gets (and so do I, if it costs $3 or less).  For the newer ones , buying them as eBooks is generally an option. But the state of software, DRM, and copy protection for Kindle books is a mess. Kindle’s software (like iBooks) is deliberately crippled – no copying into another document, no printing, and especially no way to copy diagrams. I’m running Kindle’s software on my Mac and on an iPad, rather than using a Kindle tablet, but that barely helps.

Librarians against DRM

Continue reading

Joe Stiglitz disses TPP treaty: it’s for corporations, not people

Joe Stiglitz critique of TPP Trans-Pacific Partnership treaty:

 Corporations on both sides of the Pacific have an interest at lowering regulatory standards—to protect the environment, to protect consumers, to protect workers, to protect health. But ordinary citizens, our society, will suffer. So you can get corporations on both sides pushing an agenda that will be increasing corporate profits at the cost of the well-being of people on both sides of the Pacific.

…Philip Morris is suing Uruguay under an investment agreement. It says, “This interferes with our basic right to sell products to kill people.” It’s like the Opium War 150 years ago, where the West went to war because China said, “We don’t want opium,” and we said, “That interferes with the basic right to trade.”

Web Special: Joseph Stiglitz on TPP, Cracking Down on Corporate Tax Dodgers & New BRICS Bank

More analysis of corporate capture of the TPP treaty,

 

 

Comparing OCR program compression: PDFpen, Acrobat, and Abbyy Finereader

I have been doing a lot of OCR, as I study more than 100 old aircraft manuals to see how aviation procedures evolved. I have them all in a database, and it’s useful to search the DB for key terms like V1 and density altitude. In the end, no single OCR program did everything, and I have ended up with 3. (OCR = Optical Character Recognition = takes scanned documents and makes them searchable, copyable, etc.) Here are some notes on my experience, with the goal of saving time for others in the future.  Continue reading