… on the Delusions of Big Data … Interview from IEEE Spectrum

Machine-Learning Maestro Michael Jordan on the Delusions of Big Data and Other Huge Engineering Efforts – IEEE Spectrum.

I agree 100% with the following discussion of big data learning methods, which is excerpted from an interview. Big Data is still in the ascending phase of the hype cycle, and its abilities are being way over-promised. In addition, there is a great shortage of expertise. Even people who take my course on the subject are only learning “enough to be dangerous.” It will take them months more of applied work to begin to develop reasonable instincts, and appropriate skepticism.

As we are now realizing, standard econometrics/regression analysis has many of the same problems, such as publication biases and excess re-use of data. And one can argue that it’s effects e.g. in health care have also been overblown to the point of being dangerous. (In particular, the randomized controlled trials approach to evaluating pharmaceuticals is much too optimistic about evaluating side effects. I’ve posted messages about this before.) The important difference is that now the popular press has adopted Big Data as its miracle du jour.

One result is excess credulity. On the NPR Marketplace program recently, they had a breathless story about The Weather Channel, and its ability to forecast amazing things using big data. The specific example was that certain weather conditions in Miami in January predict raspberry sales. What nonsense. How many Januaries of raspberry sales can they be basing that relationship on? 3? 10?

Why Big Data Could Be a Big Fail [this is the headline that the interviewee objected to – see below]

Spectrum: If we could turn now to the subject of big data, a theme that runs through your remarks is that there is a certain fool’s gold element to our current obsession with it. For example, you’ve predicted that society is about to experience an epidemic of false positives coming out of big-data projects.

Michael Jordan: When you have large amounts of data, your appetite for hypotheses tends to get even larger. And if it’s growing faster than the statistical strength of the data, then many of your inferences are likely to be false. They are likely to be white noise.

Spectrum: How so?

Continue reading

The Ph.D. Student’s Ticking Clock

The Ph.D. Student’s Ticking Clock – Graduate Students – The Chronicle of Higher Education.

Many of my former students come back years later and ask my advice about getting a PhD. I generally tell them that a PhD program is like a monastery – you have to love the pursuit of knowledge, for its own sake,  to make it bearable. If you are doing it only in pursuit of a post-graduation goal, it is too hard a life.

Voila_Capture 2014-10-21_04-35-12_PM This article includes a startling graph on time-to-graduation. I graduated from MIT in 1982, after 4 years. According to the graph, the average time in social sciences then was 8 years?!  I had a lot of breaks (NSF Fellowship, stipend from one of my thesis advisors, pregnant wife to provide emotional support and incentive!) but 4 to 5 years seemed like the norm in my program.

In any case, the second half of the article has some realistic advice about the stresses of protracted graduate programs, and about the importance of your particular advisor’s style.

Joe Stiglitz disses TPP treaty: it’s for corporations, not people

Joe Stiglitz critique of TPP Trans-Pacific Partnership treaty:

 Corporations on both sides of the Pacific have an interest at lowering regulatory standards—to protect the environment, to protect consumers, to protect workers, to protect health. But ordinary citizens, our society, will suffer. So you can get corporations on both sides pushing an agenda that will be increasing corporate profits at the cost of the well-being of people on both sides of the Pacific.

…Philip Morris is suing Uruguay under an investment agreement. It says, “This interferes with our basic right to sell products to kill people.” It’s like the Opium War 150 years ago, where the West went to war because China said, “We don’t want opium,” and we said, “That interferes with the basic right to trade.”

Web Special: Joseph Stiglitz on TPP, Cracking Down on Corporate Tax Dodgers & New BRICS Bank

More analysis of corporate capture of the TPP treaty,



What happens if California’s solar panels start to fail over the next few years?

Solar Industry Anxious Over Defective Panels – NYTimes.com.I had not solar panel quality was becoming such an acute issue “so soon.” Judging by this NYT article, many Chinese-branded PV panels are not reliable.  This article sounds straight out of the book that Barry Naughton pointed me to, Poorly Made in China. The performance degradation  data on well-made panels is pretty encouraging: 0.5% per year is typical, but the key is well made. There are many manufacturing shortcuts and quality problems that will lead to failure of electrical connections after a few thousand temperature cycles, for example. (Think night/day in Colorado!)

testing solar panels

Power inverters, which are straight power semiconductor products, apparently may also be unreliable. https://www.greentechmedia.com/articles/read/3-Reasons-Why-Chinese-Solar-Inverters-Cost-Half-of-American-Inverters

It will be interesting to see what this problem with Chinese panels leads to in trade/market share. California and other states that subsidize PV should only pay for systems that pass good certification – for both performance and safety. For obvious reasons, testing long-lifetime behavior of electronics is very tricky.I wonder if we will   see a repeat of the “solar water heating” fiasco of the 1980s, when lots of houses put pool heaters on their roof that started to leak and ended up getting ripped out. When the economics of a project are based on a 20 year life, and it only lasts 5 years, that is a colossal fail. If it catches on fire, as described in the NYT article, that is another situation entirely!  What is the typical guarantee for homeowners in California?

Technology’s Real Benefits- NOT so much in cancer research

The first example is cancer research. … The genomic approach helps establish the right treatments today, and will likely lead to new and better drugs in the next few years. ….” this is something that will be useful 200 years from now. This is a landmark that will stand the test of time.”

via Technology’s Real Benefits (Hint: They’re Not Economic).

Sorry, Andy, we have been getting hype about contributions of computers to biotech, and biotech to cancer, for 20+ years.  It’s past time to be highly skeptical that medical breakthroughs are “around the corner… just give us another $X billion for research…” Although the research results have been fascinating, the practical impacts have been modest. I think one reason is that the Big Pharma/Big Academia model of R&D is  inefficient and ineffective. Everyone hoards their data, and pursues their own stove pipe. There’s little collaboration or interchange among computer modelers, in-vitro, animal models, epidemiologists, etc. This is not something that better technology can solve – it’s a problem with business incentives and the academic promotion system.

Case in point: According to a friend, there have been no Randomized Clinical Trials on the relationship between crystalline salt and kidney disease. Everyone assumes there is a relationship, but what is the exact causal link? What’s the magnitude? What are the mediators of the effect (e.g. different diets, different climates). And what effects do intervention at different points (diet versus medications) have?  This is not cancer research, but same principles hold.

Other benefits of technology: sure. Cultural and scientific and business. Mapping Inca ruins: awesome. Effect of Facebook on daily lives: large,and not captured in GDP statistics. So your basic thesis is good; just don’t use medical promises as cases in point!

Invented by a data scientist: the first anti-scam – AnalyticBridge

Invented by a data scientist: the first anti-scam – AnalyticBridge.

An interesting concept: create a lottery which is really a disguised form of savings. That’s not quite what this proposal does, but it could be modified very easily.

From what I read, accumulating savings is a big problem for many poor people. Some nonetheless play the lottery. Create a lottery-squared, which takes in tickets from participants, accumulates most of it in an account for the payer, and puts a fraction in a true lottery. Then the ticket-buyer can “win” a small amount according to some rule. The rule  may be hidden from them, as in the original proposal, or could be partially under their control.

Lots of legal problems with this, to say the least. The middleman is acting like a bank, with all the issues that brings. The record-keeping and security could be a problem. And so forth. This is more of a problem in some countries than others.

By the way, this is similar to what “Christmas club accounts” in banks did in the 1950s, apparently. Customers would put $5 into the account each week, and get it all back in December.

Classic definition of recession no longer useful

We now need 2 separate concepts of recession. Time for employment to return to pre-recession

“Total GDP recession” is the conventional one. “Actual people recession” is the one that matters to almost everyone. The rationale is that all of the “growth” after the 2001 recession, and this one as well, is in the upper few percent of the income distribution, or in corporate profits. Median income, for example, has been approximately stagnant or declining for a decade. Much more important to “the 99%” is what happens to their share of national income – which is much slower to recover. (If you adjust for increasing health care costs, then median income is doing even worse – but that’s another story.) 

(This interpretation is my own. I got the graph link from Andy McAfee