… on the Delusions of Big Data … Interview from IEEE Spectrum

Machine-Learning Maestro Michael Jordan on the Delusions of Big Data and Other Huge Engineering Efforts – IEEE Spectrum.

I agree 100% with the following discussion of big data learning methods, which is excerpted from an interview. Big Data is still in the ascending phase of the hype cycle, and its abilities are being way over-promised. In addition, there is a great shortage of expertise. Even people who take my course on the subject are only learning “enough to be dangerous.” It will take them months more of applied work to begin to develop reasonable instincts, and appropriate skepticism.

As we are now realizing, standard econometrics/regression analysis has many of the same problems, such as publication biases and excess re-use of data. And one can argue that it’s effects e.g. in health care have also been overblown to the point of being dangerous. (In particular, the randomized controlled trials approach to evaluating pharmaceuticals is much too optimistic about evaluating side effects. I’ve posted messages about this before.) The important difference is that now the popular press has adopted Big Data as its miracle du jour.

One result is excess credulity. On the NPR Marketplace program recently, they had a breathless story about The Weather Channel, and its ability to forecast amazing things using big data. The specific example was that certain weather conditions in Miami in January predict raspberry sales. What nonsense. How many Januaries of raspberry sales can they be basing that relationship on? 3? 10?

Why Big Data Could Be a Big Fail [this is the headline that the interviewee objected to – see below]

Spectrum: If we could turn now to the subject of big data, a theme that runs through your remarks is that there is a certain fool’s gold element to our current obsession with it. For example, you’ve predicted that society is about to experience an epidemic of false positives coming out of big-data projects.

Michael Jordan: When you have large amounts of data, your appetite for hypotheses tends to get even larger. And if it’s growing faster than the statistical strength of the data, then many of your inferences are likely to be false. They are likely to be white noise.

Spectrum: How so?

Continue reading

Ted for IRPS faculty: setting up your first Ted class

This page is for my colleagues who are using Ted for the first time in Fall 2014. The main difficulty you will encounter is that it is too flexible. There are many ways to do just about every operation. But they often look different to students. One result is that students don’t know where to find material. Another result (again, speaking from my experience last year) is that you will design a Ted setup that you want to change after a few weeks, potentially adding further confusion for students. This is just for IRPS faculty although other faculty new to Blackboard are welcome to look. Everyone else should ignore it.

Continue reading

Kindle books and academic research = needless pain

I’ve probably purchased 300 books in the last year for research purposes, not to mention all the fiction my wife gets (and so do I, if it costs $3 or less).  For the newer ones , buying them as eBooks is generally an option. But the state of software, DRM, and copy protection for Kindle books is a mess. Kindle’s software (like iBooks) is deliberately crippled – no copying into another document, no printing, and especially no way to copy diagrams. I’m running Kindle’s software on my Mac and on an iPad, rather than using a Kindle tablet, but that barely helps.

Librarians against DRM

Continue reading

Google contact lens: it won’t work.

Time to debunk another widely covered press story about wonderful new inventions coming from a tech giantArs Technica had one of many articles about Google’s “announcement” of a blood glucose sensor in a contact lens. The discussion after the article is good, as often happens with Ars. Here’s my quick explanation of why the concept will fail. Unfortunately.

Non-invasive glucose testing is the perennial “pot of gold at the end of the rainbow.” Google is not the first to try using tears; the others have failed, and they will too. They say it is “5 years away,” which is equivalent to saying “We have not yet tested it on real diabetics.” 

The problem is basically that tears won’t track blood glucose levels closely. Tears are secreted by the lacrimal gland. I’ve never studied it, but the composition of its secretion is sure to depend on a multitude of variables. (Think: sweat, saliva, etc.) Even if a relationship exists and can be quantified  “on average,” there will be lags.

It’s possible that a device like this could supplement other measurement systems.  But nothing will be as good as actual blood measurements.  Therefore finger sticks will always be needed for calibration. The best realistic case is that a contact lens device could serve as an early warning; but finger sticks will still be needed for validation before taking any action.

via Google introduces smart contact lens project to measure glucose levels | Ars Technica.

NYT review of photo drone recommends illegal and unsafe behavior

This review really missed the boat on both law and safety issues for drones. Some of what it discussed is illegal (unfortunately – I think the present law against commercial use of UAVs is too strong). A lot of it is unsafe, or rather it will be unsafe in the hands of newbies who buy this expensive but very-easy-to-use piece of technology.    Review – The Phantom 2 Vision Photo Drone From DJI – NYTimes.com.

If you have the $1200 for one of these undeniably cool machines, and the interest, the best approach is simple: buy one, and give it to me.  More seriously, here’s some good advice about learning to do photography with these.  It’s written for photographers who fundamentally are not interested in the flying part, and it’s not nearly “sufficient” for safety, but it gives a good idea of what you are in for.

Here are two videos of idiots flying these vehicles and having nasty crashes.  After the break: my two exchanges with the NY Times about the article.

Continue reading

Software, Design Defects Cripple Health-Care Website – WSJ.com

Software, Design Defects Cripple Health-Care Website – WSJ.com.

Poor software design is still common. I notice the developer was Experian, a private company. Outsourcing the web system for the Affordable Care Act was the right idea, but looks like they picked a weak company.

It will be interesting to get a post-mortem in a year or two. I hope someone writes it up for the New Yorker. It should make a good case study on software product development.

System is down...

System is down…