What is under the Internet of Things?

MEMS  (Micro Electrical Mechanical Systems) is a comparatively new and little known class of semiconductor chips. They are physical devices, mostly sensors, built with standard semicon technologies so that they are small and cheap.  Amazing new sensors are opening up all kinds of low-cost measurements, and when/if the IoT world materializes, MEMS sensors will be ubiquitous.

Detail of a MEMS chip from Analog Devices. Width of image is about .5 mm

An early application was accelerometers to measure what happens during an auto crash. Precisely calculating when to set off the airbags and how much force to use  substantially reduced the incidental injuries caused by airbags expanding at 100 mph or faster.  Another early application was the Wii’s wand. Now they are common in many products such as phones and toys. They also play key roles in “lab on a chip” technologies. In the future, household appliances may include MEMS microphones, vibration sensors, chemical sensors, and many other.

So MEMS is one of the multitude of “important but invisible” technologies that make the world work. As an aside, too many of my students look for jobs only with companies they have heard of, ie. they completely miss industries like MEMS.

Many years ago I helped Analog Devices with some manufacturing problems. ADI was a MEMS pioneer, so of course it had many new problems to deal with. Their fab (plant) was in Cambridge, MA, right next to MIT!  Now the technology is more widely diffused, so the industry is more competitive and apparently not very profitable. This article has a short discussion of price pressure and product directions.   Semiconductor Engineering .:. The Trouble With MEMS

A short description of the technology itself is at MEMS Motion Sensors: The Technology Behind the Technology.

Econometrics versus statistical analysis

I teach a course on Data Mining, called Big Data Analytics. (See here for the course web site.) As I began to learn its culture and methods, clear differences from econometrics showed up. Since my students are well trained in standard econometrics, the distinctions are important to help guide them.

One important difference, at least where I teach, is that econometrics formulates statistical problems as hypothesis tests. Students do not learn other tools, and therefore they have trouble  recognizing problems where hypothesis tests are not the right approach.  Example: when viewing satellite images, distinguish urban from non-urban areas. This cannot be solved well in a hypothesis testing framework.

Another difference is less fundamental, but also important in practice: using out-of-sample methods to validate and test estimators is a religious practice in data mining, but is almost not taught in standard econometrics. (Again, I’m sure PhD courses at UCSD are an exception, but it is still rare to see economics papers that use out of sample tests.) Of course in theory econometrics formulas give good error bounds on fitted equations (I still remember the matrix formulas that Jerry Hausman and others drilled into us in the first year of grad school). But the theory assumes  that there are no omitted variables and no measurement errors! Of course all real models have many omitted variables. Doubly so since “omitted” variable includes all  nonlinear transforms of included variables.

Here are two recent columns on other differences between economists’ and statisticians’ approaches to problem solving.

I am not an econometrician  by Rob Hyndman.

and

Differences between econometrics and statistics: From varying treatment effects to utilities, economists seem to like models that are fixed in stone, while statisticians tend to be more comfortable with variation, by Andrew Gelman.

Good data mining reference books

The students in my Big Data Analytics course asked for a list of books on the subject they should have in their library. UCSD has an excellent library, including digital versions of many technical books, so my  list is entirely books that can be downloaded on our campus. Many are from Springer. There are several other books that I have purchased, generally from O’Reilly, that are not listed here because they are not available on campus.

These are intended as reference books for people who have taken one course in R and data mining. Some of them are “cookbooks” for R. Others discuss various machine learning techniques. BDA16 reference book suggestions

If you have other suggestions, please add them in the comments with a brief description of what is covered.

Who Will Debunk The Debunkers? Reality and Myth in history of science

Separating historical truth from myth is as hard in science as anywhere else. This article has several examples, including whether Darwin got his ideas from someone else, and a dispute about whether Semmelweis was really ignored after his discovery of the link  between hand-washing and disease.

Semmelweiss teaches doctors to wash their hands c 1850 – it is still an issue today

The Hamblin article [about a supposed misplaced decimal point], unscholarly and unsourced, would become the ultimate authority for all the citations that followed. (Hamblin graciously acknowledged his mistake after Sutton published his research, as did Arbesman.)

In 2014, a Norwegian anthropologist named Ole Bjorn Rekdal published an examination of how the decimal-point myth had propagated through the academic literature. He found that bad citations were the vector. Instead of looking for its source, those who told the story merely plagiarized a solid-sounding reference: “(Hamblin, BMJ, 1981).” Or they cited someone in between — someone who, in turn, had cited Hamblin. This loose behavior, Rekdal wrote, made the transposed decimal point into something like an “academic urban legend,” its nested sourcing more or less equivalent to the familiar “friend of a friend” of schoolyard mythology. Source: Who Will Debunk The Debunkers? | FiveThirtyEight

I found a similar myth about aviation checklists. It’s a myth that they were invented because of the crash of a B-17 bomber prototype in 1935. The first B-17 checklist was in 1937, and by then many Navy aircraft had more complete checklists. Including one published before the 1935 crash.

As far as I could tell when I researched this, the B-17 checklist story was first told in a 1965 book by Edward Jablonski. Since then the myth has been passed from article to article to book, such as Atul Gawande’s generally excellent book, Checklist. The crash did happen, but checklists were invented independently of it.

Art to science: Dating sites take another step toward science. 

Many years ago I wrote a popular  (for an academic)  article, “Measuring and Managing Technological Knowledge.” The basic idea is that some  concepts are well understood, many others are not, and over time the tendency is to move from poorly understood (crafts) to well understood (science). Anyway, in class I used the example of romance to prove that this model is very general. “When you were 14, you had absolutely no idea how to impress a girl. When you were 20, you at least knew what the key variables are, even though you didn’t know how to make them happen reliably.” Etc. (Another example is the increasingly scient8936_0a73de68f10e15626eb98701ecf03adbific business of prostitution – but I won’t tell that one here, and I doubt I had courage to tell it in class.)
Continue reading

Death by GPS | Ars Technica

Why do we follow digital maps into dodgy places? Something is happening to us. Anyone who has driven a car through an unfamiliar place can attest to how easy it is to let GPS do all the work. We have come to depend on GPS, a technology that, in theory, makes it impossible to get lost. Not only are we still getting lost, we may actually be losing a part of ourselves. Source: Death by GPS | Ars Technica

As usual, aviation is way “ahead.” Use of automated navigation reduces pilots’ navigation skills; automated flight reduces hand-flying skills. Commercial aviation is starting to grapple with this, but there is no easy solution.

Art to science in moderating internet content

This article describes the efforts of Facebook, Youtube, and similar hosts of user-generated content, to screen unacceptable material. (Both speech and images.) It’s apparently a grim task, because of the depravity of some material.  For the first decade, moderation methods were heavily ad hoc, but  gradually grew more complex and formalized in response to questions such as when to allow violent images as news. In aviation terms, it was at Stage 2: Rules + Instruments. Now, some companies are developing Stage  3 (standard procedures) and Stage 4 (automated) methods.

Continue reading