I teach a course on Data Mining, called Big Data Analytics. (See here for the course web site.) As I began to learn its culture and methods, clear differences from econometrics showed up. Since my students are well trained in standard econometrics, the distinctions are important to help guide them.
One important difference, at least where I teach, is that econometrics formulates statistical problems as hypothesis tests. Students do not learn other tools, and therefore they have trouble recognizing problems where hypothesis tests are not the right approach. Example: when viewing satellite images, distinguish urban from non-urban areas. This cannot be solved well in a hypothesis testing framework.
Another difference is less fundamental, but also important in practice: using out-of-sample methods to validate and test estimators is a religious practice in data mining, but is almost not taught in standard econometrics. (Again, I’m sure PhD courses at UCSD are an exception, but it is still rare to see economics papers that use out of sample tests.) Of course in theory econometrics formulas give good error bounds on fitted equations (I still remember the matrix formulas that Jerry Hausman and others drilled into us in the first year of grad school). But the theory assumes that there are no omitted variables and no measurement errors! Of course all real models have many omitted variables. Doubly so since “omitted” variable includes all nonlinear transforms of included variables.
Here are two recent columns on other differences between economists’ and statisticians’ approaches to problem solving.
I am not an econometrician by Rob Hyndman.