Should we teach data mining without using a programming language?

Should data mining newcomers have to learn programming at the same time? Here is a contrarian view, which advocates a GUI (“drag and drop”) environment. Even though the popularity of R (and recently, Python) is increasing.  

I have certainly considered this in the Big Data course I teach. All my coding is in R, but I don’t have the time or see enough value in teaching “programming.” Over time as better tools come out of the R community, I have found ways to teach only a minimal amount of R.  I start them off with a menu-driven system called Rattle. It does a lot of data manipulation, importing, descriptive statistics, etc. It also has modules for a number of standard mining algorithms. Finally, it generates R code as it goes, so students can edit the methods if they need something a little different.

I have considered taking the next step, and using an IBM or Microsoft web-based platform for mining. But there is little or no material for students using these platforms, which is a deterrent. For example, there are at least 5 good books on data mining in R.

So for now, I’m happy with using a minimal subset of R for teaching. But I continue to look at alternatives.

Source: Why R is Bad for You – Data Science Central

Additional discussion of environments for data mining.

And for further alternatives, here is a new cheat sheet on data mining using Stata (which all economists at UCSD use, and most of my students therefore learn).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s