Web site: Data mining with R for MBA level students.

I just completed teaching a 10 week course on data mining for MS level professional degree students. Most of the material is on a web site, https://irgn452.wordpress.com/chron/   The course assumes good knowledge of OLS regression, but other than that is self-contained.
Software is R, with a heavy dose of Rattle for the first few weeks. (Rattle is a front end for R.) The main algorithms I emphasize are Random Forests and LASSO, for both classification and regression. I emphasize creating new variables that correspond to the physical/economic characteristics of the problem under study. The course requires a major project; some students scrape or mash their own data. Because we have only 10 weeks, I provide a timetable and a lot of milestones for the projects, and frequent one-on-one meetings.
The web site is not designed for public consumption, and is at best in “early beta” status. I am making it available in case anyone wants mine it for problem sets, discussions of applied issues not covered in most books, etc. Essentially, it is a crude draft of a text for MBAs on data mining using R. This was about the fifth time I taught the course. 

By the way, a lot of the lecture notes are modestly modified versions of the excellent lecture material from Matt Taddy. His emphasis is more theoretical than my course, but his explanations and diagrams are great. Readings were generally short sections from either ISLR by James et al,  or Data Mining with Rattle and R. Both are available as ebooks at many universities. My TA was Hyeonsu Kang.

 

The Tesla Dividend: Better Internet Access — Interesting but Wrong

Elon Musk’s newest car doesn’t just run on electricity — it needs a world class fiber network  Source: The Tesla Dividend: Better Internet Access — Backchannel — Medium

This is an interesting attempt to give still more importance to Tesla and very smart cars. “Tesla cars generate about 1 Gigabyte per minute of [raw] data.”

But the argument is wrong. They generate plenty of data internally – so do today’s other advanced cars with their 100+ processors. But that data is thrown away as fast as it is created. It’s part of what I called “dark data” in my report on Measuring Information. Neither Tesla nor anyone else needs the massive detail. Even for deep learning, only a few seconds are going to be useful, per hour of operation. See my response to the original article, here.

Computers don’t belong in classrooms?!

I recently audited some lectures by friend and China expert Prof. Susan Shirk. She bans computers in her lectures. But one student sitting near me had his machine out and was “busy” with the usual distractions. (Didn’t he know the Associate Dean was a few seats away?) I asked Susan about him after class. “He told me he can’t take notes without a computer.” Obviously  the computer  is not the big issue on his note taking. Actually, it probably IS the issue – but in a negative way.

Not one computer mirrors the overheads.

James Kwak has beaten the distraction of cell phones – by removing most apps, including browsers.

I know that its enormous powers of distraction also make me lose focus on work, tune out in meetings, stay up too late at night, and, worst of all, ignore people in the same room with me. We all know this. We’re addicted to the dopamine hit we get when we look at our email and there’s actually something good in there, so we keep checking our email hoping to feel it again.

via How I Achieved Peace by Crippling My Phone — Bull Market — Medium.

Clay Shirky, an Internet sociologist, has a good discussion of why he recently banned computers  in his classrooms. Excerpt:

I came late and reluctantly to this decision — I have been teaching classes about the internet since 1998, and I’ve generally had a laissez-faire attitude towards technology use in the classroom. This was partly because the subject of my classes made technology use feel organic, …. And finally, there’s not wanting to infantilize my students, who are adults, even if young ones — time management is their job, not mine.

Despite these rationales, the practical effects of my decision to allow technology use in class grew worse over time. The level of distraction in my classes seemed to grow, even though it was the same professor and largely the same set of topics, …

Over the years, I’ve noticed that when I do have a specific reason to ask everyone to set aside their devices (‘Lids down’, in the parlance of my department), it’s as if someone has let fresh air into the room. The conversation brightens, and more recently, there is a sense of relief from many of the students. Multi-tasking is cognitively exhausting — when we do it by choice, being asked to stop can come as a welcome change.

So this year, I moved from recommending setting aside laptops and phones to requiring it, adding this to the class rules: “Stay focused. (No devices in class, unless the assignment requires it.)” …

Continue reading

Facebook AI Director on “Deep Learning”

This short interview has some good explanations.

LeCun: Actually, I think the basics of machine learning are quite simple to understand….

A pattern recognition system is like a black box with a camera at one end, a green light and a red light on top, and a whole bunch of knobs on the front. The learning algorithm tries to adjust the knobs so that when, say, a dog is in front of the camera, the red light turns on, and when a car is put in front of the camera, the green light turns on. You show a dog to the machine. If the red light is bright, don’t do anything. If it’s dim, tweak the knobs so that the light gets brighter. If the green light turns on, tweak the knobs so that it gets dimmer. Then show a car, and tweak the knobs so that the red light get dimmer and the green light gets brighter. If you show many examples of the cars and dogs, and you keep adjusting the knobs just a little bit each time, eventually the machine will get the right answer every time.

Why unsupervised learning is critical in the long run, but does not yet work:

The type of learning that we use in actual Deep Learning systems is very restricted. What works in practice in Deep Learning is “supervised” learning. You show a picture to the system, and you tell it it’s a car, and it adjusts its parameters to say “car” next time around. Then you show it a chair. Then a person. And after a few million examples, and after several days or weeks of computing time, depending on the size of the system, it figures it out.

Now, humans and animals don’t learn this way. You’re not told the name of every object you look at when you’re a baby. And yet the notion of objects, the notion that the world is three-dimensional, the notion that when I put an object behind another one, the object is still there—you actually learn those. You’re not born with these concepts; you learn them. We call that type of learning “unsupervised” learning.

Facebook AI Director Yann LeCun on His Quest to Unleash Deep Learning and Make Machines Smarter – IEEE Spectrum.

No, a study did not link GM crops to 22 diseases

No, a study did not link GM crops to 22 diseases.

And a  candidate for worst graph of the year, appearing to show that deaths from a certain class of diseases grew in parallel with some farming trends. ! (Figure 16 in the article, which is at http://www.organic-systems.org/journal/92/JOS_Volume-9_Number-2_Nov_2014-Swanson-et-al.pdf ). Any steadily increasing time series can be plotted so that they lie approximately on top of each other, if you distort the scales enough. Other “causes” they could have plotted, with approximately the same results: cell-phone per capita, percentage of cars on the road with ABS brakes, and (for all I know) average campaign spending per Congressional race.

How did the Ukranian govt. know who was demonstrating against it?

[edits Jan. 31] A poli sci friend recently blogged about the Ukranian government’s “text that changed the world,” a mass text message thousands of anti-government demonstrators in Kiev. She asked 1) How did the government know who was in the main square of Kiev that day? (Cell phone location) and 2) How did it send the same message to everyone at once? (Mass SMS)

Demonstrators in Kiev. From CNN 

The second question is easy: phone companies routinely provide mass-SMS services to large customers. For example, I’m on the “emergency alert” texting service of UC San Diego’s campus police. It was designed for earthquakes, but it has been used for other kinds of messages “between earthquakes.” The same message goes out to every phone number on their list.

What to do to avoid tracking? Short version: Leave your phone at home. Second best is to shut it off or switch to airplane mode, but those work only if the government is not making an effort to target you.

Continue reading