Police  body cams will cost $1000s per cop per year!

Police body cams sound great, but it will take years to work out all the ramifications, rules for using them, etc. One concern is cost. It’s likely that the initial cost of the cameras is a small fraction of the total cost.

One issue is the cost of storing the video recorded by cams. According to my rough calculations, this could be thousands of dollars per user per year. That will put a hole in any department’s budget.

Continue reading

Web site: Data mining with R for MBA level students.

I just completed teaching a 10 week course on data mining for MS level professional degree students. Most of the material is on a web site, https://irgn452.wordpress.com/chron/   The course assumes good knowledge of OLS regression, but other than that is self-contained.
Software is R, with a heavy dose of Rattle for the first few weeks. (Rattle is a front end for R.) The main algorithms I emphasize are Random Forests and LASSO, for both classification and regression. I emphasize creating new variables that correspond to the physical/economic characteristics of the problem under study. The course requires a major project; some students scrape or mash their own data. Because we have only 10 weeks, I provide a timetable and a lot of milestones for the projects, and frequent one-on-one meetings.
The web site is not designed for public consumption, and is at best in “early beta” status. I am making it available in case anyone wants mine it for problem sets, discussions of applied issues not covered in most books, etc. Essentially, it is a crude draft of a text for MBAs on data mining using R. This was about the fifth time I taught the course. 

By the way, a lot of the lecture notes are modestly modified versions of the excellent lecture material from Matt Taddy. His emphasis is more theoretical than my course, but his explanations and diagrams are great. Readings were generally short sections from either ISLR by James et al,  or Data Mining with Rattle and R. Both are available as ebooks at many universities. My TA was Hyeonsu Kang.


Good data mining reference books

The students in my Big Data Analytics course asked for a list of books on the subject they should have in their library. UCSD has an excellent library, including digital versions of many technical books, so my  list is entirely books that can be downloaded on our campus. Many are from Springer. There are several other books that I have purchased, generally from O’Reilly, that are not listed here because they are not available on campus.

These are intended as reference books for people who have taken one course in R and data mining. Some of them are “cookbooks” for R. Others discuss various machine learning techniques. BDA16 reference book suggestions

If you have other suggestions, please add them in the comments with a brief description of what is covered.

Using data mining to ban trolls on League of Legends

Something I just found for my Big Data class.

Riot rolls out automated, instant bans for League of Legends trolls

Machine learning system aims to remove problem players “within 15 minutes.”

An interesting thread of player comments has a good discussion of potential problems with automated bans. Only time will tell how well the company develops the system to get around these issues.

This company also took an experimental approach to banning players. And hired 3 PhDs in Cognitive Science to develop it. (Just to be clear, their experiments did not appear to be automated A/B style experiments.) After the jump is a screen shot from that system.

League of Legends screen shot

But, I’m not tempted to play League of Legends to study player behavior and experiment with getting banned! (I don’t think I’ve ever tried an MMO beyond some prototypes 15 years ago.)  If any players want to post your observations here, great.

Chartjunk: Second-worst graphic of the month!

A bad graphic from a pro-solar group is perhaps not surprising. (See previous post.) Here is one from Bloomberg  that verges on incomprehensible. Bloomberg as a source is surprising.

Which way is up? (Answer: down is up)

Which way is up? (Answer: down is up)

Looking closer, it appears that Skill Desirability increase from left to right, and Skill Frequency increases from top to bottom?!  Graphs should be drawn so that UP means higher.  In any case, it should not take prolonged inspection to deduce which variable is on the X axis.

The graphic also manages to make as many schools as possible look good at something. In Financial Services, the top 3 schools for Communications skills are listed as  Tuck, McCombs, and Kellogg. But in Technology, the top 3 schools change to Fuqua, Haas, and Kellogg. And for Consulting, the top 3  are London, Harvard, and Ivey. Since “Communication Skills” are the most desired skill of all according to the graph, eight schools can say they are in the Top 3 for teaching the most sought-after skills.

When the doctor’s away, the patient is more likely to survive | Ars Technica

When the doctor’s away, the patient is more likely to survive | Ars Technica.

Very surprising. When cardiologists are away from the hospital, deaths after heart failure or cardiac arrest declined. I’ll probably use this in my course this Spring. (Or perhaps in both courses: Big Data, and Operations Quality in Healthcare.)

Continue reading