Smaller departments that struggle with the cost of equipment and storage of data are ending or suspending programs aimed at transparency and accountability.
Should data mining newcomers have to learn programming at the same time? Here is a contrarian view, which advocates a GUI (“drag and drop”) environment. Even though the popularity of R (and recently, Python) is increasing.
Police body cams sound great, but it will take years to work out all the ramifications, rules for using them, etc. One concern is cost. It’s likely that the initial cost of the cameras is a small fraction of the total cost.
One issue is the cost of storing the video recorded by cams. According to my rough calculations, this could be thousands of dollars per user per year. That will put a hole in any department’s budget.
I just completed teaching a 10 week course on data mining for MS level professional degree students. Most of the material is on a web site, https://irgn452.wordpress.com/chron/ The course assumes good knowledge of OLS regression, but other than that is self-contained.
Software is R, with a heavy dose of Rattle for the first few weeks. (Rattle is a front end for R.) The main algorithms I emphasize are Random Forests and LASSO, for both classification and regression. I emphasize creating new variables that correspond to the physical/economic characteristics of the problem under study. The course requires a major project; some students scrape or mash their own data. Because we have only 10 weeks, I provide a timetable and a lot of milestones for the projects, and frequent one-on-one meetings.
The web site is not designed for public consumption, and is at best in “early beta” status. I am making it available in case anyone wants mine it for problem sets, discussions of applied issues not covered in most books, etc. Essentially, it is a crude draft of a text for MBAs on data mining using R. This was about the fifth time I taught the course.
By the way, a lot of the lecture notes are modestly modified versions of the excellent lecture material from Matt Taddy. His emphasis is more theoretical than my course, but his explanations and diagrams are great. Readings were generally short sections from either ISLR by James et al, or Data Mining with Rattle and R. Both are available as ebooks at many universities. My TA was Hyeonsu Kang.
The students in my Big Data Analytics course asked for a list of books on the subject they should have in their library. UCSD has an excellent library, including digital versions of many technical books, so my list is entirely books that can be downloaded on our campus. Many are from Springer. There are several other books that I have purchased, generally from O’Reilly, that are not listed here because they are not available on campus.
These are intended as reference books for people who have taken one course in R and data mining. Some of them are “cookbooks” for R. Others discuss various machine learning techniques. BDA16 reference book suggestions
If you have other suggestions, please add them in the comments with a brief description of what is covered.
Something I just found for my Big Data class.
Machine learning system aims to remove problem players “within 15 minutes.”
An interesting thread of player comments has a good discussion of potential problems with automated bans. Only time will tell how well the company develops the system to get around these issues.
This company also took an experimental approach to banning players. And hired 3 PhDs in Cognitive Science to develop it. (Just to be clear, their experiments did not appear to be automated A/B style experiments.) After the jump is a screen shot from that system.
But, I’m not tempted to play League of Legends to study player behavior and experiment with getting banned! (I don’t think I’ve ever tried an MMO beyond some prototypes 15 years ago.) If any players want to post your observations here, great.
A bad graphic from a pro-solar group is perhaps not surprising. (See previous post.) Here is one from Bloomberg that verges on incomprehensible. Bloomberg as a source is surprising.
Looking closer, it appears that Skill Desirability increase from left to right, and Skill Frequency increases from top to bottom?! Graphs should be drawn so that UP means higher. In any case, it should not take prolonged inspection to deduce which variable is on the X axis.
The graphic also manages to make as many schools as possible look good at something. In Financial Services, the top 3 schools for Communications skills are listed as Tuck, McCombs, and Kellogg. But in Technology, the top 3 schools change to Fuqua, Haas, and Kellogg. And for Consulting, the top 3 are London, Harvard, and Ivey. Since “Communication Skills” are the most desired skill of all according to the graph, eight schools can say they are in the Top 3 for teaching the most sought-after skills.