Big data and AI are not “objective”

AI, machine learning, etc only appear to be objective. In reality, they reflect the world view and prejudices of their developers.

 Algorithms have been empowered to make decisions and take actions for the sake of efficiency and speed…. the aura of objectivity and infallibility cultures tend to ascribe to them. . the shortcomings of algorithmic decisionmaking, identifies key themes around the problem of algorithmic errors and bias, and examines some approaches for combating these problems. This report highlights the added risks and complexities inherent in the use of algorithmic … decisionmaking in public policy. The report ends with a survey of approaches for combating these problems.

Source: An Intelligence in Our Image: The Risks of Bias and Errors in Artificial Intelligence | RAND

Standards wars in home automation: don’t spend big $ yet

TL;DR It will take 5+ years for standards to get sorted out in home automation. Until they are, devices from different companies will not be compatible. Anything that you buy and install now will be inconvenient (you will need multiple interfaces) and become obsolete in a few years.

Now that there are many genuinely useful and modestly priced home automation devices (and I don’t mean smart refrigerators), we are ready to enter the rising portion of “the S curve” where penetration increases. Most of the devices can be retrofit, which will make uptake much easier.

But right now, most vendors have their own protocols. Common protocols are needed at 3 layers:  the user interface, such as a mobile phone/computer app (or web site), physical communication such as Bluetooth, Zigbee, or Wi-Fi, and data protocols (API’s, essentially). Most vendors appear to be moving toward a hub and spokes arrangement, where the hub handles communication to the user and outside the home, so there will also be competition for whose hub customers buy. Finally, I would add security as its own “layer,” since it is so important and currently completely neglected.

Continue reading

“My Galaxy Note7 is still safer than my car.” No, it isn’t.

The odds of dying in a car wreck are twice as high as this thing “exploding.” I’m keeping it.

Source: My Galaxy Note7 is still safer than my car. I’m keeping it

This author does an interesting calculation, but he does it wrong. The 100 Note7s that have exploded, out of 2.5M sold, were all used for 2 months or less since the phone has only been on the market that long. When you correct for this, the rate of fires over a 2 year ownership period is roughly 1 in 1000. (Probably higher, for several reasons.)

Second, lithium battery fires are nasty, smelly, and dangerous because they can set other things on fire. I speak from personal experience. Do you want to leave a device plugged in at night that may have a .1% chance of burning your house down over the period that you own it? I hope not.

His car wreck odds calculation (1 in 12000), by the way, may be per-year, but again he does not realize that it matters. But he is right that cars are plenty dangerous. I once estimate that at birth an American has a 50% chance of being hospitalized due to a car accident during their lifetime.

There are many other TOM issues to do with this Samsung Note7 recall. Clearly they have internal problems, and problems somewhere in management.

Police  body cams will cost $1000s per cop per year!

Police body cams sound great, but it will take years to work out all the ramifications, rules for using them, etc. One concern is cost. It’s likely that the initial cost of the cameras is a small fraction of the total cost.

One issue is the cost of storing the video recorded by cams. According to my rough calculations, this could be thousands of dollars per user per year. That will put a hole in any department’s budget.

Continue reading

Google/Alphabet continues toward Total Person Awareness: tracking every vehicle + person.

Secretive Alphabet division aims to fix public transit in US by shifting control to Google (from The Guardian)

Documents reveal Sidewalk Labs is offering a system it calls Flow to Columbus, Ohio, to upgrade bus and parking services – and bring them under Google’s management.


The emails and documents show that Flow applies Google’s expertise in mapping, machine learning and big data to thorny urban problems such as public parking. Numerous studies have found that 30% of traffic in cities is due to drivers seeking parking.

Sidewalk said in documents that Flow would use camera-equipped vehicles,…. It would then combine data from drivers using GoogleMaps with live information from city parking meters to estimate which spaces were still free. Arriving drivers would be directed to empty spots.

Source: Secretive Alphabet division aims to fix public transit in US by shifting control to Google

Notice that this gives Google/Alphabet a legitimate reason to track every car in the downtown area. Flow can be even more helpful if they know  the destination of every car AND every traveler for the next hour.
The next logical step, a few years from now, will be to track the plans of every person in the city. For example Mary Smith normally leaves her house in the suburbs at 8:15AM to drive to her office in downtown Columbus. Today, however, she has to drop off daughter Emily (born Dec 1, 2008, social security number 043-xx-xxxx) at school, so she will leave a little early. This perturbation in normal traffic can be used to help other drivers choose the most efficient route. Add  together thousands of these, and we can add real-time re-routing of buses/ Uber cars.
For now, this sounds like science fiction.  It certainly contains the ability to improve transit efficiency and speed, and “make everyone better off.” But it comes at a price. Yet many are already comfortable with Waze tracking their drives in detail.
Tune back in 10 years from now and tell me how I did.

Web site: Data mining with R for MBA level students.

I just completed teaching a 10 week course on data mining for MS level professional degree students. Most of the material is on a web site,   The course assumes good knowledge of OLS regression, but other than that is self-contained.
Software is R, with a heavy dose of Rattle for the first few weeks. (Rattle is a front end for R.) The main algorithms I emphasize are Random Forests and LASSO, for both classification and regression. I emphasize creating new variables that correspond to the physical/economic characteristics of the problem under study. The course requires a major project; some students scrape or mash their own data. Because we have only 10 weeks, I provide a timetable and a lot of milestones for the projects, and frequent one-on-one meetings.
The web site is not designed for public consumption, and is at best in “early beta” status. I am making it available in case anyone wants mine it for problem sets, discussions of applied issues not covered in most books, etc. Essentially, it is a crude draft of a text for MBAs on data mining using R. This was about the fifth time I taught the course. 

By the way, a lot of the lecture notes are modestly modified versions of the excellent lecture material from Matt Taddy. His emphasis is more theoretical than my course, but his explanations and diagrams are great. Readings were generally short sections from either ISLR by James et al,  or Data Mining with Rattle and R. Both are available as ebooks at many universities. My TA was Hyeonsu Kang.