Here is an argument for allowing companies to maintain a lot of secrecy about how their data mining (AI) models work. The claime is that revealing information will put companies at a competitive disadvantage. Sorry, that is not enough of a reason. And it’s not actually true, as far as I can tell.
The first consideration when discussing transparency in AI should be data, the fuel that powers the algorithms. Because data is the foundation for all AI, it is valid to want to know where the data…
Here is my response.
Your questions are good ones. But you seem to think that explainability cannot be achieved except by giving away all the work that led to the AI system. That is a straw man. Take deep systems, for example. The IP includes:
1) The training set of data
2) The core architecture of the network (number of layers etc)
3) The training procedures over time, including all the testing and tuning that went on.
4) The resulting system (weights, filters, transforms, etc).
5) HIgher-level “explanations,” whatever those may be. (For me, these might be a reduced-form model that is approximately linear, and can be interpreted.)
Revealing even #4 would be somewhat useful to competitors, but not decisive. The original developers will be able to update and refine their model, while people with only #4 will not. The same for any of the other elements.
I suspect the main fear about revealing this, at least among for-profit companies, is that it opens them up to second-guessing . For example, what do you want to bet that the systems now being used to suggest recidivism have bugs? Someone with enough expertise and $ might be able to make intelligent guesses about bugs, although I don’t see how they could prove them.
Sure, such criticism would make companies more cautious, and cost them money. And big companies might be better able to hide behind layers of lawyers and obfuscation. But those hypothetical problems are quite a distance in the future. Society deserves to, and should, do more to figure out where these systems have problems. Let’s allow some experiments, and even some different laws in different jurisdictions, to go forward for a few years. To prevent this is just trusting the self-appointed experts to do what is in everyone else’s best interests. We know that works poorly!
It sounds like what we used to call a “bug” to me. I guess bugs are now promoted to “algorithm failures”.
Nearly half a million elderly women in the United Kingdom missed mammography exams because of a scheduling error caused by one incorrect computer algorithm, and several hundred of those women may have died early as a result. Last week, the U.K. Health Minister Jeremy Hunt announced that an independent inquiry had been launched to determine how a “computer algorithm failure” stretching back to 2009 caused some 450,000 patients in England between the ages of 68 to 71 to not be invited for their final breast cancer screenings.
The errant algorithm was in the National Health System’s (NHS) breast cancer screening scheduling software, and remained undiscovered for nine years.
“Tragically, there are likely to be some people in this group who would have been alive today if the failure had not happened,” Hunt went on to tell Parliament. He added that based on statistical modeling, the number who may have died prematurely as a result was estimated to be between 135 and 270 women.
There is a lot of concern about AI potentially causing massive unemployment. The question of whether “this time will be different” is still open. But another insidious effect is gaining speed: putting tools in the hands of large companies that make it more expensive and more oppressive to run into financial trouble. In essence, harder to live on the edges of “The System.”
- Cars with even one late payment can be spotted, and repossessed, faster. “Business has more than doubled since 2014….” This is during a period of ostensible economic growth.
- “Even with the rising deployment of remote engine cutoffs and GPS locators in cars, repo agencies remain dominant. … Agents are finding repos they never would have a few years ago.”
- “So much of America is just a heartbeat away from a repossession — even good people, decent people who aren’t deadbeats,” said Patrick Altes, a veteran agent in Daytona Beach, Fla. “It seems like a different environment than it’s ever been.”
- “The company’s goal is to capture every plate in Ohio and use that information to reveal patterns. A plate shot outside an apartment at 5 a.m. tells you that’s probably where the driver spends the night, no matter their listed home address. So when a repo order comes in for a car, the agent already knows where to look.”
- Source: The surprising return of the repo man – The Washington Post
If you are looking for information about my upcoming Big Data course, which starts on April 2, 2018, it is in a different blog. Please go here to learn about the textbooks, and to see how the course worked last year.
A contributor to Dave Farber’s IP (“Important People” list) recently stated that 1 Megabit per second (Mbps) is adequate bandwidth for consumers. This compares to “high speed Internet” which in the US is 20 Mbps or higher, and Korea where speeds over 50 Mbps are common.
My response: 1 Mbps is woefully low for any estimate of “useful bandwidth” to an individual, much less to a home. It’s risky to give regulators an any excuse to further ignore consumer desires for faster connections. 1 Mbps is too low by at least one order of magnitude, quite likely by three orders of magnitude, and conceivably by even more. I have written this note in an effort to squash the 1Mbps idea in case it gets “out into the world.”
The claim that 1 Megabit per second is adequate:
>From: Brett Glass <firstname.lastname@example.org>
>Date: Sun, Dec 31, 2017 at 2:14 PM
> The fact is that, according to neurophysiologists, the entire bandwidth of
> all of the human senses combined is about 1 Mbps. (Some place it slightly
> higher, at 1.25 Mbps.) Thus, to completely saturate all inputs to the human
> nervous system, one does not even need a T1 line – much less tens of megabits.
> And therefore, a typical household needs nowhere near 25 Mbps – even if they
> were all simultaneously immersed in high quality virtual reality. Even the
First, I don’t know where the 1Mbps number comes from, but a common number is the bandwidth of the optic nerve, which is generally assessed at around 10Mbps. See references.
Second, a considerable amount of pre-processing occurs in the retina and the layer under the retina, before reaching the optic nerve. These serve as the first layers of a neural network, and handle issues like edge detection.
Gina Kolata in the NY Times has been running a good series of articles on fraudulent academic publishing. The basic business model is an unholy alliance between academics looking to enhance their resumes, and quick-buck internet sites. Initially, I thought these sites were enticing naive academics. But many academics are apparently willing participants, suggesting that it’s easy to fool many promotion and award committees.
All but one academic in 10 who won a School of Business and Economics award had published papers in these journals. One had 10 such articles.