If you are looking for information about my upcoming Big Data course, which starts on April 2, 2018, it is in a different blog. Please go here to learn about the textbooks, and to see how the course worked last year.
A contributor to Dave Farber’s IP (“Important People” list) recently stated that 1 Megabit per second (Mbps) is adequate bandwidth for consumers. This compares to “high speed Internet” which in the US is 20 Mbps or higher, and Korea where speeds over 50 Mbps are common.
My response: 1 Mbps is woefully low for any estimate of “useful bandwidth” to an individual, much less to a home. It’s risky to give regulators an any excuse to further ignore consumer desires for faster connections. 1 Mbps is too low by at least one order of magnitude, quite likely by three orders of magnitude, and conceivably by even more. I have written this note in an effort to squash the 1Mbps idea in case it gets “out into the world.”
The claim that 1 Megabit per second is adequate:
>From: Brett Glass <email@example.com>
>Date: Sun, Dec 31, 2017 at 2:14 PM
> The fact is that, according to neurophysiologists, the entire bandwidth of
> all of the human senses combined is about 1 Mbps. (Some place it slightly
> higher, at 1.25 Mbps.) Thus, to completely saturate all inputs to the human
> nervous system, one does not even need a T1 line – much less tens of megabits.
> And therefore, a typical household needs nowhere near 25 Mbps – even if they
> were all simultaneously immersed in high quality virtual reality. Even the
First, I don’t know where the 1Mbps number comes from, but a common number is the bandwidth of the optic nerve, which is generally assessed at around 10Mbps. See references.
Second, a considerable amount of pre-processing occurs in the retina and the layer under the retina, before reaching the optic nerve. These serve as the first layers of a neural network, and handle issues like edge detection.
Gina Kolata in the NY Times has been running a good series of articles on fraudulent academic publishing. The basic business model is an unholy alliance between academics looking to enhance their resumes, and quick-buck internet sites. Initially, I thought these sites were enticing naive academics. But many academics are apparently willing participants, suggesting that it’s easy to fool many promotion and award committees.
All but one academic in 10 who won a School of Business and Economics award had published papers in these journals. One had 10 such articles.
Two emerging technologies are revolutionizing industries, and will soon have big impacts on our health, jobs, entertainment, and entire lives. They are Artificial Intelligence, and Big Data. Of course, these have already had big effects in certain applications, but I expect that they will become even more important as they improve. My colleague Dr. James Short is putting together a conference called Data West at the San Diego Supercomputer Center, and I came up with a list of fears that might disrupt their emergence.
1) If we continue to learn that ALL large data repositories will be hacked from time to time (Experian; National Security Agency), what blowback will that create against data collection? Perhaps none in the US, but in some other countries, it will cause less willingness to allow companies to collect consumer data.
2) Consensual reality is unraveling, mainly as a result of deliberate, sophisticated, distributed, attacks. That should concern all of us as citizens. Should it also worry us as data users, or will chaos in public venues not leak over into formal data? For example, if information portals (YouTube, Facebook, etc.) are forced to take a more active role in censoring content, will advertisers care? Again, Europe may be very different. We can presume that any countermeasures will only be partly effective – the problem probably does not have a good technical solution.
3) Malware, extortion, etc. aimed at companies. Will this “poison the well” in general?
4) Malware, extortion, doxing, etc. aimed at Internet of Things users, such as household thermostats, security cameras, cars. Will this cause a backlash against sellers of these systems, or will people accept it as the “new normal.” So far, people have seemed willing to bet that it won’t affect them personally, but will that change. For example, what will happen when auto accidents are caused by deliberate but unknown parties who advertise their success? When someone records all conversations within reach of the Alexa box in the living room?
Each of these scenarios has at least a 20% chance of becoming common. At a minimum, they will require more spending on defenses. Will any become large enough to suppress entire applications of these new technologies?
I have not said anything about employment and income distribution. They may change for the worse over the next 20 years, but the causes and solutions won’t be simple, and I doubt that political pressure will become strong enough to alter technology evolution.
TL;DR In Southern California should put PV on houses and buildings that are far from the coast, because coastal areas are cloudy much of the summer. But the actual pattern is the opposite. I estimate a 30% magnitude of loss. Even my employer, UCSD, has engaged in this foolishness in order to appear trendy.
Should data mining newcomers have to learn programming at the same time? Here is a contrarian view, which advocates a GUI (“drag and drop”) environment. Even though the popularity of R (and recently, Python) is increasing.
I have just finished my Big Data course for 2017, and noted some concepts that I want to teach better next year. One of them is how to interpret and use the coefficient estimates from linear regression. All economists are familiar with dense tables of coefficients and standard errors, but they require experience to read, and are not at all intuitive. Here is a more intuitive and useful way to display the same information. The blue dots show the coefficient estimates, while the lines show +/- 2 standard errors on the coefficients. It’s easy to see that the first two coefficients are “statistically significant at the 5% level”, the third one is not, and so on. More important, the figure gives a clear view
of the relative importance of different variables in determining the final outcomes.
The heavy lifting for this plot is done by the function sjp.lm from the sjPlot library. The main argument linreg is the standard results of a linear regression model, which is a complex list with all kinds of information buried in it. Continue reading