Parachutes are not an argument against RCTs for medical treatments!

Another doctor has recently used parachutes as an example of why some medical treatments don’t need to be tested before using them on patients. That historical claim is wrong.

Arguing for “The search for perfect evidence may be the enemy of good policy,” Greenhalgh, a physician and expert in health care delivery at the University of Oxford, fumed in the Boston Review. “As with parachutes for jumping out of airplanes, it is time to act without waiting for randomized controlled trial evidence.” [emphasis added]….

COVID-19, she argues, has revealed the limits of evidence-based medicine—masks being a potent case in point.

The United Kingdom’s mask crusader
Ellen Ruppel Shell
Science 16 Oct 2020: Vol. 370, Issue 6514, pp. 276-277
DOI: 10.1126/science.370.6514.276

A 2003 article in British Medical Journal claimed after a literature search that “No randomised controlled trials of parachute use have been undertaken,[sic]” and went on to claim that “Individuals who insist that all interventions need to be validated by a randomised controlled trial need to come down to earth with a bump.” This is nonsense. Parachutes were heavily tested by the British air force late in WW I, for example. The issue was controversial at the time because German pilots already had parachutes, and the British military was slow to adopt, perhaps because of NIH (Not Invented Here). Continued trials delayed deployment until after the war was over.

Jet ejection seats, a “super-parachute” invented in the 1940s, received comprehensive engineering tests as various designs were experimented on. Tests ultimately included multiple human trials with volunteers. Despite that, many pilots at the time were hesitant to trust them, but field experience (and lack of alternatives when you were about to crash) led to still-reluctant acceptance. The reluctance stemmed from the dangers of ejection – severe injuries were common, due to high accelerations, collisons with pieces of the aircraft, and so forth. Continued experimentation at many levels (simulations, scale models, dummy pilots, etc.) have led to many improvements over the early designs, and most pilots who eject now are not permanently injured.

Test of a 0/0 ejection by Major Jim Hall, 1965

So parachutes have been, and new designs continue to be, heavily tested. Perhaps the 2003 authors missed them because they did not search obscure engineering and in-house journals written decades before the Internet. What about the “controlled” part of Randomized Controlled Trials? They had not even been invented in 1918; R.A. Fisher’s seminal work on experimental statistics was done in the 1920s and 30s.

More important, engineering trials have something better than randomization: deliberate “corner tests.” With humans and diseases we don’t know all the variables that affect treatment effectiveness, and even if we knew them, we couldn’t measure many of them. But with engineered systems we can figure out most key variables ahead of time. So trials can be run with:

  • Low pilot weight / high pilot weight
  • Low airspeed/high airspeed
  • Low, intermediate, and high altitudes
  • Aircraft at 0 pitch and yaw, all the way to aircraft inverted.
  • Delayed or early seat ejection.
  • Testing prototypes (and now, finite element simulations) can tell us which conditions are most extreme, so not all corners need full-scale tests.

Of course some of these tests will “fail,” e.g. early ejection seats did not work at low altitude and airspeed. Those limits are then written into pilots’ manuals. That is considerably better than we do with many RCT’s, which deliberately choose trial subjects who are more healthy than patients who will ultimately take the medicine.

So let’s stop using this analogy. Parachutes were never adopted without (the equivalent of) RCTs. Thereare many reasons to adopt masks without years of testing, but this is not one of them.

(I have written more about this in my book draft about the evolution of flying from an art to a science.)

Smarter contact tracing

Not just another cell-phone idea

Can Contact Tracing Work At COVID Scale? Amit Kaushal and Russ B. Altman JULY 8, 2020 10.1377/hblog20200630.746159

You cannot trace everybody, so be smart about who you trace. This article points out the impracticality of massive contact tracing, and how to build a learning system to make it useful anyway. Contact tracing is hard, and when there are too many cases it starts to break down. But we need to figure it out, especially in high-priority settings and in places with limited outbreaks. There are also many idiosyncrasies in Covid infection patterns. A well-executed learning system can gradually make smarter judgments about where to look for cases, who to test, who to quarantine, and when to lift the quarantine.

As we build our nation’s tracing operations, we need to ensure that they are effective at identifying contacts while attempting to quarantine as few people as possible, for as short a duration as possible. To ensure contact tracing remains viable at scale, we must develop data-driven metrics to evaluate and adapt our contact tracing efforts. Historically, successful contact tracing has been measured by its sensitivity [based on more is better]. However, at scale “more is better” breaks down. We must have corresponding metrics for specificity, to … exclude from quarantine those people who have not themselves become carriers of the virus.

Can Contact Tracing Work At COVID Scale? | Health Affairs

But will America’s current political decision-making paralysis, chaos, and suspicion allow the systematic tracing program that would be required? At the national level it seems unlikely. But this approach can be done by states or smaller units. There are probably some states with enough leadership and public willingness to be serious about suppressing Covid before it wipes out another 6 months of jobs and education!

Covid-19 and exponential growth: Actual cases = 22 × Reported cases

Covid-19 is growing exponentially. Exponential growth is funny (counterintuitive). For a while, nothing seems to be happening. Then very quickly “everything explodes.” (Also strange: this visual behavior repeats at different scales.) For example, here is a chart of reported cases in Italy. (Source: dev_thetromboneguy )

Exponentially growing data looks small at first; then it appears to zoom upwards

As an applied mathematician (among other things), I’ve worked with lots of exponential growth. I fear that reporters and others are not understanding the implications.)

  • Covid-19 infections are growing exponentially, with a doubling time of 5 days. This is based on Italy, and on China before they took drastic measures to limit spread. (On the Italian chart above, it’s even faster.)
  • The incubation period between getting sick and knowing you are sick averages 14 days.
  • In the US, it is taking multiple days to get tested when you ask for it. Optimistically, a 4 day delay on average.
  • The test results are apparently taking days to come back. Approximately 3 days
  • So total lag between being infected, and being reported in the statistics as a “confirmed new case” is 14 + 4 + 3 = 21 days.
  • This is 4.2 doubling intervals = 21/5
  • 2^4.2 = 22.6 ;2 raised to the 4.2 power is 22
  • So for every confirmed case, there are about 21 more that are sick but not yet reported.
  • This assumes everyone with the disease is eventually tested. If half of people who are sick are never tested, double this number.

This is why we each need to take personal defensive action before there are many reported cases in our city. By the time we know there are many cases near us, it is 3 weeks later and the real number is 22 times higher.

Of course there are many approximations in this model. If you know anywhere that has done similar analysis more rigorously, please post it in the comments or tweet it to me, @RogerBohn

Update March 17: The 20:1 calculation is driven by the 21 days assumed between being infected, and getting positive test results. The average may be lower, and it will fall as more tests become available. On the other hand Italy reports a case growth rate of 35% per day! So the ratio could be as low as 8.

John IOANNIDIS calls for random testing of entire population, which would give an unbiased estimate. That’s not yet feasible for the U.S. due to CDC’s inexplicable foot dragging on test development and procurement. But Korea could do it. The required sample size is large due to uneven incidence in different parts of a country. But a running sample e.g. 1000 per day would give very useful numbers.
If the test is sensitive enough, random sampling would also show how many people are sick with few or no symptoms.

Don’t expect Level 5 Autonomous cars for decades

Why I don’t expect fully autonomous city driving in my lifetime (approx 25 years).

Paraphrase: The strange and crazy things that people do. .. a ball bouncing in front of your car, a child falling down, a car running a red light, head-down pedestrian. A level-5 car has to handle all of these cases, reliably.

These situations require 1) a giant set of learning data 2) Very rapid computing 3) Severe braking. Autonomous cars today are very slow + very cautious in order to allow more time for decisions and for braking.

My view:

There is no magic bullet that can solve these 3 problems, except keeping autonomous cars off of city streets. And all 3 get worse in bad weather, including fog much less in snow.

Also, there are lots of behavioral issues, such as “knowing” the behavior of pedestrians in different cities. Uber discovered that frequent braking/accelerating makes riders carsick – so they re-tuned their safety margins, and their car killed a pedestrian.

A counter-argument (partly from Don Norman, jnd1er): Human drivers are not good at these situations either, and occasionally hit people. Therefore, we should not wait for perfection, but instead systems that on balance are better than humans.  As distracted driving gets worse, the tradeoff in favor of autonomous cars will shift.

But there is another approach to distracted driving. Treat it like drunk driving. Make it socially and legally unacceptable. Drunk driving used to be treated like an accident, with very light penalties even in fatal accidents.

Finally, I’m not sure if any amount of real-life driving will be good enough to develop  training datasets for the rarest edge cases. Developers will need supplemental methods to handle them, including simulated accidents and some causal modeling. For example, the probabilities of different events change by location and time of day. Good drivers know this, and adjust. Perhaps cars will need adjustable parameters that shift their algorithm tuning in different circumstances.

Source of the quotation: Experts at the Table: The challenges to build a single chip to handle future autonomous functions of a vehicle span many areas across the design process.

Source: Semiconductor Engineering – Challenges To Building Level 5 Automotive Chips

Semiconductors get old, and eventually die. It’s getting worse.

I once assumed that semiconductors lasted effectively forever. But even electronic devices wear out. How do semiconductor companies plan for aging?

There has never been a really good solution, and according to this article, problems are getting worse. For example, electronics in cars continue to get more complex (and more safety critical). But cars are used in very different ways after being sold, and in very different climates.This makes it impossible to predict how fast a particular car will age.

Electromigration
Electromigration is one form of aging. Credit:  JoupYoup – Own work, CC BY-SA 4.0, 

When a device is used constantly in a heavy load model for aging, particular stress patterns exaggerate things. An Uber-like vehicle, whether fully automated or not, has a completely different use model than the standard family car that actually stays parked in a particular state a lot of the time, even though the electronics are always somewhat alive. There’s a completely different aging model and you can’t guard-band both cases correctly.

The First Smart Air Quality Tracker?

The first microprocessor is almost 50 years old, but microprocessors (MPUs) continue to revolutionize new areas. (First MPU = Intel 4004, in 1971, which Intel designed for a calculator company!) In concert with Moore’s Law and now ubiquitous wireless two-way wireless data transmission (thanks, Qualcomm!). smartphones have become a basic building block of many products.

A companion to explain what’s in your air, anywhere. Flow is the intelligent device that fits into your daily life and helps you make the best air quality choices for yourself, your family, your community.

Source: Flow, by Plume Labs | The First Smart Air Quality Tracker

Here is a quick review I wrote of the “Flow” pollution meter, after using it for a few months.  I wrote it as a comment on a blog post by Meredith Fowlie about monitoring the effects of fires in N. California.

Memorial Sloan Kettering’s Season of Turmoil – The New York Times

America’s health care research system has many problems. The overall result is poor return on the money spent. The lure of big $ is a factor in many of them. Two specific problems:

  • What gets research $ (including from Federal $) is heavily driven by profit potential, not medical potential. Ideas that can’t be patented get little research.
  • Academic career incentives distort both topics of research (what will corporate sponsors pay for?) and publication. The “replicability crisis” is not just in social sciences.

This NYT article illustrates one way that drug companies indirectly manipulate research agendas: huge payments to influential researchers. In this article, Board of Directors fees. Large speaking fees for nominal work are another common mechanism. Here are some others:

Flacking for Big Pharma

Drugmakers don’t just compromise doctors; they also undermine top medical journals and skew medical research. By Harriet A. Washington | June 3, 2011

I could go on and on about this problem, partly because I live in a biotech town and work at a biotech university. I have posted about this elsewhere in this blog. But since it’s not an area where I am doing research, I will restrain myself.