Rescuing a medical treatment from failure in a clinical trial by using  Post Hoc Bayesian Analysis 

How can researchers maximize learning from experiments, especially from very expensive experiments such as clinical trials? This article shows how a Bayesian analysis of the data would have been much more informative, and likely would have saved a useful new technique for dealing with ARDS.

I am a big supporter of Bayesian methods, which will become even more important/useful with machine learning. But a colleague, Dr. Nick Eubank, pointed out that the data could also have been re-analyzed using frequentist statistics. The problem with the original analysis was not primarily that they used frequentist statistics. Rather, it was that they set a fixed (and rather large) threshold for defining success. This threshold was probably unattainable. But the clinical trial could still have been “saved,” even by conventional statistics.

Source: Extracorporeal Membrane Oxygenation for Severe Acute Respiratory Distress Syndrome and Posterior Probability of Mortality Benefit in a Post Hoc Bayesian Analysis of a Randomized Clinical Trial. | Critical Care Medicine | JAMA | JAMA Network

Here is a draft of a letter to the editor on this subject. Apologies for the very academic tone – that’s what we do for academic journals!

The study analyzed in their article was shut down prematurely due to the unlikelihood that it would attain the target level of performance. Their paper shows that this might have been avoided, and the technique shown to have benefit, if their analysis had been performed before terminating the trial. A related analysis could usefully have been done within the frequentist statistical framework. According to their Table 2, a frequentist analysis (equivalent to an uninformative prior) would have suggested a 96% chance that the treatment was beneficial, and an 85% chance that it had RR < .9 .

The reason the original study appeared to be failing was not solely that it was analyzed with frequentist methods. It also failed because the target threshold for “success” was set at a high threshold, namely RR < .67. Thus, although the full Bayesian analysis of the article was more informative, even frequentist statistics can be useful to investigate the implications of different definitions of success.

Credit for this observation goes to Nick. I will ask him for permission to include one of his emails to me on this subject.

Fraudulent academic journals are growing

Gina Kolata in the NY Times has been running a good series of articles on fraudulent academic publishing. The basic business model is an unholy alliance between academics looking to enhance their resumes, and quick-buck internet sites. Initially, I thought these sites were enticing naive academics. But many academics are apparently willing participants, suggesting that it’s  easy to fool many promotion and award committees.

All but one academic in 10 who won a School of Business and Economics award had published papers in these journals. One had 10 such articles.

Continue reading

Going back to school again – a shopping list | The Thesis Whisperer

Hardware and software list for new PhD students. It’s a good starting place. (I guess February is the start of the new academic year “Down Under.”)  Source: Going back to school again – a shopping list | The Thesis Whisperer

My additional suggestions:
1. Reference manager software is essential. EndNote is radically overpriced and behind in terms of features, but unfortunately is standard in certain fields. There are open source alternatives (Zotero). I use Bibdesk, which is only about $50.

2. A document manager is also essential. And I disagree with her comment about using a different manager for each source (PDF, web pages, etc.) To start with, Evernote is an OK but very lightweight document manager – it is not easy to find things once your library gets large (I have > 10,000 documents, but that is after years in academia). Better alternatives are:

  • EagleFiler
  • Devonthink (much more complex and correspondingly harder to learn, but also more powerful. For example, you can link to specific locations inside documents).
  • Both of these allow you to store ALL kinds of documents and to easily display, search, reorganize, and annotate them while still in the main application. For web pages, you have the option of storing as HTML, or converting to PDF.
  • A third option, which went through years of bugs but is apparently now OK, is Papers (papersapp.com). But it only has native support for PDF – everything else has to be stored as PDF. Still, if I were starting over I would give it a serious look.
  • Finally, Bibdesk or other reference managers may be OK for managing documents, although I’ve never tried it.

Scrivener and its ilk for writing you have discussed elsewhere.

Good data mining reference books

The students in my Big Data Analytics course asked for a list of books on the subject they should have in their library. UCSD has an excellent library, including digital versions of many technical books, so my  list is entirely books that can be downloaded on our campus. Many are from Springer. There are several other books that I have purchased, generally from O’Reilly, that are not listed here because they are not available on campus.

These are intended as reference books for people who have taken one course in R and data mining. Some of them are “cookbooks” for R. Others discuss various machine learning techniques. BDA16 reference book suggestions

If you have other suggestions, please add them in the comments with a brief description of what is covered.

Tumblr? Pinterest? What should I use?

What’s a good place to put supplemental information, especially photos and tables, for my book? I have a lot of old photographs, and putting them into the book itself gets expensive. Some are in color and some are very large. Here are a few examples.
I could set up my own site, or use my publisher’s, but places like Tumblr know how to run photo sites. The ideal features I want include being able to link to pictures on other sites (due to copyright restrictions), able to create tables of contents, etc. Straight chronology won’t suffice.

Obvious candidates

include Tumblr, Pinterest, Instagram. I don’t use any of them except to dabble, so I don’t know their strengths. Possibly Twitter or Facebook?

All advice welcome. Email me, or post comments here.

The Ph.D. Student’s Ticking Clock

The Ph.D. Student’s Ticking Clock – Graduate Students – The Chronicle of Higher Education.

Many of my former students come back years later and ask my advice about getting a PhD. I generally tell them that a PhD program is like a monastery – you have to love the pursuit of knowledge, for its own sake,  to make it bearable. If you are doing it only in pursuit of a post-graduation goal, it is too hard a life.

Voila_Capture 2014-10-21_04-35-12_PM This article includes a startling graph on time-to-graduation. I graduated from MIT in 1982, after 4 years. According to the graph, the average time in social sciences then was 8 years?!  I had a lot of breaks (NSF Fellowship, stipend from one of my thesis advisors, pregnant wife to provide emotional support and incentive!) but 4 to 5 years seemed like the norm in my program.

In any case, the second half of the article has some realistic advice about the stresses of protracted graduate programs, and about the importance of your particular advisor’s style.

Kindle books and academic research = needless pain

I’ve probably purchased 300 books in the last year for research purposes, not to mention all the fiction my wife gets (and so do I, if it costs $3 or less).  For the newer ones , buying them as eBooks is generally an option. But the state of software, DRM, and copy protection for Kindle books is a mess. Kindle’s software (like iBooks) is deliberately crippled – no copying into another document, no printing, and especially no way to copy diagrams. I’m running Kindle’s software on my Mac and on an iPad, rather than using a Kindle tablet, but that barely helps.

Librarians against DRM

Continue reading