cultivating & crashing

an organic collection of notes, observations, and thoughts

Tag: statistics

TIL

intercept and slope = ordenada al orígen y pendiente

Advertisements

epi epi epi

Yesterday I went to an amazing seminar by Madhu Pai on open access publishing. Everyone doing research should consider open access publishing, including asking non-open access journals to publish your article under an open access license.

Notes:

  • In Pai’s experience, most editors say yes when ask if the paper can be open access. So no matter what the journal, just ask, providing some rationale.
  • Space for negative findings, and access to original data!
  • Leaders: Welcome Trust, Gates Foundation require their work to be published with OA license.
  • OA has forced traditional publishing to evolve: many now have OA branches.
  • Last 5 years has seen explosion of fraudulent OA journals. BEWARE OF THESE. Beall’s list is a good black list, but when will there be a white list?

Today I learned about factor analysis, and it strikes me that 1. statisticians are just humans like the rest of us (including being not so smart sometimes), and 2. math and statistics are still the best things ever and I should do a Master’s in stats one day.

Bayesian analysis in the NYTimes

The Odds, Continually Updated

I had read the story of the fisherman who was found, but had no idea Bayesian statistics was the hero.

Multiple imputation

Multiple imputation is my new favourite thing. I still need to figure out how Stata manages (pretends?) to do it, or if it’s just executing an approximation of the Bayesian process of iteratively drawing samples from distributions of alpha and the beta(s) in the model. To be continued.

Null hypothesis testing: methodus non grato

The journal Basic and Applied Social Psychology just announced that it will no longer publish p-values, test statistics, confidence intervals, or any other null hypothesis testing procedures. This is pretty extreme, and equally exciting. From the little that I understand, Bayesian analysis is slowly but surely gaining ground against frequentist analysis, and this is one more (rather large) step in that direction.

Read the article here.

R Notebook

IPython, but for R. I cannot wait to use this on my next data analysis assignment!

http://nbviewer.ipython.org/gist/msund/d3e00e5e27dff31a7b6d

When statistical significance is statistically insignificant

This is my new favourite thing.

By the authors of Bayesian Data Analysis, a paper published in The American Statistician entitled “The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant.”

http://www.stat.columbia.edu/~gelman/research/published/signif4.pdf

How to install and run WinBUGS on Mac OS X

A guide for non-geeks

  1. Download Wineskin Winery
    (Page: http://wineskin.urgesoftware.com/tiki-index.php?page=Downloads
    Direct link: http://sourceforge.net/projects/wineskin/files/Wineskin%20Winery.app%20Version%201.7.zip/download)
    a. Click “Wineskin Winery … (click me to download)”
    b. Click “Save File” to accept the Wineskin Winery app
    c. Find the .zip file in your Downloads folder and click on it to expand (unzip) it
    d. The Winery app should now appear (red icon)
    e. Click on the Winery app to open it
  2. Download the WinBUGS.exe file from the MRC website
    Page: http://www.mrc-bsu.cam.ac.uk/software/bugs/the-bugs-project-winbugs/
    Direct link: http://www.mrc-bsu.cam.ac.uk/wp-content/uploads/WinBUGS14.exe
  3. Save the key
    a. Click the link for the key
    (Direct link: http://www.mrc-bsu.cam.ac.uk/wp-content/uploads/WinBUGS14_immortality_key.txt)
    b. If the .txt file opens in the browser, save it (⌘S) to your Downloads folder
  4. Save the patch
    a. Click the link for the patch under the Quick start heading
    (Direct link:
    http://www.mrc-bsu.cam.ac.uk/wp-content/uploads/WinBUGS14_cumulative_patch_No3_06_08_07_RELEASE.txt)
    b. If the .txt file opens, save it, too
  5. Configure your Wineskin app
    a. Go to your opened Wineskin app, click on the + to install an engine
    b. Click “Download and Install” to get the most recent engine
    c. Click “OK” at the dialog box
    d. Click “Update” to update Wineskin, and “OK” at the dialog box
    e. Click “Create New Blank Wrapper” to do exactly that
    f. Type in WinBUGS to name your new wrapper
    g. Allow Wineskin to accept incoming network connections
    h. Click “Install” at the Wine Mono Installer
    i. Click to view wrapper in Finder
    j. Drag your WinBUGS wrapper to your Applications folder
    k. Click on the WinBUGS wrapper to open in
  6. Install WinBUGS in Wineskin
    a. Click on “Install Software”
    b. Click “Choose Setup Executable”
    c. Select the WinBUGS14.exe file from your Downloads folder
    d. The WinBUGS installation wizard will now open; click “Next>” until installation is complete, then click “Finish”
    e. Wineskin will ask you which executable file to use, select WinBUGS14.exe and click “OK”
  7. Run WinBUGS
    a. Click on your WinBUGS wrapper in your Applications folder to open WinBUGS
    8. To add the key and patch to WinBUGS, follow the directions in each .txt file

Statistical armageddon / scientific postmodernism

Idea: As our scientific precision increases, it is necessary to proportionally (or exponentially) expand our capacities for computing the effect. Example: in order to definitively statistically detect a difference in males and females in incidence of heart attacks we might need, say, a total sample size of 100 (so that we have 50 people in each group so that our statistical calculations will be robust enough to make solid conclusions from). But as we learn more about heart attacks, we realize that there are more variables that affect heart attacks, like age, diet, physical activity, abdominal adiposity, educational attainment, socio-economic status, smoking, and alcohol consumption. Assuming each of these variables are dichotomized into only two groups (young, old; healthy diet, unhealthy diet; active, inactive; etc.) and that 50 people in each group is still enough to detect true difference in each variable (which is unrealistic), we would now need 25,600 people to tease out the effects of all of these different variables. With more nuanced categories, this number climbs very quickly. We know a lot about heart attacks nowadays but there is still unexplained variation in the effects we see, which means there are other things we’re not measuring and accounting for.

Not think of whole-genome research, where we are handling approximately no less than 125 megabytes of information from just one single person. Now think of the entire genome of one person’s entire microbiome. How many people would we need then? More than exist on Earth.

My prediction is that soon we’ll realize that the more we know, the less we can continue to learn. We will be reduced to underpowered tests of small questions. We will have hit an upper limit of what we can isolate or definitively know about anything, and there will be nothing we can do about it. (But before that, our computers will not have enough computing power and even before that we will never have enough money to even begin to do one sound study of these proportions). We’ll get to the point where we must resign ourselves to not knowing what we want to know. Science will become postmodern, accepting that we can’t do what our methods set out to do. It will be a kind of scientific Armageddon, having arrived at the limits of statistical possibility.

Just a thought.

In the meantime I’m now wondering if I can use Bayesian data analysis to analyze prevalence of diabetes to account for error in diagnosis. “Live by the harmless untruths that make you brave and kind and healthy and happy.

Semester 2

Today began the second semester of epi, and I’m already in love with the Data Analysis course. In it we learn how to analyze data using both frequentist and Bayesian approaches. The former is the dominant and most standard approach, and in sum it’s arbitrary, clunky, and susceptible to manipulation. The latter is a more informed approach of analysis that takes into consideration all prior knowledge; this makes it more informative and rigorous but also harder to do. Frequentist analysis is what I’ve learned since high school; Bayes is what I get to get my hands dirty in now. I can’t wait!

Bayes vs Frequentist