intercept and slope = ordenada al orígen y pendiente
Yesterday I went to an amazing seminar by Madhu Pai on open access publishing. Everyone doing research should consider open access publishing, including asking non-open access journals to publish your article under an open access license.
Today I learned about factor analysis, and it strikes me that 1. statisticians are just humans like the rest of us (including being not so smart sometimes), and 2. math and statistics are still the best things ever and I should do a Master’s in stats one day.
Multiple imputation is my new favourite thing. I still need to figure out how Stata manages (pretends?) to do it, or if it’s just executing an approximation of the Bayesian process of iteratively drawing samples from distributions of alpha and the beta(s) in the model. To be continued.
The journal Basic and Applied Social Psychology just announced that it will no longer publish p-values, test statistics, confidence intervals, or any other null hypothesis testing procedures. This is pretty extreme, and equally exciting. From the little that I understand, Bayesian analysis is slowly but surely gaining ground against frequentist analysis, and this is one more (rather large) step in that direction.
Read the article here.
IPython, but for R. I cannot wait to use this on my next data analysis assignment!
This is my new favourite thing.
By the authors of Bayesian Data Analysis, a paper published in The American Statistician entitled “The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant.”
A guide for non-geeks
Idea: As our scientific precision increases, it is necessary to proportionally (or exponentially) expand our capacities for computing the effect. Example: in order to definitively statistically detect a difference in males and females in incidence of heart attacks we might need, say, a total sample size of 100 (so that we have 50 people in each group so that our statistical calculations will be robust enough to make solid conclusions from). But as we learn more about heart attacks, we realize that there are more variables that affect heart attacks, like age, diet, physical activity, abdominal adiposity, educational attainment, socio-economic status, smoking, and alcohol consumption. Assuming each of these variables are dichotomized into only two groups (young, old; healthy diet, unhealthy diet; active, inactive; etc.) and that 50 people in each group is still enough to detect true difference in each variable (which is unrealistic), we would now need 25,600 people to tease out the effects of all of these different variables. With more nuanced categories, this number climbs very quickly. We know a lot about heart attacks nowadays but there is still unexplained variation in the effects we see, which means there are other things we’re not measuring and accounting for.
Not think of whole-genome research, where we are handling approximately no less than 125 megabytes of information from just one single person. Now think of the entire genome of one person’s entire microbiome. How many people would we need then? More than exist on Earth.
My prediction is that soon we’ll realize that the more we know, the less we can continue to learn. We will be reduced to underpowered tests of small questions. We will have hit an upper limit of what we can isolate or definitively know about anything, and there will be nothing we can do about it. (But before that, our computers will not have enough computing power and even before that we will never have enough money to even begin to do one sound study of these proportions). We’ll get to the point where we must resign ourselves to not knowing what we want to know. Science will become postmodern, accepting that we can’t do what our methods set out to do. It will be a kind of scientific Armageddon, having arrived at the limits of statistical possibility.
Just a thought.
In the meantime I’m now wondering if I can use Bayesian data analysis to analyze prevalence of diabetes to account for error in diagnosis. “Live by the harmless untruths that make you brave and kind and healthy and happy.
Today began the second semester of epi, and I’m already in love with the Data Analysis course. In it we learn how to analyze data using both frequentist and Bayesian approaches. The former is the dominant and most standard approach, and in sum it’s arbitrary, clunky, and susceptible to manipulation. The latter is a more informed approach of analysis that takes into consideration all prior knowledge; this makes it more informative and rigorous but also harder to do. Frequentist analysis is what I’ve learned since high school; Bayes is what I get to get my hands dirty in now. I can’t wait!