cultivating & crashing

an organic collection of notes, observations, and thoughts

Tag: biostats

Semester 2

Today began the second semester of epi, and I’m already in love with the Data Analysis course. In it we learn how to analyze data using both frequentist and Bayesian approaches. The former is the dominant and most standard approach, and in sum it’s arbitrary, clunky, and susceptible to manipulation. The latter is a more informed approach of analysis that takes into consideration all prior knowledge; this makes it more informative and rigorous but also harder to do. Frequentist analysis is what I’ve learned since high school; Bayes is what I get to get my hands dirty in now. I can’t wait!

Bayes vs Frequentist

Notes on Bayesian adaptive design, aka the future

Notes from a biostats seminar today by Kristine Broglio, M.S., statistical scientists at Berry Consultants. I was expecting another biostatistics talk where I would follow 10% and glaze over aggressive looking letter salads of the Greek alphabet the rest of the time. Instead, I heard about THE FUTURE. This is how controlled randomized trials will look like in the future, I am sure. It’s too spectacular not to.

fixed design is based on the math we were good at doing 200 years ago

– adaptive randomization (randomize to different groups)
– statistical modeling

can chase random highs by randomizing more patients to it, which will smooth random deviations, so can treat more patients in the best arm

Roger Perlmutter (Merck executive): we do 21st century biology in our laboratories and then do clinical trials that Hippocrates would have been comfortable with

would you rather be the last person enrolled in a trial or the first person to receive treatment?
Bayesian adaptive design shortens the gap between the two options.
would you rather be the first person enrolled in a trial or the last person in enrolled in the same trial?
clearly the latter.

are able to determine which arm of treatment is most likely to come out as the winner during interim analyses.
can stop more trials earlier because trial is more efficient

bayesian is a natural way to think about adaption. bayesian comparative design is how drs think anyway, but not limited to it. but frequentist can also use this design.

clinicians loved it, patients loved it, commercial statisticians loved it. only academic statisticians were not sure about it.

uses almost no priors, or really non-informative ones

on biostats and habits

in final week of biostats, and, as usual, it’s pretty exciting (once I stop procrastinating and sit down to my lectures).

last week we saw how admissions to UC Berkeley’s graduate programs, which aggregated into male/female, accepted/not accepted groups, seemed to expose a glaring bias towards accepting males (i.e., more males accepted than females). as it turns out, it only looks like men were favoured in applications, whereas in fact most women were applying to programs that had lower acceptance rates, whereas men applied to ones easier to get into. when comparing the acceptance rates, women + men were equal, and in one case women were actually favoured. this illustrates the importance of isolating confounding variables/effects. in order to do this, you take stratified samples, compare, then verify if effects are still similar, then apply relative weights and pool together to see if concurs with the partial effects.

in this week’s lessons the prof explains why relative odds ratios are so popular/important, which is that the probability/odds of a certain outcome given a certain risk factor, it is equivalent to the odds of the risk factor given the outcome. below is a visual representation and the mathematic proof of why this is true. (this example is based on a study concerning infant mortality in the first 180 days of life and night blindness in the mothers.)

odds proof

the practical implication of this trait, called the invariance of odds ratio, is that you can used case-control studies in order to approximate relative risk. e.g., you can find out what the relative lung cancer risk of smoking is by going to the hospital and counting up how many lung cancer patients smoked and didn’t smoke. this is valuable because that kind of study is much simpler and often cheaper than working in the opposite direction.

this is one of the instances where math (the art of tinkering with numbers) reveals some really cool shit about how the world works, which I find very exciting. data about the world are just sitting around, waiting to be seen in just the right way so as to reveal the answers about which we care about quite a lot (does smoking cause cancer or not? does x easily-fixed risk factor cause death in impoverished people or not? are college admissions distorted by sexism?).

my master’s is definitely going to have to include statistics. in fact, my aim at this point is to do the highest level of stats possible given the (non-stats) background I have currently.

in other news, I had been obsessing about how I haven’t been running this summer. every time I got dressed in the morning, every time I looked at the musculature of the summer-exposed arms and legs of athletic people, every time I log the exercise I’ve done (yoga, 7-minute workout, bike 3hrs), I felt the smart of knowing that weeks kept going by of me not running on a regular basis, in a way that was not so much about what had not happened but a sinking feeling that running is just not something I can or will ever do. this is something that I do a lot; my default expectation in meeting people/embarking on projects/taking a class/learning a skill is failure. it takes me weeks of pushing forward blindly, unconvinced, before I will blink and realize I’m not only doing it, but doing it half decently. with the running, it took me realizing that I was committed to the idea that I simply will never be a runner, and so far this week I’ve gone running every other day. obviously it’s just the first week, but the difference is in how I feel about my capacity to do it, if I want to. as I said, this irrational automatic response is present in all aspects of my life, so I hope to counter it in other parts slowly; first realizing I’m doing it, then actively reminding myself that this is irrational because it is unfounded in reality, and then taking steps to show myself otherwise. if we are what we repeatedly do, then I will learn by painstakingly repeating this cycle of steps.

Gaussian distribution

I’m taking a Coursera course on biostatistics given by Scott Zeger of Johns Hopkins. It’s really interesting and it makes me want to do this in a more serious capacity. Other than that, yesterday I finally found an answer to a question I’ve had since my first statistics class back in 2006. We were being introduced to the normal distribution, which is misnomer since there is actually nothing normal about it. Most people know it as the dreaded bell curve on which people are marked since grade school. It’s called the Gaussian distribution, and, according to Zeger, is the basis of almost all statistical analysis. My question was whether the shape of the distribution was what it was due to something uncannily and uniformly perfect throughout nature or whether it had something to do with computation that makes things fall into such a perfect curve. The prof had told me it just is that way, which made little sense and always bothered me. As it turns out, the distribution is based on the central limit theorem, and reflects the mean of samples of independent observations or measurements from a certain population. It is these means that are normally distributed, and all distributions (no matter what shape) will exhibit means that follow the Gaussian distribution.

Here it is. It’s pretty.

gaussian distribution

On a separate note, I love the prof and the way he speaks, as well as the quirky details he weaves into his lectures. It reminds me of that romantic idea of life in the ivory tower that I used to idolise and want so much. (Or maybe I still do.) In any case, every time I listen to these lectures I feel that I’ve been wasting time studying social constructs when I could have been learning about how to understand and manipulate real-world data. But I feel like I could catch up in a small way if I spend a little bit of time every day messing around with datasets, putting them through R and visualising them, seeing what I can find, figuring out what I can do. It’s a start and it’s already exciting.