cultivating & crashing

an organic collection of notes, observations, and thoughts

Tag: r

How to stats right

This article in PLOS just became my blueprint for learning after I finish this Master’s. I want to spend as much time as I can learning how to do proper data analysis (both the stats as well as the process) so that I have the skills for when I start looking for jobs because, ultimately, I want to get paid to read, analyze, and interpret numbers.

So I’m adding some items to my technical wishlist:

  1. Github for version control
  2. R Markdown
  3. knitr

Wish me luck!

Mac vs. PC

R graphic generated on PC

JCog PC

R graphic generated in Mac

JCog Mac

 

Did not know about this.

do not attach()

Note to self: DO NOT USE attach(). Do whatever it takes to avoid it.

I would up making a mess of my variables, to the point were the variables of the dataframe I was using were masking themselves four times over.

In case this happens in the future, to get out of the mess:

  1. locate the code you used to create dataframe
  2. figure out how you’ve manipulated the variables since you attached the dataframe they came from so you can run these transformations again later, so you can keep doing what you were supposed to be doing in the first place instead of messing up your Global Environment
  3. search() to look at what you currently have sitting in your path. you will see copies of the attached dataframe. the more copies, the more silly you were
  4. rm() variables that should be IN the dataframe that you cavalierly created outside of it, like a numbnuts
  5. detach(dataframe) so package it back up
  6. rm(dataframe) to delete the whole goddamn thing (one iteration of it, that is)
  7. repeat steps 2-4 until you see no more iterations of the dataframe in your path
  8. re-create dataframe using code from step 1
  9. correct all code to include dataframe$ before variable names
  10. repeat all of your manipulations from step 2
  11. resist the temptation to attach() dataframes ever again

Notes

# rename columns
> View(ghq_elsa)
> ghq_elsa <- setNames(ghq_elsa, c("ghqconc", "ghqsleep", "ghquse", "ghqdecis", "ghqstrai", "ghqover", "ghqenjoy", "ghqface", "ghqunhap", "ghqconfi", "ghqworth", "ghqha"))

# export to .txt file
> write.table(ghq_elsa, "c:/[path]/data.txt", sep="\t")

Mass shootings dataset

First small barplot on the mass shootings dataset. Frustrated that I don’t know why the Saturday label won’t appear.

barplot1

# obtaining counts for shootings per day
cur15$day Friday Monday Saturday Sunday Thursday Tuesday Wednesday
35 31 72 68 28 31 31

# creating dataframe
weekday = c(“Sunday”, “Monday”, “Tuesday”, “Wednesday”, “Thursday”, “Friday”, “Saturday”)
count = c(68, 31, 31, 31, 28, 35, 72)
wkday.count = data.frame(weekday, count)

# barplot, days not in alphabetical order
library(ggplot2)
wkday.count$weekday ggplot(wkday.count, aes(wkday.count$weekday, y= wkday.count$count)) +
geom_bar(stat=”identity”, fill = “dark grey”, colour = “black”, alpha = 1/3) +
ggtitle(“Mass shootings in 2015 (as of October 7)”) + xlab(“Day of the week”) +
ylab(“Number of mass shootings”)

Note to self

# count of non-NAs in dataframe “cog”
colSums(!is.na(cog))

# write first 50 rows of dataset to csv file
w2ktn <- w2[c(1:50),]
View(w2ktn)
write.csv(w2ktn, “w2ktn.csv”, row.names=TRUE)

# get column index from dataframe
which(colnames(w2ktn)==”TDIABET”)

# drop column from dataframe
w2ktn$diab <- NULL

R Notebook

IPython, but for R. I cannot wait to use this on my next data analysis assignment!

http://nbviewer.ipython.org/gist/msund/d3e00e5e27dff31a7b6d

R notes

I’ve downloaded swirl so as to begin learning to use R, and do a refresher on stats while I’m at it.

| Type ls() to see a list of the variables in your workspace. Then, type rm(list=ls()) to clear your workspace.

| When you are at the R prompt (>):
| — Typing skip() allows you to skip the current question.
| — Typing play() lets you experiment with R on your own; swirl will ignore what you do…
| — UNTIL you type nxt() which will regain swirl’s attention.
| — Typing bye() causes swirl to exit. Your progress will be saved.
| — Typing main() returns you to swirl’s main menu.
| — Typing info() displays these options again.

https://github.com/swirldev/swirl_courses

| For more information on something, type help.start() at the prompt, which will open a menu of resources.

 

To be continued

R workshop at McGill

I spent Sunday in an R workshop that was part of the Genomes to Biomes conference, put on by the Montreal R User Group, who did a fantastic job. Slides are here.

I don’t have much time but I want to add some graphics to shame myself into playing around with it until I can do more.

Before I forget, the <- symbol can be added with the [Option] + [-] shorcut.

From the City of Montreal’s Open Data Portal.

Image

The following is data about flowers, but the cool part is the regressions that are so simply drawn, in, as well as the standard error which is visually represented by the grey area surrounding the linear regression line.Image

This is info on the sleeping habits of different animals. Later I can add the code for these graphs, and the raw data.Image

Still have to figure out why I got this error: http://stackoverflow.com/questions/15285089/r-duplicate-row-names-are-not-allowed

And this was good motivation: http://matloff.wordpress.com/2014/05/21/r-beats-python-r-beats-julia-anyone-else-wanna-challenge-r/

And the girl next to me recommended I watch epi docs here: http://topdocumentaryfilms.com/

Siced about R

If I were more gifted, I’d write a little song or poem about learning R and how cool it is. But I’m not, so instead I’m just posting this guide, which I’ll be working on tomorrow when I’m on the train. Whoo!

Getting Started in R