Lisa Green on Open Data

by Sofia

Notes from Lisa Green’s talk chez Montreal Girl Geeks.

YES Montreal Women in Tech event on 30 Oct

– It’s a data world! We have a lot of computation power, but only certain people are allowed to have data, and storage is not cheap enough yet for big computing.
– the more people working on a problem, the faster it gets solved

gratis ………. commercial

libre ………. proprietary

– NHS took leap of faith and published Rx info for England. w/in weeks, someone found out how to save UK some millions of dollars but tracking Drs who prescribed a brand name drug that was proven to be no more effective than its generic form.

Common Crawl
semantic web guy, Gil Elbaz, Applied Semantics
google has a copy of the web!
predict flu tends, economic simulations, etc. better than the lead people on the field
Gil creates Common Crawl to emulate Google’s copy of the web
idea: the web is an incredibly valuable corpus of data, and everyone should have it
~300TB of data, ~8 billion pages
like internet archive, but accessible.
commoncrawl.org/get-started
common crawl makes it possible for research, individuals, education, commercial entities (e.g., makes it cheaper for startups to start. this is worthwhile! promotes economic diversity)

other examples of things that have been done
-web data commons, in germany. nature of metadata. what’s being adopted? why? who is using RDFa?
– zyxt labs, inc. found that 22% of web pages contain FB urls. 8% of web pages implement Open Graphs tags. matthew berk. “how much has FB infected the web?” now in social search.
– data publica, mapping french websites related to open data
– sentiment analysis (but does not detect sarcasm… yet)
– plagiarism detection
– concept mapping
– machine translation
– mozilla study of web design
– product index

hadoop – can process data by the TB
amazon gives free storage to open data
github repository of open source code
map reduce for the masses – steve salevan

open data needs you!
everyone needs to code, she says. discoverability is a big issue. open data is a fledgling, not guaranteed to succeed yet. we need to protect it. and we need to spread the word that it’s not scary.

lisa green @boudicca

open data is very much in line with broadening human knowledge, which is the ultimate purpose of science.

open access movement needs to be supported so that academic knowledge can be open, too.

– netflix prize: gave prize to person who came up with a better algorithm for what people will like. good model, shows commercial entities the value of open data.
– kaggle

revenue models need to be different.

LinkedIn pays people to build Apache, because they use the tool, but then the public has access to it.

oer – open educational resources

reputation and reach on the internet is convertible to money on the internet. cory doctorow and jonathan someone, a photographer, who made much more money with creativecommons image than all the ones he copyrighted.

Advertisements