This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Names You Need to Know in 2011: R Data Analysis Software

Updated Aug 11, 2011, 12:34pm EDT
This article is more than 10 years old.

Simply put by one of its staunchest advocates, "R is the most powerful statistical computing language on the planet; there is no statistical equation that cannot be calculated in R."

Beyond "just" a language, R is a toolset, a community, and a lot of free software.

"Everyone can, with open source R," Norman Nie says in Quentin Hardy's article, "afford to know exactly the value of their house, their automobile," their current business and prospects. Nie has built a successful business providing services and support for R. (Thanks to community member johnkolchak for this correction.)

Ross Ihaka and Robert Gentleman, then both at Auckland University in New Zealand, created the R Project informally around 1990. The R Core Team, currently at 19 members, is responsible for the development of the basic R software.

This post is part of an ambitious project to crowdsource the December issue of Forbes Magazine. It's based on a suggestion to "Names You Need To Know" by one of our community members (Hat Tip, Kurt Grela). That R is rapidly augmenting or replacing other statistical analysis packages at universities, is being written about in The Register, The New York Times, and Forbes, and is exposing data analysis to millions of Do-It-Yourselfers makes R a Name You Need to Know in 2011.

The community of users and developers around R are 2-million strong. A significant body of their work on R is available freely on The Comprehensive R Archive Network. Once in peoples' hands, R can be used to a number of ends.

Dataspora has, for example, used R to analyze every MLB pitcher and pitch of the 2008 season. Here you can see Mariano Rivera's tight distribution, location and velocity. And here you see how the same information for Tim Wakefield makes you want to send his catchers a care package.

You can imagine how Theo Epstein could (does?) put R to use.

Facebook has used R to figure out that "just two data points are significantly predictive of whether a user remains on Facebook: (i) having more than one session as a new user, and (ii) entering basic profile information."

For non-data-wonk professionals, R can make pretty pictures of complex information. Anyone capable of creating a spreadsheet can load that same sheet into R and begin playing with plots, charts, and graphs. This should not be undervalued: making a management presentation fraught with data (think: ad page sales by geography; segmented by industry; year-to-date; vs. last year-to-date) is a lot more palatable when there's a stunning infograph in your deck.

It's not that this type of analysis wasn't possible before - statisticians have existed, and commercial software has been available to support them, for decades. The fact that R is free to use, free to modify, and its source is open to view, extend and improve means students, stock traders-in-training and fantasy football junkies can familiarize themselves with the software. They can write programs against it. They're likely to continue that usage into their professional lives. When they share their work, the community, down the line, benefits. And the virtuous cycle strengthens.

R is also making incursions into the domain of the dominant non-free players in statistical software like SAS, Stata and SPSS (which was created by Nie and is now part of IBM). It's another leveling of the playing field. It's competition making the market vital. Free and open source makes software more widely-available. The potential user pool gets deeper, wider. More comers means more experts.

If you have graphs and charts you can link to in the comments, please do. Success, and other, stories are welcomed.


UPDATE: Thanks to community (i.e. your!) participation, this R piece is going to be part of the "Names You Need to Know" feature in a December issue of Forbes Magazine. We're looking for infographs, plots, etc. to use in the print piece. Please add your recommendations in the comments below by Tuesday (11.23) AM if you would like to be considered and you can give us permission to use your images. They need to be as high-resolution as possible. Thank you -