Monday, October 20, 2014

Degrees of Choice: A modification of a WSJ graphic

This post uses ggplot2 and rCharts to modify a grouped bar chart that appeared in an article that appeared online on October 19, 2014 in Wall Street Journal’s site. The article was titled “How to Sell a Liberal-Arts Education”. The code for the Rmarkdown file that generated this post can be found on github.

Please click this link to visit the post (http://patilv.com/WSJMod/) . 



Wednesday, September 17, 2014

Animated choropleths to visualize mortality rates of children under 5 and gender differences using rMaps

This post displays two animated choropleths. One for global mortality rates for children under 5 (per 1000 live births) and the second for the difference in global mortality rates for males and female children under 5 (per 1000). Please click here: http://patilv.com/MortalityUnder5/

Using great circles and ggplot2 to map arrival/departure of 2014 US Open Tennis Players

Please click on the image for information on how to use R and ggplot2 to generate this plot. 

Tuesday, June 17, 2014

Studying Ted Talks, Anscombe's Quartet, and Modern Languages Enrollment

While the feed from a newer github/jekyll blogging platform (patilv.github.io) is registered with blog aggregators, here are snippets of three posts that were recently published at the new site. Please click on the titles to visit the corresponding page.


A recent article in openculture.com by Dan Colman mentioned that there was a list of 1756 Ted Talks maintained by “someone” in a spreadsheet format. A link to this sheet can also be found on this page on Wikipedia. It was titled “Ted Talks as of 5/23/2014”. I downloaded that spreadsheet on 6/12/2014 from this link and saved that as a csv file. It turned out to be a list of 1755 talks. Here, I make a wordcloud of the titles of these talks and a few ggplots to identify speakers with 3 or more appearances using Karthik Ram’s Wes Anderson palette for R. The code and data for this post can be found on my github site at this link.


Anscombe’s quartet is a set of four datasets with two variables (x and y) and 11 observations.It has been been used to demonstrate the importance of graphically displaying data. It has appeared not only in books (for example, in the first page of the first chapter of Tufte’s seminal work, Visual Display of Quantitative Information), but also in scholarly papers (for example, see Healy and Moody, 2014), and blog posts (for example, see Hirst). Here, I use ggvis in the shiny environment to play with the quartet. The code for the post and the accompanying shiny app can be found on my github site.


Published first at KD Nuggets. This was an extension of my earlier post on Modern Languages Enrollments in the US. In that, I used data from MLA surveys of enrollments in institutions of US higher education between 1983 and 2009 and found that enrollments in Indian languages were low, compared to enrollments in 10 other languages, besides English. These 10 languages were French, German, Italian, Japanese, Spanish, Arabic, Chinese, Korean, Portuguese, and Russian. In this extension, I used data from 22 survey years since 1958, the first year for which the modern languages enrollment database provides data, to study the pattern and number of students enrolling in these 11 languages. 

Friday, May 9, 2014

Enrollments in US in Different Languages using rCharts and ggplot2


UPDATE: THE BLOG/SITE HAS MOVED TO GITHUB. THE NEW LINK FOR THE BLOG/SITE IS patilv.github.io and THE LINK TO THIS POST IS: http://bit.ly/1pi5z8l . PLEASE UPDATE ANY BOOKMARKS YOU MAY HAVE.

Sunday, May 4, 2014

Monday, February 10, 2014

Scraping Pro-Football Data and Interactive Charts using rCharts, ggplot2, and shiny

UPDATE: THE BLOG/SITE HAS MOVED TO GITHUB. THE NEW LINK FOR THE BLOG/SITE IS patilv.github.io and THE LINK TO THIS POST IS: http://bit.ly/1k0mKWI. PLEASE UPDATE ANY BOOKMARKS YOU MAY HAVE.

This post uses pro-football (American) boxscore data from 1966 through 2013 and generates few interactive charts using rCharts, ggplot2 and shiny.  It also provided a first time exposure to the power of dplyr. Data for these charts were scraped from the excellent reference site, pro-football-reference.com, using a function written in R. (This site has been used previously by other bloggers as a source for their data as well. See here and here for two examples.) Rest of this post has been created using slidify. The code for this post and relevant data are available at github. The code for the shiny application can be found here on github. (shiny is amazing, thanks R-Studio team and a big thanks to Ramnath Vaidyanathan for his support on rCharts).

Friday, January 17, 2014

Animated choropleths using animation, ggplot2, rCharts, googleVis and Shiny to visualize violent crime rates in different US States across 5 decades

UPDATE: THE BLOG/SITE HAS MOVED TO GITHUB. THE NEW LINK FOR THE BLOG/SITE IS patilv.github.io and THE LINK TO THIS POST IS: http://bit.ly/1jccIBN. PLEASE UPDATE ANY BOOKMARKS YOU MAY HAVE.

This post uses animated choropleths to visualize violent crime rates in different US States across 5 decades (1961-2010). Data are retrieved from Quandl.  Animation of rCharts and googleVis based choropleths is done in the shiny server environment and ggplot2 based choropleths are animated using the animation package. [Even though using shiny for ggplot2 based choropleths would've greatly reduced the effort required, the animation package usage made the effort worthwhile.] Rest of this post is generated using slidify and the code for doing so can be found on github. [In an update on 19 Jan, I had to move the shiny app to a different server because of some "technical glitches" with the previous server. All should be well now with the revised code and server.]