Wednesday, July 17, 2013

Lilac Bloomsday Run in Graphs

UPDATE: THE BLOG/SITE HAS MOVED TO GITHUB. THE NEW LINK FOR THE BLOG/SITE IS patilv.github.io and THE LINK TO THIS POST IS: http://bit.ly/1obWRer. PLEASE UPDATE ANY BOOKMARKS YOU MAY HAVE. 

The Lilac Bloomsday Run is a 7.46 miles (12 kms) race held annually in Spokane, Washington. The run was started in 1977 by Don Kardong and it saw 51,613 people register for it in 2013. In this post, I use R to collect data from their website, clean and organize them, and present some information about the run and its history using graphs generated using ggplot2. The markdown file with the code and output can be found hereThe code by itself, along with the raw data files to replicate everything can be found on github. A pdf document with all pages from their website relevant for these graphs can be found here.


2013 Geographic representation of participants




* People from Spokane participate in it with full spirit. There is international representation as well. 
Gender Composition over Years
* It is no longer male dominated. There is a higher proportion of female finishers.
Presidents/Directors of the Run over Years
In alphabetical order by first name
In order of number of times Presidents/Directors served
Number of People Registered and Finished in the General Category
General Category Winners, All Years

Wheelchair Winners, All Years
Winners in All Categories, All Years

The Perennials - Finish times of Runners Who have Finished Bloomsday Run Every Year Since It Began
* It is the spirit of participating and finishing that counts.
Top Performers in Each Age Group, Since 1983
--------------------------------------------------------------------------

Wednesday, July 10, 2013

Visualizing a tiny slice of India's demographics with information from Wikipedia

UPDATE: THE BLOG/SITE HAS MOVED TO GITHUB. THE NEW LINK FOR THE BLOG/SITE IS patilv.github.io and THE LINK TO THIS POST IS:
http://bit.ly/1ib8wTl .  PLEASE UPDATE ANY BOOKMARKS YOU MAY HAVE.

This post presents a tiny slice of a complex and diverse India using charts. (Data retrieval from Wikipedia on 9 July, 2013 and the analysis were performed using R; charts were generated using ggplot2, googleVis and wordcloud. More information can be found in the code used for this analysis at github.)

Objective
Present information regarding the following visually, using charts.
1. Official languages, sex ratio, percentage (or ratio) of urban to total population and literacy rates across different states and union territories (UTs) of India.
2. Different religions of the country and demographic variables, such as literacy rates and sex ratios.

Data 
Data on variables with respect to states and UTs were downloaded from this page on Wikipedia (tables related to States and Union Territories of India). Data on different religions and demographic variables were downloaded from this page on Wikipedia (table 2, Census Information for 2001). Since these pages can be changed, a copy of these have been downloaded and a pdf version of both pages can be found in this document.

The following tables were arrived at after data were cleaned. (You can click on the header of any column to arrange the table based on that column.)

States and Union Territories of India (with some demographic information)


Religions in India (with some demographic information)


Visualizing Demographics

Let's focus on the data presented in the first table above.

Literacy Rate (in %) of different states and UTs
(You can hover over the map to get corresponding values)
   You can click on the graph below regarding literacy rates to enlarge it.
Sex Ratio of different states and UTs
It is a measure of number of females to 1000 males.

(You can hover over the map to get corresponding values)
You can click on the graph below regarding sex ratio to enlarge it
Urban to Total Population Percent in Different States
(You can hover over the map to get corresponding values)

You can click on the graph below regarding Urban to Total Population Percent to enlarge it
Number of Official Languages in different States and UTs (graph can be clicked to enlarge)
Which Languages are more popular (are designated as an "Official Language" by different states and UTs)? (Graph and wordcloud can be clicked to enlarge them)
Relationship between Sex Ratio, Urban To Total Population Percent,  Literacy Rate, and Number of Official Languages - These graphs can be clicked to enlarge them. 

For scaling purposes in the graph below, Sex Ratio was divided by 10, making it number of females to 100 males. In response to a comment (see below), it is to be clarified that the states in the graph below were arranged in increasing order of sex ratio (Daman and Diu was lowest and Kerala was highest.)

Correlation Matrix


The correlation coefficients do not attain significance at an alpha of .05, suggesting that there is no relationship between these variables.

Religion and Demographics


Let's take a brief look at data from the religion and demographics table in the two graphs below. (The graphs can be clicked to enlarge them.) For scaling purposes, in both graphs below, Sex Ratio related variables were divided by 10, making these variables number of females to 100 males.


-----------------------------------------------------------------------------------------------------------