Public Datasets

UPDATE: THE BLOG/SITE HAS MOVED TO GITHUB. THE NEW LINK FOR THE BLOG/SITE IS patilv.github.io and THE LINK TO THIS POST IS: http://patilv.github.io/public-datasets/. PLEASE UPDATE ANY BOOKMARKS YOU MAY HAVE.


Carnegie Mellon : 
http://lib.stat.cmu.edu/datasets/          
http://lib.stat.cmu.edu/DASL/

Searchable catalog of public datasets: http://3stages.org/idata/


Data mining http://www.kdnuggets.com/datasets/


UC Irvine machine learning repository: http://archive.ics.uci.edu/ml/


Inter-university Consortium for Political and Social Science Research at UMich: http://www.icpsr.umich.edu/icpsrweb/ICPSR/index.jsp


General Social Survey: http://www3.norc.org/GSS+Website/


Inside-R: listing of data sources in Economics, Finance, Government and a whole host of areas: http://www.inside-r.org/howto/finding-data-internet


Large data sets suggestions in response to a question on Quora (free login may be required) 

A pdf of discussion of the above discussion from Quora.

Quandl: Numerous datasets on different topics 


Healthcare: http://phpartners.org/health_stats.html

Transportation and logistics: http://www.rita.dot.gov/bts/data_and_statistics/index.html

Bureau of Labor Statistics: http://www.bls.gov/data/

US Census: http://censtats.census.gov/