Related Open Courses:
Stanford - Mining massive data sets: a lot of data mining techniques, not so much project and programming, programming with Hadoop.
Stanford - Project in Mining Massive Data Sets: big datasets as well as computational infrastructure (large MapReduce cluster) will be provided by course staff. Highly based on hadoop, used AWS EC2 for implementation.
UW - Introduction to Data Science on Coursera: good materials to learn the Twitter stream and search API.
JHU - Exploratory Data Analysis on Coursera: R language based course, good illustration for data visualization principles
MongoDB - Data Wrangling with MangoDB on Udacity: good material for introduction to MangoDB, based on Python.
Introduction to Machine Learning
Python Resourses:
Python documentation
Python online programming practice for beginners
Sthurlow's Python for beginners.
Hadoop Resourses:
AWS EC2
Cloudera Hadoop Quickstart Virtual Machine: you need VirtualBox installed to use the virtual machine.
Data Sets:
UCI archive
AWS public data
APIs with OAuth to extract specific data from Twitter, Linkedin, , Facebook, Google
Free Weather Data, US Government Open Data (powered by CKAN API), Space Science Data from NASA.
A list of Open Data powered by SODA APIs. You can download open data from cities like New York, Chicago, etc., no Philladelphia, what a pity.
Data Visualization:
Hands on tutorial of static visualization with Matplotlib, Seaborn packages.
Inteactive visualization with Python Plotly package.
Others:
Kaggle hosts quite data mining competitions quite often.
A lot of data analysis examples, books published with Python Notebook