Temple University - CIS 3715 Spring 2018 Data Science

Related Open Courses:

Stanford - Mining massive data sets: a lot of data mining techniques, not so much project and programming, programming with Hadoop.
Stanford - Project in Mining Massive Data Sets: big datasets as well as computational infrastructure (large MapReduce cluster) will be provided by course staff. Highly based on hadoop, used AWS EC2 for implementation.
UW - Introduction to Data Science on Coursera: good materials to learn the Twitter stream and search API.
JHU - Exploratory Data Analysis on Coursera: R language based course, good illustration for data visualization principles
MongoDB - Data Wrangling with MangoDB on Udacity: good material for introduction to MangoDB, based on Python.
Introduction to Machine Learning

Python Resourses:

Python documentation
Python online programming practice for beginners
Sthurlow's Python for beginners.

Hadoop Resourses:

AWS EC2
Cloudera Hadoop Quickstart Virtual Machine: you need VirtualBox installed to use the virtual machine.

Data Sets:

UCI archive
AWS public data
APIs with OAuth to extract specific data from Twitter, Linkedin, , Facebook, Google
Free Weather Data, US Government Open Data (powered by CKAN API), Space Science Data from NASA.
A list of Open Data powered by SODA APIs. You can download open data from cities like New York, Chicago, etc., no Philladelphia, what a pity.

Data Visualization:

Hands on tutorial of static visualization with Matplotlib, Seaborn packages.

Inteactive visualization with Python Plotly package.

Others:

Kaggle hosts quite data mining competitions quite often.

A lot of data analysis examples, books published with Python Notebook