Lab Assignment 1 (Due on Jan. 24 noon)

Description: This is a "Hello world!" lab. In this lab, you will go through a Python tutorial. You will learn about a couple of Python development environments, Python syntax, data types, control flow, functions, and advice about navigating Python. Please download the tutorial file CIS3715 (Temple - Spring 2018) PythonBasics - Lab1.ipynb, read carefully, and run all the shown code trying to understand what is happening and why. Feel free to experiment with the code and follow the provided pointers or use Google search to learn more. After you are done with the tutorial, save the resulting ipynb file as lab1-(your last name).ipynb and submit it through Canvas. This will end your first lab assignment.


Lab Assignment 2 (Due on Jan. 31 noon)

Description: In this lab you will become familiar with using Python to perform Exploratory Data Analysis (EDA). The lab is in a form of a brief tutorial that illustrates how to load and visualize a tabular data set provided in the csv format: cars.csv. The name of the data set is cars.csv and it is one of the popular benchmark data science data sets. You will be asked to go through the provided document at CIS3715 (Temple - Spring 2018) Lab2.ipynb, run the code, and answer many questions. The first 12 questions are related to the pieces of code that were provided to you. In the final question, you will be expected to produce a 2-page document that uses a combination of text and plots you can produce with Python to provide a coherent story about the data set. Submit the two files (modified .ipynb and your .pdf) through the Canvas.


Lab Assignment 3 (Due on Feb. 7 noon)

Description: More EDA: improving expertise in loading, cleaning, and analyzing data. The objective of Lab 3 is for you to become more proficient in obtaining and working with different types of data. A particular emphasis will be on dealing with text data. Note: Please use Python 2.7 for this lab.


Homework 1 (Assigned Jan 31. Due on Feb. 9 noon)

This is the extra credit homework assignment worth 50% of a typical lab assignment. In this assignment, you will be asked to learn the basics of Tableau software for data visualization and apply your knowledge to create a web page with visualization of the Auto MPG Data. The total estimated effort to accomplish this assignment is 5 hours.

Task 0: Spend a few minutes browsing Tableau gallery at https://public.tableau.com/en-us/s/gallery to get an idea what kind of web pages could be produced by Tableau.

Task 1: Go to Tableau Public web page https://public.tableau.com/s/ and download the app. The app is available for Windows and Mac machines. This should not take more than a few minutes. Open the app.

Task 2: Go to the Tableau tutorial at https://public.tableau.com/en-us/s/resources. There are about 90 minutes of video lectures that will lead you all the way from loading different types of data to Tableau software to publishing a web page with your interactive data visualizations that could be similar to examples you have seen in the Tableau gallery. Instead of just watching, you are asked to learn along by repeating everything you see using your own app. The total estimated time to accomplish this task is 3 hours.

Task 3: Using your knowledge, load the Auto MPG Data (https://CIS3715-temple-2017.github.io/cars.csv) you are already familiar with from your labs. Use what you learned to create at least 3 Tableau Sheets showing different views of the data. Then use those sheets to create a Tableau Dashboard and publish the dashboard as a web page. Provide 2-3 paragraphs explaining why you decided to create your dashboard the way it is and discussing what kinds of insights one could get from your visualization. The total estimated time to accomplish this is 2 hours.

Deliverables: Submit to the Blackboard a one-page document containing the link to your Tableau web page and the few paragraphs that describe it.


Lab Assignment 4 (Due on Feb. 14 noon)

Description: In this Lab 4, you will improve your skills in scraping data from web pages, organizing the data in a desired format, and performing EDA. For this lab, we are reusing code from Harvard's CS109: https://github.com/cs109/2015/blob/master/Lectures/02-DataScrapingQuizzes.ipynb


Lab Assignment 5 (Due on Feb. 21 noon)

Description: In this Lab 5 dataset: (d_temple, iris, documents, groupnames, newsgroups, wordlist), you will gain more experience with ranks and Singular Value Decomposition (SVD) and learn how to use SVD in data science.


Lab Assignment 6 (Due on Feb. 28 11:59pm)

Description: In this Lab 6, you will gain more experience with clustering. In particular, you will learn how to use two of the most popular clustering algorithms: Hierarchical Clustering and K-Means Clustering. Then, you will be asked to apply this knowledge on a document data set.


Lab Assignment 7 (Due on Mar. 14 11:59pm)

Description: In this Lab 7, we will make first steps in doint supervised learning. in particular, we will learn about the k-Nearest Neighbor (kNN) algorithm. kNN uses a simple idea: "you are what your neighbors are". This idea work quite well in data science. In the first part of the lab, we will cover some background needed to understand the kNN algorithm. In the second part, you will be asked to apply your knowledge on another data set.


Lab Assignment 8 (Due on Mar. 21 11:59pm)

Description: In this Lab 8, we will keep working on supervised learning. We will first learn how to train decision trees and we will see that doing this using sklearn is not much different from running kNN algorithm.


Lab Assignment 9 (Due on Apr. 4 11:59pm)

Description: In this Lab 9 (Dataset: onetweet, smallNYC.json), we will learn how to read JSON files and how to perform exploratory analysis of twitter data. As an extra credit, you will learn how to use API to download tweets on the topic of your choice.


Lab Assignment 10 (Due on Apr. 11 11:59pm)

Description: In this Lab 10 we will learn how to train convolutional and recurrent neural networks (CNN and RNN). The part with the CNN is for the regular credit. The part with RNN is for the extra credit (you can earn an additional 50% of the grade if you do it).

Pre-trained model: shakespear100.h5


Lab Assignment 11 (Due on Apr. 18 11:59pm)

Description: This Lab 11 will be a warm-up for your course project. You are asked to identify an indetersing data set and perform data collection and cleaning, exploratory data analysis, and supervised learning. You should produce a 3-page report explaining the data, what kind of analysis you performed, and showing and dicussing the main results.