Class Listing: ITWS 4600/ITWS 6600/ MATP 4450/ CSCI 4960
Instructor: Professor Peter Fox
TA: Akshay Vyas - vyasa at rpi dot edu
Meeting times: TF 10-11:50
Class Location: Lally 104
Office Hours: By appointment Winslow 2120 or Lally 207A
phone: N/A
TA Office Hours: by appointment
Syllabus/ Calendar
Refer to Reading/ Assignment/ Reference list for each week (see below).
Reference material (available through RPI library - RCS login required):
- Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (online) (RECOMMENDED)
- Big data analytics : turning big data into big money
- Big Data Analytics : Turning Big Data into Big Money (online)
- Big Data Analytics : From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph (online)
- Big Data Analytics with R and Hadoop (online)
- R for Everyone: Advanced Analytics and Graphics (online)
Group 1 - Intro/ Setup
- Week 1 (Aug. 31): Introduction to Course, Case Studies, and Preview of Course Material Week 1 Friday slides [Download], Assignment 1 [Download]
- Week 2 (Sep. 4/7): Introduction/ refresher on basic statistics Week 2 Tuesday slides [Download], Starting with Data and Information Resources, Role of Hypothesis, Synthesis and Model Choices Week 2 slides [Download], R/ RStudio introduction and lab Week 2 Friday R intro slides [Download]
- Week 3 (Sep. 11/ 14): Introduction to Analytic Methods, Types of Data Mining for Analytics Week 3 Tuesday slides [Download] , Data filtering, hypothesis exploration, visual analysis, model consideration and assessment (lab) Week 3 Friday slides [Download]
Group 2 - Patterns, relations, descriptive analytics
Group 3 - Predictive Analytics
Group 4 - Evaluating and validating, prescriptive analytics
- Week 4 (Sep. 18/21): Weighted kNN, Clustering, early decision trees Week 4 Tuesday slides [Download], Exercises for linear regression, kNN and K-means (lab), trees, plotting Week 4 Friday slides [Download]
- Week 5 (Sep. 25/28): Interpreting: Regression, Weighted kNN, Clustering, and Bayesian Inference Week 5 Tuesday slides [Download], Exercises for clustering, plotting, bayesian inference (lab) Week 5 Friday slides [Download]
- Week 6 (Oct. 2/5):Assignment 5 presentations (Tuesday and Friday)
- Week 7 (Oct. 12): No lecture on Tuesday. Lab weighted kNN, decision trees, random forest
Week 7 Friday slides [Download] - Week 8 (Oct. 16/19): Cross-Validation Week 8 Tuesday slides [Download], Trees, Dimension Reduction and Multi-Dimensional Scaling Week 8 Friday slides [Download]
- Week 9 (Oct. 23/26): Support Vector Machines Week 9 Tuesday slides [Download], Lab for Trees, DR, MDS, SVM Week 9 Friday slides [Download]
- Week 10 (Oct. 30/Nov. 2): Factor Analysis Week 10 Tuesday slides [Download], Factor Analysis lab Week 10 Friday slides [Download]
- Week 11 (Nov. 6/9): Interpreting PCA, MDS, DR, and FA Week 11 Tuesday slides [Download], Boosting, Bootstrapping, Bagging Week 11 Tuesday slides [Download], Boosting, Bootstrapping, Bagging (lab) Week 11 Friday slides [Download]
- Week 12 (Nov. 13/16): Cross-validation, Revisiting Regression - local methods, Week 12 Tuesday slides [Download], Lab - Cross-validation, Regression - local methods and continue project and assignment work Week 12 Friday slides [Download]
- Week 13 (Nov. 20): Local Regression ctd, Mixed Models, Optimizing, Iterating, (Fischer Linear Discriminant) Week 13 Tuesday slides [Download]
- Week 14 (Nov. 27/30): Prior Lab Review, Hierarchical Linear and Mixed Models, Latent Class Mixed Models Week 14 Tuesday slides [Download], Lab - Local Regression ctd, Mixed Models, Optimizing, Iterating, (Fischer Linear Discriminant) Week 14 Friday slides [Download] Assignment 7 due
- Week 14 (Dec. 4/ 7): Final Project Presentations
Reading/ Assignment/ Reference List (see above)
Class 1: Reading Assignment:
- Sports Analytics – Moneyball (http://www.imdb.com/title/tt1210166/),
- Nate Silver (http://en.wikipedia.org/wiki/Nate_Silver)
- http://www.slideshare.net/lsakoda/case-studies-utilizing-real-time-data-...
- http://www.marketquotient.com/case-studies.html
- http://www.ibm.com/analytics/us/en/case-studies/
Class 2 Reading Assignment: prior to Thursday class
Class 3 Reading Assignment: prior to Monday class
- http://en.wikipedia.org/wiki/Degrees_of_freedom_(statistics)
- http://en.wikipedia.org/wiki/Regression_analysis
- http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
- http://varianceexplained.org/r/kmeans-free-lunch/
- http://en.wikipedia.org/wiki/K-means_clustering
Classes 4-5 Reading Assignment: none
Class 6 Reading Assignment:
- http://stat-www.berkeley.edu/users/breiman/RandomForests/ Random Forests
Class 7 Reading Assignment:
- http://aquarius.tw.rpi.edu/html/DA/v15i09.pdf Karatzoglou et al. 2006
- http://aquarius.tw.rpi.edu/html/DA/svmbasic_notes.pdf Vert SVM basic
- http://aquarius.tw.rpi.edu/html/DA/svmdoc.pdf SVM documentation
- http://202.141.160.110/CRAN/web/packages/e1071/vignettes/svmdoc.pdf SVM documentation (updated 2017)
- http://www.stjuderesearch.org/site/data/ALL1/ ALL dataset
- http://www.stanford.edu/group/wonglab/RSVMpage/R-SVM.html RSVM
- http://data-informed.com/focus-predictive-analytics/ /li>
Class 8-9 Reading Assignment: non
Classes 10-13 Reading Assignment: None
Course goals:
- Introduce students to relevant methods to recognize and apply quantitative algorithms, techniques and interpretation
- To develop students' strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making.
- Develop ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems
- Students will examine real-world examples using modern cyberinfrastructure to place statistical and data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science.
- By the end of the course, students can effectively communicate analytic findings to non-specialists
Course Learning Objectives:
- Students to demonstrate knowledge of relevant analytic methods, and to recognize and apply quantitative algorithms, techniques and interpret results
- Students to demonstrate strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making.
- Students to develop ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems
- Students will examine real-world examples to place data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science.
- Students must effectively communicate analytic findings to non-specialists.
- [graduate level]
Students must develop and demonstrate a working knowledge of decision making under uncertainty, be able to build optimization models that incorporate random parameters: static stochastic optimization, two-stage optimization with recourse, chance-constrained optimization, and sequential decision making.
Course: Data Analytics
Date: to