Topics: Predictive Analytics, Big Data, Data Science, Analytics, Data Visualization
Course Numbers:
- 38740, 38741
- Introduce students to relevant methods to recognize and apply quantitative algorithms, techniques and interpretation
- To develop students' strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making.
- Develop ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems
- Students will examine real-world examples using modern cyberinfrastructure to place statistical and data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science.
- By the end of the course, students can effectively communicate analytic findings to non-specialists
Class Listing: ITWS 4600/ITWS 6600
Instructor: Professor Peter Fox and Professor Greg Hughes
TA: Dave Ward - wardd4 at rpi dot edu
Meeting times: MR 2-3:50
Class Location: LALLY HALL 102
Office Hours: Monday 1-2pm Winslow 2120 or by appointment in Lally 207A
phone: x4862
TA Office Hours: by appointment
Syllabus/ Calendar
Refer to Reading/ Assignment/ Reference list for each week (see below).
Reference material (available through RPI library - RCS login required):
- Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (online) (RECOMMENDED)
- Big data analytics : turning big data into big money
- Big Data Analytics : Turning Big Data into Big Money (online)
- Big Data Analytics : From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph (online)
- Big Data Analytics with R and Hadoop (online)
- R for Everyone: Advanced Analytics and Graphics (online)
Group 1 - Intro/ Setup
- Week 1 (Jan. 19): Introduction to Course, Case Studies, and Preview of Course Material Week 1 Thursday slides [Download], Introduction/ refresher on basic statistics Week 1 Thursday slides [Download], Assignment 1 [Download]
- Week 2 (Jan. 23/26): Starting with Data and Information Resources, Role of Hypothesis, Synthesis and Model Choices Week 2 slides to go with video recording [Download] view video here, R/ RStudio bootcamp Week 2 R bootcamp slides and Group1/ Lab 1 slidesWeek 2 Thursday lab slides [Download]
- Week 3 (Jan. 30/Feb 2): Introduction to Analytic Methods, Types of Data Mining for Analytics Week 3 slides [Download] , Data filtering, hypothesis exploration, visual analysis, model consideration and assessment (lab) Week 3 Thursday slides [Download]
Group 2 - Patterns, relations, descriptive analytics
Group 3 - Predictive Analytics
Group 4 - Evaluating and validating, prescriptive analytics
- Week 4 (Feb. 6/9): Weighted kNN, Clustering, early decision trees Week 4 Monday slides [Download], Exercises for linear regression, kNN and K-means (lab), trees, plotting Week 4 Thursday slides [Download]
- Week 5 (Feb. 13/16): Interpreting: Regression, Weighted kNN, Clustering, and Bayesian Inference Week 5 Monday slides [Download], Exercises for clustering, plotting, bayesian inference (lab) Week 5 Thursday slides [Download]
- Week 6 (Feb. 21/23):Bayesian Inference, Decision Trees and Cross-Validation Week 6 Tuesday slides [Download] (lab), lab for trees Week 6 Thursday slides [Download]
- Week 7 (Feb. 27/Mar. 2): Dimension reduction and scaling, Support Vector Machines Week 7 Monday slides [Download], Lab for DR, MDS, SVM Week 7 Thursday lab [Download]
- Week 8 (Mar. 6/9): Factor Analysis Week 8 Monday slides [Download], SVM, Dimension Reduction, MDS, Factor Analysis lab Week 8 Thursday slides [Download]
- Mar. 13/17 - no classes - Spring Break
- Week 9 (Mar. 20/24): Interpreting PCA, MDS, DR, and FA Week 9 Monday slides [Download], lab for FA and DR Week 9 Thursday slides [Download]
- Week 10 (Mar. 27/30): Boosting, Bootstrapping, Bagging Week 10 Monday slides [Download] Boosting, Bootstrapping, Bagging (lab) Week 10 Thursday slides [Download]
- Week 11 (Apr. 3/6): Cross-validation, Revisiting Regression - local methods, Week 11 Monday slides [Download] Lab - Cross-validation, Regression - local methods and continue project and assignment work Week 11 Thursday slides [Download]
- Week 12 (Apr. 10/13): Local Regression ctd, Mixed Models, Optimizing, Iterating, (Fischer Linear Discriminant) Week 12 Monday slides [Download], Lab - Local Regression ctd, Mixed Models, Optimizing, Iterating, (Fischer Linear Discriminant) Week 12 Thursday slides [Download]
- Week 13 (Apr. 17/Apr 20): TBD Week 13 Monday slides [Download], Open Lab and continue project and assignment work - Assignment 7 due (no slides)
- Week 14 (Apr. 24/ 27): Final Project Presentations
Reading/ Assignment/ Reference List (see above)
Class 1: Reading Assignment:
- Sports Analytics – Moneyball (http://www.imdb.com/title/tt1210166/),
- Nate Silver (http://en.wikipedia.org/wiki/Nate_Silver)
- http://www.slideshare.net/lsakoda/case-studies-utilizing-real-time-data-...
- http://www.marketquotient.com/case-studies.html
- http://www.ibm.com/analytics/us/en/case-studies/
Class 2 Reading Assignment: prior to Thursday class
Class 3 Reading Assignment: prior to Monday class
- http://en.wikipedia.org/wiki/Degrees_of_freedom_(statistics)
- http://en.wikipedia.org/wiki/Regression_analysis
- http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
- http://varianceexplained.org/r/kmeans-free-lunch/
- http://en.wikipedia.org/wiki/K-means_clustering
Classes 4-5 Reading Assignment: none
Class 6 Reading Assignment:
- http://stat-www.berkeley.edu/users/breiman/RandomForests/ Random Forests
Class 7 Reading Assignment:
- http://aquarius.tw.rpi.edu/html/DA/v15i09.pdf Karatzoglou et al. 2006
- http://aquarius.tw.rpi.edu/html/DA/svmbasic_notes.pdf Vert SVM basic
- http://aquarius.tw.rpi.edu/html/DA/svmdoc.pdf SVM documentation
- http://202.141.160.110/CRAN/web/packages/e1071/vignettes/svmdoc.pdf SVM documentation (updated 2017)
- http://www.stjuderesearch.org/site/data/ALL1/ ALL dataset
- http://www.stanford.edu/group/wonglab/RSVMpage/R-SVM.html RSVM
- http://data-informed.com/focus-predictive-analytics/ /li>
Class 8-9 Reading Assignment: non
Classes 10-13 Reading Assignment: None
Course goals:
- Introduce students to relevant methods to recognize and apply quantitative algorithms, techniques and interpretation
- To develop students' strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making.
- Develop ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems
- Students will examine real-world examples using modern cyberinfrastructure to place statistical and data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science.
- By the end of the course, students can effectively communicate analytic findings to non-specialists
Course Learning Objectives:
- Students to demonstrate knowledge of relevant analytic methods, and to recognize and apply quantitative algorithms, techniques and interpret results
- Students to demonstrate strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making.
- Students to develop ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems
- Students will examine real-world examples to place data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science.
- Students must effectively communicate analytic findings to non-specialists.
- [graduate level]
Students must develop and demonstrate a working knowledge of decision making under uncertainty, be able to build optimization models that incorporate random parameters: static stochastic optimization, two-stage optimization with recourse, chance-constrained optimization, and sequential decision making.
Course: Data Analytics
Date: to