Class Listing: ITWS 4600/ITWS 6600/ MATP 4450/ CSCI 4960/ MGMT 4962/ MGMT 6962
Course Numbers: 84772, 85232, 84773, 85250, 83230, 85431, 85433, 85434
Instructor: Thilanka Munasinghe
TA: Atish Jain - jainm2 at rpi dot edu
Meeting times:
Section1:Location: VORHES SOUTH - Time: 10:00am -11:50am
Section2:Location: GREENE 120 – Time: 2pm – 3:50pm
Instructor Office Hours: Tue/Fri 12:30pm – 1:30pm or by appointment via email
Instructor Office Location: Amos Eaton 133
TA Office Hours: by appointment
Syllabus/ Calendar
Refer to Reading/ Assignment/ Reference list for each week (see below).
Reference material (available through RPI library - RCS login required):
- Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (online) (RECOMMENDED)
- Big data analytics : turning big data into big money
- Big Data Analytics : Turning Big Data into Big Money (online)
- Big Data Analytics : From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph (online)
- Big Data Analytics with R and Hadoop (online)
- R for Everyone: Advanced Analytics and Graphics (online)
- Introduction to Statistical Learning with R - 7th Edition
Group 1 - Intro/ Setup
- Week 1 (Aug. 30): Introduction to Course, Case Studies, and Preview of Course Material.
- Assignment 1
- Week 2 (Sept. 03: No classes/ Sept. 06): Introduction/ refresher on basic statistics , Starting with Data and Information Resources, Role of Hypothesis, Synthesis and Model Choices, R/ RStudio introduction and lab.
- Week 3 (Sept. 10: / Sept. 13): Introduction to Analytic Methods, Types of Data Mining for Analytics, Data filtering, hypothesis exploration, visual analysis, model consideration and assessment (lab)
- (Lab) Assignment 2
Group 2 - Patterns, relations, descriptive analytics
- Week 4 (Sept. 17/ Sept. 20): Weighted kNN, Clustering, early decision trees, Exercises for linear regression, kNN and K-means (lab), trees, plotting
- Assignment 3
- Week 5 (Sept. 24/ Sept. 27): Interpreting: Regression, Weighted kNN, Clustering, and Bayesian Inference, Exercises for clustering, plotting, bayesian inference (lab)
- Assignment 4
- Assignment 5
- Week 6 (Oct. 1/ Oct. 4):Assignment 5 presentations (Tuesday and Friday)
- Assignment 6
- Week 7 (Oct. 8 / Oct. 11): Lab weighted kNN, decision trees, random forest
Group 3 - Predictive Analytics
- Week 8 (Oct. 15/ Oct. 18): Cross-Validation Trees, Dimension Reduction and Multi-Dimensional Scaling
- Week 9 (Oct. 22/ Oct. 25): Support Vector Machines, Lab for Trees, DR, MDS, SVM
- Week 10 (Oct. 20/ Nov. 01): Factor Analysis, Factor Analysis lab
- Week 11 (Nov. 05/ Nov. 08): Interpreting PCA, MDS, DR, and FA , Boosting, Bootstrapping, Bagging, Boosting, Bootstrapping, Bagging (lab)
- Assignment 7
Group 4 - Evaluating and validating, prescriptive analytics
- Week 12 (Nov. 12/ Nov. 15): Cross-validation, Revisiting Regression - local methods, Lab - Cross-validation, Regression - local methods and continue project and assignment work
- Week 13 (Nov. 19/ Nov. 22): Local Regression ctd, Mixed Models, Optimizing, Iterating, (Fischer Linear Discriminant)
- Week 14 (Nov. 26/ [Nov. 29: No Classes - Thanksgiving recess]): Prior Lab Review, Hierarchical Linear and Mixed Models, Latent Class Mixed Models, Lab, Assignment 7 due
- Week 15 (Finals Week Dec. 16 - Dec. 20): TBA: Final Project Presentations
Reading/ Assignment/ Reference List (see above)
Class 1: Reading Assignment:
- Sports Analytics – Moneyball (http://www.imdb.com/title/tt1210166/),
- Nate Silver (http://en.wikipedia.org/wiki/Nate_Silver)
- http://www.slideshare.net/lsakoda/case-studies-utilizing-real-time-data-...
- http://www.marketquotient.com/case-studies.html
- http://www.ibm.com/analytics/us/en/case-studies/
Class 2 Reading Assignment: prior to Thursday class
Class 3 Reading Assignment: prior to Monday class
- http://en.wikipedia.org/wiki/Degrees_of_freedom_(statistics)
- http://en.wikipedia.org/wiki/Regression_analysis
- http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
- http://varianceexplained.org/r/kmeans-free-lunch/
- http://en.wikipedia.org/wiki/K-means_clustering
Classes 4-5 Reading Assignment: none
Class 6 Reading Assignment:
- http://stat-www.berkeley.edu/users/breiman/RandomForests/ Random Forests
Class 7 Reading Assignment:
- http://aquarius.tw.rpi.edu/html/DA/v15i09.pdf Karatzoglou et al. 2006
- http://aquarius.tw.rpi.edu/html/DA/svmbasic_notes.pdf Vert SVM basic
- http://aquarius.tw.rpi.edu/html/DA/svmdoc.pdf SVM documentation
- http://202.141.160.110/CRAN/web/packages/e1071/vignettes/svmdoc.pdf SVM documentation (updated 2017)
- http://www.stjuderesearch.org/site/data/ALL1/ ALL dataset
- http://www.stanford.edu/group/wonglab/RSVMpage/R-SVM.html RSVM
- http://data-informed.com/focus-predictive-analytics/ /li>
Class 8-9 Reading Assignment: None
Classes 10-13 Reading Assignment: None
Course goals:
- Introduce students to relevant methods to recognize and apply quantitative algorithms, techniques, and interpretation
- To develop students' strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making.
- Develop the ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems
- Students will examine real-world examples using modern cyberinfrastructure to place statistical and data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science.
- By the end of the course, students can effectively communicate analytic findings to non-specialists
Course Learning Objectives:
- Students to demonstrate knowledge of relevant analytic methods, and to recognize and apply quantitative algorithms, techniques and interpret results
- Students to demonstrate strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making.
- Students to develop the ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems
- Students will examine real-world examples to place data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science.
- Students must effectively communicate analytic findings to non-specialists.
- [graduate level]
Students must develop and demonstrate a working knowledge of decision making under uncertainty, be able to build optimization models that incorporate random parameters: static stochastic optimization, two-stage optimization with recourse, chance-constrained optimization, and sequential decision making.
- Introduce students to relevant methods to recognize and apply quantitative algorithms, techniques, and interpretation
- To develop students' strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making.
- Develop the ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems
- Students will examine real-world examples using modern cyberinfrastructure to place statistical and data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science.
- By the end of the course, students can effectively communicate analytic findings to non-specialists
- Introduce students to relevant methods to recognize and apply quantitative algorithms, techniques, and interpretation
- To develop students' strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making.
- Develop the ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems
- Students will examine real-world examples using modern cyberinfrastructure to place statistical and data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science.
- By the end of the course, students can effectively communicate analytic findings to non-specialists
- Students to demonstrate knowledge of relevant analytic methods, and to recognize and apply quantitative algorithms, techniques and interpret results
- Students to demonstrate strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making.
- Students to develop the ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems
- Students will examine real-world examples to place data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science.
- Students must effectively communicate analytic findings to non-specialists.
- [graduate level] Students must develop and demonstrate a working knowledge of decision making under uncertainty, be able to build optimization models that incorporate random parameters: static stochastic optimization, two-stage optimization with recourse, chance-constrained optimization, and sequential decision making.
Course: Data Analytics
Date: to