Class Listing: ITWS 4600/ITWS 6600/ MATP 4450
Instructor: Professor Peter Fox
TA: Akshay Bhasin - bhasia at rpi dot edu
Meeting times: MR 2-3:50
Class Location: CARNEGIE 113 (Monday and Thursday) and LALLY HALL 102 (Thursday)
Office Hours: By appointment Winslow 2120 or Lally 207A
phone: N/A
TA Office Hours: by appointment
Syllabus/ Calendar
Refer to Reading/ Assignment/ Reference list for each week (see below).
Reference material (available through RPI library - RCS login required):
- Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (online) (RECOMMENDED)
- Big data analytics : turning big data into big money
- Big Data Analytics : Turning Big Data into Big Money (online)
- Big Data Analytics : From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph (online)
- Big Data Analytics with R and Hadoop (online)
- R for Everyone: Advanced Analytics and Graphics (online)
Group 1 - Intro/ Setup
- Week 1 (Jan. 18): Introduction to Course, Case Studies, and Preview of Course Material Week 1 Thursday slides [Download], Assignment 1 [Download]
- Week 2 (Jan. 22/25): Introduction/ refresher on basic statistics Week 1 Monday slides [Download], Starting with Data and Information Resources, Role of Hypothesis, Synthesis and Model Choices Week 2 slides [Download], R/ RStudio introduction and lab Week 2 R intro slides [Download]
- Week 3 (Jan. 29/Feb 1): Introduction to Analytic Methods, Types of Data Mining for Analytics Week 3 slides [Download] , Data filtering, hypothesis exploration, visual analysis, model consideration and assessment (lab) Week 3 Thursday slides [Download]
Group 2 - Patterns, relations, descriptive analytics
Group 3 - Predictive Analytics
Group 4 - Evaluating and validating, prescriptive analytics
- Week 4 (Feb. 5/8): Weighted kNN, Clustering, early decision trees Week 4 Monday slides [Download], Exercises for linear regression, kNN and K-means (lab), trees, plotting Week 4 Thursday slides [Download]
- Week 5 (Feb. 12/15): Interpreting: Regression, Weighted kNN, Clustering, and Bayesian Inference Week 5 Monday slides [Download], Exercises for clustering, plotting, bayesian inference (lab) Week 5 Thursday slides [Download]
- Week 6 (Feb. 20/22):Bayesian Inference, Decision Trees and Cross-Validation Week 6 Tuesday slides [Download] (lab), lab for trees Week 6 Thursday slides [Download]
- Week 7 (Feb. 26/Mar. 1): Dimension reduction and scaling, Support Vector Machines Week 7 Monday slides [Download], Lab for DR, MDS, SVM Week 7 Thursday lab [Download]
- Week 8 (Mar. 5/8): Factor Analysis Week 8 Monday slides [Download], SVM, Dimension Reduction, MDS, Factor Analysis lab Week 8 Thursday slides [Download]
- Mar. 12/16 - no classes - Spring Break
- Week 9 (Mar. 19/22): , lab for FA and DR Week 9 Thursday slides [Download] Open lab (Monday), open lab for project work (Thursday)
- Week 10 (Mar. 26/29): Interpreting PCA, MDS, DR, and FA Week 10 Monday slides [Download]Boosting, Bootstrapping, Bagging Week 10 Monday slides [Download] Boosting, Bootstrapping, Bagging (lab) Week 10 Thursday slides [Download]
- Week 11 (Apr. 2/5): Cross-validation, Revisiting Regression - local methods, Week 11 Monday slides [Download] Lab - Cross-validation, Regression - local methods and continue project and assignment work Week 11 Thursday slides [Download]
- Week 12 (Apr. 9/12): Local Regression ctd, Mixed Models, Optimizing, Iterating, (Fischer Linear Discriminant) Week 12 Monday slides [Download], Lab - Local Regression ctd, Mixed Models, Optimizing, Iterating, (Fischer Linear Discriminant) Week 12 Thursday slides [Download]
- Week 13 (Apr. 16/Apr 19): Prior Lab Review, Hierarchical Linear and Mixed Models, Latent Class Mixed Models Week 13 Monday slides [Download], Thursday - open Lab and continue project and assignment work - Assignment 7 due (no slides)
- Week 14 (Apr. 23/ 26): Final Project Presentations
Reading/ Assignment/ Reference List (see above)
Class 1: Reading Assignment:
- Sports Analytics – Moneyball (http://www.imdb.com/title/tt1210166/),
- Nate Silver (http://en.wikipedia.org/wiki/Nate_Silver)
- http://www.slideshare.net/lsakoda/case-studies-utilizing-real-time-data-...
- http://www.marketquotient.com/case-studies.html
- http://www.ibm.com/analytics/us/en/case-studies/
Class 2 Reading Assignment: prior to Thursday class
Class 3 Reading Assignment: prior to Monday class
- http://en.wikipedia.org/wiki/Degrees_of_freedom_(statistics)
- http://en.wikipedia.org/wiki/Regression_analysis
- http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
- http://varianceexplained.org/r/kmeans-free-lunch/
- http://en.wikipedia.org/wiki/K-means_clustering
Classes 4-5 Reading Assignment: none
Class 6 Reading Assignment:
- http://stat-www.berkeley.edu/users/breiman/RandomForests/ Random Forests
Class 7 Reading Assignment:
- http://aquarius.tw.rpi.edu/html/DA/v15i09.pdf Karatzoglou et al. 2006
- http://aquarius.tw.rpi.edu/html/DA/svmbasic_notes.pdf Vert SVM basic
- http://aquarius.tw.rpi.edu/html/DA/svmdoc.pdf SVM documentation
- http://202.141.160.110/CRAN/web/packages/e1071/vignettes/svmdoc.pdf SVM documentation (updated 2017)
- http://www.stjuderesearch.org/site/data/ALL1/ ALL dataset
- http://www.stanford.edu/group/wonglab/RSVMpage/R-SVM.html RSVM
- http://data-informed.com/focus-predictive-analytics/ /li>
Class 8-9 Reading Assignment: non
Classes 10-13 Reading Assignment: None
Course goals:
- Introduce students to relevant methods to recognize and apply quantitative algorithms, techniques and interpretation
- To develop students' strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making.
- Develop ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems
- Students will examine real-world examples using modern cyberinfrastructure to place statistical and data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science.
- By the end of the course, students can effectively communicate analytic findings to non-specialists
Course Learning Objectives:
- Students to demonstrate knowledge of relevant analytic methods, and to recognize and apply quantitative algorithms, techniques and interpret results
- Students to demonstrate strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making.
- Students to develop ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems
- Students will examine real-world examples to place data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science.
- Students must effectively communicate analytic findings to non-specialists.
- [graduate level]
Students must develop and demonstrate a working knowledge of decision making under uncertainty, be able to build optimization models that incorporate random parameters: static stochastic optimization, two-stage optimization with recourse, chance-constrained optimization, and sequential decision making.
Data and Information analytics extends analysis (descriptive and predictive models to obtain knowledge from data) by using insight from analyses to recommend action or to guide and communicate decision-making. Thus, analytics is not so much concerned with individual analyses or analysis steps, but with an entire methodology. The world at-large is confronted with increasingly larger and complex sets of structured/unstructured information; from sensors, instruments, and generated by computer simulations; data is "hidden" in websites, application servers, social networks and on mobile devices. As a nation, assimilating information across disparate domains (e.g., intelligence, economics, science) has the potential to provide improved capabilities for decision makers. In commerce and industry, analytics-driven enterprises are becoming mainstream. Yet, there is a shortfall in the key education skills needed to meet the growing needs. Traditional enterprises are moving toward analytics-driven approaches for core business functions. In the government and corporations, cybersecurity problems are prevalent. The investment in advanced analytics capabilities could potentially be more broadly leveraged today and greater than any prior government investments in computing. Emphasis is now placed on disruptive data and information sources on the Web and Internet: using Web Science and informatics to explore social networks, platform competition, the "long tail" and economic or resource impacts of the search for new findings. Key topics include: advanced statistical computing theory, multivariate analysis, and application of computer science courses such as data mining and machine learning and change detection by uncovering unexpected patterns in data. Introduce students to relevant methods to recognize and apply quantitative algorithms, techniques and interpretation To develop students' strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making. Develop ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems Students will examine real-world examples using modern cyberinfrastructure to place statistical and data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science. By the end of the course, students can effectively communicate analytic findings to non-specialists
Goal:
Introduce students to relevant methods to recognize and apply quantitative algorithms, techniques and interpretation To develop students' strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making. Develop ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems Students will examine real-world examples using modern cyberinfrastructure to place statistical and data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science. By the end of the course, students can effectively communicate analytic findings to non-specialists
Learning Objective:
Assessment Criteria:
See above
Academic Integrity:
Course: Data Analytics
Date: to