Data Analytics 2019 Fall

Class Listing: ITWS 4600/ITWS 6600/ MATP 4450/ CSCI 4960/ MGMT 4962/ MGMT 6962
Course Numbers: 84772, 85232, 84773, 85250, 83230, 85431, 85433, 85434

Instructor: Thilanka Munasinghe
TA: Atish Jain - jainm2 at rpi dot edu
Meeting times:
Section1:Location: VORHES SOUTH - Time: 10:00am -11:50am
Section2:Location: GREENE 120 – Time: 2pm – 3:50pm

Instructor Office Hours: Tue/Fri 12:30pm – 1:30pm or by appointment via email
Instructor Office Location: Amos Eaton 133
TA Office Hours: by appointment

Syllabus/ Calendar

Refer to Reading/ Assignment/ Reference list for each week (see below).

Reference material (available through RPI library - RCS login required):

 

Group 1 - Intro/ Setup

  • Week 1 (Aug. 30): Introduction to Course, Case Studies, and Preview of Course Material.
  • Assignment 1
  • Week 2 (Sept. 03: No classes/ Sept. 06): Introduction/ refresher on basic statistics , Starting with Data and Information Resources, Role of Hypothesis, Synthesis and Model Choices, R/ RStudio introduction and lab.
  • Week 3 (Sept. 10: / Sept. 13): Introduction to Analytic Methods, Types of Data Mining for Analytics, Data filtering, hypothesis exploration, visual analysis, model consideration and assessment (lab)
  • (Lab) Assignment 2

Group 2 - Patterns, relations, descriptive analytics

  • Week 4 (Sept. 17/ Sept. 20): Weighted kNN, Clustering, early decision trees, Exercises for linear regression, kNN and K-means (lab), trees, plotting
  • Assignment 3
  • Week 5 (Sept. 24/ Sept. 27): Interpreting: Regression, Weighted kNN, Clustering, and Bayesian Inference, Exercises for clustering, plotting, bayesian inference (lab)
  • Assignment 4
  • Assignment 5
  • Week 6 (Oct. 1/ Oct. 4):Assignment 5 presentations (Tuesday and Friday)
  • Assignment 6
  • Week 7 (Oct. 8 / Oct. 11): Lab weighted kNN, decision trees, random forest

Group 3 - Predictive Analytics

  • Week 8 (Oct. 15/ Oct. 18): Cross-Validation Trees, Dimension Reduction and Multi-Dimensional Scaling
  • Week 9 (Oct. 22/ Oct. 25): Support Vector Machines, Lab for Trees, DR, MDS, SVM
  • Week 10 (Oct. 20/ Nov. 01): Factor Analysis, Factor Analysis lab
  • Week 11 (Nov. 05/ Nov. 08): Interpreting PCA, MDS, DR, and FA , Boosting, Bootstrapping, Bagging, Boosting, Bootstrapping, Bagging (lab)
  • Assignment 7

Group 4 - Evaluating and validating, prescriptive analytics

  • Week 12 (Nov. 12/ Nov. 15): Cross-validation, Revisiting Regression - local methods, Lab - Cross-validation, Regression - local methods and continue project and assignment work
  • Week 13 (Nov. 19/ Nov. 22): Local Regression ctd, Mixed Models, Optimizing, Iterating, (Fischer Linear Discriminant)
  • Week 14 (Nov. 26/ [Nov. 29: No Classes - Thanksgiving recess]): Prior Lab Review, Hierarchical Linear and Mixed Models, Latent Class Mixed Models, Lab, Assignment 7 due
  • Week 15 (Finals Week Dec. 16 - Dec. 20): TBA: Final Project Presentations

Reading/ Assignment/ Reference List (see above)

Class 1: Reading Assignment:

 

Class 2 Reading Assignment: prior to Thursday class

Class 3 Reading Assignment: prior to Monday class

 

Classes 4-5 Reading Assignment: none

Class 6 Reading Assignment:

Class 7 Reading Assignment:

 

Class 8-9 Reading Assignment: None

Classes 10-13 Reading Assignment: None

Course goals:

  • Introduce students to relevant methods to recognize and apply quantitative algorithms, techniques, and interpretation
  • To develop students' strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making.
  • Develop the ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems
  • Students will examine real-world examples using modern cyberinfrastructure to place statistical and data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science.
  • By the end of the course, students can effectively communicate analytic findings to non-specialists

Course Learning Objectives:

  • Students to demonstrate knowledge of relevant analytic methods, and to recognize and apply quantitative algorithms, techniques and interpret results
  • Students to demonstrate strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making.
  • Students to develop the ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems
  • Students will examine real-world examples to place data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science.
  • Students must effectively communicate analytic findings to non-specialists.
  • [graduate level]
    Students must develop and demonstrate a working knowledge of decision making under uncertainty, be able to build optimization models that incorporate random parameters: static stochastic optimization, two-stage optimization with recourse, chance-constrained optimization, and sequential decision making.

 

Description:
Introduce students to relevant methods to recognize and apply quantitative algorithms, techniques and interpretation To develop students' strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making. Develop ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems Students will examine real-world examples using modern cyberinfrastructure to place statistical and data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science. By the end of the course, students can effectively communicate analytic findings to non-specialists Data and Information analytics extends analysis (descriptive and predictive models to obtain knowledge from data) by using insight from analyses to recommend action or to guide and communicate decision-making. Thus, analytics is not so much concerned with individual analyses or analysis steps, but with an entire methodology. The world at-large is confronted with increasingly larger and complex sets of structured/unstructured information; from sensors, instruments, and generated by computer simulations; data is "hidden" in websites, application servers, social networks and on mobile devices. As a nation, assimilating information across disparate domains (e.g., intelligence, economics, science) has the potential to provide improved capabilities for decision makers. In commerce and industry, analytics-driven enterprises are becoming mainstream. Yet, there is a shortfall in the key education skills needed to meet the growing needs. Traditional enterprises are moving toward analytics-driven approaches for core business functions. In the government and corporations, cybersecurity problems are prevalent. The investment in advanced analytics capabilities could potentially be more broadly leveraged today and greater than any prior government investments in computing. Emphasis is now placed on disruptive data and information sources on the Web and Internet: using Web Science and informatics to explore social networks, platform competition, the "long tail" and economic or resource impacts of the search for new findings. Key topics include: advanced statistical computing theory, multivariate analysis, and application of computer science courses such as data mining and machine learning and change detection by uncovering unexpected patterns in data. Introduce students to relevant methods to recognize and apply quantitative algorithms, techniques and interpretation To develop students' strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making. Develop ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems Students will examine real-world examples using modern cyberinfrastructure to place statistical and data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science. By the end of the course, students can effectively communicate analytic findings to non-specialists
  • Introduce students to relevant methods to recognize and apply quantitative algorithms, techniques, and interpretation
  • To develop students' strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making.
  • Develop the ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems
  • Students will examine real-world examples using modern cyberinfrastructure to place statistical and data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science.
  • By the end of the course, students can effectively communicate analytic findings to non-specialists
Goal:
  • Introduce students to relevant methods to recognize and apply quantitative algorithms, techniques, and interpretation
  • To develop students' strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making.
  • Develop the ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems
  • Students will examine real-world examples using modern cyberinfrastructure to place statistical and data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science.
  • By the end of the course, students can effectively communicate analytic findings to non-specialists
Learning Objective:
  • Students to demonstrate knowledge of relevant analytic methods, and to recognize and apply quantitative algorithms, techniques and interpret results
  • Students to demonstrate strategic thinking skills, combined with a solid technical foundation in data and model-driven decision-making.
  • Students to develop the ability to apply critical and analytical methods to formulate and solve science, engineering, medical, and business problems
  • Students will examine real-world examples to place data-mining techniques in context, to develop data-analytic thinking, and to illustrate that proper application is as much an art as it is a science.
  • Students must effectively communicate analytic findings to non-specialists.
  • [graduate level] Students must develop and demonstrate a working knowledge of decision making under uncertainty, be able to build optimization models that incorporate random parameters: static stochastic optimization, two-stage optimization with recourse, chance-constrained optimization, and sequential decision making.
Assessment Criteria:
See above
Academic Integrity:
Student-teacher relationships are built on trust. For example, students must trust that teachers have made appropriate decisions about the structure and content of the courses they teach, and teachers must trust that the assignments that students turn in are their own. Acts, which violate this trust, undermine the educational process. The Rensselaer Handbook of Student Rights and Responsibilities defines various forms of Academic Dishonesty and you should make yourself familiar with these. In this class, all assignments that are turned in for a grade must represent the student’s own work. In cases where help was received, or teamwork was allowed, a notation on the assignment should indicate your collaboration. Submission of any assignment that is in violation of this policy will result in a penalty. If found in violation of the academic dishonesty policy, students may be subject to two types of penalties. The instructor administers an academic (grade) penalty, and the student may also enter the Institute judicial process and be subject to such additional sanctions as: warning, probation, suspension, expulsion, and alternative actions as defined in the current Handbook of Student Rights and Responsibilities. First violation for a specific assignment will result in a zero grade. Second violation will result in failure of the course. If you have any question concerning this policy before submitting an assignment, please ask for clarification.

Course: Data Analytics

Date: to