Skip to main content

Data Science 2024

  • Instructor: Ahmed Eleish - eleisa2 at rpi dot edu
  • Class Meeting times: Thursdays 11:00 AM - 1:50 PM EST
  • Class Location: Lally, room 104    
  • Instructor Office Hours: Tue from 12:30 PM - 1:30 PM EST/Wednesday from 12:00 PM - 1:00 PM or by appointment/email
  • Instructor Office Location: Amos Eaton 134
  • TA: Benita Chinemerem - chineb at rpi dot edu
  • TA office hours : Tuesday 12-2pm / Friday 12-1pm 
  • TA Office Hours Location: Lally 205

 

Sections: CSCI 4350/6350, ITWS 4350/6350, ERTH 4350/6350

 

Syllabus/ Calendar

Refer to Reading/ Assignment/ Reference list for each week (see below). 

  • Week 1 (Aug. 29): History of Data and Information, Data, Information, Knowledge Concepts and State-of-the-Art Week 1 slides [Download]
  • Week 2 (Sept. 05): Data and information acquisition (curation, preservation) and metadata - management Week 2 slides [Download] Assignment 1 [Download]
  • Week 3 (Sept 12): Data formats, metadata standards, conventions, reading and writing data and information Week 3 slides [Download] Assignment 2 [Download]
  • Week 4 (Sept. 19): Module 2 & 3 Review, Knowledge Graphs, Data Analysis I Week 4 slides [Download]
  • Week 5 (Sept. 26): Data Analysis II
  • Week 6 (Oct. 03): Class Presentations: present your data, Part of Assignment 2
  • Week 7 (Oct. 10): Class Presentations: present your data, Part of Assignment 2 - Data Analysis II Week 6 slides [Download] Assignment 3 Instructions [Download]
  • Week 8 (Oct. 17): Data Analysis II cont'd - Data Mining I Data Analyis II [Download] Data Mining I [Download]
  • Week 9 (Oct. 24): Project teams and introducing project instructions. Assignment 4 [Download]
  • Week 10 (Oct. 31): Data Quality, Uncertainty and Bias, Final Project Preparation – Start project meetings with team Data Quality, Uncertainty and Bias [Download]
  • Week 11 (Nov. 07): Data Mining II and Class exercise
  • Week 12 (Nov. 14): Data Workflow Management, Preservation, and Data Stewardship
  • Week 13 (Nov. 21): Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Integration, Data Citation
  • Week 14 (Nov. 28): No classes: Thanksgiving break – continue project and assignment work
  • Week 15 (Dec. 05): Final project work discussion with the instructor – Group One-on-Ones
  • Week 16 (Dec. TBA): Final Project Report Submission and Presentations

 

Reading/ Assignment/ Reference List

• Assignment 1 (available 09/05 – due 09/19) – 10% (written)
• Assignment 2 (available 09/12 – due 10/03) – 15% (written) / 5% (presentation)
• Assignment 3 (available 10/10 – due 10/24) – 20% (written)
• Assignment 4 (available 10/24 – due TBA) – 25% (written) / 5% presentation/poster
• Assignment 5 (available 11/07 – due 12/01) – 10% (written)

 

Reading Assignments:

Class 1 Reading Assignment (choose 5-6 and at least 2-3 in-depth):

  • Changing Science: Chris Anderson: [1]
  • Rise of the Data Scientist [2]
  • Where to draw the line? [3]
  • Career of the Future [4]
  • What is Data Science (I) [5]
  • Data Science vs. Data Analytics [6]
  • Data Scientist: The Hottest Job You've Never Heard Of [7]
  • What Is a Data Scientist? [8]
  • Data Scientist - sexiest job of the 21st Century ? [9]
  • Big Data  [10]
  • A Very Short History of Data Science [11]
  • 7 Phases of Data-Life-Cycle[12]

Reference

  • Fourth Paradigm: [13]
  • Humanities - Digging into Data [14]
  • National Science Founcation Cyberinfrastructure Plan chapter on Data [14a]

Class 2: Reading Assignment:

  • Data Lineage [15]
  • Earth Science Information Partners Data Management Workshop: [16]
  • Earth Science Information Partners: Course Outline [17]
  • Data Management Plan - Univ. Minnesota [18]
  • Data Management Plan software (DMPTool)  [19]
  • Metadata and Provenance Management [20]
  • Provenance Management in Astronomy (case study) [21]
  • Web Data Provenance for QA [22]
  • W3 PROV Overview [23]
  • W3 PROV Data Model [24]

Class 3: Reading Assignment:

  • Data formats: netCDF [25]
  • Spatial Data Transfer Standard GIS format [26]
  • HDF5 TUTORIAL: Learning HDF5 with HDFVIEW [27]
  • Metadata Encoding and Transfer Standard - METS [28]
  • Open Archives Initiative - Protocol for Metadata Harvesting - OAI-PMH [29]
  • Keyhole Markup Languge - KML Tutorial [30]
  • Earth Science Markup Language - ESML [31]
  • HDF5View User's Guide [32]
  • HDF5 files in Python [33]

Class 4: Reading Assignment:

  • Brief Introduction to Data Mining [34]
  • Longer Introduction to Data Mining and slide sets [35]
  • See the software resources list [36]
  • What are Knowledge Graphs[37]

Class 5: Reading Assignment: TBA

Class 6: Reading Assignment: None. 

Class 7: Reading Assignment: preview government and other (science) data repositories 

Some of these have no single "entry point" to their data; you can find them fairly easily by searching for the name of the agency:

  • Department of Energy EIA [41]
  • Humanities - Digging into Data [42]
  • Environmental Protection Agency (EPA)
  • US Geological Survey (and state surveys) (USGS), data.usgs.gov
  • NASA Earth Observing System (EOS) and ECHO, data.nasa.gov
  • National Oceanic and Atmospheric Administration (NOAA) NCEI, data.noaa.gov
  • Department of Energy (DoE): [43]
  • National Library of Medicine (NLM): [43a]
  • data.gov [44]
  • data.ny.gov [45]
  • Find one of your own

Class 8: Reading Assignment:

Class 9: Reading Assignment: pre-reading

  • Another Look at Data (Mealy 1967)! [53]
  • Identifying Content and Levels of Representation in Scientific Data (Wickett et al. 2012) [54]

Class 10: Reading Assignment: none

  • Introduction to Data Management [45]
  • Changing software, hardware a nightmare for tracking scientific data [46] (and Parts I, II and III)
  • Overview of Scientific Workflow Systems, Gil (AAAI08 Tutorial) [47]
  • Comparison of workflow software products, Krasimira Stoilova ,Todor Stoilov [48]
  • Scientific Workflow Systems for 21st Century, New Bottle or New Wine? Yong Zhao, Ioan Raicu, Ian Foster [49]
  • OCLC Sustainable Digital Preservation and Access [50]
  • Preservation and Access of NOAA Open Data [51]
  • NITRD report: [52]

Class 11: Reading Assignment: 

Class 12: Reading Assignment:

  • The Deep Web (Internet Tutorials) [55]
  • Digital Image Resources on the Deep Web [56]
  • Facilitating Discovery of Public Datasets [57]
  • Tom Heath Linked Data Tutorial (2009)[58]
  • Relational Databases on the Semantic Web, Tim Berners-Lee, Design Issue Note, 1998-2009. [59]
  • A Survey of Current Approaches for Mapping of Relational Databases to RDF (PDF), Satya S. Sahoo, Wolfgang Halb, Sebastian Hellmann, Kingsley Idehen, Ted Thibodeau Jr, Sören Auer, Juan Sequeda, Ahmed Ezzat, 2009-01-31. [60]
  • On directly mapping relational databases to RDF and OWL, 2012, Sequeda, Arenas, Miranker in WWW '12 Proceedings of the 21st international conference on World Wide Web, pp. 649-658 [61]

Class 13: Reading Assignment: 

  • Parsons and Fox Is Data Publication the Right Metaphor?[61]
  • Beautiful data: [62]
  • Scientific data management: [63]
  • BRDI activities: [64]
  • Data policy [65]

Self-directed study (answer the quiz): [66] 

Reference material (purchase not required - please ask the instructor if you are interested in any of these):

 

Course Goals / Objectives 

To instruct future scientists how to sustainably generate/ collect and use data for their research as well as for others: data science. To instruct future technologists how to understand and support essential data and information needs of a wide variety of producers and consumers For both to know tools, and requirements to properly handle data and information Will learn and be evaluated on the full life-cycle of data and relevant methods, technologies and best practices. 

Through class lectures, practical sessions, written and oral presentation assignments and projects, students should: Develop and demonstrate skill in Data Collection and Management Develop Data Models and Generate Metadata Demonstrate Knowledge of Data Standards Demonstrate Skill in Data Science Tool Use and Evaluation Demonstration the application the Data Life-Cycle principles Become proficient in Data and Information Product Generation.

Academic Integrity: 

Student-teacher relationships are built on trust. For example, students must trust that teachers have made appropriate decisions about the structure and content of the courses they teach, and teachers must trust that the assignments that students turn in are their own. Acts that violate this trust undermine the educational process. The Rensselaer Handbook of Student Rights and Responsibilities and The Graduate Student Supplement define various forms of Academic Dishonesty and you should make yourself familiar with these. In this class, all assignments that are turned in for a grade must represent the student’s own work. In cases where help was received, or teamwork was allowed, a notation on the assignment should indicate your collaboration. Submission of any assignment that is in violation of this policy will result in a penalty. If found in violation of the academic dishonesty policy, students may be subject to two types of penalties. The instructor administers an academic (grade) penalty, and the student may also enter the Institute judicial process and be subject to such additional sanctions as: warning, probation, suspension, expulsion, and alternative actions as defined in the current Handbook of Student Rights and Responsibilities. of an academic grade penalty or. If you have any questions concerning this policy before submitting an assignment, please ask for clarification. First violation results in zero grade for the relevant portion of the work. Second offense results in a failing grade. 

Submission of any assignment that is in violation of this policy will result in a penalty for the first violation results in zero grade for the relevant portion of the work. Second offense results in a failing grade. 

If you have any question concerning this policy before submitting an assignment, please ask for clarification.

 

ACADEMIC ACCOMMODATIONS:  
Rensselaer Polytechnic Institute strives to make all learning experiences as accessible as possible. If you anticipate or experience academic barriers based on a disability, please let me know immediately so that we can discuss your options.   
To establish reasonable accommodations, please register with The Office of Disability Services for Students (mailto:dss@rpi.edu; 518-276-8197; 4226 Academy Hall).  After registration, make arrangements with me as soon as possible to discuss your accommodations so that they may be implemented in a timely fashion.”

 

Rensselaer Polytechnic Institute On- and Off-Campus Support Resources: Fall 2024 
Remember, seeking help is a strength, not a weakness


ON-CAMPUS HEALTH & WELLNESS SUPPORT 

 

Student Health Center* 

Mon-Fri, 8:30 am – 5:00 pm EST

The mission of the Student Health Center (SHC) is to keep students healthy so that they may achieve their academic, personal, and athletic goals. The SHC provides confidential, accessible, cost-effective, current evidence-based treatment for acute and chronic physical health problems. At this time, appointments are being offered virtually (phone and video). Call 518-276-6287 to schedule an appointment, or schedule one through your Student Health portal. There are no walk-in appointments available at the Student Health Center during this time.  
*information subject to change based on pandemic conditions

 

Counseling Center* 
Mon-Fri, 8:30 am – 5:00 pm EST (some weekday evening hours available by appointment) 
The goal of the Counseling Center is to help students maximize their sense of well-being as well as their academic, personal, and social growth. Appointments are free and confidential, and in-person at the Counseling Center, 4th Floor of Academy Hall. Some WebEx and phone appointments will be offered as needed. Please contact the Counseling Center for this service. Appointments can be made by calling 518-276-6479 or email: counseling@rpi.edu Counseling Center staff are available in case of a crisis on evenings and weekends (call Public Safety at 518-276-6611 and ask to speak with the on-call counselor).  
*information subject to change based on pandemic conditions

 

Office of Health Promotion  
Health promotion initiatives at Rensselaer are evidence-based and comprehensive efforts to improve health knowledge, behaviors, and skills of Rensselaer students. Health Educators provide campus programming on a variety of health topics, and are available for one-on-one consultations around issues including, but not limited to: sleep hygiene, mental health, sexual health, alcohol and other drugs, LGBTQIA+ topics, sexual assault prevention, and more. All appointments are free and confidential and take place via WebEx. To schedule an appointment, email: healthed@rpi.edu  Follow us on social media for daily health tips and event information!  Instagram: rpi.studenthealth |Twitter: @RPIhealth |  
Facebook: RPI Student Health Services | Discord: https://discord.gg/8DZJJ38zWj 
    
Disability Services for Students 
The Office of Disability Services for Students (DSS) assists Rensselaer students with disabilities in gaining equal access to academic programs, extracurricular activities, and physical facilities on campus. DSS is the designated office at Rensselaer that obtains and files disability-related documentation, assesses for eligibility of services, and determines reasonable accommodations in consultation with students. Call 518-276-8197 or email dss@rpi.edu for more information. 

 

 


Course: Data Science

Date: to