Data Science - Fall 2019

Printer-friendly version
  • Instructor: Thilanka Munasinghe, munast at rpi dot edu
  • TA: Mia Price - pricem4 at rpi dot edu
  • TA office hours : Monday 9am - 11 am OR Appointment via email
  • Class Meeting times: Thursdays 11:00AM-1:50PM EST (synchronous) and online (asynchronous; see Location)
  • Class Location: Lally 104 and Adobe Connect (login as guest) and Learning Management System (LMS) 1709_Data Science (RCS login)
  • Instructor Office Hours: Tue/Fri from 12:30PM - 1:30PM EST or by appointment/ email/ online
  • Instructor Office Location: Amos Eaton 133
  • TA Office Hours:Monday 9am – 11am OR by Appointment EST OR by appointment

Syllabus/ Calendar

Refer to Reading/ Assignment/ Reference list for each week (see below).

  • Week 1 (Aug. 29): History of Data and Information, Data, Information, Knowledge Concepts and State-of-the-Art, Data life-cycle for Science; Data acquisition, curation, preservation, metadata
  • Week 2 (Sep. 05): Data and information acquisition (curation) and metadata/ provenance - management
  • Week 3 (Sep.12): Data formats, metadata standards, conventions, reading and writing data and information
  • Week 4 (Sep. 19): Module 2 and 3 Review, Data Analysis I
  • Week 5 (Sep. 26): Class exercise - collecting data - individual
  • Week 6 (Oct. 03): Presentations: present your data (part of Assignment 2)
  • Week 7 (Oct. 10): Presentations: present your data (part of Assignment 2)
  • Week 8 (Oct. 17): Academic basis for Data Science, Data Models, Schema, Markup Languages, group project, working with someone else's data
  • Week 9 (Oct. 24): Intro to Data Mining for Data Science
  • Week 10 (Oct. 31): Data Analysis II and Class exercise
  • Week 11 (Nov. 07): Data Workflow Management, Preservation and Data Stewardship
  • Week 12 (Nov. 14): Data Quality, Uncertainty, and Bias
  • Week 13 (Nov. 21): Final Project Preparation – Project work discussion with the instructor
  • Week 14 (Nov. 28): No Classes - Thanksgiving recess
  • Week 15 (Dec. 05): Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Integration, Data Citation
  • Week 16 (Dec. TBA ): Final Project Presentation (During the Finals Week )

Reading/ Assignment/ Reference List

Class 1 Reading Assignment (choose 5-6 and at least 2-3 in depth):

Reference

  • Fourth Paradigm: [14]
  • Humanities - Digging into Data [15]
  • National Science Founcation Cyberinfrastructure Plan chapter on Data [15a]

Class 2: Reading Assignment:

  • ISO Lineage Model (NOAA Environmental Data Management) [16]
  • Earth Science Information Partners Data Management Workshop: [17]
  • Earth Science Information Partners: Course Outline [18]
  • Univ. Minnesota [19]
  • Moore et al., Data Management Systems for Scientific Applications, IFIP Conference Proceedings; Vol. 188, Proceedings of the IFIP TC2/WG2.5 Working Conference on the Architecture of Scientific Software, pp. 273 – 284 (2000) [20]
  • Data Management and Workflows [21]
  • Metadata and Provenance Management [22]
  • Provenance Management in Astronomy (case study) [23]
  • Web Data Provenance for QA [24]
  • W3 PROV Overview [25]
  • W3 PROV Data Model [26]

Class 3: Reading Assignment:

  • Data formats: netCDF [27]
  • Spatial Data Transfer Standard GIS format [28]
  • Metadata resources [29]
  • Metadata Encoding and Transfer Standard - METS [30]
  • Open Archives Initiative - Protocol for Metadata Harvesting - OAI-PMH [31]
  • Keyhole Markup Languge - KML Tutorial [32]
  • Earth Science Markup Language - ESML [33]
  • Climate Science Markup Language - CSML [34]
  • Climate and Forecast (CF) conventions [35]

Class 4: Reading Assignment:

  • Brief Introduction to Data Mining [36]
  • Longer Introduction to Data Mining and slide sets [37]
  • See the software resources list [38]
  • Data Analysis - Introduction[39]
  • Example: Data Mining[40]

Class 5: Reading Assignment: None.
Class 6: Reading Assignment: None.
Class 7: Reading Assignment: preview government and other (science) data repositories
Some of these have no single "entry point" to their data; you can find them fairly easily by searching for the name of the agency:

  • Department of Energy EIA [41]
  • Humanities - Digging into Data [42]
  • Environmental Protection Agency (EPA)
  • US Geological Survey (and state surveys) (USGS), data.usgs.gov
  • NASA Earth Observing System (EOS) and ECHO, data.nasa.gov
  • National Oceanic and Atmospheric Administration (NOAA) NCEI, data.noaa.gov
  • Department of Energy (DoE): [43]
  • National Library of Medicine (NLM): [43a]
  • data.gov [44]
  • data.ny.gov [45]
  • Find one of your own

Class 8: Reading Assignment:

  • See Class 4 reading

Class 9: Reading Assignment: pre-reading

  • Another Look at Data (Mealy 1967)! [53]
  • Identifying Content and Levels of Representation in Scientific Data (Wickett et al. 2012) [54]

Class 10: Reading Assignment: none

  • Introduction to Data Management [45]
  • Changing software, hardware a nightmare for tracking scientific data [46] (and Parts I, II and III)
  • Overview of Scientific Workflow Systems, Gil (AAAI08 Tutorial) [47]
  • Comparison of workflow software products, Krasimira Stoilova ,Todor Stoilov [48]
  • Scientific Workflow Systems for 21st Century, New Bottle or New Wine? Yong Zhao, Ioan Raicu, Ian Foster [49]
  • OCLC Sustainable Digital Preservation and Access [50]
  • Preservation and Access of NOAA Open Data [51]
  • NITRD report: [52]

Class 11: Reading Assignment:
Class 12: Reading Assignment:

  • The Deep Web (Internet Tutorials) [55]
  • Digital Image Resources on the Deep Web [56]
  • Facilitating Discovery of Public Datasets [57]
  • Tom Heath Linked Data Tutorial (2009)[58]
  • Relational Databases on the Semantic Web, Tim Berners-Lee, Design Issue Note, 1998-2009. [59]
  • A Survey of Current Approaches for Mapping of Relational Databases to RDF (PDF), Satya S. Sahoo, Wolfgang Halb, Sebastian Hellmann, Kingsley Idehen, Ted Thibodeau Jr, Sören Auer, Juan Sequeda, Ahmed Ezzat, 2009-01-31. [60]
  • On directly mapping relational databases to RDF and OWL, 2012, Sequeda, Arenas, Miranker in WWW '12 Proceedings of the 21st international conference on World Wide Web, pp. 649-658 [61]

Class 13: Reading Assignment: none
Reference material (purchase not required - please ask instructor if you are interested in any of these):

  • Parsons and Fox Is Data Publication the Right Metaphor?[61]
  • Beautiful data: [62]
  • Scientific data management: [63]
  • BRDI activities: [64]
  • Data policy [65]
  • Self-directed study (answer the quiz): [66]