Description:
Instructor: Thomas Hughes, hughet2 at rpi dot edu
TA: Abinash Koirala, koiraa at rpi dot edu
Meeting times: Tuesday 0900-1150
Office Hours: Monday 1500-1600 or by appointment
phone: x 2315 (JEC 5018)
TA Office Hours: By appointment
Class Listing: CSCI/ERTH/ITWS 4350/ 6350
Class Location: Lally 102
Syllabus/ Calendar
Refer to Reading/ Assignment/ Reference list for each week (see below).
- Week 1 (Sep. 1): History of Data and Information, Data, Information, Knowledge Concepts and State-of-the-Art, Data life-cycle for Science; Data acquisition, curation, preservation, metadata Week 1 slides [Download]
- Week 2 (Sep. 8): Data and information acquisition (curation) and metadata/ provenance - management Week 2 slides [Download]
- Week 3 (Sep. 15): Data formats, metadata standards, conventions, reading and writing data and information Week 3 slides [Download]
- Week 4 (Sep. 22): Data Analysis I Week 4 slides [Download]
- Week 5 (Sep. 29): Class exercise - collecting data - individual
- Week 6 (Oct. 6) : Class Presentations: present your data (4 groups) - start in Lally 102
- Oct. 13 - no classes (Tuesday follows Monday schedule)
- Week 7 (Oct. 20): Data Analysis II Week 7 slides [Download]
and Class exercise - group project definitions - working with someone else's data
- Week 8 (Oct. 27): Intro to Data Mining for Data Science, Data Quality, Uncertainty and Bias Week 8 slides [Download]
- Week 9 (Nov. 3): Data Quality, Uncertainty, and Bias Week 9 slides [Download] Week 9 alternate slides [Download]
- Week 10 (Nov. 10): Data Workflow Management, Preservation and Data Stewardship Week 10 slides [Download]
- Week 11 (Nov. 17): Academic basis for Data Science, Data Models, Schema, Markup Languages Week 11 slides [Download]
- Nov. 24: No lecture - continue project and assignment work
- Week 12 (Dec. 1): Webs of Data and Data on the Web, the Deep Web, Data Infrastructures, Data Discovery, Data Citation Week 12 slides [Download]
- Week 13 (Dec. 8): Final Project Presentations
Reading/ Assignment/ Reference List
Class 1 Reading Assignment (choose 5-6 and at least 2-3 in depth):
- Changing Science: Chris Anderson: [1]
- Rise of the Data Scientist [2]
- Where to draw the line? [3]
- Career of the Future [4]
- What is Data Science (I) [5]
- What is Data Science (II) [6]
- Data Scientist: The Hottest Job You've Never Heard Of [7]
- What Is a Data Scientist? [8]
- Data Scientist - sexiest job of the 21st C? [9]
- An example of data science [10]
- Big Data Science [11]
- A Very Short History of Data Science [12]
- Data Science Programs on the Increase [13]
Reference
Class 2: Reading Assignment:
- ISO Lineage Model (NOAA Environmental Data Management) [16]
- Earth Science Information Partners Data Management Workshop: [17]
- Earth Science Information Partners: Course Outline [18]
- Univ. Minnesota [19]
- Moore et al., Data Management Systems for Scientific Applications, IFIP Conference Proceedings; Vol. 188, Proceedings of the IFIP TC2/WG2.5 Working Conference on the Architecture of Scientific Software, pp. 273 – 284 (2000) [20]
- Data Management and Workflows [21]
- Metadata and Provenance Management [22]
- Provenance Management in Astronomy (case study) [23]
- Web Data Provenance for QA [24]
- W3 PROV Overview [25]
- W3 PROV Data Model [26]
Assignment 1 - Data Science 2015 Assignment 1 [Download] Preparing for Data Collection (10% of grade) due week 3 on Sept. 15, 2015
Class 3: Reading Assignment:
- Data formats: netCDF [27]
- Spatial Data Transfer Standard GIS format [28]
- Metadata resources [29]
- Metadata Encoding and Transfer Standard - METS [30]
- Open Archives Initiative - Protocol for Metadata Harvesting - OAI-PMH [31]
- Keyhole Markup Languge - KML Tutorial [32]
- Earth Science Markup Language - ESML [33]
- Climate Science Markup Language - CSML [34]
- Climate and Forecast (CF) conventions [35]
Assignment 2: Data Science 2015 Assignment 2 [Download] Presenting your Data (20% of grade) due in week 6, Oct. 6, 2015.
Class 4: Reading Assignment:
- Brief Introduction to Data Mining [36]
- Longer Introduction to Data Mining and slide sets [37]
- See the software resources list [38]
- Data Analysis - Introduction[39]
- Example: Data Mining[40]
Class 5: Reading Assignment: None.
Class 6: Reading Assignment: None.
Assignment 3: Data Science 2015 Assignment 3 [Download] Reformatting and Submitting Your Data (20% of grade) due in week 8, October 27, 2015
Class 7: Reading Assignment: preview government and other (science) data repositories
Some of these have no single "entry point" to their data; you can find them fairly easily by searching for the name of the agency:
- Department of Energy EIA [41]
- Humanities - Digging into Data [42]
- Environmental Protection Agency (EPA)
- US Geological Survey (and state surveys) (USGS)
- NASA Earth Observing System (EOS) and ECHO
- National Oceanic and Atmospheric Administration (NOAA) NODC, NGDC, NCDC
- Department of Energy (DoE): [43]
- National Library of Medicine (NLM): [43a]
- Cancer Grid (CaBIG)
- OneGeology
- data.gov [44]
- Find one of your own
Assignment 4: Data Science 2015 Assignment 4 [Download] Working with someone else's data (40% of grade) writeup due December 1, 2015, Final presentations December 8, 2015
Class 8: Reading Assignment:
- None
Class 9: Reading Assignment:
- Introduction to Data Management [45]
- Changing software, hardware a nightmare for tracking scientific data [46] (and Parts I, II and III)
- Overview of Scientific Workflow Systems, Gil (AAAI08 Tutorial) [47]
- Comparison of workflow software products, Krasimira Stoilova ,Todor Stoilov [48]
- Scientific Workflow Systems for 21st Century, New Bottle or New Wine? Yong Zhao, Ioan Raicu, Ian Foster [49]
- OCLC Sustainable Digital Preservation and Access [50]
- National Science Founcation Cyberinfrastructure Plan chapter on Data [51]
- NITRD report: [52]
Class 10: Reading Assignment:
- Another Look at Data (Mealy 1967)! [53]
- Identifying Content and Levels of Representation in Scientific Data (Wickett et al. 2012) [54]
Assignment - Final: Data Science 2015 Final Assignment [Download] Stewardship: Workflow construction for Preservation (10% of grade) due in week 13, December 8, 2015
Class 11: Reading Assignment:
- The Deep Web (Internet Tutorials) [55]
- Digital Image Resources on the Deep Web [56]
- Tom Heath Linked Data Tutorial (2009)[57]
- Relational Databases on the Semantic Web, Tim Berners-Lee, Design Issue Note, 1998-2009. [58]
- A Survey of Current Approaches for Mapping of Relational Databases to RDF (PDF), Satya S. Sahoo, Wolfgang Halb, Sebastian Hellmann, Kingsley Idehen, Ted Thibodeau Jr, Sören Auer, Juan Sequeda, Ahmed Ezzat, 2009-01-31. [59]
Class 12: Reading Assignment: none
Reference material (purchase not required - please ask instructor if you are interested in any of these):
- Parsons and Fox Is Data Publication the Right Metaphor?[60]
- Beautiful data: [61]
- Scientific data management: [62]
- BRDI activities: [63]
- Data policy [64]
- Self-directed study (answer the quiz): [65]
Course: Data Science
Date: to