Semantic eScience Meeting September 04, 2013

Printer-friendly version

General Meeting Information

  • [Semantic eScience meeting 2013-09-04 This Pad]
  • [Semantic eScience meeting 2013-05-01 Previous Meeting]
  • Call-in information
    • Goto Meeting
  • Meeting Page
  • Next Meeting

Agenda

  • updates from summer, what you did, new things, what you are doing in the Fall (projects/ research)
  • new RAs in the lab for the Fall (who they are, what they are working on)
  • meeting schedule for Fall, Doodle Poll: http://doodle.com/4ir3q6cnzynakahx
  • sign up for 15 mins talks for coming meetings (e.g. someone want to take this one (an old one) - I will explain more on Wed - http://firstmonday.org/ojs/index.php/fm/article/view/1145/1065) [Katie takes it]

Attendance

  • Patrick West
  • Boliang
  • Marshall X Ma
  • Jin
  • Harsha
  • Han
  • Katie
  • Peter
  • Linyun Fu
  • Chengcong
  • Mengyu
  • Jun (in a course)

Past Action Items

Action Items

  • Patrick: make sure everyone has accounts

Presentations

  • Keep this list from week to week so we know who's presented and who will present
  • Last year's presentations
    • Stephan September 21, 2012 - Ontology documentation discussion
    • Yu Chen, October 5, 2012 - Continuous Flow Forcast in Southesk River: Where we are and how we proceed, semantically
    • Marshall X Ma, October 19, 2012 - Exploratory visualization of earth science data in a semantic web context
    • Eric Rozell, November 2, 2012 - Resource Discovery for Extreme Scale Collaboration
    • Massimo Di Stefano, November 16, 2012 - IPython Notebook (applied to the ECOOP use case)
    • Han Wang, November 30, 2012 - http://www.odata.org/ - Open Data Protocol: the current state, libraries, functionality
    • Jin Guang Zheng, February 20, 2013 - Semantic Similarity based Entity Mapping: GCIS Case: GCMD-CLEAN Mapping
    • Linyun Fu, March 6, 2013 - CMSPV and ELDA
    • Patrick West, March 20, 2013 - Adaptations and extensions of Drupal
    • Han Wang, April 03, 2013 - QUDT: Quantities, Units, Dimensions and Data Types in OWL and XML (http://www.qudt.org/)
    • Massimo Di Stefano, April 17, 2013 - Data and visualization integration via web based resources
    • Marshall X Ma, May 01, 2013 - A short story of geologic time ontologies and vocabularies
    • Katie Dunn, Sept 18, 2013 - http://firstmonday.org/ojs/index.php/fm/article/view/1145/1065
    • ORCID - Linyun Fu

Notes

Yu Chen (Sorry for the virtually/physically absence...)

    • Summer
      • Developed an ontology-based classification system for cuisine categorization
      • A data mashup between multiple propriertory(allmenus.com, factual etc) and non-proprietory(freebase, dbpedia) dataset. These dataset act as the training data(ground truth of what dish belongs to what cuisine). Using Natural Language Processing(nlp) techniques(n-gram, LDA etc) as well as machine learning techniques(Mutual-Information Weighted Bayesian Networks) to build the model
      • One conference paper ready to be published on either, RecSys, AAAI Spring Symposium, CIKM etc
      • Learned a lot on dealing with sparse matrix, feature selection, nlp in practise(data clean up) and machine learning. Had a wonderful time in the Bay Area, meeting with different talented people, having lots of startup ideas burgeoned, had lots of Korean food, fascinating about the free food in Yahoo, Linkedin, Google etc, and getting to know more about the dark side of Samsung....
    • DCO
      • Working on fixing handle bugs now and then
      • Will concentrate on DCO in the fall semester and will try some research ideas taking advantage of the platform
    • Research
      • Data processing/Algorithm/Visualization as a service
        • To create a platform that help people to share their algorithm that can work on the dataset they provide, in such way that their algorithm can be referenced/reused.
        • Based on the motion detection algorithm I've done in the previous semester, I will try to publish the algorithm online such that people can test the algorithm agains different dataset, as the first prototype of "data processing as a service"
        • Develop visualization as a service, i.e. describe the input of the visualization in terms of an ontology and visualize the data obtained from the user accordingly.
        • Comment are EXTREMELY welcomed. I am trying to combine the work we gonna do in DCO with something that can be regarded as a research/thesis/publishable work.

Boliang

  • DCO - http://tw.rpi.edu/web/project/DCO-DS

Marshall X Ma

  • DCO-DS - http://tw.rpi.edu/web/project/DCO-DS
  • GCIS-IMSAP - http://tw.rpi.edu/web/project/GCIS-IMSAP
  • Will co-chair a session at AGU FM2013
  • A proposal for a special issue 'Semantic e-Science' was accepted by journal Earth Science Informatics

Jin

  • GCIS-IMSAP - http://tw.rpi.edu/web/project/GCIS-IMSAP
  • S2S - http://tw.rpi.edu/web/project/SeSF/workinggroups/S2S

Harsha

  • DCO - http://tw.rpi.edu/web/project/DCO-DS

Han Wang

  • DCO - http://tw.rpi.edu/web/project/DCO-DS

Katie Dunn

  • Works up in Library, Took Peter's Data Science class, semantic e-science.
  • Might have some time to work on a project later this semester (ontology dev or something?), hanging around for meetings in the meanwhile to listen to article presentations, see if there is something I might be able to get involved in.

Linyun Fu

  • GCIS-IMSAP - http://tw.rpi.edu/web/project/GCIS-IMSAP
  • CMSPV - http://tw.rpi.edu/web/project/CMSPV

Patrick West

  • Senior Software Engineer - on staff with TWC
  • http://tw.rpi.edu/instances/PatrickWest
  • DCO -http://tw.rpi.edu/web/project/DCO-DS
  • CMSPV - http://tw.rpi.edu/web/project/CMSPV
  • RDESC - http://tw.rpi.edu/web/project/RDESC
  • Other Projects
    • TW Web Site - http://tw.rpi.edu
    • ECOOP - http://tw.rpi.edu/web/project/ECOOP
    • GCIS-IMSAP
    • System Administration

Benno Lee

  • PhD Student
  • RDESC - http://tw.rpi.edu/web/project/RDESC

Peter Fox

  • TWC Constellation Chair
  • http://tw.rpi.edu/instances/PeterFox

massimo di stefano

  • Software Engineer on staff
  • ECO-OP - http://tw.rpi.edu/web/project/ECOOP

Peter general comments on e-Science and how it fits in with the other TWC research themes

  • Purpose of this meeting - exchanging ideas at a research level. Not a project meeting.
  • http://tw.rpi.edu/web/ResearchAreas/SemanticeScience
  • Commonalities across projects
  • Thesis research topics
  • Research needs, opportunities
  • New technologies
  • Interacting with collaborators, bigger picture stuff, outreach
  • Deborah's students will sometime come to this meeting, too.
  • Provenance meeting temporarily suspended (some projects wrapping up) - some discussions moved into this meeting for now.
  • At beginning of each meeting, get an update on a new technology, article, etc. (~15 minutes)
  • Current research areas
    • Data science
    • Resource discovery (Patrick, Benno Lee)
    • Provenance instrumentation / explanation (natl climate assessment, marine fisheries reports) - of interest to govt, private sector - $, thesis topics!
    • data.rpi.edu - institutional data project. relevant to deep carbon observatory (Katie might want to show up to DCO meeting sometime)
    • GCIS (Global Change Information System) - many authors not contributing content to populate the ontology/instances. infrastructure is good, content is not there, yet. How do you get people to contribute.

Marshall:

  • Have people who did internships in an external company give a summary explanation of their activities
  • Harsha (Intern at CISCO): stats .... module: wide area optimizer. optimization of packets. his team looked at traffic between microsoft exchange server and .... . accomplished with read and write optimizations. Is the device functioning as expected? - Reports. Team was 4 developers in US, 7-8 in Bangalore.
  • Han (IBM research china) - application discovery & cloud migration - migrate application from local server to cloud (more generally, from one machine to another). to migrate, need to find configurations/config files. did this with data mining - classify/distinguish config files from other files on machine. once files found, parse them to get specific configurations. did some functionality demos, planning to get a paper out. brought back some work! Need more data, need to do experiments for result
  • linyun - google maps global seattle - creating visualization for map change. detecting specific changes, visualizing them on map. familiarization with google infrastructure - compiling/config tools, libraries. made a poster, but too much confidential information. what happens at google stays at google

Marshall: questions from PhDs - what will people be doing? - Peter answers.

  • GCIS current iteration ends this fall, next iteration not in place yet. folks working on GCIS will need to be funded from a different project. Patrick knows how this fits together. ecosystem interoperability project is different end of a very similar problem. Various folks working on scientific explanation & information systems. GCIS is written report of nature of change in climate in US by region & over time, and implications for that climate change. text, figures, tables, conclusions. funded by congress. object is to present it to people, one venue is web interface. beyond simple links - underlying material is connected by provenance ontology. big problem in GCIS project is how to extract this info reliably from a report and present it in a web portal. coop project (based in woods hole oceanographic institute, massimo is primary software eng, patrick doing architecture work) is developing ipython tool for generation of a report. fisheries people are generating a report to tell scientists, policymakers whether they should change fishing regulations. they are producing this report and need to be able to explain it. interactive use of python in a web form interface. cut & paste python, R, Matlab, URLs for datasets that can be pulled in, references - all this goes into generating the report. provenance is collected semi-automatically. GCIS needs to build in some of this provenance info, and generate it in a form that's useful to users. what national climate assessment wants to do in next round is use iPython notebooks.
  • Anusha has been working on integrity of use cases. She has seen a lot of use cases across different projects, is in a good position to see commonalities. Has also worked with S2S (application integration framework) - main application is faceted search. Eric Rozell is primary author. Used in a lot of applications. Harder to deploy it since Eric left! Lots of dependencies - hard to deploy outside twc environment - Anusha did a lot of work last semester analyzing different deployments of S2S. Lots of S2S requirements in projects.

Marshall: new RAs in the lab for the Fall (who they are, what they are working on)

  • Peter comments for RAs: learn stuff, take advantage of opportunities

Peter: sign up for 15 mins talks for coming meetings (e.g. someone want to take this one (an old one) - I will explain more on Wed -

Next meeting - still collecting times on doodle poll - will be announced.
Frequency: every 2 weeks?