Rui Yan

Rui Yan
I am currently a 3rd year PhD student at Tetherless World Constellation, Rensselaer Polytechnic Institute. I graduated from Harbin Engineering University with a bachelor degree in Engineering before I became a full-time PhD student here in 2011. My research advisor is Prof. Deborah L. McGuinness

About my research

My research focuses on natural language processing, semantic web & technologies and ontologies. I am involved in a variety of research projects:

Recently I am also working with Prof. Heng Ji on a clinical notes concepts recognition project. This project aims to identify concepts written and hidden in unstructured natural lanuage by leveraging NLP technologies and clinical domain ontologies.

I went to an internship during the summer 2013 in Franz Inc. I benefited a lot from this internship not only making great friends with people there, but also enjoying the enhancement of my research and presentation ability. During my internship, I developed a medical devices ontology by leveraging both natural language process and semantic technologies. A interactive user interface is also developed which allows user to query, navigate, compare and analyze the costs of diverse medical devices so as to improve efficiency and profits.

Yan, R., Greaves, M.T., Smith, W., and McGuinness, D.L. 2016. Remembering the Important Things: Semantic Importance in Stream Reasoning. In Proceedings of Stream Reasoning Workshop 2016 at International Semantic Web Conference (ISWC) 2016 (October 18 2016, Kobe, Japan).

Yan, R., Praggastis, B., Smith, W., and McGuinness, D.L. 2016. Towards A Cache-Enabled, Order-Aware, Ontology-Based Stream Reasoning Framework. In Proceedings of WWW 2016 (April 11-15 2016, Montreal, Canada).

Yan, R., Praggastis, B., Smith, W., and McGuinness, D.L. 2015. Towards Smart Cache Management for Ontology Based, History-Aware Stream Reasoning. In Proceedings of International Semantic Web Conference (ISWC) 2015 (October 11-15 2015, Bethlehem, PA, US).

Erickson, J.S., Chastain, K., Patton, E.W., Fry, Z., Yan, R., and McGuinness, D.L. 2014. Identifying First Responder Communities Using Social Network Analysis (POSTER ABSTRACT). In Proceedings of International Semantic Web Conference (ISWC) 2014 (October 19-23 2014, Riva del Garda, Trentino, Italy).

McCusker, J., Yan, R., Solanki, K., Erickson, J.S., Chang, C., Dumontier, M., Dordick, J., and McGuinness, D.L. 2014. A Nanopublication Framework for Biological Networks using Cytoscape.js. In Proceedings of International Conference on Biomedical Ontologies (ICBO 2014) (October 6-9 2014, Houston, TX).

Erickson, J.S., Chastain, K., McCusker, J., Fry, Z., Yan, R., and McGuinness, D.L. 2014. Technical Report: Identifying First Responder Communities through Social Network Analysis of Disaster-Related Traffic.

Erickson, J.S., McCusker, J., McGuinness, D.L., Fry, Z., Chastain, K., and Yan, R. 2013. Technical Report: Requirements Gathering through First Responder Social Network Analysis.

Project Collaborator

First Responders logoFirst Responders Requirements Metholodology (FirstResponders)
Principal Investigator: Deborah L. McGuinness
Co Investigator: John S. Erickson
Description: The purpose of this project is to design and prototype a requirements-gathering methodology driven by the first responders community. The methodology will include examining the current state of collecting and synthesizing responder requirements, assessing the effectiveness of that process, evaluating existing candidate platforms for use within this community, and producing a roadmap that can be used by NIST and others to achieve a solution enabling the responder community to more effectively dialogue with key stakeholders. A prototype implementation of the methodology will be developed using the roadmap and will be available for testing and evaluation and requirements gathering.
FUSE LogoForesight and Understanding from Scientific Exposition (FUSE)
Principal Investigator: Deborah L. McGuinness
Co Investigator: Jim Hendler
Description: Technical emergence refers to the process whereby innovative ideas, capabilities, applications, and even entirely new fields of study arise, are tested, mature, and, if conditions are favorable, demonstrate feasibility and impact. IARPA’s Foresight and Understanding from Scientific Exposition (FUSE) Program is sponsoring advanced research and development (R&D) to develop automated systems that aid in the systematic, continuous, and comprehensive assessment of technical emergence using information derived from the published scientific, technical, and patent literature.
Health Data Challenge (HealthData)
Principal Investigator: Deborah L. McGuinness and Jim Hendler
Co Investigator: James McCusker
Description: An infrastructure for large-scale collaboration around aggregation, generation, and publication of health-related Linked Data.
Repurposing Drugs with Semantics (ReDrugS)
Principal Investigator: Jonathan Dordick and Deborah L. McGuinness
Description: We aim to find new effective treatments for disease using existing drugs. Our approach is to gather and integrate existing data using semantic technologies to help discover promising drug repurposing.
TW LogoStreaming Data Characterization (SDC)
Principal Investigator: Deborah L. McGuinness and Mark Greaves
Description: This project aims to develop a flexible window management strategies and algorithms for stream reasoning. We have proposed a stack of technologies including sequential stream reasoning architecture, the notion of semantic importance. Project Poster link: Project Slides link:
TW LogoStreaming Hypothesis Reasoning (Shyre)
Principal Investigator: Deborah L. McGuinness and William Smith
Description: AIM will advance streaming reasoning techniques to overcome a limitation in contemporary inference that performs analysis only over data in a fixed cache or a moving window. This research will lead to methods that continuously shed light on proposed hypotheses as new knowledge arrives from streams of propositions, with a particular emphasis on the effect that removing the expectation of completeness has on the soundness and performance of symbolic deduction platforms.
TWC Project LogoTWC Vocabulary Development (TWC_Schemas)
Principal Investigator: Jim Hendler
Description: provides a collection of schemas — html tags — that webmasters can use to markup their pages in ways recognized by major search providers. Search engines including Bing, Google, Yahoo! and Yandex rely on this markup to improve the display of search results, making it easier for people to find the right web pages. Since early 2012 researchers at TWC RPI have been working with government and research data providers to define vocabularies for expressing the structured data that powers their web sites, using on-page markup based on vocabularies. In particular, we developed the extension, a concise vocabulary that extends for describing datasets and data catalogs. Current work includes applying Dataset to scientific datasets and developing new extensions for use by Web Observatories