Summarization of Health Data Repositories for Clinical Design Support

This research is being carried out with Mahsa Rouzbahman. It began in January 2012 when I was funded with a Google Faculty Research Award ($18,000) to develop a new search engine for emergency room physicians. The basic idea in that project was to use summaries of similar patient files as a clinical decision support tool when dealing with a new patient. In the first year of the project we put a lot of effort in to getting access to health data but it was just too difficult because of the confidentiality of that data and associated privacy concerns. So we decided that the best strategy was to work with publicly available data and to demonstrate the value of our approach before applying to hospitals and other organizations for access to their data. We first tried to get added to the Medical records track at the Text Retrieval Evaluation Conference (TREC) but that track was cancelled due to privacy concerns which reinforced our own experience of the problems associated with trying to get access to confidential health data. We then decided to use the MIMIC II data that is available from The data was provided by Beth Israel Deaconess Medical Center in Boston. It was created by obtaining data from the hospital’s ICU information systems, hospital archives and other external data sources. The data contains a sample of around 25,000 adult NICU patients recorded in the MIMIC II database. The data covers a total of around 36,000 hospital admissions and over 40,000 ICU stays The database contains 38 different tables and among those, we chose to use medication, lab, chart, demographics, ICD codes and ICU stay tables for the first round of data analysis.

At the conclusion of the Google faculty research award we have developed a set of patient types (clusters) that summarize key aspects of the ICU data included in the Mimic II database. We are now writing a scientific paper on our work and we will be looking for other health data sets that we can apply our methodology to. I am very grateful to Google for providing the seed funding for this important research.

In March 2013 I received Notification that I have received an NSERC Discovery Grant to continue this work over the next five years ($125,000 in total). Please contact me if you have health data that you would like to mine for possible patient types that could be useful for a variety of purposes, including clinical decision support.