Interview with Dr. Philip  de Melo

JS: Hello, Dr. de Melo. My name is Julie Smith and I represent Science Review. Nice to see in our studio.

PdM: Great to be here.

JS: You are a data scientist working on health projects including public health. What is the major challenge in public health informatics now?

PdM: COVID-19 demonstrated that IT technologies we used before are no longer suitable for processing big data that constitute over 80% of recorded health data.

JS: As a pioneer in big data analytics used in health informatics education and research, can you explain to our readers what big data is?

PdM: Previously, we believed that having dealt with large data sets, we worked with big data. This definition has recently changed. We can process the data sets of 100 records or of 1 million records at almost the same time. Big data is when the data is not static but is recorded with high velocity and we have no time to process first arrivals but next arrivals are at the doorstep. The inability to work with big data adds a lot of frustration in the health informatics community.

JS: How does it reflect in the real world?

PdM: It is estimated that the inability to efficiently work with big data, leads to losses of $300 billion to the nation’s health economy annually.

JS: Do you mean that the public health workforce does not have enough qualified professionals able to work with big data?

PdM: I say that when the COVID-19 pandemic unfolded, public health agencies understood that existing technologies were not enough.  Public health professionals knew how to create simple databases, but combining disparate large changing data sets to draw real-time insights was another matter altogether. COVID-19 indicated an urgent need for 3 things: scale, automation, and speed.

JS: How does the US government react to these needs?

PdM: The Office of National Coordinator (ONC) led by Dr. Tripathi conducts an excellent program that aims at improving the nation’s Public Health Informatics and Technology workforce. The ONC funded 10 US universities with the objective to train 5,000 individuals who will bring necessary knowledge to public health agencies.

JS: Can you recommend to our readers a few universities with a strong public health informatics training?

PdM:  I would recommend  the  University of Texas Health Science (Dr. Susan Fenton), Jackson State University (Dr. Grimay Berhie), Norfolk State University (Dr. Marie St. Rose), California State University Long Beach (Dr. Kamiar Alaei and Dr. Gora Datta), George Washington University (Dr. Keith Crandall),  University of California Irvine (Dr. Kai Zheng). These and other universities committed to build strong public health informatics programs in this country.  

JS: Your online big data training platform attracts a substantial number of students in this country and abroad. How does it work?

PdM: First of all,  the platform is free of charge (an open source), but we give priority  to those students who have a necessary background in statistics, data science, and  programming. The most important is if they have plans to join the public health workforce. So far, 276 students were trained on how to process big data in the public health domain.

JS: How?

PdM:  First of all we teach students that in public health, data comes from different sources such as electronic health records, various registries, administrative databases (such as Medicare or a prescription database), insurance claims, laboratory records, etc. The data is usually unstructured (cannot be presented as tables with rows and columns). For example, electronic health records have less than 20% of structured data that can be stored in databases. Instead, students learn to use cloud technologies and data lakes (such as AWS data lakes) that are able to efficiently integrate complex public health data of different structures, types and formats. The online platform teaches students how to choose various big data modern technologies (depending on the problem to solve) such as Cassandra, Hadoop, Spark, Kafka, ElasticSearch, Flume, etc.  Actually the platform advises the student which technology is more suitable for processing his/her data in real time.

JS: Do you mean that students learn how to process public health data in real time?

PdM:  I believe that 2 technologies will revolutionize public health in this country and worldwide: the first one is data interoperability (exchange) using web services (API-application programming interfaces-API) and the ability to process big data in real time. This we call big data streaming. Big data streaming implies the creation of a pipeline. Students learn how to build a public health big data pipeline for stream processing. A streaming data pipeline flows data continuously from sources to destination as it is created, making it useful along the way populating data lakes and allows the public health officials to make decisions in real time.

JS: I came from clinical medicine. Can you say a few words about the use of big data and streaming analytics to improve medical care?

PdM:  Medical devices provide  vital signs through physiological streams such as an electrocardiogram, heart rate, blood- oxygen saturation, and respiratory rate. Electronic health records (EHR) represent a great wealth of medical data. Life-threatening conditions such as nosocomial infection, pneumothorax, heart attacks and strokes  can be better  detected using big data analytics that integrate different data sources. Streaming analytics  improve patient outcomes and reduce the length of hospital stays identifying new relationships between data-stream events and medical conditions. Finally, personalized data streams may significantly reduce morbidity and mortality providing health insights continuously.

JS: Thank you, Dr. de Melo for a very interesting discussion.

PdM: You are very welcome.

Scroll to Top