| | November 20188CIOReviewRECENT TRENDS IN VISUALIZATION FOR THE DATA-DRIVEN WORLDOver the past six years, the business technology community has come to realize the value in collecting data at scale and applying analytical approaches to drive decision-making. This transformation across business, industry, healthcare and science in general has been well documented in CIOReview articles spanning Big Data, artificial intelligence and informatics. The data scientist, dubbed Harvard Business Review's "sexiest job of the decade," has emerged to meet the challenge of leveraging data from databases, deploying machine learning techniques across computational architectures, and producing actionable insights. Both visualization and machine learning are both critically important and codependent for success.Generating visual summaries is one of many core skills required for data scientists, as part of their communications and storytelling role. Visualizations serve as a common language bridging the analysts and decision makers to exchange interpretations, assumptions, patterns and artifacts found in all data. Our brains have evolved to effectively find patterns and comprehend data visually. Many new open-source tools have recently become available including interactive web-based tools for the standard python/R languages of data science. It is not coincidental that machine learning libraries have similarly proliferated since there is a strong interplay between the visualization and machine learning fields leveraged by data scientists. Machine learning is currently the driving force behind artificial intelligence. It can identify relationships, make predictions, and extract insights from data. Before choosing a machine learning approach, understanding the data is crucial since each data set has unique nuances. Some approaches can handle missing data entries perfectly, whereas others break. Outliers can distort or add complexity; yet robust methods minimize their effects. Many statistical-based methods require a certain data distribution. Whether using a simple two-parameter regression or deep learning with millions of hyper parameters, analysts must check the model assumptions and validity of the results to ensure a correct interpretation. Caruana et al. highlighted this point in their 2017 study predicting hospital readmission for pneumonia patients. They demonstrated that applying appropriate analyses on carefully collected data gave By Peter V. Henstock, Senior Data Scientist, Pfizer Inc.Peter V. HenstockIN MYOPINION
<
Page 7 |
Page 9 >