Data Quality Deficiencies in Healthcare Records

Joseph A. Yacura, Founder – International Association for Data Quality, Governance and Analytics

 

The recent corona virus has drawn considerable attention to critical errors and gaps in the supply chain data quality in the healthcare field. The severity of the limited real time data, poor data quality and lack of vertical integration of data all contributed to the challenges of ordering, shipping and receiving critical medical supplies.

Prior to this pandemic, a study of health care records was conducted by Yili Zhang and Guney Koru whose results were published in the “Journal of the American Medical Informatics Association Volume 27, Issue 3 March, 2020 Pages 386 – 395”. The article is titled “Understanding and detecting defects in healthcare administration data: Toward higher data quality to better support healthcare operations and decisions.”

The purpose of their study was to investigate and quantify the quality of the data contained in the health care records to develop a systematic approach to understanding and assessing data quality as data is becoming increasingly important as the volume and utilization of health care data increases. The data set they studied were comprised of Medicaid records from an undisclosed state. The data set contained 2.23 million rows and 32 million cells. They segmented the defects found into five (5) major categories and seventeen (17) subcategories. The five (5) major categories were:

  1. Missing data

  2. Correctness

  3. Syntax violations

  4. Semantic violations

  5. Duplicity

Results:

Their study found more than 3 million (3,000,000) defects in the data set. Defect density exceeded 10% in five tables. The majority of the defects were format mismatches, invalid codes, dependency-contract violations, implausible value types, etc.

These findings suggest a significant opportunity for healthcare organizations to immediately address their data quality. Such an effort will result in lower operating and administrative costs, a higher level of data quality and improved patient care.


Previous
Previous

“Statistical Inference on Membership Profiles in Large Networks” Dr. Jianqing Fan - Princeton University

Next
Next

PyCaret 1.0.0