Big Data is a Big Bust! (Unless You Do the Little Things First)
Big data is the current buzz word and is all the rage. Big data this, big data that – I think even the Girl Scouts announced a big data initiative recently. This truly is great news, and I have been preaching the value of data as an enterprise asset for years! Back then, it was hard to get anyone willing to listen about good old regular enterprise data, as no one wanted to put in the work, and more specifically spend the money.
Big data has made data a “cool kid” again and is a logical next step in the analytics lifecycle. One of the benefits of big data as opposed to standard data warehousing, business intelligence (BI), and analytics is that it removes responsibility for some of the classic required data management functions, specifically around data integration and aggregation.
But many people seem to be taking this a step further and are ignoring data management entirely. For example, The Centers for Medicare & Medicaid Service (CMS) announced last month that it was withholding one-third of submitted records on drug and device industry payments to physicians and teaching hospitals because of suspected inaccuracies with data included on its Open Payments website.
The problem? Data from physicians with the same or similar names were getting co-mingled. FOR ONE THIRD OF THE DATA! This is a long-standing, classic, data management problem, and revolves around Master Data Management (MDM).
Big data not only (still) requires many classic data management functions such as data quality, MDM, and data governance, it makes the problem exponentially bigger due to the increase in the inherent size of the data sets. Inversely data issues also become LESS visible in the raw data due to the larger data volumes, and often aren’t found until the data is consumed.
The focus is often on the technology while trying to ignore the underlying data quality issues. This allows you to process your data faster, but quality will actually diminish. Many people do not realize that a historical file with a permanent key has to be constantly monitored to detect possible data conflicts (e.g. suddenly a “JR” pops up in one of the names) in order to keep data quality from “snowballing”. This is quite different from files that are produced for one-time use, or a retrospective dataset representing a slice of time in the past.
Data is a living, breathing thing, and as such, needs constant care and attention. Data quality, MDM and other data management functions are not a one-time event, but rather an ongoing process.
The goal is trusted data. And even more than a goal, it is a requirement. If people don’t trust the data, you have lost them and it is very hard to get them back. And data doesn’t just support reporting and analytics, it supports your systems, executive decision making, and the monitoring of the core health of your organization.
Big data is the future, but please make sure that you take baby steps and start with the basics first.