Data Quality – How Much is Enough?

I read Henrik Liliendahl’s blog post today on “Turning a Blind Eye to Data Quality”.  I believe Henrik and those that commented on the post have some very, very good points. For data quality professionals, the question might be “How much is enough?” when it comes to data quality.  And the answer to that question really depends on the nature of your business and how the leaders in your organization view the value that data quality can bring them.  The question we will most often be asked is “how does DQ help my bottom line?”  If we as  data quality professionals can’t tie DQ initiatives directly to bottom line impact, it will be hard to get serious attention.  And believe me, we want serious attention or our own value to organizations will be questioned.

This means we probably need to change the conversation away from DQ as a means to its own end and towards a conversation about how selected projects can have a positive impact on the bottom line. That conversation may be happening within many organizations at levels above our pay grades. Our first goal should be to ask our direct managers if the conversation is going on and how can we help in real, meaningful, tactical, financially relevant ways.  If that conversation is not happening, we need to ask what can we do to get the conversation going in the same real, meaningful, tactical, financially relevant ways.  We must abandon the mythical goal of a single version of the truth for all attributes.  Our goal needs to be about making the business more successful in incremental tangible ways and thus making ourselves more successful in incremental tangible ways.  After all, as Henrik points out, our businesses are being successful today despite bad data.

In comments to Henrik’s post, Ira Warren Whiteside mentioned that “As with everything else in order to convince an executive to “fix” something it has to be really easy to do, not involve a lot of collaboration and be cheap”.   I would argue with this perspective.  I think that the decision to go forward with the project should be easy, not necessary the project itself.  This means clear, unquestionable value to the business (most likely directly to the bottom line) is needed.  And I think such a project should show impact quite quickly.  You can’t have a 12, 24 or 48 month project without value being realized within, say, the first six months or so.  Thus, it is best if the initiative can be absorbed in bite-sized chunks so initial benefits can be quickly realized to help reinforce a culture of DQ. Don’t wait too long to deliver or you will lose your audience and your chances for any future projects will diminish – greatly.

In the retail supply chain, suppliers end up paying, on average, 2% of gross sales in penalties to their retail customers. This is 2% that is taken directly from the bottom line and is usually tied back to inaccuracies in delivering orders – problems with being “on time”, “right quantity”, “right product”, “right location”, “broken products”. Each supplier has people (often a whole team) focused on resolving these issues.  There are two key ways they tackle these problems.  The first is that they identify the dollar amount below which it is too costly for them to address the problem.  In short, it costs them more to fix it than it does to just pay the penalty.  The second is deep root cause analysis of those issues that are too costly not to fix – either in aggregate (ie. the problem occurs frequently) or as stand alone problems.

I think DQ practitioners could learn a lot from what is done to address these retail supply chain problems.  The first is to identify what is ostensibly “noise”.  The data that costs too much to fix based on its low impact on the business.  The second is to take that bad data that is too costly to ignore and identify initiatives to resolve the DQ problems – and tie that back to tangible business benefits.  The challenge is easy with “perfect order” delivery problems in the retail supply chain.  Companies know that the penalties are directly impacting the bottom line.  Fix a problem and the resulting money goes straight back to the bottom line.

I’m not so sure its that easy for DQ professionals, at least not with all our DQ problems.  But I’ve been around long enough to know that there are some low hanging fruit that exists in just about every business.  As a hint, if your company or customers are suppliers to retailers, you might start your quest for a promising DQ project with your compliance team since often supply chain problems can be tied back to DQ problems.  Imagine that!

 

 

 

 

Advertisements

Size Doesn’t Matter to Donkeys

Should it to you?

When it comes to treats, my donkeys don’t seem to care how big the cookie is – just that they are getting something.  Thus, I keep my donkeys lean and my costs low by breaking their special horse treats in two before letting the boys scarf them down.  Likewise, I break our dog treats in 2 or 3 before dispersing to the pack.  Polly in particular has benefited from a lean diet supported by pieces of biscuit as opposed to the whole thing.  Of course, I could just purchase smaller treats, and sometimes I do.  But what about your company?  When it comes to your EDI and B2B programs, do  you go for the small provider or the big one?  For you, does vendor size matter?

In recent articles in EC Connexion and the VCF Report, I outlined my thoughts on the pending GXS/Inovis merger.  In the former I focused more on the B2B practitioner and what it might mean for them.  In the latter I leveraged some of the first and provided a higher-level view focused on the business and managerial reader.  In both I pointed out that what you get out of the merger depends on how you and your company reacts.  There is enormous potential in the merger, but companies will only benefit if they hold the new company accountable for brining the right mix of solutions to the table – a mix the merged company will have access to, but might not fully exploit.

However, this merged company will be best positioned of all extant B2B players to provide the full end-to-end services on a global basis.  In fact, others who claim to be global will – when you look under the hood – only have one or two people in emerging markets like China, India and Brazil where the merged company will have significant resources in those locations.  In this case, size does matter.  If you have global operations – whether it be your own enterprise or companies in your extended supply chain (supplier, supplier’s supplier, customer, customer’s customer) you must consider the ability of your B2B partner to help you manage that ever changing supply chain as you move production from country to country and from far offshore to a mix of far and near shore manufacturing, and as you change your mix of carriers as production changes. 

Successfully implementing, maintaining and managing a global business requires that your partners be there – and that they have been there doing what you need done for a while.  You need them to be smarter than you when it comes to new countries and new regions.  There is no use partnering if the partner can’t bring something to the table.  There isn’t another B2B player that has the global reach within their own organization than the GXS/Inovis merger will bring.  Much of that comes from GXS, but with the addition of Inovis’ unique solution offerings, the global capabilities of the merged company will be significant. 

No matter the size of your business, if you do business globally, size should matter to you.  There are other fine players in the space – Tie Commerce is one smaller player with strength in Europe, the US and Brazil – but if you truly need round-the-world visibility, accountability, presence and capability, your partner’s size will matter.  Don’t underestimate the knowledge that doing business in a country can bring, or that being able to do business in nearly 20 languages can get you.  Legal, business and technical knowledge are all there for leveraging.  The question is, if size does matter and you choose the merged company, will you demand that they leverage their size (both global presence and overall solution set) to help you do business better with enhanced global visibility, data quality and synchronization, compliance management and full, uninterrupted end-to-end automation?  Or will you just ask them to do what you’ve always done – automate purchase orders, invoices and ship notices and call it a day.  If that’s all you do, your stakeholders won’t be happy and, when it comes to your B2B partner, size really won’t matter after all.

Single Version of the Truth VS Single Interpretation of the Truth

The recent Data Quality Blog Olympics  with Charles Blyth Jim Harris, Henrik Liliendahl Sørensen got me thinking about this more in depth. From a pure technical standpoint, can you have a single version of the truth? Does it necessarily have to be a shared version of the truth? I’m thinking you can have a single version of the truth but it doesn’t have to be a shared version of the truth. I think what you can’t have is a single interpretation of the truth.

In simplistic terms, data is the data – wherever you get it from. You need to define it and define it well. For instance, the color of the product – as I define it – is “blue”. The height is 6 inches. I define my fields for the height and the unit of measure for that height as well as the color. Those are absolute values. They are the truth, per se.

Let’s try a few more relatively simple ideas. Pretend I am run a store. In my store I like to assign a “customer number” to each of my customers because that’s what I started doing dozens of years ago. My preference is to use their social security number (yes, I know it is a bad practice, but I choose to do this). However, some customers don’t have a social security number. I need to have another method of assigning them a number so that I can uniquely identify them. I now have to have a method to

  1. capture and store the social security number
  2. identify that the social security number is the customer number (perhaps I store it twice – once in SS# field and once in the customer number field, perhaps I use a pointer, perhaps my business logic looks to the SS# field and only looks to customer number field if SS# is blank)
  3. create a customer number if SS# is not available.
  4. make sure that the created customer number can be uniquely identified as not being a SS#

Now, I know what the customer’s SS# is if they have one (assuming they gave it to me). I also know that my customer number is – which may or may not be their social security number. My single version of the truth for the customer number is whatever number I have for them – SS# or my created number. I need to interpret – perhaps – whether it is a SS# or not.

Perhaps that’s not a good example. What about transient data like “ship to address”. Is the current address accurate? By accurate do we mean “do they live their now?” or do we mean “did we spell it correctly?” Do we keep previous versions of the address so we know history? Still, the information is absolute – whatever the answer is. And we know it is kept separately from “billing address” though the two might be the same. If we want to know who we ship products to, we can know based on ship to or billing address – and doing sales analysis separately against both can yield different results. But the actual data is absolute. And we know when each sale was placed for each ship to and billing address.

Marketing, sales and the finance department might all look at the same data set and interpret it to mean different things, but the absolute numbers are still the same thing. They may aggregate different pieces of information – like color, size, and price by ship to address or color size and cost by customer number. All those pieces of data may be absolute, but the end result of the use of that data is a different interpretation of the impact to the business.

I’m still having problems with the various case studies for why you can’t accurately and definitively assign absolute values – a single version of the truth – to a field of data. Perhaps its free form text fields that are left to interpretation? Perhaps it is that some of the data entered may be inaccurate? If it is only entered in one place and then electronically shared to other secondary instances of the field (in other systems) you still have a single version of the truth – it just may not be accurate. Is it because we just don’t want to capture the same information in 5 different ways to satisfy the different usages people will have for it? I know that different states and different railroads refer to rail crossing information fields differently. But if the authoritative body defines the meaning of the fields that it uses, the various other entities may interpret this information in their own “language” but it is still defined by the authoritative body.

Perhaps I’ve been lucky in my work so far, as the problems seem to always be in executing on the rules once defined (someone put width in the height field).  Yes, initial agreement on definition can be tough, but agreement can and should be reached.  It may be a leadership issue rather than a data issue if you can’t.

What are your thoughts? Give me some examples of where you can’t have a uniquely defined field in your line of work. Are there different industries, lines of business, or organizations where this is more difficult?

The Data Quality Blog Olympics

Trying to simplify the data quality debate just a bit.

So, I read the three posts by Jim Harris, Henrik Liliendahl Sorensen and Charles Blyth in their good-natured debate on data quality.  The problem is that they seemed so theoretical and – in some ways – abstract, that I found the meanings got lost in their messages.  I found Jim’s to be the most understandable though I think Charles’ entry was perhaps a bit more in line with what I see as data quality.  Still, they all left me wanting in some ways.

My Take

I think my view is a bit simplistic compared to these three data quality thought leaders, but these are my thoughts on data quality with an eye towards these three blogs.

  1. Information is data in context.  If you ensure that data is defined accurately at a granular level and you have an objective way to ensure your data is complete, accurate and normalized, the users of the data will put it in context – as long as they have access to, and understand, the definitions of that granularly defined data.  This “granular definition” is the key point I like from Charles’ post.  In my mind, information “in context” is the key “subjective” dimension mentioned by Jim.
  2. If the data is defined at an appropriately granular level, you should be able to arrive at a single version of the truth.  This may result in many, many granular definitions – many more than any one user might care for.  In fact, any individual user may choose to roll up groups of granularly defined data into meta-groups.  And that is fine if that data fits the context for which they want to use it.  But it first has to be defined at the most granular level possible.
  3. Once the data has been defined at the granular level, and single system of record needs to be chosen for every defined piece of data.  If it can be entered into more than one system, that needs to stop.  All the secondary systems (not the system of record) need to be cut off from manual updates to help ensure 1) accuracy across all instances and 2) potential for “redefinition” by someone with a silo-influenced view.  While this might be thought of as a MDM or governance related concept, I think it is core to the needs of quality data.  Both the definition and the value of the data must be made sacrosanct.

With appropriately defined data at a granular level that has both the definition and the actual data values protected, the users can be free to put that data into any contexts they wish to.  From where I sit, if the user understands the definitions, it is their prerogative – and in some cases their job – to leverage that data which they have access to in different ways to move the organization forward.  They put the data into context and create information.  If they are properly informed, those users can decide for themselves if the data is “fit for their intended purposes” as Henrik mentions.

In many ways I think the subjective use of the data should be separated from the objective process of securing the accuracy of the individual values.  The data is either objectively accurate or it is not.  The key to this, though, is making sure the users are fully informed – and they understand – the definitions of the data.  In fact, this may be the most difficult aspect of data quality.  Technically, I think defining data at a granular level may be relatively easy as compared to making sure users understand those definitions.

Then there is the cross-enterprise use of data – and whether your definitions align with the rest of the world.  A recent study conducted by AMR Research and GXS shows that about 1/3 of all data originates outside an organization (40% in automotive, 30% in high tech, about 34% in retail). Industry standard definitions come in handy but everyone has to use them to bring real value to the various enterprises (see the retail supply chain’s use of the Global Data Synchronization Network as an example of good definitions but limited adoption).  We still probably have the need to translate definitions between organizations.  That is where the breakdown comes.  Like much of the success in the RFID world, data quality is best managed in a closed loop environment.  Once it leaves your four walls (or your back office systems) you can’t easily control 1) its continued accuracy 2) its definition 3) the interpretation of the definition by external users.  Even with a global standard, GDSN participants had trouble figuring out how to measure their products so height, depth and width have been transposed many times.

I think, though, that companies that focus on internal data accuracy will be in a much better position to have cross-enterprise accuracy as well.  The key here is that eliminating the internal challenges will reduce the occurrences of problems and help narrow the list of places to look if any do arise.  And having that good data can help make an ERP Firewall much more valuable.

Of course, this topic is something that could take thousands of pages to cover and years to write – and still miss most of the challenges, I’m sure.  But sticking to the three points above for internal data quality can go a long way towards making better decisions, improving the bottom line, and making the overall enterprise more effective.

 

Data Quality Doesn’t Get Any Respect

Halloween Costumes have some fuming.  But should stakeholders be the most upset?

David Loshin of DataFlux recently asked “is Data Quality Recession Proof?“ I think the answer is “yes”.  There is no doubt that data quality is still a problem in an economic downturn. In fact, I think it is recession irrelevant.  It continues to have a bad impact and is demanding attention, but I wonder if it is getting any more attention than it has in non-recessionary times.  I believe it should because – as Charles Blyth points out in a comment on David’s blog – organizations are more efficient with greater data accuracy. The fact that so much attention has been paid to new ways to move data (see The Long Tail of B2B Standards by Steve Keifer) that we miss the need to get the data right! 

Is Good Data Really Such an Alien Issue at Target?

I’d say it is time to stop focusing on XML and other means of moving data and spend more time getting the processes, culture and technology in place to assure accuracy of the data.  It does little good to move it quickly or in a new format if the data is bad. 

So, any time is a good time for data quality.  The problem is getting enough enterprise focus on the issue.  Yet when you do get corporate attention, is it for the wrong reasons?  Take Target, for instance.  They were the first to respond to the controversy regarding the “illegal alien” Halloween costumes they (along with Wal-Mart, Walgreens and others) sell.  This weekend some people started suggesting the costumes are not “politically correct” and that caused the risk-adverse retailers to pull the web pages for those products from their online stores.  In Target’s case, the fact that they are (or were) selling the costumes was explained as a “data entry” error (Hey, they got first dibs on this excuse – wonder what Wal-Mart and Walgreens will say?). 

If a data entry error (meaning data quality) is really the cause, was that error in the ordering of the multiple SKUs – like “We didn’t mean to enter in these SKU numbers and quantities along with distribution centers to deliver them to and the dates on which to deliver them”?  Or did they want to order them but not post them on the web site or put them in the store for sale?  In this case was the data entry error in revealing they had a horde of the products that they weren’t planning to sell?  I can see the CEO in a staff meeting:  “Let’s order these things and not sell them, thus keeping everyone from getting them because we think they are not politically correct!”

Let’s face it: Data Quality becomes a convenient scapegoat even when it probably isn’t the problem.  Yet, when it really is a problem it is too often ignored.  I know how bad the data is because in my last role I was able to analyze the quality of data being sent to retailers from their suppliers.  The data crossed numerous industry sectors so we know the problems are persistent and numerous.  We know the companies using the GXS Product Data Quality tool are experiencing better data, however, I suspect that only the retailers are really making use of this data.  Most suppliers don’t take the cleansed data and refresh their own systems with it.  Thus, retailers are ordering from good data and it won’t line up with the data in supplier systems.  The suppliers have to know their data is bad.  Do they just not care? 

In the end, a data quality program that fixes data on one end but not the other of a strategic relationship really is little better than having the same bad data on both ends.  The only difference is that retailers can wield a heavier compliance stick when they know their own data is good.  I think it is time to divest from those suppliers and their retailers.  Since supply chains with divergent data on each end are destined to be less competitive, they aren’t a good place to invest nor a good place to continue holding stocks.  Unfortunately most retail supply chains are like that. 

Getting back to the costumes, these retailers have sophisticated, automated systems.  Even for seasonal products Wal-Mart has decided against seasonal suppliers so they don’t have to manage occasional, manual business relationships.  They want automation with repetitive, reliable and accurate processes with correspondingly good data.  If the information about the costumes in Target’s systems  was wrong I could see it being a data quality problem.  But the fact they were sold in the first place?  Don’t blame it on errors in data – keystroke or otherwise.  There are enough of those without adding additional, fictitious fault.

Data quality just doesn’t get any respect.