Social Networks, Big Data, Data Quality and You!

Not only is social data Big Data, its Quality Data, too!

For the better part of the last two decades I have advised my clients and employers to push hard on their suppliers to adopt either EDI (Electronic Data Interchange) or, in certain more recent cases XML, instead of trading using phone, fax, email or snail mail.  The costs – both in time and in bad data – are too great to continue executing business transations in a completely manual way.  I remember years ago when the web first made it viable, I also started recommending that when suppliers would not trade electronically with them, companies should push the use of web portals my clients could off-load the data entry – and responsibility for bad data – onto their suppliers.  Unfortunately many companies (not my clients, mind you) abandoned electronic trading altogether when their IT shops found supporting portals was easier than supporting electronic trading…but that is a completely different story.

The thing here is the idea of having your supplier handle the entry and validation of data.  They have a vested interest in making sure the invoice information they provide you is complete and accurate or they might not get paid.  In the same way, I see the the social networks, and other companies, that are aggregating information on all their users leveraging the same model…but it is even better for them.

Every one of us must fill out web forms when signing up for almost any service on line.  Information we share includes name, address, email address, age and many other personal tidbits.  The information becomes the basis of the “consumer as a product” business that these companies have adopted where our virtual selves are sold to the highest bidder – either directly for cash or indirectly through targeted advertising and routes.  In a March 11th column for the Huffington Post, former Federal Trade Commissioner Pamela Jones Harbour wrote “To Google, users of its products are not consumers, they are the product.”

Imagine having a “self-building” product.  It is self-aware and concerned about itself so it naturally wants to make sure the information about it is accurate so on a daily basis it makes itself better.  From a data quality standpoint, there are few better ways to make sure your data is accurate.  Make sure the products – er, those that enter the data – have a vested interest in it.  But the data quality aspect of social data is even better!  For the most part, the data collected from on-line searches, click-throughs, web browsing and everything else you do online – whether collected overtly or covertly – will pretty much be accurate – it is near impossible not to be.

Social data, then, is a data managers dream.  The real challenge is what to do with all the data.  With few data quality issues, data managers responsible for social data are left with working with their business counterparts to figure out the best ways to exploit the data they collect.  Which leads me to an old quote from Google’s ex-CEO Eric Schmidt when speaking with The Atlantic: “Google policy is to get right up to the creepy line and not cross it.”  With Chloe Albanesius at PCMag.com reporting that at least one recent company defector thinking that Google+ has ruined Google, perhaps they’ve stuck their nose just across that line afterall.  Quality Big Data can be a scary thing.

All I can say is “Stay behind creepy line, please!”