"The keepers of big data say they do it for the consumer's benefit. But data have a way of being used for purposes other than originally intended" - Erik Larson, 1989
A full twelve years before Doug Laney defined "Big Data" in terms of the 3 V's (velocity, variety, and volume) and nearly a quarter century before the term truly entered the collective consciousness, Larson understood that there would emerge a tension between the keepers of data and the consumers from whom the data has been derived.
Notice I am careful not to use the term 'ownership'. Whose data is this anyway - that's a topic for another article. For now, I'm keen to focus on how do think about your Big Data project so as ensure you achieve your objectives from your investment.
So here's the list. It's not exhaustive, but it's my personal Top 5 - please feel free to comment if you believe priorities need to be elsewhere:
5. What are you trying to achieve?
It continues to amaze me how many clients ask for help with their "Big Data Initiative" and their response to my question of "Why are you investing in Big Data?" - the answer tends to be "Because Senior Management has made it a strategic initiative". OK, but why?
The truth is that any investment in Big Data that's purely experimental is likely not to yield a positive return. Data Scientists (and Data Philosophers for that matter) can have lots of fun exploring the vast data sets your organisation might hold, but without a clear strategy as to what this discovery and experimentation exercise is designed to achieve it will ultimately fail.
Start off with your Big Data Initiative with a clear objective in mind, such as:
- we want to reduce costs by discovering operational efficiencies
- we want to grow revenue by being smarter with how we offer our product or service to clients
- we need to respond to a regulatory demand in a more time efficient or cost optimal way
And if you're tasked with a Big Data delivery - make sure you understand what the strategic objective is, and if you don't find out or be part of defining the objective if it hasn't been set.
4. What data do you have today?
It seems obvious but many Big Data initiatives start from what new data can be gathered without any regard to what an organisation already has.
Quite often the data an organisation has might be in an inappropriate format (as in it's not clean) for value extraction or buried in a business silo that means that not everyone in the organisation knows what's available.
Start your Big Data initiative with an audit of what you have today and within this look for opportunities to mine this data for value towards your objective. Only when you've exhausted this look to expand your data set.
3. Where else can you get data that will be useful?
Again, start close to home and move progressively outwards. In some cases you might find that you have a data set that would benefit from mild enrichment and then think about how to augment your data into a form that you can use for your next level purpose.
An example here might be an app developer that sees usage log data, but doesn't know who their users are.
Partnerships can often be brokered with third party organisations in order to 'blend' data from multiple sources in order to be able to enrich it for mutual benefit - but careful here, you need to keep #1 below in mind if you do this.
2. How will you protect this data from unintended consequences?
I'm particularly thinking here about how do you secure the data so as to avoid getting hacked. In 2015, data is such a valuable commodity that any organisation that holds (or is perceived to hold) data of value is likely to be a honey-trap for hackers.
This is also an area where you need to be thinking whether you really need all the data you hold, or does the data make you a liability in some way?
For example, you might hold customer credit card details for their convenience; but do you really need to do this? If you're a utility company that is collecting invoice payments on a monthly basis from a customer credit card (or Bank account) then perhaps the convenience (to you, and to your customers) outweighs the risk, but if you're an online retailer of office supplies then is the usefulness of storing this data just exposing you to unnecessary risk?
1. How will you operate in a 'data-ethical' way?
Early adopters of Big Data & Analytics have been fortunate to have been operating in an environment where they can make their own rules. Sure, scandals have hit the news - but by an large the industry has self-regulated.
My own view on this is that consumer sentiment is likely to turn against any organisation who doesn't have an ethical framework by which they collect, manage, store, and use data.
An example of this might be a social media platform (such as the one where you found this article, perhaps?) which is free to use for the consumer but where the data is used in order to profile, segment, and ultimately to send finely tuned marketing messages to. That's OK - but how explicitly have you communicated this to your users? Have you given them the option to opt-out and 'pay' for the service another way if they don't want their data to be harvested and mined?
Regardless of whether you are the CTO of a major Investment Bank, a concerned pensioner, or a curious student - you are a consumer too! Here is an opportunity to apply the Golden Rule, my favourite being the example from Confucius:
"Never impose on others what you would not choose for yourself".