大数据时代,到底相信哪一个?是数据科学还是老板的第六感
摘要:越来越多的公司引入了服务于大数据的技术,但却因为缺乏相应的技能以及过于依赖于直觉而导致大数据进程受到了阻碍。
即使企业有开发大数据的能力(其实大部分并没有这个能力),他们还总是喜欢使用未经测试的想当然的想法,而非利用数据科学来做决策。视觉分析公司Atheon Analytics的总经理Guy Cuthbert认为,即使是在那些使用数据的公司中,许多仍然会有选择性的支持那些已经被认可的观点,而非真正数据验证的事实,而利用数据本身,只是靠想法制定决策的一种伪装罢了。
最近Actian公司在伦敦参加的一个圆桌会议上,Cuthbert谈到,数据科学涉及到提出假设和检验假设的方法,但他所遇到的大多数商品零售企业都绕过了这两个方法。
他说:“我能一口气说出许多零售商的可怕故事,这些故事都有一个共同点,即这些零售商相信客户存在一个特定的行为方式,因为他们在开张的那一天就有人这么告诉他们了。他们从了没有真正质疑过这些说法而去探究真实的情况——品类的真实情况、国家中特定地区的情况,或尺码的情况。数不清的案例表明,人们轻信别人告诉他们的东西,但却不去自己探究真正的事实是什么样的。
“我们做了很多工作,想要把各种组织从‘想法驱动’改变为‘数据驱动’,让它们开始采用事实和假设的科学方法,而不是‘想当然’。”Cuthbert说。
Cuthbert说,自己曾经努力帮助过许多公司,以让它们理解它们的产品的绩效表现,但这些公司都无法被认为具有“分析能力”。按照他的推测,全球的商业企业中大概只有百分之一甚至是只有千分之一是真正的数据驱动的。
Cuthbert说:“我见过大量的依靠直觉运营的企业并不了解原来数据还可以产生决策。我也听过太多高管们滔滔不绝地喷出了各种各样的其实没有什么‘营养’的想法。因此,如果数据绘制者或者数据科学家去做一些事情,去教商人们他们的组织中所蕴含的那些令人着迷数据及背后的事实,他们就会开始自觉地去认识它们了。”
然而,让业务揭穿企业中的一些(不真实)的神话并开始接受现实,接受以数据为基础的结论,并非易事。
“在我们向人们展示我们的观点时,常遇到一些充满火药味的‘回击’,他们当面指责我们,说我们所说的是彻头彻尾的缪论。”Cuthbert说。
另外一个问题是即使公司试图科学地使用数据,他们关注的点也过于聚焦而狭隘。
“大多数与我们合作的公司关注已知的东西,他们总是着眼于诸如‘我们希望明年的收入增加6%,让我们确保能搞定6%’之类的东西。”Cuthbert说。
“他们没有去寻找增长30%或者120%的机会。我们很多工作只是浮于表面,或者展示那些他们其实自己也没有弄懂的一套东西。”
不幸的是,尽管技术进步让我们能够非常容易地处理数十亿条数据,但分析本身,却必须依靠与人力完成。
“机器缺乏灵感,这是造成机器学习以及其他计算机技术与人类思维鸿沟巨大的现时原因。”Cuthbert说,“灵感来自于人类懂得如何从数据中找出隐含的信息。”
大数据分析公司Actian(之前叫Ingres)的CEO Steve Shine说,一直到现在,为了满足大数据所需要的开发技能,他们拥有一组特定的高预算的客户,这些客户需要他们的这些技能完成项目。
“在过去的三四年中,如果你在任何的一个地方接触过hadoop项目,你就会意识到,能够写一个高效的MapReduce程序并使hadoop高效运行是一个相当牛逼的技能。”Shine说,“这种技能被技术社区热切地保护了起来,却并没有扩散,但最近的12个月内发生了戏剧性的变化。大家都接受了一个事情,即需要让利用新的技术变得更加容易。”
“我们把人们带回上世纪80年代,那时如果你能有现在通过代码来获取所有的数据和发现新见解的能力,你将会变得有多么的多产。”
但现在新的问题是,大数据技术在迅猛扩散,各种版本的Hadoop、NoSQL,以及提供和整合数据的新方法层出不穷。
“没有一个CIO因为把这些东西能够‘粘合’在一起而获得奖励。企业并不在乎你多快多好的把这些粘合在一起。企业也不在乎你到底能多快的帮助他找到客户流失数据。”Shine说。
然而,现在的技术允许企业从他们通常的生产数据中发现意想不到的商业潜力。
Shine援引了他在福利和工资服务行业中的一个客户的情况,对于工资变化、毕业生和参与者的信息,这个客户自己处理数据后得到的信息,甚至要比政府提供的信息更能准确描绘宏观经济。
他说:“那些看起来传统的组织,实际上已经有一个业务,即他们意识到如果它们以数据为中心,并且利用数据,以及尽可能地结合其它他们能获得的数据,他们就能够基本做出比现在他们能做出来的东西更具根本价值的东西”,他说。
英语原文:
Even if businesses had the developer skills to exploit big data — and most haven’t — they invariably prefer to make decisions using untested opinion rather than data science.
And where firms are using data, many are choosing it selectively to back up currently held views, which is just opinion-based decision-making in another guise, according to Guy Cuthbert, managing director at visual analytics firm Atheon Analytics.
Data science involves producing and testing hypotheses, an approach shunned by most of the businesses he encounters in the retail goods sector, Cuthbert told a recent Actian big-data round-table event in London.
“I could reel off a horrifying list of stories of retailers that believe that customers behave in a particular way because that’s what they were told when they joined the business,” he said.
“They’ve never really questioned whether that’s what happens in that category, or in that particular part of the country, or with that dress size. There are hundreds of examples of people doing what they were told, rather than thinking afresh about what else is going on.
“A lot of our work is trying to move organisations from this opinion-operated world into a data-driven world and start to use facts, hypotheses — science — as a method.”
Of the companies Cuthbert has tried to help understand how their products perform, very few could be called analytical. By his reckoning, probably only the top one percent or even the top 0.1 percent of the world’s businesses are truly data-driven.
“I see a huge number of opinion-operated businesses that don’t get why decisions could be made on data. I’ve listened to executives spout all sorts of opinions with no fabric or no substance behind them at all,” Cuthbert said.
“So if data animators and data scientists can do anything, it’s to try and teach the rest of our peers in businesses that there are a fascinating number of facts located in their organisation if they just choose to look at them.”
However, getting businesses to debunk corporate myths and accept factual, data-based conclusions is not easy.
“We have some really hostile presentations where we are showing people stuff that they’ll flatly dispute and tell us we’re completely wrong,” Cuthbert said.
Another problem is companies adopting too narrow a focus, even if they are trying to use data scientifically.
“Most businesses we work with focus on known knowns. They look at, ‘We expect to operate with an increase in revenues of six percent in the next year, let’s make sure we make six percent’,” Cuthbert said.
“They don’t go looking for the 30 percent or the 120 percent opportunity. A lot of our work is surfacing that kind of thing and showing patterns that they simply don’t know.”
Unfortunately, despite advances that make it easier to process billions of rows of data, analysis is still an area where human skills remain essential.
“The gap with machine learning and all the rest of the computer sciences at the moment is that as yet there is no machine inspiration,” Cuthbert said.
“The inspiration comes from humans understanding how to interpret signals in the data.”
Steve Shine, CEO of big-data analytics company Actian, formerly Ingres, said that until relatively recently, the development skills required for big data have made such projects the preserve of a certain set of customers with big budgets.
“If you’ve been anywhere close to a Hadoop project over the past three or four years, you’ll realise that it’s a fairly rarefied set of skills that can write an efficient MapReduce program to get anything efficient out of Hadoop,” Shine said.
“It was passionately protected by the community for a long period of time. That has changed radically in the past 12 months. There’s a broad acceptance that things need to get much easier in terms of how you use these new technologies.
“We took people back to the 1980s in terms of how productive you were in being able to generate code to get at all that data and discover insights.”
An issue now is the proliferation of big-data technologies, with various versions of Hadoop, NoSQL, and new ways of preparing and integrating the data.
“No CIO gets rewarded for gluing all that together. The business doesn’t care how well you glued all that together or how quickly. It says, ‘How fast can you help me get to my customer churn data’,” he said.
However, the technology now does permit businesses to discover unexpected commercial potential in the data it routinely produces.
Shine cited a customer in the benefits and payroll business, which has understood that the data it processes on salary changes and leavers and joiners is information that probably provides a more accurate view of macro economics than their own government possesses.
“There are organisations that traditionally look like they have one business that are realising that if they are data centric and can take that data — possibly combining it with other data that’s out there — they can deliver insights that are fundamentally more intrinsically valuable than what they have today,” he said.
转载请注明:数据分析 » 大数据时代,到底相信哪一个?是数据科学还是老板的第六感