Big Data: Needles and Haystacks

A CogWorld Custom Feature

By Peter Fingar


We’ve been loaded (and overloaded) with information about Big Data,Big Data and more recently, Cognitive Computing. I’ve researched and written a lot about these subjects. But recently, I reached out to a CEO who is “in the business of Big Data” to update my thinking. I talked at length with  Boyd Davis, CEO at Kogentix Inc., providers of  cognitive computing and machine learning to build systems that use Big Data to learn about your business, make timely recommendations, and become increasingly more accurate with every iteration. Here’s my recap of that enlightening conversation.

Boyd, what’s really new about A.I. (which has been around for 50+ years) and this new buzz about “Big Data?”

Some form of AI has been around for decades, and much of the vision for AI is still years away. However, the emergence of commodity hardware, open source software, and cloud computing has created a unique opportunity for organizations to leverage AI for business value. We are at an inflection point, and now is the time to act.

The goal is to integrate cognitive computing and machine learning to build systems that learn about your business, make timely recommendations to improve KPIs, and become increasingly more accurate with every iteration. The key is to integrate Artificial Intelligence with Big Data. Intelligence, artificial or otherwise, starts with data. The foundation of the needed solutions is an integrated data hub, bringing together data of disparate types – structured and unstructured, current and retrospective, real-time and batch. To meet enterprise requirements, the data must be secure, persistent, and well governed.

So, is this like IBM’s Watson that beat two reigning champions at Jeopardy, going through massive amounts of data to produce the answer (the needle in the haystack) in 3 seconds?

Imagine hundreds of thousands haystacks being thrown at you, some at rapidly increasing speed, others lobbed at a more casual pace. We might think we need to search for the needles, but what if there is more to mine from those haystacks? What if we could understand the value of the haystacks themselves? And most importantly, what if the needles themselves aren’t pertinent to us? Your organization needs to not only mine big data, but re-purpose and find new ways of using your data to boost your enterprise. The goal is to build systems that work with your existing technology to unlock potential goldmines and reveal actionable insights.

That sounds very interesting, but very futuristic. When do we need to act?

AI is real today – the intersection of cloud computing, big data, mature algorithms and the broader convergence of the physical and digital world make now the right time to act. This convergence is leading to an unprecedented change in the competitive environment. For example, in connected vehicles, we see vehicle manufacturers competing with insurers competing with telecommunications service providers, all competing with a new breed of Silicon Valley startups. Success will depend on a combination of innovation, speed, and scale. Larger, more established organizations can leverage their scale, brand, and route to market advantage to win, but only if they take risks, fail fast, and innovate at the pace the market demands. Organizations are already using AI to build new data-driven services, both for internal and external consumption. The game isn’t about the most sophisticated or exotic application of deep learning or neural nets – it’s the practical delivery of real solutions that derive insights and value from physical and digital data.

So, does a business have to reach out to one of the big-name vendors to tap this new world?

Technology strategy matters, and open systems are critical. As the competitive landscape shifts to AI-driven solutions, information technology shifts from supporting the business to being the business. There is a temptation for organizations to look to proprietary or completely outsourced solutions to manage complexity – in the world of AI, this approach virtually guarantees mediocrity. Utilizing open source software, commodity hardware (or cloud IaaS), and sticking to modular, open interfaces is an approach that minimizes cost, maximizes scalability, and preserves optionality for future innovations. Of course companies can and should rely on vendors and partners to fill critical gaps, but no single element of the solution should be irreplaceable. Practical solutions don’t require expensive, complex, and completely specialized hardware, and any organization that relies on a small number of proprietary algorithms for sustained competitive advantage will be disappointed. Many large organizations, as well as most of the Silicon Valley stalwarts, bank on open source tools like Hadoop, Spark, and Kafka. The traditional leaders in IT solutions are very capable companies – in a head to head matchup, they often have the talent to beat anybody. However, over the long haul, they can’t beat everybody – that is the true value of open source.

Sounds very interesting. But is anyone actually doing this today?

Who is already using open-source solutions to harvest haystacks? Well, most companies don't want to talk in depth as that would give away the "secrets" in their secret sauce for competitive advantage. But do a Web search for Hadoop and use cases and you will likely uncover British Telecom, Cisco, Barclays, Mastercard, Neilsen, Marks and Spencer, Siemens, Samsung, Equifax, Schipol airport, Walt Disney company, and Western Union, among others.

Is your organization ready to be added to the list?

Kogentix, Boyd Davis, Big Data, Peter Fingar