Delivering Big Data Advanced Analytics Clusters - Defining Your Requirements (PART 1)

Delivering Big Data Advanced Analytics Clusters - Defining Your Requirements (PART 1)

Day Darran Cooke, Services Delivery Manager, EMEA, Cloudera

Delivering Big Data Advanced Analytics - Clusters - Defining Your Requirements

Defining Your Requirements - Part 1

In this series, I’ll be sharing some of my experience and learning to pass on some useful advice and guidance I’ve picked up while helping customers successfully deliver Apache Hadoop clusters into their organisations.

The vast majority of this guidance will seem like common sense, but even so there are some basic principles of good product engineering, project management and service delivery that can, and often do, get lost.

Project budgets are usually the primary driver in designing any IT solution, but it's a mindset the industry is steadily trying to move away from. Being able to focus on the value generating use case, and the business benefits derived, is where the attention and evaluation should be made. Certainly, in the advanced analytics world, infrastructure is seen as a commodity with the set-up and running costs understood but measured against the benefits of 'Time to Value' and 'Speed to Market.' Early, and then regular validation points in your project will help to ensure you remain on course to deliver your objectives, or alternatively, will highlight a need to pivot and reassess your strategy.

It’s also important to remember that a successful advanced analytics solution may expand rapidly. This may concern new entrants in this field because the infrastructure costs will increase linearly as your data increases. However, the more data you have the better informed your business decisions will be, and the better able you are, to start answering the questions that were previously impossible to answer without that new data. In a well managed project, as data grows linearly, business value will grow exponentially.

Restricting your data set, or limiting it to known structured data, will limit your opportunity in discovery of new and valuable use cases. Advanced analytics clusters are commonly based on the principle of collecting and storing as much data as possible. This data can be structured, unstructured, new and old, and then piecing together this data to unlock exciting and previously unavailable insights, opening up valuable use cases that give you incredible market advantage against your competitors.

It’s also important to remain within proven design blueprints too, where possible. It’s all too common to see huge amounts of money spent trying to save money on infrastructure, software, or trying to mix unproven technology into the stack. This often quickly overtakes the original cost of simply upgrading, expanding or buying proven off the shelf software. A second and important disadvantage to understand is the long-term supportability and potential growth of your solution, when you create any bespoke architecture. Maintaining a technology/product compatibility roadmap will be a massive overhead, creating serious operational risks in tandem.

Many customers today are keen to move away from the traditional ‘waterfall’ project delivery methodology and try to adopt a quicker, more agile delivery approach, with an emphasis on delivering faster and cheaper. It’s critical to consider the importance of this shift in organisational capability; it’s broader than technological change. Cultural change, minimising employee resistance to new ways of thinking and working, alongside building new robust and repeatable processes to maximise the effectiveness of this change is critical. I've seen both succeed and often one methodology will merge into another as teams adapt their process to fit their internal governance, being flexible enough to continuously re-evaluate and re-adjust when needed. “Agile with a small ‘a’” is sometimes used and I like this approach. Agile shouldn't be rigid, cherry picking relevant and appropriate elements of Agile, Waterfall, ITIL (Information Technology Infrastructure Library) and any other framework you wish, to build a model that best fits your organisation, is in my opinion the best way to successfully and continuously improve and evolve.

Whichever path you choose, it is always important to define, agree and prioritise your requirements and use cases for this solution. It is common to see both sides of the scale when considering these requirements, the secret is finding the right balance for your organisation.

Large enterprise organisations will still, in nearly all use cases, develop large complex solutions because of their security restrictions, governance processes, and larger user base. Smaller organisations tend to focus on smaller defined and focused use cases, often with reduced emphasis on security, less governance and a very small user base.

Budgets matter in all cases, and whether that be multi-million pound budgets or limited inexpensive startups, demonstrating value early, along any delivery path is critical and is often the trigger for further investment.

In the second part of my blog, I will provide specific key focus areas that will help you and your organisation prepare and ensure you build strong foundations on which to start your big data journey.

Read: Defining Your Requirements - Part 2