A brief backdrop
Finding business insights through data is nothing new. Organisations usually track various metrics on revenue, profitability and risk to track operational efficiencies, or help with sales and marketing, etc. The requirement engineering aspects in such cases generally revolve around establishing the key metrics and determining useful reporting formats to provide businesses with timely diagnoses and cues for possible actions.
But according to an International Data Corporation report , the digital data generated currently doubles in size every two years and will reach 44 trillion gigabytes by 2020. This vast amount data generated by the customer and business activities of the digital age outstrips prevailing data warehousing technologies. This has led organisations to employ technologies to collect and organise large data assets and make them accessible for leverage – popularly known as Big Data technologies.
Although Big Data technologies provide data leverage, the initiative remains half-baked as long as the business decisions continue to rely on intuitive measures, which are susceptible to biases and hidden motives, or can be relatively static in the face of changing conditions. This has led to the quickly emerging field of Data Science – an interdisciplinary field of scientific methods, processes and systems to extract knowledge or insights from structured or unstructured data. These insights provide a decision-making edge which is not available through traditional business intelligence techniques and reporting.
Opportunity and the skill divide
The application of data science can be seen everywhere from search engines, email spam filters, product recommenders and inventory optimization to medical diagnosis, stock market, fraud detection and crime prevention. Organisations with a head start in employing Machine Learning capabilities (a subset of Data Science field) are proving to be disruptive forces to well-established organisations.
Clearly, data science holds greater promise to organizations looking to transform in the digital age. In fact many organisations with initiatives in data science projects focus on getting data scientists on board, assuming their presence alone will enable this transformation. But according to a recent McKinsey study , the results of many ongoing initiatives have been lacklustre. In spite of accessible capabilities, investments in analytical tools and bringing in the skillset of data scientists, organisations continue to derive only a fraction of potential value from these initiatives.
The biggest hurdle in extracting this value has been associated with organisational challenges, and attracting the right talent. This is not limited to the technically oriented role of Data Scientist; one of the key areas of talent shortage is data savvy professionals who both understand the data science concepts and can bring in domain context and business knowledge.
That is why, according to another McKinsey survey, attracting and retaining business users with analytics skills has become one of the primary talent acquisition concerns for global executives .
Such a skillset can address the gap between the ventures of analytical teams and taking a practical view to address business objectives. Enter the Business Analyst.
Business Analysts already have a unique advantage in being able to partner with both the business and technical stakeholders. This remains true in data-centric ventures, where Business Analysts will increasingly need to partner with Data Scientists, Data Analysts and Business Stakeholders to achieve objectives. In addition, a Business Analyst can be a champion in propagating data-centred culture within the organisation.
Crossing the divide
A BA role in data science projects will continue to leverage the primary skill areas such as organisational and business knowledge, strong communication skills, and an analytical attitude to problem solving. At the same time the BA will need be familiar with data analysis and possess a data analytic mind-set. Specifically, the Business Analyst will need to build a broad knowledge base in five areas: Business, Statistics, Machine Learning, Programming, and Math. A good visualisation of these skills, in order of relative importance, for a business-facing role in a data environment has been published in the survey report  below.
The skill versus role figure above indicates that business expertise will continue to be the dominant skill for business analysts even in data projects. But these will also need to be complimented with a fundamental awareness of statistics, analytics and data mining concepts. To begin with, I will introduce some of the key terminologies and concepts below. Nevertheless Data Science is a vast field, and I would highly recommend interested readers to look at books and online resources (see References) to build a better perspective.
Broadly speaking the data-driven decision capability of an organisation can be viewed in three distinct phases of maturity, with an increasing value proposition:
- Descriptive Analytics: What Happened? This phase mainly deals with tracking historic trends and measurement against KPIs. The focus is on operational reporting for benchmarking decision making. Traditional reports and dashboards are a good example of this. For example, creating and tracking the performance of a marketing campaign among customer segments.
- Predictive Analytics: What could happen? This phase is focused on using historic data records and other data assets to produce models for future predictions. This is the place where Machine Learning and Big Data technologies can really come into play for sophisticated analysis and recommendations for decision making. For example, in a marketing sector, predicting customer value can help attain maximum conversion ratio, and help get best returns on marketing investment.
- Prescriptive Analytics: What should we do? This phase needs a sophisticated data-driven learning environment where intelligent systems can be utilised to apply optimisation and simulation to determine best action, and execute workflows without human intervention. For example, optimizing the mix of paid advertising across social media channels to maximize revenue.
What distinguishes these phases is the data-driven thinking in the organisational culture, the data collection maturity, and sophistication in application of predictive and prescriptive modeling techniques. These techniques come in the form of supervised learning, unsupervised learning and reinforced learning methods.
In practice, not a lot of organisations possess the in-house computing and technical capability and skilled resources to embark on such transformative initiatives themselves. Fortunately in the age of cloud services, accessing computing power for scale is no longer an issue, and a number of online machine learning services are also available to employ predictive or prescriptive models. However, even with these off-the-shelf services, the Business Analyst needs to be aware of the different learning methods, and the context of their application, in order to ensure development of an effective modeling solution.
About a year back, I worked on a pilot initiative to add personalization based on predictive analytics. In its essence the project was to drive ranking of results on customer queries through modeling of customer and commercial interests. We also relied on an online Machine Learning service to fill in gaps in our modeling capabilities. As we made early inroads into data science, the project took a proven risk-based, time-boxed approach towards delivery, with planned phases and goals including eliciting the solution requirements. As Business Analyst I captured the business and input data requirements, defined solution scope and helped evaluate the solution using pre-defined guidelines.
However, this approach only resulted in slipping deadlines and a solution that was ineffective at scale, and was ultimately discarded. One of the key issues was that with an end product delivery in mind we did not spend time to investigate data sufficiently and understand where the value actually lies. In addition, our approach of subjecting the project to a typical software development lifecycle methodology was not suitable to a data-driven initiative. When confronted with timelines, there is often a temptation to bypass the right methodology and jump into solutions. But this actually hindered our best intentions to solve the problem.
Data science projects are by their nature exploratory, as the embedded patterns within data are not foreseeable. There are times when even experienced analytical teams would not know if current data can meet business objectives in a desired way, making it difficult to trace data-related requirements to specific objectives. This does not mean that initial business objectives are not relevant; only that business objectives may need to be broken down while exploring data for a better view. Data science methods, when applied properly to discovery and analysis, may even lead to identifying bigger initiatives in new directions for the business stakeholders.
That is why a key lesson for our team was to work using an iterative approach conducive to these contrasts; one based on constant tuning and flexibility. I found this perspective acknowledged in a widely recognised methodology called the CRISP-DM (Cross-industry Standard Process for Data Mining) – an industry standard framework developed to conduct data mining projects by IBM in the late 90s. This methodology gives a proven framework which can be used in different aspects of analytical initiatives .
We will see more on this framework below, and how it gives Business Analysts a structured way of thinking about analytical problems, even as the underlying requirement management principles remain the same.
The CRISP-DM Framework
The CRISP DM framework talks about six major phases, namely: 1) Business Understanding, 2) Data Understanding, 3) Data Preparation, 4) Modeling, 5) Evaluation and 6) Deployment.
Understanding this framework can help put all analytical initiatives into perspective and bring out areas where creativity is important, as opposed to areas where analytical tools can be leveraged. In the section below I have described the six phases and how Business Analysts can participate in each effectively.
Phase 1: Business Understanding
Prior to considering a data science solution to a business problem, it is imperative to understand the business context and the goals the business has in mind when looking for solutions. Business problems seldom come with a clear formulation that can be directly associated with analytical approaches.
For example, the design of an analytical model for fraud detection can vary greatly according to whether it is in a banking domain, where costs of fraud can be high, versus healthcare, where deceptions in petty bills are quite common and rather than being victims, frauds are often initiated by the customers themselves. There cannot be a one size fits all solution to fraud detection and prevention. Breaking down a business problem needs a good understanding of the business, and the assumptions and constraints under which it operates. At this stage a Business Analyst’s creativity, common sense and communication skills play a big role in developing a unique problem formulation. The BA can actively engage stakeholders and analytical teams to:
- Think about business problem decomposition and use cases carefully – Breaking down objectives leads to structured discussion where priorities can be identified and the problem resolution planned
- Bring out understated perspectives from all key business stakeholders – Depending on the problem, different stakeholders will have to be engaged to determine relevant requirements and clarify questions
- Structure problems into sub-tasks to help build a solution based on various data mining techniques
All things considered, getting a clear understanding of business requirements is the necessary first step as this allows determining which data will be used to answer the question and, ultimately, directs the analytical approach needed.
Phase 2: Data Understanding
In the data understanding phase all available data assets are looked at and their value assessed in relation to the given problem. Historically collected data can be unrelated to the current business problem. But a BA can use his domain knowledge and investigative skills to study data sources and determine the most representative data in the context of the problem to be solved.
Techniques such as descriptive statistics and visualization can be applied to the data set to assess the content and quality of the data and to provide initial insights into the data. This process will be easier if Business Analysts are familiar with a data querying language such as SQL, and understand common descriptive attributes to measure statistical dispersions such as standard deviation and ranges.
In a number of cases, a deeper understanding of current data can also merit additional investments in the data sources to design the right analytical models. This is where a BA can partner with Project or Product Managers to estimate the costs and benefits of each data source and do a Build versus Buy analysis; to develop a business case and get sponsorship.
Eliciting correct data requirements can often go hand in hand with the business understanding phase, as a better understanding of business context can lead to areas be identified where more information can be leveraged. This new information can then also be fed back into breaking down the business problem and digging beneath its surface. Also, since the requirements are dependent on data, documenting the strengths and limitations of data is a critical activity in this phase.
Phase 3: Data Preparation
Even though a tremendous amount of data is available to organisations in the digital age, it is rarely rich enough to be used straightaway for the specific business problem at hand. In a lot of instances the captured or stored data isn’t reliable, is scattered among different business systems, or may not be accessible at all. The data preparation phase is a critical step to format the data in ways which make it usable in the later phases. This could take shape in the form of cleaning data sources for duplicate or out of date records, and aggregating data to create new records or values.
The data savvy BA’s understanding of data sources and the learning from the previous two phases can be useful at this stage. An important aspect of this activity relates to Feature Engineering - which involves identifying useful ‘features’ or data attributes that need to be leveraged for predicting outcomes in analytical models. For example, instead of looking at each customer purchase, it may prove more valuable to use an average value of customer purchase, or a percentage of transactions via credit card, etc. when profiling a customer for better engagement.
Feature engineering could be seen as a part of the requirement engineering activity itself, and in reality is a part of the Business Analyst’s focus from the very beginning i.e. the Business Understanding phase. This activity could be an iterative process as more value is discovered with better addition and aggregation of data, and could be revisited once the value becomes clearer in the modeling phase.
Phase 4: Modeling
The Modeling phase is where the collected data, identified features, and the broken down business objectives are then brought together with various data mining tasks. At this stage, various models using data mining tasks are selected and applied, and the parameters of the models are calibrated to achieve optimal solutions to the business objectives being addressed. There are several machine learning algorithms available for uncovering patterns and generating predictive models.
Generally, there is no single modelling task, and optimised solutions often need a combination of machine learning approaches. This activity in itself needs deep understanding of the underlying algorithms and their individual merits, and is therefore assigned to the role of data scientist.
However, there are two sides to data mining: building models to find patterns, and using the results of data mining. The latter is where a Business Analyst’s understanding of the various data mining approaches, including a general understanding of common learning algorithms, can prove to be of tremendous value.
For example, a predictive model built to maximise marketing outreach may suggest the best customers to provide with an offer based on maximised chance of response. However if our marketing budget is limited, we might be able to reach only a subset of customers. In such a situation, the astute business analyst may suggest a ranking model on top of predicted results to rank the customers in terms of potential revenue versus marketing costs, and to maximise the benefit and reduce the cost of reaching out. The wider understanding of business rules, objectives and constraints can result in creative evaluation by Business Analysts to form an intelligent approach that yields the best business value.
Phase 5: Evaluation
The evaluation phase is the verification stage of the process. Solution evaluation is already a key BA competency and in context of our framework, there are number of ways where the BA can be involved, such as : a) Evaluate results of data mining activity with stakeholders, b) Validate against meeting business objectives, c) Review the process and look at ways of improving in future iterations. Communication skills are a key in this phase to ensuring that business stakeholders understand the results of the mining and modeling activity, and a BA with a familiarity with data visualisation is well positioned to conduct this activity.
Evaluating data mining results is also where knowledge of the application of statistics comes into play. Whereas traditional requirements analysis always looks for accuracy in meeting results, the results of modeling assignments need to be looked at probabilistically. A good example is again the fraud detection domain, where an analytical model could predict frauds with 90% accuracy. However this is not nearly as promising as it sounds. For example, if in real world cases only 10% customers are likely to be involved in a fraudulent transaction, any model with a prediction accuracy of 90% is just confirming an existing trend and not providing any predictive advantage! That is why knowledge of statistics can help avoid a lot of pitfalls.
Business analysts can add even further value though:
▪ Developing a comprehensive evaluation framework to measure the performance of the deployed model in production
▪ Obtain ‘sign off’ by various business stakeholders once they are satisfied with the quality of model’s decisions
▪ Evaluation could result in additional findings, and can provide useful information for new business directions and can be documented for later.
Phase 6: Deployment
In the deployment phase, the results of data mining techniques are put out to the end user. This involves deploying systems that can automatically run the models and access data through ‘pipelines’ established by IT processes. These efforts need the establishing of background IT processes, and handing over the solution to the IT team - an area which the Business Analyst can actively support through developing resources and facilitating understanding of the solution.
Often these solutions are rolled out with A/B testing, which is a way to roll variations of a solution to a controlled audience and judge its effectiveness in production. The Business Analysts can develop various hypotheses with stakeholders and later test which variant produces the best results, such as highest customer adoption or maximum profit margin, and capture future steps for improvement.
Additionally, the deployment of any new solution, especially an analytical initiative touching multiple business areas, needs to go through a proper change management process. An experienced Business Analyst can participate in this step by driving adoption of new analytical tools and processes. If the initiative has potential to reach out to wider business areas in the future, the Business Analyst can be involved in capturing a change strategy with best practices and associated risks.
Closing Words on CRISP-DM
CRISP DM is only a guiding methodology, and its effectiveness is only possible as long as all participants pursue data mining as a collaborative and iterative approach. Creating a model is not the end of the project. Once a model is rolled out in production, the results often provide leverage to move towards a better model. A curious Business Analyst can keep asking relevant questions regarding the model’s performance in production and share ideas for improvements.
Further perspectives to analytical requirements
The content of most of the article focuses on the need, and what traditional Business Analysts can bring to a data science project. Since the participation of Business Analysts is relatively new to this field, I have mentioned below three other perspectives that BAs can bring when starting with requirements:
- Looking at existing solutions – One of the key philosophies of engineering is to try to see if a new problem can be broken down and resolved with existing solutions, and analytical engineering is no different. Successful implementations in other functions within or outside the organisation can provide great sources of learning, and some due diligence by business analysts can help in generating successful hypotheses. Furthermore, if the application of analytical solutions is widespread, it is better to focus on investigating only a subset of solutions to start with.
- Customer Experience – Customer experience can be understated in a data-focussed environment. However the importance of usability must never be ignored, as this could prove detrimental to customer adoption of the solution. The Business Analyst needs to ensure that requirements cover user interface aspects, and utilise standard techniques like prototyping or simulation tools. Additionally an intrusive display of personalised insights could make customers nervous about their privacy being compromised.
- Data Governance and Regulatory Considerations – Data science provides a powerful means to gain customer insights, and utilises customer and business data for personalisation. However, this must not become a way of compromising customer privacy and break the implicit trust the customer develops when interacting with a brand. This may not only prove detrimental to the brand, but can also lead to legal scrutiny. A business analyst must ensure that data requirements are not revealing sensitive information about the customer or the organisation itself. Mandatory data compliance regulations, like GDPR rules in UK, provide a good reference for taking a view of these considerations.
In summary, the capabilities of data science provide tremendous new opportunities to businesses and are playing an increasing role as a strategic asset. Initiatives in this area require an understanding of the business and the unique data assets available in the context of the business strategy, as described in the methodology above. This methodology involves coordinating complex technical activities, such as creating an assembly of predictive modeling techniques over business data where the role of a Business Analyst can be key. An experienced Business Analyst with data skills will be in a position to ask the right questions in such initiatives and can set the organisation up for success in the data economy.
-  IDC study summary: uk.emc.com/leadership/digital-universe/2014iview/executive-summary.htm
-  McKinsey Study : mckinsey.com/~/media/McKinsey/Business%20Functions/McKinsey%20Analytics/Our%20Insights/The%20age%20of%20analytics%20Competing%20in%20a%20data%20driven%20world/MGI-The-Age-of-Analytics-Full-report.ashx
-  McKinsey Survey Report: mckinsey.com/business-functions/digital-mckinsey/our-insights/the-need-to-lead-in-data-and-analytics
-  Analyzing the Analyzers - Harlan Harris, Data Community DC
-  kdnuggets.com/2014/10/crisp-dm-top-methodology-analytics-data-mining-data-science-projects.html General reading: Data Science for Business – Foster Provost & Tom Fawcett