Skip to content
Browse
BABOK Guide
BABOK Guide
10. Techniques
Introduction 10.1 Acceptance and Evaluation Criteria 10.2 Backlog Management 10.3 Balanced Scorecard 10.4 Benchmarking and Market Analysis 10.5 Brainstorming 10.6 Business Capability Analysis 10.7 Business Cases 10.8 Business Model Canvas 10.9 Business Rules Analysis 10.10 Collaborative Games 10.11 Concept Modelling 10.12 Data Dictionary 10.13 Data Flow Diagrams 10.14 Data Mining 10.15 Data Modelling 10.16 Decision Analysis 10.17 Decision Modelling 10.18 Document Analysis 10.19 Estimation 10.20 Financial Analysis 10.21 Focus Groups 10.22 Functional Decomposition 10.23 Glossary 10.24 Interface Analysis 10.25 Interviews 10.26 Item Tracking 10.27 Lessons Learned 10.28 Metrics and Key Performance Indicators (KPIs) 10.29 Mind Mapping 10.30 Non-Functional Requirements Analysis 10.31 Observation 10.32 Organizational Modelling 10.33 Prioritization 10.34 Process Analysis 10.35 Process Modelling 10.36 Prototyping 10.37 Reviews 10.38 Risk Analysis and Management 10.39 Roles and Permissions Matrix 10.40 Root Cause Analysis 10.41 Scope Modelling 10.42 Sequence Diagrams 10.43 Stakeholder List, Map, or Personas 10.44 State Modelling 10.45 Survey or Questionnaire 10.46 SWOT Analysis 10.47 Use Cases and Scenarios 10.48 User Stories 10.49 Vendor Assessment 10.50 Workshops

2.3 Tasks

2.3.1 Develop Data Analysis Plan

Guide to Business Data Analytics

The data analysis plan may be formal or informal. The objective is to ensure sufficient time to plan the data analysis activities required for the initiative.

When developing the data analysis plan, the analyst determines:

  • which mathematical or statistical techniques the data scientist plans to use,
  • which statistical and algorithmic models are expected for use (such as regression, logistics regression, decision trees/random forest, support vector machines, and neural nets),
  • which data sources will be used and how data will be linked or joined, and
  • how data will be preprocessed and cleaned.
The business analysis professional provides insights into the plan or may draft the initial plan for review by the data scientist. It is the data scientist who possesses deep technical expertise to decide how the data analysis will be conducted. Analysis skills are applied by ensuring sufficient information about the business domain is provided to the data scientist so an effective approach to data analysis is developed. Analysts understand the mathematical techniques and algorithmic models in sufficient detail to explain the analysis approach to business stakeholders: why a particular model may be chosen for the given research question.

If the data analysis plan is formally documented, analysts use templates to ensure consistency and guide planning decisions. Analysts use metrics and key performance indicators to assist the data scientist in determining if the outcomes from data analysis are producing the results required to address the business need. Organizational knowledge helps business analysis professionals provide the context for the data scientist's work.

Planning Business Data Analytics Approach at Various Stages

Analysts may not require a rigorous understanding of the various algorithmic models used in predictive analytics exercises, but it is helpful to understand these at a high-level. A foundational understanding of these models help analysts describe what models are being considered, and why, to stakeholders.

A limited sample of different models is presented below with some of their advantages and disadvantages.

Model Name Description Advantages Disadvantages
Ordinary Least Squares Regression  This model uses linear regression. A linear relationship can be established between predictor variables and the independent variable by minimizing the squared errors between observed values and the predictions.
  • Used extensively
  • Easy to understand and explain
  • May perform poorly due to simple construct
ARIMA Method (Auto-Regressive Integrated Moving Average) Primarily used for time-series data analysis. For example, stock movements based on moving averages and data trends.
  • Can handle time-series data with trends
  • Slowly getting phased out by more accurate algorithms
Decision Trees Variables are iteratively chosen that can separate the predictions into buckets with the maximum number of observations.
  • Easy to understand and visualize
  • Decision rules can be extracted
  • May have generalization errors (may perform poorly if the future data is significantly different from the training data)
Random Forest Takes many shallow decision trees and combines the result through voting.
  • Works in most cases with high accuracy
  • Complex to explain the result
  • Too general a purpose
Logistic Regression Maximizes the probability difference between different classes.
  • Used for primarily binary classification
  • Can have a high bias towards model assumptions
  • Requires preprocessing and normalization of data
KNN (K-Nearest Neighbors) Classifies new data based on its distance to other nearest data points.
  • General-purpose algorithm
  • Too many modelling assumptions
  • Fails in higher dimensions
Naïve Bayes (NB) Based on computing conditional probabilities of data and predicting the outcome.
  • Works well with text processing
  • There are better algorithms that outperform NB
  • NB gives conditional independence assumption that will affect the posterior probability estimate
SVM (Support Vector Machine) Maximizes the margin between two disparate classes of data.
  • Good performance for image, video use cases
  • Requires specific hyperparameter tuning expertise (algorithms)
Perceptron Fewer model assumptions and a building block for neural nets and deep learning.
  • Easy to understand
  • Chained together in a neural network (NN) to produce accurate predictions
  • Extremely complicated when used in neural networks
  • Low performance outside a neural network