Skip to content
Browse
BABOK Guide
BABOK Guide
10. Techniques
Introduction 10.1 Acceptance and Evaluation Criteria 10.2 Backlog Management 10.3 Balanced Scorecard 10.4 Benchmarking and Market Analysis 10.5 Brainstorming 10.6 Business Capability Analysis 10.7 Business Cases 10.8 Business Model Canvas 10.9 Business Rules Analysis 10.10 Collaborative Games 10.11 Concept Modelling 10.12 Data Dictionary 10.13 Data Flow Diagrams 10.14 Data Mining 10.15 Data Modelling 10.16 Decision Analysis 10.17 Decision Modelling 10.18 Document Analysis 10.19 Estimation 10.20 Financial Analysis 10.21 Focus Groups 10.22 Functional Decomposition 10.23 Glossary 10.24 Interface Analysis 10.25 Interviews 10.26 Item Tracking 10.27 Lessons Learned 10.28 Metrics and Key Performance Indicators (KPIs) 10.29 Mind Mapping 10.30 Non-Functional Requirements Analysis 10.31 Observation 10.32 Organizational Modelling 10.33 Prioritization 10.34 Process Analysis 10.35 Process Modelling 10.36 Prototyping 10.37 Reviews 10.38 Risk Analysis and Management 10.39 Roles and Permissions Matrix 10.40 Root Cause Analysis 10.41 Scope Modelling 10.42 Sequence Diagrams 10.43 Stakeholder List, Map, or Personas 10.44 State Modelling 10.45 Survey or Questionnaire 10.46 SWOT Analysis 10.47 Use Cases and Scenarios 10.48 User Stories 10.49 Vendor Assessment 10.50 Workshops

2.2 Tasks

2.2.2 Determine the Data Sets

Guide to Business Data Analytics

Determining data sets involves performing a review of the data expected from the data sources and determining specifics such as data types, data dimensions, sample size, and relationships between different data elements. It involves deciding which whole, and which partial, datasets need to be collected. For example, determining whether to use an entire spreadsheet versus specific rows within it. When the required data is not available, determining data sets also involves identifying data gaps. Data gaps occur when data doesn't exist or is missing due to errors such as a failure in the data collection process.

Analysts collate and assess data by establishing relationships between different data elements and identifying data linkages between data from various sources. They may use data discovery tools or database querying to assess data availability.

A five Vs assessment (volume, velocity, variety, veracity, value) helps to determine which datasets to consider:

  • Volume: is determined by the amount of data being produced and the size of the data sets needing to be processed.
  • Velocity: is determined by the speed at which data is generated and the frequency by which the data needs to be collected and processed.
  • Variety: is determined by the variety of data sources, formats, and types needing to be processed.
  • Veracity: refers to the trustworthiness of the data and that which presents uncertainties and inconsistencies in the data.
  • Value: refers to the necessity of driving any analytics exercise from real, valuable business goals.
Non-functional requirements and existing service level agreements may constrain the availability of data. For example, privacy or security considerations may deem a dataset unfit for use.

Analysts possess a firm understanding of the lexicon used by the different business units and are capable of drawing comparisons and relationships between different data sets having the same meaning. Analysts also possess strong visualization skills and contribute to creating conceptual architectural diagrams that depict the data sources, data flows, and frequency of the data feeds. Such models are essential when facilitating discussions about data sourcing with stakeholders and facilitating approvals.

Analysts support data scientists by analyzing the cost versus benefits of different data sets. It is ideal for the analytics team to collect their own data from scratch to reduce any external biases during data collection, but frequently there are not enough resources to do so. Analysts advise on the advantages and disadvantages of using different data sets from a cost, value, timing, risk, and feasibility perspective. This is especially important when the data needed for analytics must be acquired from an external third party. Certain research questions may need to be dropped when it is determined too expensive to obtain the data required to answer it.

When determining data sets, analysts use a variety of techniques to help them work with and understand the data before building their analytical models. Data profiling is used to assess the content, structure, and quality of data. Data sampling is used when breaking a large source of data into a smaller, more manageable set of data. Sampling helps an analyst reduce the amount of data they have to work with as it provides a means to use a representative subset of the larger population. Skills such as creative thinking and conceptual thinking are useful when formulating ideas about which data to use. Business acumen helps the analyst determine which data sets may be best to use based on the current business situation.