Skip to content
Browse
BABOK Guide
BABOK Guide
10. Techniques
Introduction 10.1 Acceptance and Evaluation Criteria 10.2 Backlog Management 10.3 Balanced Scorecard 10.4 Benchmarking and Market Analysis 10.5 Brainstorming 10.6 Business Capability Analysis 10.7 Business Cases 10.8 Business Model Canvas 10.9 Business Rules Analysis 10.10 Collaborative Games 10.11 Concept Modelling 10.12 Data Dictionary 10.13 Data Flow Diagrams 10.14 Data Mining 10.15 Data Modelling 10.16 Decision Analysis 10.17 Decision Modelling 10.18 Document Analysis 10.19 Estimation 10.20 Financial Analysis 10.21 Focus Groups 10.22 Functional Decomposition 10.23 Glossary 10.24 Interface Analysis 10.25 Interviews 10.26 Item Tracking 10.27 Lessons Learned 10.28 Metrics and Key Performance Indicators (KPIs) 10.29 Mind Mapping 10.30 Non-Functional Requirements Analysis 10.31 Observation 10.32 Organizational Modelling 10.33 Prioritization 10.34 Process Analysis 10.35 Process Modelling 10.36 Prototyping 10.37 Reviews 10.38 Risk Analysis and Management 10.39 Roles and Permissions Matrix 10.40 Root Cause Analysis 10.41 Scope Modelling 10.42 Sequence Diagrams 10.43 Stakeholder List, Map, or Personas 10.44 State Modelling 10.45 Survey or Questionnaire 10.46 SWOT Analysis 10.47 Use Cases and Scenarios 10.48 User Stories 10.49 Vendor Assessment 10.50 Workshops

2.3 Tasks

2.3.2 Prepare Data

Guide to Business Data Analytics

Preparing data involves obtaining access to the planned data sources and establishing the relationships and linkages between sources in order to create a coherent dataset. Data scientists identify how different datasets are related, consider whether the data can be linked in theory, and decide whether it can happen in practice.

Preparing data includes understanding the relationships that exist between data. For example, do two tables have a 0 to 1, 1 to 1, or 1 to many relationships? Preparing data also involves establishing the joins or linkages between sources, normalizing data to reduce data redundancy, standardization, scaling, and converting data. Sometimes the data collected is uninterpretable and must be transformed to lend value to the analytics effort. Data cleansing is a process by which data is transformed to correct or remove bad data.

Data preprocessing, scaling, normalization, imputation, and cleansing are some of the common terminologies used in analytics. 
 
Data scientists identify the rules for consolidating data, perform the consolidation, and then validate the results to see if the business rules are being adhered to. Any mechanisms data scientists build to automate the data acquisition or preparation processes can be repurposed for use by other analytics teams.

 Data scientists leverage a host of techniques when preparing data. Weighting is one technique applied to data to correct bias. Sample weights can be applied to address the probability of unequal samples and survey weights applied to address bias in surveys. Data scientists use strong technical skills and knowledge of statistics when preparing data for use in an analytics initiative.

When preparing data, analysts provide the business context for data that may or may not differ from the statistical interpretation. For example, if there are missing data elements, a data scientist may choose to attribute those elements with mean or median value to retain the distribution of a variable intact. While this may be a sound approach from a statistical point of view, it may conflict with some business rules which the analyst may be able to highlight.

Similarly, if there is a portion of the data with missing information, a data scientist may choose to ignore the observations and continue the analysis because it may be statistically insignificant. But from a business standpoint further investigation may be required to determine the course of analysis. These scenarios are best handled by analysts with facilitation, collaboration, and elicitation skills who can supplement the information by stakeholder collaboration and investigation of the recording process.