Skip to content
Browse
BABOK Guide
BABOK Guide
10. Techniques
Introduction 10.1 Acceptance and Evaluation Criteria 10.2 Backlog Management 10.3 Balanced Scorecard 10.4 Benchmarking and Market Analysis 10.5 Brainstorming 10.6 Business Capability Analysis 10.7 Business Cases 10.8 Business Model Canvas 10.9 Business Rules Analysis 10.10 Collaborative Games 10.11 Concept Modelling 10.12 Data Dictionary 10.13 Data Flow Diagrams 10.14 Data Mining 10.15 Data Modelling 10.16 Decision Analysis 10.17 Decision Modelling 10.18 Document Analysis 10.19 Estimation 10.20 Financial Analysis 10.21 Focus Groups 10.22 Functional Decomposition 10.23 Glossary 10.24 Interface Analysis 10.25 Interviews 10.26 Item Tracking 10.27 Lessons Learned 10.28 Metrics and Key Performance Indicators (KPIs) 10.29 Mind Mapping 10.30 Non-Functional Requirements Analysis 10.31 Observation 10.32 Organizational Modelling 10.33 Prioritization 10.34 Process Analysis 10.35 Process Modelling 10.36 Prototyping 10.37 Reviews 10.38 Risk Analysis and Management 10.39 Roles and Permissions Matrix 10.40 Root Cause Analysis 10.41 Scope Modelling 10.42 Sequence Diagrams 10.43 Stakeholder List, Map, or Personas 10.44 State Modelling 10.45 Survey or Questionnaire 10.46 SWOT Analysis 10.47 Use Cases and Scenarios 10.48 User Stories 10.49 Vendor Assessment 10.50 Workshops

2.2 Tasks

2.2.3 Collect Data

Guide to Business Data Analytics

Collecting data involves the activities performed to support data professionals with data setup, preparation, and collection. The degree of involvement analysts have with data collection depends on how the organization structures the analytics team as well as the technical abilities of analysts.

In a broad sense, there are two approaches to data collection:

  • Passive Data Collection: unobtrusive data collection from users in their day-to-day transactions with the organization. This type of data is available without an analytics objective in mind, and a large portion of such data may already exist with the organization. For example, point-of- sale data, internet browsers, web, and mobile data. This type of data is often curated or transformed to be used for research questions.
  • Active Data Collection: actively seeking information from stakeholders for a specific goal. This type of data is not readily available with the organization (surveys and self-reports). Analysts play a significant role in structuring and applying best practices to design the data collection initiative. For example, the analyst may use best practices to design a survey on how to formulate open or closed-ended questions, use of a rating scale like the Likert scale, paired-comparisons, the number of questions, and the flow of questions.
Before data professionals begin collecting large amounts of data, it may be necessary to test the data collection approach by using a small number of observations. If the data collection method is a survey, this task might involve piloting the survey with a small population of participants before performing the survey with the larger population. When collecting data, analysts:
  • determine if the data will be originated from different sources,
  • identify where the data is going to be collected from (for example, database, spreadsheet, other sources), and
  • understand where the data comes from, what transformations are performed, and where it is finally stored in order to assess data quality. This is referred to as data lineage.
When data is collected from different sources, analysts determine if the disparate sources represent the same data in the same way. For example, if data source A uses numeric codes to specify gender and data source B uses alpha codes, the need for reconciling data elements across sources needs to be identified.

The file format for the output produced from each source is also identified. Further analysis determines if the data needs to be formatted prior to merging it into a single file. For example, will spaces need to be removed when moving data from a text file to a spreadsheet? Will data formats need to change so data is consistent between sources? There are instances where data discrepancies cannot be programmatically identified. These require domain knowledge to interpret the same type of data with different labels with the same meaning in different data sources. As data is collected, it is analyzed to identify potential problems with the data collection approach.

When collecting data, analysts leverage techniques such as surveys and experiments. Data collection is usually performed using automated tools over business processes. Data analysis skills determine what data to use, how to collect it, and its relevance and relationship to what is being analyzed. Demonstrating skills such as trustworthiness and ethics helps to build trust and rapport with stakeholders who may be needed to gain access to data or participate in elicitation activities. Business acumen is necessary during the testing of the data approach and when profiling data.