Skip to content
Browse
BABOK Guide
BABOK Guide
10. Techniques
Introduction 10.1 Acceptance and Evaluation Criteria 10.2 Backlog Management 10.3 Balanced Scorecard 10.4 Benchmarking and Market Analysis 10.5 Brainstorming 10.6 Business Capability Analysis 10.7 Business Cases 10.8 Business Model Canvas 10.9 Business Rules Analysis 10.10 Collaborative Games 10.11 Concept Modelling 10.12 Data Dictionary 10.13 Data Flow Diagrams 10.14 Data Mining 10.15 Data Modelling 10.16 Decision Analysis 10.17 Decision Modelling 10.18 Document Analysis 10.19 Estimation 10.20 Financial Analysis 10.21 Focus Groups 10.22 Functional Decomposition 10.23 Glossary 10.24 Interface Analysis 10.25 Interviews 10.26 Item Tracking 10.27 Lessons Learned 10.28 Metrics and Key Performance Indicators (KPIs) 10.29 Mind Mapping 10.30 Non-Functional Requirements Analysis 10.31 Observation 10.32 Organizational Modelling 10.33 Prioritization 10.34 Process Analysis 10.35 Process Modelling 10.36 Prototyping 10.37 Reviews 10.38 Risk Analysis and Management 10.39 Roles and Permissions Matrix 10.40 Root Cause Analysis 10.41 Scope Modelling 10.42 Sequence Diagrams 10.43 Stakeholder List, Map, or Personas 10.44 State Modelling 10.45 Survey or Questionnaire 10.46 SWOT Analysis 10.47 Use Cases and Scenarios 10.48 User Stories 10.49 Vendor Assessment 10.50 Workshops

2.2 Tasks

2.2.6 A Case Study for Source Data

Guide to Business Data Analytics

Voice termination fraud is a major concern in the telecom industry; billions of dollars are lost by telecom companies according to industry research.

.1    The Challenge

Voice termination fraud, also referred to as SIMbox fraud, often occurs when international calls are hijacked by an intermediate network party and the call traffic is routed via Voice over Internet Protocol (VoIP) and then injected back through SIMboxes that are local to the receiving country. These practices effectively bypass the fees owed to telecom carriers resulting in lost revenue for the telecom industry.

Consider Alice in Country A who is making a phone call to Bob in Country B (a different country), as depicted in the graphic below. Instead of the call moving through the legal least cost path between the two countries, it moves through a SIMbox to a fraudulent least cost path carrier.

BDASCaseStudySourceData2.2.jpg


A SIMbox is a legal device, but they can be used by individuals to re-route cell phone calls to VoIP, in order to bypass the receiving network carrier, who would have received a termination fee for providing the last mile connectivity for calls. This reduces the money collected and overall revenues of the telecom as well as voice quality and fidelity of the networks.

Context

Traditional detection methods are inaccurate when detecting SIMbox fraud. SIMbox network signatures are difficult to track and emulate genuine devices like network repeaters or probes. Plus, the volume of device data generated is extremely large in size and variety.

This is especially a problem in Africa and Southeast Asia as the local call rates are cheaper compared to global averages. A Nigerian telecom carrier plans to use recent advancements in predictive analytics to detect and limit voice termination frauds in real-time. Adaku Musa, an experienced telecom expert with the telecom carrier, was asked to assist the data analysis team in her organization to develop a solution that could predict real-time fraudulent traffic in the network.

.2    Identifying Options

As an experienced telecom expert, Adaku was well aware of recent technological advancements. She assessed the situation and recommended three methods to support the objectives of this work:

Analysis Steps Explanation Advantages Disadvantages
Identify SIMbox characteristics Place test calls to own network from a foreign country through a calling card and identify if the last leg of a call is routed through SIM cards.
This can be used to discover rules to identify SIMboxes and apply these rules in data collection and transformation in more sophisticated analysis.
Easy to apply for discovering SIMbox characteristics/rules. For example:
  • Large volumes of outgoing calls.
  • Different destinations.
  • Low number of incoming calls within the network.
  • Although SIMboxes can be identified with such an approach it is hard to scale for multiple countries.
  • It is not real-time detection and rule- based. Identification may not be highly accurate.
Passive call detail records (CDR) analysis with data sampling Analyze CDR to create a baseline for relevant data that may be used for predicting/classifying a call, whether it is genuine or not.
The rules discovered in the earlier stage are used to derive the right predictors and formats. For example, CDR may provide individual call duration, but volume of outgoing call for a SIM/subscriber is an aggregate level data, which may be a true predictor. 
  • Sampling of data from CDR provides a quicker way to test the hypothesis based on the rules above without the need to analyze all the CDR.
  • It is not real-time detection of SIMbox fraud; however, it is used to determine the right predictor variables and allows the data science team to quickly train and test classification algorithms that can be deployed in real-time.
  • Less accurate than a complete analysis of CDR due to sampling errors.
Analysis of CDR utilizing big data technologies
Analyze CDR using different big data technologies to discover additional predictor variables that may affect the classification of fraud.
 
This step could have been performed before sampling in the previous step; however, it would have taken more time, effort, and cost to do so.
  • More accurate than analysis using sampling.
  • No need to analyze already established predictors as the analysis is carried forward from last stage.
  • Can be implemented in real-time.
  • Expensive and requires technical and data sophistication.

.3    Outcomes Achieved

It is important to note that each method builds on the previous analysis in an iterative manner and provides an escalation in approach and successively more accurate information. The data itself goes through several layers of transformation. Business data analytics tools and techniques, as well as strong business knowledge, were used throughout to identify the actual predictors and rules that would be useful for predicting fraud.

The following identifies the results of the analysis and lists the data that was used to determine the appropriate predictors for fraud analysis:

CDR Information Directly Available Transformed Information used for Prediction
Partial CDR Fields (Call Level) Description Transformed Data (SIM
Level)
Description
Time Date and time of the call IMSI International mobile subscriber’s ID
Duration Call duration Total # Calls/ day Total number of calls per day
Originating Number Caller's number Total numbers called Total number of unique subscribers called on a single day
Originating Country Code Caller's country identifier Total Night Calls Total number of night-time calls
Terminating Number Receiver's number Total Incoming Total number of incoming calls to the subscriber
Terminating Country Code Receiver's country code Average Minutes Average call duration of each subscriber
IMEI International mobile equipment ID Most Frequent LAC ID Most frequent base station used for calls
IMSI International mobile subscriber's ID Most frequent Originating Country Most frequent originating country identifier
LAC ID Local area base station identifier Most frequent Terminating Country Most frequent terminating country identifier
 
In this case, the data identified in the first table was directly available. Using domain knowledge and successive data transformation approaches, the data depicted in the second table was created to support the predictive analytics outcomes. This data improves the predictive power of any SIMbox fraud detection algorithm.

By progressing through this structured approach, analyzing the data, and utilizing appropriate business data analytics techniques, Adaku could determine the best set of predictors that her organization could use to develop the fraud classification algorithm.

.4    Key Takeaways

  • A structured approach to planning data collection and how that data can be used results in more accurate analysis and subsequent prediction of fraud.
  • Industry knowledge, business knowledge, and solution knowledge are key competencies to help identify the most relevant data. In this case, Adaku's knowledge of how SIMbox fraud takes place pointed her to the CDR as the right data source.
  • The available data may not be directly useful in analysis and undergoes additional transformation to serve its intended use. In this case, the data captured in the CDR was not sufficient in its raw form and application of business knowledge was needed to transform the data to a more business-oriented format for better analytical insights.
  • By successive experimentations and analysis of the outcomes, more accurate methods might emerge. In this case, we saw the refinement of data and methods (rule-based discovery to passive CDR analysis to real- time big data implementation) to achieve the desired outcomes.