2.2 Tasks
2.2.6 A Case Study for Source Data
Guide to Business Data Analytics
Voice termination fraud is a major concern in the telecom industry; billions of dollars are lost by telecom companies according to industry research.
.1 The Challenge
Voice termination fraud, also referred to as SIMbox fraud, often occurs when international calls are hijacked by an intermediate network party and the call traffic is routed via Voice over Internet Protocol (VoIP) and then injected back through SIMboxes that are local to the receiving country. These practices effectively bypass the fees owed to telecom carriers resulting in lost revenue for the telecom industry.
Consider Alice in Country A who is making a phone call to Bob in Country B (a different country), as depicted in the graphic below. Instead of the call moving through the legal least cost path between the two countries, it moves through a SIMbox to a fraudulent least cost path carrier.
A SIMbox is a legal device, but they can be used by individuals to re-route cell phone calls to VoIP, in order to bypass the receiving network carrier, who would have received a termination fee for providing the last mile connectivity for calls. This reduces the money collected and overall revenues of the telecom as well as voice quality and fidelity of the networks.
Context
Traditional detection methods are inaccurate when detecting SIMbox fraud. SIMbox network signatures are difficult to track and emulate genuine devices like network repeaters or probes. Plus, the volume of device data generated is extremely large in size and variety.
This is especially a problem in Africa and Southeast Asia as the local call rates are cheaper compared to global averages. A Nigerian telecom carrier plans to use recent advancements in predictive analytics to detect and limit voice termination frauds in real-time. Adaku Musa, an experienced telecom expert with the telecom carrier, was asked to assist the data analysis team in her organization to develop a solution that could predict real-time fraudulent traffic in the network.
.2 Identifying Options
As an experienced telecom expert, Adaku was well aware of recent technological advancements. She assessed the situation and recommended three methods to support the objectives of this work:
.3 Outcomes Achieved
It is important to note that each method builds on the previous analysis in an iterative manner and provides an escalation in approach and successively more accurate information. The data itself goes through several layers of transformation. Business data analytics tools and techniques, as well as strong business knowledge, were used throughout to identify the actual predictors and rules that would be useful for predicting fraud.
The following identifies the results of the analysis and lists the data that was used to determine the appropriate predictors for fraud analysis:
In this case, the data identified in the first table was directly available. Using domain knowledge and successive data transformation approaches, the data depicted in the second table was created to support the predictive analytics outcomes. This data improves the predictive power of any SIMbox fraud detection algorithm.
By progressing through this structured approach, analyzing the data, and utilizing appropriate business data analytics techniques, Adaku could determine the best set of predictors that her organization could use to develop the fraud classification algorithm.
.4 Key Takeaways
.1 The Challenge
Voice termination fraud, also referred to as SIMbox fraud, often occurs when international calls are hijacked by an intermediate network party and the call traffic is routed via Voice over Internet Protocol (VoIP) and then injected back through SIMboxes that are local to the receiving country. These practices effectively bypass the fees owed to telecom carriers resulting in lost revenue for the telecom industry.
Consider Alice in Country A who is making a phone call to Bob in Country B (a different country), as depicted in the graphic below. Instead of the call moving through the legal least cost path between the two countries, it moves through a SIMbox to a fraudulent least cost path carrier.

A SIMbox is a legal device, but they can be used by individuals to re-route cell phone calls to VoIP, in order to bypass the receiving network carrier, who would have received a termination fee for providing the last mile connectivity for calls. This reduces the money collected and overall revenues of the telecom as well as voice quality and fidelity of the networks.
Context
Traditional detection methods are inaccurate when detecting SIMbox fraud. SIMbox network signatures are difficult to track and emulate genuine devices like network repeaters or probes. Plus, the volume of device data generated is extremely large in size and variety.
This is especially a problem in Africa and Southeast Asia as the local call rates are cheaper compared to global averages. A Nigerian telecom carrier plans to use recent advancements in predictive analytics to detect and limit voice termination frauds in real-time. Adaku Musa, an experienced telecom expert with the telecom carrier, was asked to assist the data analysis team in her organization to develop a solution that could predict real-time fraudulent traffic in the network.
.2 Identifying Options
As an experienced telecom expert, Adaku was well aware of recent technological advancements. She assessed the situation and recommended three methods to support the objectives of this work:
| Analysis Steps | Explanation | Advantages | Disadvantages |
| Identify SIMbox characteristics | Place test calls to own network from a foreign country through a calling card and identify if the last leg of a call is routed through SIM cards. This can be used to discover rules to identify SIMboxes and apply these rules in data collection and transformation in more sophisticated analysis. |
Easy to apply for discovering SIMbox characteristics/rules. For example:
|
|
| Passive call detail records (CDR) analysis with data sampling | Analyze CDR to create a baseline for relevant data that may be used for predicting/classifying a call, whether it is genuine or not. The rules discovered in the earlier stage are used to derive the right predictors and formats. For example, CDR may provide individual call duration, but volume of outgoing call for a SIM/subscriber is an aggregate level data, which may be a true predictor. |
|
|
| Analysis of CDR utilizing big data technologies | Analyze CDR using different big data technologies to discover additional predictor variables that may affect the classification of fraud. This step could have been performed before sampling in the previous step; however, it would have taken more time, effort, and cost to do so. |
|
|
.3 Outcomes Achieved
It is important to note that each method builds on the previous analysis in an iterative manner and provides an escalation in approach and successively more accurate information. The data itself goes through several layers of transformation. Business data analytics tools and techniques, as well as strong business knowledge, were used throughout to identify the actual predictors and rules that would be useful for predicting fraud.
The following identifies the results of the analysis and lists the data that was used to determine the appropriate predictors for fraud analysis:
| CDR Information Directly Available | Transformed Information used for Prediction |
| Partial CDR Fields (Call Level) | Description | Transformed Data (SIM Level) |
Description |
| Time | Date and time of the call | IMSI | International mobile subscriber’s ID |
| Duration | Call duration | Total # Calls/ day | Total number of calls per day |
| Originating Number | Caller's number | Total numbers called | Total number of unique subscribers called on a single day |
| Originating Country Code | Caller's country identifier | Total Night Calls | Total number of night-time calls |
| Terminating Number | Receiver's number | Total Incoming | Total number of incoming calls to the subscriber |
| Terminating Country Code | Receiver's country code | Average Minutes | Average call duration of each subscriber |
| IMEI | International mobile equipment ID | Most Frequent LAC ID | Most frequent base station used for calls |
| IMSI | International mobile subscriber's ID | Most frequent Originating Country | Most frequent originating country identifier |
| LAC ID | Local area base station identifier | Most frequent Terminating Country | Most frequent terminating country identifier |
In this case, the data identified in the first table was directly available. Using domain knowledge and successive data transformation approaches, the data depicted in the second table was created to support the predictive analytics outcomes. This data improves the predictive power of any SIMbox fraud detection algorithm.
By progressing through this structured approach, analyzing the data, and utilizing appropriate business data analytics techniques, Adaku could determine the best set of predictors that her organization could use to develop the fraud classification algorithm.
.4 Key Takeaways
- A structured approach to planning data collection and how that data can be used results in more accurate analysis and subsequent prediction of fraud.
- Industry knowledge, business knowledge, and solution knowledge are key competencies to help identify the most relevant data. In this case, Adaku's knowledge of how SIMbox fraud takes place pointed her to the CDR as the right data source.
- The available data may not be directly useful in analysis and undergoes additional transformation to serve its intended use. In this case, the data captured in the CDR was not sufficient in its raw form and application of business knowledge was needed to transform the data to a more business-oriented format for better analytical insights.
- By successive experimentations and analysis of the outcomes, more accurate methods might emerge. In this case, we saw the refinement of data and methods (rule-based discovery to passive CDR analysis to real- time big data implementation) to achieve the desired outcomes.