3. Techniques
3.6 Data Mapping
Guide to Business Data Analytics
.1 Purpose
Data mapping is used to consolidate data from one or more sources to a destination to create a meaningful set of data with a standardized format. This ensures that data can be accessed and used for business reporting and analysis.
Data mapping is used to consolidate data from one or more sources to a destination to create a meaningful set of data with a standardized format. This ensures that data can be accessed and used for business reporting and analysis.
.2 Description
Organizations collect a lot of data from many different data points, stored in varied formats. During data sourcing, there is often a need to migrate one or more of these data sources into a single data repository where all data has a consistent format. This allows for easy access to the data and reporting across the organization. Data mapping establishes a relationship between the source and the target repository, allowing data migration to happen, despite potentially disparate data formats.
Data mapping is used to support:
Note: For data integration to be possible, at a minimum, the data models of both source and target should be the same, even if the data schemas are different.
Data mapping happens at the attribute-level. During data migration and integration, attention is given to:
The complexity of data mapping may depend on several factors, including the degree of disparity between the source and target repositories, and any existing data hierarchy.
Organizations collect a lot of data from many different data points, stored in varied formats. During data sourcing, there is often a need to migrate one or more of these data sources into a single data repository where all data has a consistent format. This allows for easy access to the data and reporting across the organization. Data mapping establishes a relationship between the source and the target repository, allowing data migration to happen, despite potentially disparate data formats.
Data mapping is used to support:
- data migration, where source data is moved to a new data repository, and
- data integration, where source data is merged with an existing data repository.
Note: For data integration to be possible, at a minimum, the data models of both source and target should be the same, even if the data schemas are different.
Data mapping happens at the attribute-level. During data migration and integration, attention is given to:
- the selected attributes that will be migrated from source to target.
- creation of new attributes in the target repository to allow migration from the source.
- data type and size of the target attribute that is defined to ensure it can accommodate the source attribute data. In some instances, it may be determined the data size of the target attribute should be the same as the source data size, while in other instances the data size for the target attribute may need to be larger so it can accommodate future expansion. However, the target data size should never be smaller than the source data size as that may result in missing data.
- new custom-defined attributes that involve some form of manipulation or calculation (for example, tax amount, product name based on product code).
The complexity of data mapping may depend on several factors, including the degree of disparity between the source and target repositories, and any existing data hierarchy.
3.6.1 Elements
.1 Source
The repository providing the original data is referred to as the source. When analyzing the source, analysts consider the:
The repository receiving the data is referred to as the target. When analyzing the target, analysts consider the:
Data mapping is often completed using a spreadsheet where target attributes are mapped to source attributes with applicable transformation logic. The following is a basic data map.
.1 Source
The repository providing the original data is referred to as the source. When analyzing the source, analysts consider the:
- format of the source (for example, delimited file, spreadsheet, database entity),
- attributes of interest or potential interest, and
- data type and data size of the attributes.
The repository receiving the data is referred to as the target. When analyzing the target, analysts consider the:
- format of the source (for example, delimited file, spreadsheet, database entity),
- new attributes that need to be created within the target to align with the source, with similar data type and data sizes,
- source attributes that need to be transformed in the target and their corresponding transformation rule. These could be based on a business rule or a business requirement (for example, the "TransactionDatetime" data attribute in the source has a datetime format of mm/dd/yyyy hh:mm:tt but in the target, only the date should be brought in for easy querying purposes. So, the TransactionDate attribute in the target will have the date format mm/dd/yyyy), and
- creation of new custom fields (for example, adding a calculation, combining attributes, or formatting content based on a logic).
Data mapping is often completed using a spreadsheet where target attributes are mapped to source attributes with applicable transformation logic. The following is a basic data map.
| Target |
Source | Transformation Rules |
|||
| Entity Name | Attribute Name | Data Type | Attribute(s) | Direct Map? | Logic |
| T_Employe e | Emp_ID | Varchar (10) | Employee.ID | Y | |
| T_Employe e | Emp_Name | Varchar (50) | Employee.FirstName, Employee.LastName | N | Concatenate (Employee.FirstName, “ “,EmployeeLastName) |
| T_Employe e | Emp_DepNo | Number (2) | Employee.Departnment Number | Y | |
| T_Employe e | Emp_StartDate | Date | Employee.StartDate | Y | Only bring the date component with format YYYY-MM-DD |
3.6.2 Usage Considerations
.1 Strengths
.1 Strengths
- Provides a meticulous approach to ensuring smooth migration and integration of data between varied data platforms.
- Provides data traceability, which is particularly beneficial when resolving data issues.
- Easy to learn and use.
- Enables creation of a standardized, business-focused data repository, which in turn allows for consistency and ease in data access and reporting.
- Helps identify any data quality issues within the source or the target.
- Requires careful attention to detail to avoid encountering data loss, redundancies, or other unexpected errors.
- Data mapping work, particularly when done manually, can be time- consuming.
- Needs updating as soon as changes are made in either source or target