The Data Transformation Challenges with EAI
Given the diverse application landscape of companies, the integration of these applications to support a robust and reliable message exchange is more relevant than ever. These integrations can be categorized into vertical and horizontal integrations. Horizontal integration reflects the different applications and business partners that are engaged in the exchange of messages. Vertical integration can be further categorized into four layers: communication, message structure, data elements, and business process. The complexity of a problem can be reduced by subdividing the problem into smaller problem spaces that are self-contained units, or in this case, layers.[1][2]
From a horizontal perspective, the concern for complexity is focused on the connectivity of message exchange between the different applications within the enterprise landscape. The scope of the horizontal integration starts with business partners and focuses in on the organization’s application portfolio with the enterprise service bus being the central message hub for all transaction exchanges within the integration landscape. The horizontal integration is concerned with the cross-functional and cross-application business processes that need to be supported.
The four layers of vertical integration address the mechanics of integrating the different applications for the horizontal integration perspective.
Communication
The first layer is concerned with the communication adapters that are needed to send and receive messages between applications. The capabilities of communication adapters will have a direct impact on timing, recoverability, and message sequence.
Message Structure
Data structure integration differentiates between syntactical and semantic integration challenges and proposes a conflict resolution approach on the semantic level.” The conflict resolution on the syntactic and semantic layers is resolved through transformations. The question would be to what degree this transformation can be automated or partially automated.
Data Elements
Data structure integration differentiates between syntactical and semantic integration challenges and proposes a conflict resolution approach on the semantic level.” The conflict resolution on the syntactic and semantic layers is resolved through transformations. The question would be to what degree this transformation can be automated or partially automated.
Business Process
The fourth and final layer represents the integration at the business process level. There needs to be tight integration of workflow steps with the underlying integration scenarios. Integration scenarios therefore need to be tightly integrated with the business process, and testing of the integration scenario should always be driven first by the execution of the business process.
However, one area that is still providing significant challenges is the transformation of data values from a source application to the respective target application. Normally this data is considered reference data.
As described, the data integration layer is responsible for integrating data values between sending and receiving applications. A very simple example is the common scenario of integrating currencies or units of measure. The value in the sending system can be different than the value in the receiving system. In order for the receiving system to process the received messages, it needs these values to be translated into the values or language elements it understands. This transformation is currently performed through lookups to specially designed reference tables that contain the corresponding values. The storage of these translation tables is not as much of an issue as is the ongoing maintenance of the tables to ensure that they stay current, since data values can change in the sending and receiving systems. The issue of maintenance gains importance relative to the degree at which these values are dynamic.
Inaccurate content of cross-reference tables has a direct impact on the enterprise to deliver value and is a hard cost to the enterprise. Any failed transactions based on invalid or incomplete reference associations will fail and therefore incur a direct cost to the enterprise.
There are four fundamental cross-reference patterns that can be identified to demonstrate the complexity of the data layer integration.
One-To-One
The one-to-one pattern is the simplest pattern to resolve. Here, one input value corresponds to exactly one output value. A common example is the mapping of a country name to a corresponding country code. The source value acts as a key and retrieves the value as the target value. The associated mapping will replace the source value with the target value. The most common scenario is the translation of reference data that is contained in master or transactional data.
Enrichement
The enrichment pattern is slightly more complex. Here, a source value will also act as a key to retrieve the associated cross-reference value. However, the target value will require additional information that the source system cannot provide. Therefore, the x-referencing must include the additional associated field values that are missing in the source message. A good example is sensor tags from a plant. If the tag represents a valve, the source structure may include the flow rate and volume. But it will not include any reference to a plant ID or asset ID that the target system is expecting. Therefore, the data values have to be enriched to satisfy the mapping requirements of the target application.
Dependency
The dependency pattern provides a conditional relationship between x-reference values. Here, the source value acts again as a key. However, the associated target value provides a second or third to n dependency for further values that need to be provided to the target application. The sequence of these dependent lookups must be optimized to ensure the best performance within the overall processing of messages.
An example would be a lookup of a product, which would trigger a look up to a reference value such as a UOM (unit of measure) to find the conversion factors to adjust the quantities suitable for processing in the target application.
Granularity
The granularity pattern is the most complex pattern of the four. Here, a source value can be related to several potential target values. The goal is to identify source characteristics that sufficiently provide an inference of what the right target value should be. A common practical scenario of granularity would be products. A business partner integration provides the product identifier to the integration as part of the transaction. Internally, this product is further refined in various grades, specifications, or characteristics. However, the source message is missing these vital characteristics. The key issue to examine is how to define the product in such a way that the source message will, by default, incorporate sufficient characteristics so that at least the number of possible outcomes can be reduced.
The x-reference patterns described above outline the most common scenarios encountered within an enterprise application integration solution. Recognizing these patterns is vital to developing the right x-reference solution and being better able to incorporate these scenarios into the integration.