Backend applications for data integration are not just supposed to fetch data from a database and serve it to the front end. It should also be able to gather data from different sources, process it as required, and store the processed data in the database.
At Zemoso, we have extensive experience in working with such applications for products across fintech, healthtech, proptech, privacy, retail, and other industries. Therefore, we have created a handbook on how to avoid some common coding mistakes while dealing with external data sources.
Consider the above diagram, where our Spring Boot application fetches data from multiple sources.
1. Database: Usual Structured Query Language (SQL) or NoSQL database
2. Identity Providers (IDP): It stores and fetches user profile information after authentication
3. Rule: A simple “rule engine” server that fetches the “business rules”
4. Payment data provider: When you need to integrate with a payment facilitator
5. Domain data providers: The application might be integrated with third-party applications or servers that perform domain-specific heavy weight operations
The applications can get really complex as it fetches and stores data from multiple sources.
Many developers end up writing data-fetching logic from third party sources at the service layer. That sounds good most of the time, but it has some major drawbacks.
1. Tight coupling: When we write integrations in the service layer, we end up having a tight coupling to the integration or the data provider. For example, if we have a payment integration with PayPal and we fetch the payment details from five different services. Due to changes in policy and pricing, you now have to integrate with a different platform, Stripe. That means now you have to change the payment fetching logic in five different classes. Plus, this logic might not even be straight forward.
2. Scattered code: As code size increases, integrations become increasingly unmanageable. It becomes difficult to track and manage the code, even in a microservice architecture.
3. Migration hell: Library updates and deprecations become a challenge when we rely solely on the Software Development Kits (SDKs) of the integrations. Consider this situation: we are using Amazon Web Services (AWS) Java SDK to communicate with AWS and to fetch and store files to S3. Now, AWS updates their SDK to add features and remove others. Upgrading the AWS logic that is scattered across files is messy and time-consuming.
Many developers escape the above mentioned issues by following a single responsibility principle. They extract the integrations and the data-fetching logic to different classes. Many write code directly with a class instead of an interface. In most situations, this works well, but it can cause multiple code management issues going forward. Some common problems with using classes directly are:
1. Pseudo-hard coupling: While extracting the integration-specific code to a class can help loosely couple your code to some extent, it does not remove it completely.
2. Ad Hoc changes: Many developers end up making workarounds and gradually deviating from ”single responsibility”, making the code harder to manage.
Many developers use the SDKs that were provided by the integration or the data source provider. While this is convenient in the short-term, it adds to the issues outlined above in the long-run.
By using the exceptions and objects that were provided by the SDK, we counter the efforts put in to avoid the above-mentioned mistakes. This is how. Consider this example:
In the above code, we created a separate interface, but we still added a dependency on the objects (PaymentIntent) and exception (PaymentException) which was provided by the integration SDK (in this case, Stripe). This counters the effort of loosely coupling the code and managing the data from third party providers. Also, in case the SDK is changed, you’ll have to change these references across the code.
Considering we are not using GraphQL, our controllers are designed to return the JSON that serves a specific purpose on the front end.
Most organizations are moving to a resource-oriented Application Programming Interface (API) design, where each API is designed to serve a specific resource, irrespective of what the front end needs. Then it orchestrates the data in intermediate layers like GraphQL. However, many still prefer to design the APIs directly serving the front end. In such cases, defining different data transfer objects (DTOs) becomes really important.
If the DTO and data objects are not split, you’ll have to create multiple work-rounds just to keep the application running when you make any changes to the data source or the database schema.
Apart from the above-mentioned issues, writing test cases also becomes difficult in case Integration is not correct.
Here are some of the ways which we use to avoid the concerns listed above and make our code cleaner and more manageable.
Bonus insight: The below provided techniques are not just limited to the data provider from data source or integrations, but can be extended to various other use cases too.
Let us consider a simple use case:
1. You are building an application similar to Coursera or O'Reilly
2. Your end user pays you to subscribe to books or courses
3. Your application provides the books or courses to users based on their subscription
4. You want to show the preview of the books to guests and subscribed users
The whole application is of course much more complex, let’s focus on the payment and serving the books to the subscribed users and guests section.
Let us split the whole scenario into two smaller parts
1. Handling payment
2. Serving the books from database and S3 repository
In the above diagram, we added a Stripe wrapper class between StripeSDK and Stripe service. To understand why we’re doing this, let’s look at the Stripe SDK first.
Stripe SDK comes with static class methods, i.e. you cannot inject Stripe dependency to your service. Instead, you have to call the methods directly when required. This can lead to tight coupling in code and mocking Stripe during tests can become a nightmare.
So wrapping these static calls within a wrapper can help us add the Stripe SDK as an injected dependency to your service. Thus, making writing test cases more convenient.
Not only tests, but in case of SDK updates you will have all your Stripe calls in one place and managing the migrations becomes easier. You can do it without touching the actual business logic in your service class.
Many SDKs like Firebase, DataLoggers, etc. use static methods to perform some operations and introducing wrappers on top of them can make your code much more manageable.
A closer look at Stripe SDK
Charge charge = Charge.retrieve(
"ch_random_charge,
requestOptions,
);
Stripe already has methods that internally handle data fetching and storing operations. So the wrapper should be self-sufficient and can act as the data repository. Depending on the amount of operations and applying the interface segregation principle, you can further split it into multiple smaller wrappers. In the end, the target must be a code with a single responsibility and should be easy to manage.
In this one, we have introduced a new repository with its implementation, but there are no wrappers on AWS SDK. Check the below example, AWS SDK uses builder pattern to create the instance, which can be externalized in a configuration class and instance can be injected as a bean.
Why do we have a different Java Persistence API (JPA) repository, S3 repository, and an intermediate layer?
Let us break down this hypothetical problem into smaller units.
1. JPA repository serves the book preview like name, author, description to both subscribed and guest users.
2. S3 repository serves the whole book to subscribed users
3. BooksSubscriptionRepository orchestrates between the two to provide respective content to the BooksService.
What does orchestrating data between multiple repositories mean?
Consider the above data model. There is a single data provider and it has two different implementations. Each of the implementations has different sources. It can either be a cloud provider, database, message queue, or cache.
Each may or may not contain the same data. If they have the same data, they might be used for serving different loads of data. The possibilities are endless. As a service that needs data, all it cares about is getting the data, no matter how. But due to various reasons, if you switch the data source from one type to another, the rest of the application should not be affected. A data orchestrator internally picks up the right source to provide the data in the best possible way without affecting the logic that uses the data.
In our example, our book data repository switched between S3 and the database to serve a preview or full book as required by the caller. In the future, if you decide to serve the preview too from S3, then all you have to do is to only switch the source to the data provider.
Disclaimer
The example mentioned above is purely designed for demonstration purposes. Though it is good enough as an architecture, it might not be a good reference for Stripe or AWS integration into your application.
A version of this article was earlier published on Medium.
Backend applications for data integration are not just supposed to fetch data from a database and serve it to the front end. It should also be able to gather data from different sources, process it as required, and store the processed data in the database.
At Zemoso, we have extensive experience in working with such applications for products across fintech, healthtech, proptech, privacy, retail, and other industries. Therefore, we have created a handbook on how to avoid some common coding mistakes while dealing with external data sources.
Consider the above diagram, where our Spring Boot application fetches data from multiple sources.
1. Database: Usual Structured Query Language (SQL) or NoSQL database
2. Identity Providers (IDP): It stores and fetches user profile information after authentication
3. Rule: A simple “rule engine” server that fetches the “business rules”
4. Payment data provider: When you need to integrate with a payment facilitator
5. Domain data providers: The application might be integrated with third-party applications or servers that perform domain-specific heavy weight operations
The applications can get really complex as it fetches and stores data from multiple sources.
Many developers end up writing data-fetching logic from third party sources at the service layer. That sounds good most of the time, but it has some major drawbacks.
1. Tight coupling: When we write integrations in the service layer, we end up having a tight coupling to the integration or the data provider. For example, if we have a payment integration with PayPal and we fetch the payment details from five different services. Due to changes in policy and pricing, you now have to integrate with a different platform, Stripe. That means now you have to change the payment fetching logic in five different classes. Plus, this logic might not even be straight forward.
2. Scattered code: As code size increases, integrations become increasingly unmanageable. It becomes difficult to track and manage the code, even in a microservice architecture.
3. Migration hell: Library updates and deprecations become a challenge when we rely solely on the Software Development Kits (SDKs) of the integrations. Consider this situation: we are using Amazon Web Services (AWS) Java SDK to communicate with AWS and to fetch and store files to S3. Now, AWS updates their SDK to add features and remove others. Upgrading the AWS logic that is scattered across files is messy and time-consuming.
Many developers escape the above mentioned issues by following a single responsibility principle. They extract the integrations and the data-fetching logic to different classes. Many write code directly with a class instead of an interface. In most situations, this works well, but it can cause multiple code management issues going forward. Some common problems with using classes directly are:
1. Pseudo-hard coupling: While extracting the integration-specific code to a class can help loosely couple your code to some extent, it does not remove it completely.
2. Ad Hoc changes: Many developers end up making workarounds and gradually deviating from ”single responsibility”, making the code harder to manage.
Many developers use the SDKs that were provided by the integration or the data source provider. While this is convenient in the short-term, it adds to the issues outlined above in the long-run.
By using the exceptions and objects that were provided by the SDK, we counter the efforts put in to avoid the above-mentioned mistakes. This is how. Consider this example:
In the above code, we created a separate interface, but we still added a dependency on the objects (PaymentIntent) and exception (PaymentException) which was provided by the integration SDK (in this case, Stripe). This counters the effort of loosely coupling the code and managing the data from third party providers. Also, in case the SDK is changed, you’ll have to change these references across the code.
Considering we are not using GraphQL, our controllers are designed to return the JSON that serves a specific purpose on the front end.
Most organizations are moving to a resource-oriented Application Programming Interface (API) design, where each API is designed to serve a specific resource, irrespective of what the front end needs. Then it orchestrates the data in intermediate layers like GraphQL. However, many still prefer to design the APIs directly serving the front end. In such cases, defining different data transfer objects (DTOs) becomes really important.
If the DTO and data objects are not split, you’ll have to create multiple work-rounds just to keep the application running when you make any changes to the data source or the database schema.
Apart from the above-mentioned issues, writing test cases also becomes difficult in case Integration is not correct.
Here are some of the ways which we use to avoid the concerns listed above and make our code cleaner and more manageable.
Bonus insight: The below provided techniques are not just limited to the data provider from data source or integrations, but can be extended to various other use cases too.
Let us consider a simple use case:
1. You are building an application similar to Coursera or O'Reilly
2. Your end user pays you to subscribe to books or courses
3. Your application provides the books or courses to users based on their subscription
4. You want to show the preview of the books to guests and subscribed users
The whole application is of course much more complex, let’s focus on the payment and serving the books to the subscribed users and guests section.
Let us split the whole scenario into two smaller parts
1. Handling payment
2. Serving the books from database and S3 repository
In the above diagram, we added a Stripe wrapper class between StripeSDK and Stripe service. To understand why we’re doing this, let’s look at the Stripe SDK first.
Stripe SDK comes with static class methods, i.e. you cannot inject Stripe dependency to your service. Instead, you have to call the methods directly when required. This can lead to tight coupling in code and mocking Stripe during tests can become a nightmare.
So wrapping these static calls within a wrapper can help us add the Stripe SDK as an injected dependency to your service. Thus, making writing test cases more convenient.
Not only tests, but in case of SDK updates you will have all your Stripe calls in one place and managing the migrations becomes easier. You can do it without touching the actual business logic in your service class.
Many SDKs like Firebase, DataLoggers, etc. use static methods to perform some operations and introducing wrappers on top of them can make your code much more manageable.
A closer look at Stripe SDK
Charge charge = Charge.retrieve(
"ch_random_charge,
requestOptions,
);
Stripe already has methods that internally handle data fetching and storing operations. So the wrapper should be self-sufficient and can act as the data repository. Depending on the amount of operations and applying the interface segregation principle, you can further split it into multiple smaller wrappers. In the end, the target must be a code with a single responsibility and should be easy to manage.
In this one, we have introduced a new repository with its implementation, but there are no wrappers on AWS SDK. Check the below example, AWS SDK uses builder pattern to create the instance, which can be externalized in a configuration class and instance can be injected as a bean.
Why do we have a different Java Persistence API (JPA) repository, S3 repository, and an intermediate layer?
Let us break down this hypothetical problem into smaller units.
1. JPA repository serves the book preview like name, author, description to both subscribed and guest users.
2. S3 repository serves the whole book to subscribed users
3. BooksSubscriptionRepository orchestrates between the two to provide respective content to the BooksService.
What does orchestrating data between multiple repositories mean?
Consider the above data model. There is a single data provider and it has two different implementations. Each of the implementations has different sources. It can either be a cloud provider, database, message queue, or cache.
Each may or may not contain the same data. If they have the same data, they might be used for serving different loads of data. The possibilities are endless. As a service that needs data, all it cares about is getting the data, no matter how. But due to various reasons, if you switch the data source from one type to another, the rest of the application should not be affected. A data orchestrator internally picks up the right source to provide the data in the best possible way without affecting the logic that uses the data.
In our example, our book data repository switched between S3 and the database to serve a preview or full book as required by the caller. In the future, if you decide to serve the preview too from S3, then all you have to do is to only switch the source to the data provider.
Disclaimer
The example mentioned above is purely designed for demonstration purposes. Though it is good enough as an architecture, it might not be a good reference for Stripe or AWS integration into your application.
A version of this article was earlier published on Medium.
Backend applications for data integration are not just supposed to fetch data from a database and serve it to the front end. It should also be able to gather data from different sources, process it as required, and store the processed data in the database.
At Zemoso, we have extensive experience in working with such applications for products across fintech, healthtech, proptech, privacy, retail, and other industries. Therefore, we have created a handbook on how to avoid some common coding mistakes while dealing with external data sources.
Consider the above diagram, where our Spring Boot application fetches data from multiple sources.
1. Database: Usual Structured Query Language (SQL) or NoSQL database
2. Identity Providers (IDP): It stores and fetches user profile information after authentication
3. Rule: A simple “rule engine” server that fetches the “business rules”
4. Payment data provider: When you need to integrate with a payment facilitator
5. Domain data providers: The application might be integrated with third-party applications or servers that perform domain-specific heavy weight operations
The applications can get really complex as it fetches and stores data from multiple sources.
Many developers end up writing data-fetching logic from third party sources at the service layer. That sounds good most of the time, but it has some major drawbacks.
1. Tight coupling: When we write integrations in the service layer, we end up having a tight coupling to the integration or the data provider. For example, if we have a payment integration with PayPal and we fetch the payment details from five different services. Due to changes in policy and pricing, you now have to integrate with a different platform, Stripe. That means now you have to change the payment fetching logic in five different classes. Plus, this logic might not even be straight forward.
2. Scattered code: As code size increases, integrations become increasingly unmanageable. It becomes difficult to track and manage the code, even in a microservice architecture.
3. Migration hell: Library updates and deprecations become a challenge when we rely solely on the Software Development Kits (SDKs) of the integrations. Consider this situation: we are using Amazon Web Services (AWS) Java SDK to communicate with AWS and to fetch and store files to S3. Now, AWS updates their SDK to add features and remove others. Upgrading the AWS logic that is scattered across files is messy and time-consuming.
Many developers escape the above mentioned issues by following a single responsibility principle. They extract the integrations and the data-fetching logic to different classes. Many write code directly with a class instead of an interface. In most situations, this works well, but it can cause multiple code management issues going forward. Some common problems with using classes directly are:
1. Pseudo-hard coupling: While extracting the integration-specific code to a class can help loosely couple your code to some extent, it does not remove it completely.
2. Ad Hoc changes: Many developers end up making workarounds and gradually deviating from ”single responsibility”, making the code harder to manage.
Many developers use the SDKs that were provided by the integration or the data source provider. While this is convenient in the short-term, it adds to the issues outlined above in the long-run.
By using the exceptions and objects that were provided by the SDK, we counter the efforts put in to avoid the above-mentioned mistakes. This is how. Consider this example:
In the above code, we created a separate interface, but we still added a dependency on the objects (PaymentIntent) and exception (PaymentException) which was provided by the integration SDK (in this case, Stripe). This counters the effort of loosely coupling the code and managing the data from third party providers. Also, in case the SDK is changed, you’ll have to change these references across the code.
Considering we are not using GraphQL, our controllers are designed to return the JSON that serves a specific purpose on the front end.
Most organizations are moving to a resource-oriented Application Programming Interface (API) design, where each API is designed to serve a specific resource, irrespective of what the front end needs. Then it orchestrates the data in intermediate layers like GraphQL. However, many still prefer to design the APIs directly serving the front end. In such cases, defining different data transfer objects (DTOs) becomes really important.
If the DTO and data objects are not split, you’ll have to create multiple work-rounds just to keep the application running when you make any changes to the data source or the database schema.
Apart from the above-mentioned issues, writing test cases also becomes difficult in case Integration is not correct.
Here are some of the ways which we use to avoid the concerns listed above and make our code cleaner and more manageable.
Bonus insight: The below provided techniques are not just limited to the data provider from data source or integrations, but can be extended to various other use cases too.
Let us consider a simple use case:
1. You are building an application similar to Coursera or O'Reilly
2. Your end user pays you to subscribe to books or courses
3. Your application provides the books or courses to users based on their subscription
4. You want to show the preview of the books to guests and subscribed users
The whole application is of course much more complex, let’s focus on the payment and serving the books to the subscribed users and guests section.
Let us split the whole scenario into two smaller parts
1. Handling payment
2. Serving the books from database and S3 repository
In the above diagram, we added a Stripe wrapper class between StripeSDK and Stripe service. To understand why we’re doing this, let’s look at the Stripe SDK first.
Stripe SDK comes with static class methods, i.e. you cannot inject Stripe dependency to your service. Instead, you have to call the methods directly when required. This can lead to tight coupling in code and mocking Stripe during tests can become a nightmare.
So wrapping these static calls within a wrapper can help us add the Stripe SDK as an injected dependency to your service. Thus, making writing test cases more convenient.
Not only tests, but in case of SDK updates you will have all your Stripe calls in one place and managing the migrations becomes easier. You can do it without touching the actual business logic in your service class.
Many SDKs like Firebase, DataLoggers, etc. use static methods to perform some operations and introducing wrappers on top of them can make your code much more manageable.
A closer look at Stripe SDK
Charge charge = Charge.retrieve(
"ch_random_charge,
requestOptions,
);
Stripe already has methods that internally handle data fetching and storing operations. So the wrapper should be self-sufficient and can act as the data repository. Depending on the amount of operations and applying the interface segregation principle, you can further split it into multiple smaller wrappers. In the end, the target must be a code with a single responsibility and should be easy to manage.
In this one, we have introduced a new repository with its implementation, but there are no wrappers on AWS SDK. Check the below example, AWS SDK uses builder pattern to create the instance, which can be externalized in a configuration class and instance can be injected as a bean.
Why do we have a different Java Persistence API (JPA) repository, S3 repository, and an intermediate layer?
Let us break down this hypothetical problem into smaller units.
1. JPA repository serves the book preview like name, author, description to both subscribed and guest users.
2. S3 repository serves the whole book to subscribed users
3. BooksSubscriptionRepository orchestrates between the two to provide respective content to the BooksService.
What does orchestrating data between multiple repositories mean?
Consider the above data model. There is a single data provider and it has two different implementations. Each of the implementations has different sources. It can either be a cloud provider, database, message queue, or cache.
Each may or may not contain the same data. If they have the same data, they might be used for serving different loads of data. The possibilities are endless. As a service that needs data, all it cares about is getting the data, no matter how. But due to various reasons, if you switch the data source from one type to another, the rest of the application should not be affected. A data orchestrator internally picks up the right source to provide the data in the best possible way without affecting the logic that uses the data.
In our example, our book data repository switched between S3 and the database to serve a preview or full book as required by the caller. In the future, if you decide to serve the preview too from S3, then all you have to do is to only switch the source to the data provider.
Disclaimer
The example mentioned above is purely designed for demonstration purposes. Though it is good enough as an architecture, it might not be a good reference for Stripe or AWS integration into your application.
A version of this article was earlier published on Medium.