Blog

Integrating through AWS serverless ingestion pipeline

Building a serverless ingestion pipeline to decouple a front-office application from the back-office.

Setting the stage

Setting the stage

We recently had to build a front-office responsive web application, making available back-office data to the end-customer. It seemed like a simple web-based application that could be built with our favorite(?) single page application framework.

Discovering the requirements

However, when looking a bit deeper some non-functionals popped up that we needed to take into account:

  • # customers: The no. of customers is magnitudes bigger than the number of back-office employees.
  • Legacy: back-office applications are often legacy applications, and even if they are not that old, they are typically designed to support only a limited number of users.
  • 247: The front-office application is used around the clock, so not limited to business hours as is the case for the back-office applications.
  • Downtime: Some form of graceful degradation should exist. Preferably the front-office application should keep working when a back-office application is not reachable.
  • Not real-time: Modifications in the back-office applications need not to be available in real-time for the customers. Preferably however the time needed for data to show up in the front-office application should not be too long.

The autonomous bubble pattern

Autonomous bubble

Taking the above non-functional requirements into account made us move towards an autonomous bubble style architecture. The autonomous bubble is a pattern described in domain-driven design, the original article is located here. In short it allows to build a new system based on the domains of other systems, while achieving as much isolation as possible from these other (often legacy) systems.

This means that our front-office application will have its own persistent storage for storing the data to be shown to the end-customers. Data should be available in that persistent storage whenever a customer requests it, we want to avoid having to request data from the back-office systems in real-time.

The solution proposed in the autonomous bubble pattern is the so-called synchronizing anti-corruption layer. This component will take on the responsibility of synchronizing the data between the back-office systems and the front-office application. This synchronization is typically done by detecting changes in the back-office systems, which then need to be translated and inserted into the front-office application. The more frequent we can do this synchronization, the faster modifications in the back-office systems will be visible in the front-office application.

In short, this pattern enables a simple architecture for the front-office application: a single page application with a (java) backend exposing REST services for fetching the data stored in a database. The technical complexity of integrating front-office with back-office is isolated in the synchronizing anti-corruption layer.

Can AWS help us contain this complexity?

Can AWS help us

Setting up a synchronizing anti-corruption layer is quite complex, both from a development perspective as from an operational perspective. This automatically triggers the reflex for looking at products, components, … that we could (re-)use to make this simpler (a.k.a build vs buy).

AWS has a couple of serverless PAAS solutions that seem to fit the purpose. Specifically, we were looking at:

  • Amazon API Gateway
  • AWS Lambda
  • Amazon Kinesis Data Streams
  • Amazon DynamoDB
  • Amazon CloudWatch

Amazon API Gateway

Using Amazon API Gateway

Amazon API Gateway is a fully managed service that makes it easy to create, publish, maintain, monitor and secure APIs. We can use this service to expose the REST APIs of the different back-office systems behind our own domain (preventing CORS when accessed from the front-office application). We can secure those APIs using API keys and protect the back-office applications from excessive load using throttling. We could even do caching if wanted to.

AWS Lambda

With AWS we can run code without having to provision or manage servers. We can leverage this service for both reading the data from the back-office systems as well as for writing the data to the front-office system. Using AWS Lambda provides us with a serverless, monitored, auto-scaled and pay-per-use solution, letting us focus on the actual source code needed to read and write the data. AWS Lambda supports a multitude of triggers for running your code, amongst others cron-like expressions that enable us to poll for changes in the back-office systems on regular intervals.

Using AWS Lambda

Amazon Kinesis Data Streams

Amazon Kinesis is a massively scalable and durable real-time data streaming service. Typical use cases are handling log data, handling event data and enabling analytics. In our cases we have modifications in the back-office systems that act as events that need to be handled towards the front-office. By pushing modifications on a data stream, we can trigger the AWS Lambda functions responsible for pushing these changes towards the front-office system. We can use the no. of shards of the Kinesis data stream as a way to control concurrency of the consuming lambdas.

Using Amazon Kinesis

Amazon DynamoDB and Amazon CloudWatch

We need to keep track of the latest changes that have been synced. This will enable us to resume synchronization on the next invocation of our ‘pull changes’ lambda. For this purpose, we use Amazon DynamoDB which is very easy to integrate with AWS Lambda: simply applying the correct permissions and using the Amazon SDK in your lambda will do the trick.

In order to configure monitoring, log aggregation and alerting we used Amazon CloudWatch. All aforementioned Amazon services integrate automatically with CloudWatch making it easy and intuitive to set this up. With the ability to define custom dashboards and alerts (e-mail and SMS) we had all the functionality needed to manage this solution.

Final solution

Conclusion

Using AWS services, we were able to simplify the solution and mostly focus on the complexities specific to our problem domain. The actual front-office application has a simple design and was therefore easy to build, while the synchronizing anti-corruption layer was reduced to creating some lambda functions and configuring a couple of Amazon services. All AWS services are designed with operations and manageability in mind, which we could use in our advantage to build a monitoring solution quickly and efficiently.

Want to know more? Feel free to contact us, always happy to discuss.

Address
Diestsevest 32/0c
3000, Leuven
Connect with us