Blog

RepairApp: A serverless overview

As you may already have read on our blog we finished our “Repair App” - project. Time to dive into the more technical details / learnings from this project.

We will be addressing multiple things in this article, so let me first go over the items we will discuss. This way you can just jump to the part you want, or you can read it all πŸ˜„.

Architecture

First let’s look at a global view of the architecture and context of the project / application.

Our project actually consists of 2 parts: SmartRe and RepairApp. One of the reasons we chose to split it up is the fact that they are simply 2 different applications with different stakeholders. Another, smaller reason is that this makes it easier to eventually switch out the custom developed machine learning component with a managed ML service, like an AWS machine learning service for example.

SmartRe

The SmartRe part basically contains the back-end code for the machine learning components. These can be triggered with a REST-API. To make it a little bit easier to understand, we took the liberty of simplifying the architecture of the project a little bit.

We have an API-Gateway in front of our AWS lambda’s, which provides the REST-API entrypoints to the application.

The S3-bucket is used for storing data received from the RepairApp, which were images in our case. In our application we save our image to S3 without archiving them, but you can archive them in S3 to reduce the costs.

For persisting data we used the DynamoDB document database, that delivers high performance at any scale since it’s fully serverless. If we have to be honest, we are NOT pleased with the provided Java api for this. The python api is way better, since it’s more readable and intuitive.

In our architecture we also used step functions. AWS Step Functions helps you create a visual workflow to coordinate separate components of a distributed application. We even have a parallel step-function, because we want 2 different flows to start together and accumulate both results, so we can send it back to the caller as one result.

smartre architecture

RepairApp

In the RepairApp part we deployed our entire front-end, an Angular application on S3. Find more details about this in the next chapter of this blog.

Just like SmartRe we used an API-Gateway in front of our AWS lambda’s to provide a REST-API entrypoint to the application.

In front of the S3 and the API-Gateway we put a CloudFront service, providing us with a caching layer for our UI, which we have configured at 24 hours. Here we have also activated geo-restrictions. Geo restriction This makes sure that only Belgian IP’s can access our urls from Cloudfront.

Putting everything behind CloudFront also fixes a possible CORS-problem. Because now both the website and the REST-api are running on the same domain.

repairapp architecture

Deploy an Angular site on S3 and CloudFront

When your entire back-end is serverless, it’s kind of strange to have your frontend running on an EC2-instance. So we had to look for a serverless solution for this and we decided to deploy our Angular SPA (Single Page App) as a static website on S3, which ended up working great! The important keyword here is that we have a static website, so it runs on the client’s browser. If your site needs dynamic server side processing before or after the UI is served to a user, then you will need an EC2-instance.

In our case this will be the reference architecture: Static S3 architecture

However, there is a caveat in this solution when you use the Angular Project routing mechanism. Your SPA will not function properly when you use full URLs. Let me explain with an example: The url https://<bucket-name>.s3-website-<region>.amazonaws.com/some/path will not work. Why not? Well, S3 will try to find an object based on your given path, in this case some/path. For which it won’t find a resource, since there’s no file called some/path on S3. So how did we fix this? Whenever S3 replies with a 403 or 404 we should instead return a 200 with content from the index.html. You can set this up in CloudFront.

Cloudfront error pages configuration

We are not going to write a full tutorial on the setup here, because that’s not the goal of this blog. But you can find a tutorial on the AWS site. If you wish to know more about the setup or need assistance don’t hesitate to contact us and we will help you.

Advantage

  • High Availability: Amazon guarantees 99.99% availability of its S3 service, which means there is almost no chance of losing your data. S3 achieves this by replicating across multiple data centers.
  • Automatically scaling up: Without making any changes to our initial setup, S3 will automatically scale up the infrastructure to meet the growing demand.
  • Fast Content Serving with Amazon CloudFront: If you have a globally distributed audience, then CloudFront can help you deliver the content in a very efficient manner. CloudFront has 189 Points of Presence all across the globe (178 Edge Locations and 11 Regional Edge Caches). Your website’s content is cached at these edge locations and every visitor is served through the nearest edge location, hence decreasing the latency and resulting in optimal response times.
  • Negligible costs: Hosting a small to medium sized static website would cost you few dollars/euros monthly. In our case this also meant that running the website in our test environment had a cost of $0,01. The only reason we exceeded our free tier was because we did continuous deployments and we did too many puts to our S3.

For example see this sample costs listing:

  • S3 Standard Storage: 1 GB
  • PUT and other similar requests: 50.000
  • GET and other similar requests: 50.000
  • Data transfer out: 3 GB
  • Data transfer in: 3 GB

Cost per month: $0,57 when you include free tier, without free tier it would have cost $0,83

S3 calculator Using the AWS Monthly Calculator you can easily calculate the monthly bill for hosting your static website on S3.

Java versus Python

Python VS Java We started out writing all our AWS lambda’s in Java. Why did we choose Java? Well this is very simple, since Kunlabora is mainly focused on Java we went for Java. Of course we also have substantiated arguments for Java.

Java

Java has been in service for decades (First release was in 1995) and is, to this day, a reliable option when choosing the backbone of your stack. AWS lambda is no exception as it makes a strong candidate for your serverless functions.

Some advantages of Java are:

  • Reliable and well-tested libraries: Libraries make your life easy through enhanced testability and maintainability.
  • Predictive performance: It is true, Java has a slower spin up time, but you can easily predict the memory it needs to run your functions. There are solutions for keeping your lambda’s running or hot.
  • Tooling Support: As mentioned above Java has already been here for a few years so it has a wide range of mature tooling support, like IntelliJ IDEA and Gradle among others.

However, we did have to write some lambda’s in python since we used Machine Learning in our project.

Python

Python applications are everywhere, and Kunlabora is no exception. We also have a lot of in-house Python knowledge, since we love Machine Learning and Python is arguably the best language for this. In the past few years, we’ve seen a lot of developers adopting Python and it seems like this trend is not stopping.

Some benefits of Python in an AWS Lambda environment are:

  • Simplicity: With python it’s easier to have a simple architecture.
  • Easy to learn and community support: If you are a beginner, programming languages can be pretty scary. And sometimes if you are experienced, learning a new programming language can be even more scary. Python has a supportive community to help you. The community has uploaded more than 145,000 support packages to help other users.
  • Unbelievable spin up times: Python is without a doubt the absolute winner when it comes to spinning up lambda instances. It’s about 100 times faster than Java.
  • Needs less memory: Python doesn’t need that much memory to run your lambda’s and is still blazing fast.
  • Third party modules: Python has a wide variety of modules available. For example requests, numPy

Benchmark

Benchmark Credits for this benchmark go to Tai Nguyen Bui, you can read about it in more detail on his blog.

So, Java or Python?

Python Halfway into the project we noticed that the performance of Python was way better and the question arose: why don’t we write all of our lambda’s in python? After discussing this in our team we decided to fully go for python.

What are the main reasons for this?

  • Cold starts: We expect that cold starts could be an issue for our application and python solves this problem the best.
  • Cost: Keeping the running cost as low as possible is important. And with python we can keep our cost lower than with Java. The price for a lambda is calculated based on the number of executions, the time it takes to run your lambda and memory usage. Since python needs less memory and less execution time, it also makes it cheaper to run.

Our lambda’s are all very small, so the time it took to rewrite them from java to python was negligible.

Infrastructure as code

SAM CLI

AWS SAM (AWS Serverless Application Model) is an open-source framework that you can use to build serverless applications on AWS. We use this tool to build serverless applications that are defined by AWS SAM templates. This way you don’t have to setup your application manually. AWS SAM templates are an extension of AWS CloudFormation templates. That is, any resource that you can declare in an AWS CloudFormation template you can also declare in an AWS SAM template. At Kunlabora we strongly believe in DevOps and infrastructure as code, you can read more about in our blog from Bruno Lannoo.

Now we can delete and deploy our application on the fly, this also means we can deploy our application at any moment for testing and just kill it after we finished testing.

An example for a python lambda in SAM could be:

  UpdateProductLambda:
    Type: AWS::Serverless::Function
    Properties:
      Description: "Update product"
      CodeUri: "{{ playbook_dir | dirname }}/smartre-lambda/product-update-lambda"
      Handler: src.handler.handle_request
      Runtime: python3.6
      Policies:
        - DynamoDBCrudPolicy:
            TableName: "product"
      Tracing: Active
      MemorySize: 128
      Timeout: 30
      Events:
        ProxyApiRoot:
          Type: Api
          Properties:
            RestApiId: !Ref ApiGatewayApi
            Path: /api/product
            Method: PUT

During the project some of our developers tweeted about some pro-tips, so make sure you follow us on twitter and together we can share and improve our knowledge.

Ansible

Ansible is the glue that puts everything together. It makes sure we can easily run our SAM templates and we can deploy to multiple different environments.

An example from our Ansible script is:

- name: Package Lambda functions
  shell: >
    sam package
    --s3-bucket {{ bucket_lambda }}
    --s3-prefix {{ project_name }}
    --output-template-file {{ packaged_template_file }}
  args:
    chdir: "{{ build_dir }}"

This is used to package our lambda functions and upload them to an S3-bucket for deployment.

We try to help the community and share our knowledge so our colleague Tom De Keyser submitted a pull request to Ansible for working with stepfunctions. You can find his pull-request here, which already got approved by the way πŸ˜„

Monitoring

As mentioned before, the running cost of the project is important and must be kept as low as possible. This has not only had an impact on our decisions, like described in this blog why we moved from java to python, but we also wanted to monitor this. Monitoring the cost is not only important for our customer so they have a view on the monthly cost. In AWS you pay per use, so you can’t exactly say upfront how much your AWS-bill will be that month. But it’s also important during development, since with good monitoring we can easily follow up which impact our decisions have on the running cost.

My colleague Tim De Meyer wrote a blog-post about monitoring but from the angle of availability instead of cost.

AWS has a lot of default services you can find it under billing. There are two parts we have focused us upon. During our development we looked at the Cost Explorer and then mostly at the Daily Spend View on a daily basis. Cost Explorer

In the daily costs overview you can see the costs on a daily basis. This makes it easy to detect when a change in your code or architecture results in a higher running cost. Daily Costs

Now do we really want to check AWS every day to see if we didn’t exceed our limit? No of course not, we want it to tell us automatically. In AWS we automated this by adding a budget. We set up a budget of 50 euros per month and we wanted to start receiving notifications whenever our costs reached 80% of this limit. Billing

Conclusion

We are a huge fan of serverless! But during development you always have to keep this in mind because it requires a change of mindset. Serverless has an impact on your architecture and on the way you design and package your code. With the right monitoring settings you can have a useful overview of the cost of your application and the cost-impact from your code/design changes. Don’t forget to keep an eye on the AWS Lambda’s limitations.

When in doubt go for python for your lambda’s. Python will boost the performance of your serverless application.

When going serverless, go full serverless and deploy your front end on S3.

If you have anything to add or have a question don’t hesitate to contact us, you can find us on linkedin, twitter, facebook or instagram.

About the Author

Kim Lauwers has been working at Kunlabora from the beginning and worked on the RepairApp / SmartRe project. Most of the times he’s walking around with his head in the cloud because he just loves cloud computing πŸ˜„ As a certified AWS developer and architect he wants to share his experience with you.

Address
Diestsevest 32/0c
3000, Leuven
Connect with us