In my article Amazon DevOps Guru for the Serverless applications - Part 10 Anomaly detection on Aurora Serverless v2 we learned that DevOps Guru was able to successfully detect anomalies with Aurora (Serverless v2) PostgreSQL database in case of Lambda function with Java 21 managed runtime was connected to it via JDBC. We scaled our database only from 0.5 to 1 ACU and created a very high load on the database by invoking Lambda function to retrieve product by id several hundred times concurrently for multiple minutes. We saw that DevOps Guru correctly pointed to the increased sum of database connections and constantly high database (CPU) load. In this article I'd like to figure our whether DevOps Guru will detect the anomaly doing the same experiment but using Data API for Aurora Serverless v2 with AWS SDK for Java instead of JDBC.
Let's look into our sample application and use SAM template to create infrastructure and deploy the application described on the following picture :
The application creates products stored in the Aurora Serverless v2 PostgreSQL database and retrieves them by id using Data API. The relevant Lambda function which we'll use to retrieve product by its id is GetProductByIdViaAuroraServerlessV2DataApi and its handler implementation is GetProductByIdViaAuroraServerlessV2DataApiHandler.
As in the previous article we use hey tool to perform the stress test like this
hey -z 15m -c 300 -H "X-API-Key: XXXa6XXXX" https://XXX.execute-api.eu-central-1.amazonaws.com/prod/productsWithDataApi/1
In this example we invoke the API Gateway endpoint with 300 concurrent containers for 15 minutes. Behind the prod/productsWithoutDataApi endpoint Lambda function GetProductByIdViaAuroraServerlessV2WithoutDataApi will be invoked wich will retrieve the product by id 1 from the Aurora Serverless v2 PostgreSQL database.
We configured in our [SAM template]((https://github.com/Vadym79/AWSLambdaJavaAuroraServerlessV2DataApi/blob/master/template.yaml) Aurora database cluster to scale from minimal capacity 0.5 to maximal capacity 1 ACU (which is very small database size) in case of the increased load for the cost saving purpose.
AuroraServerlessV2Cluster: Type: 'AWS::RDS::DBCluster' ... ServerlessV2ScalingConfiguration: MinCapacity: 0.5 MaxCapacity: 1
Aurora (Serverless v2) database manages the maximal number of the database connections available proportionally to the database size (in our case the ACU setting) also with Data API for Aurora Serverless v2 (which is a huge difference to v1 which will become out of support end of year 2024 where was a hard quota of 1000 database connection per second). For more information, please read the documentation about Maximum connections for Aurora Serverless v2. So, with the increased number of invocations, we expect to reach the maximal number of the database connections available and high database (CPU) load soon, so that database won't be able to respond to the new Lambda function requests to retrieve product by id (Lambda will then also run into). With that we will provoke the anomaly and would like to figure out whether DevOps Guru will be able to detect it. And it was able, kind of.... The following insight was generated:
And the following aggregated anomalous metrics have been identified:
Comparing to the aggregated anomalous metrics identified in case of using JDBC instead of Data API described in my article Amazon DevOps Guru for the Serverless applications - Part 10 Anomaly detection on Aurora Serverless v2 we completely muss the Aurora database anomalous metrics: database connection sum and database (CPU) load but correctly see the error in Lambda which ran into the defined time out of 15 seconds as the database couldn't respond.
.
So, what's the difference? Let's explore both incidents that we reproduced on Aurora Serverless v2 PostgreSQL cluster with JDBC(Non Data API) and Data API :
In terms of ACU utilization/scaling they both look the same:
In terms on other database metrics like: CPU Utilization, DatabaseConnection DBLoad(CPU) there are huge differences:
With that and very low DBLoad(CPU) no DevOps Guru insight for the Aurora Serverless v2 cluster with Data API usage has been generated compared to JDBC use case.
I did the second experiment by connecting into the Aurora Serverless v2 cluster directly and wrote the script to create the load test by writing the script who fetches the product by id multiple hundred times using the standard way (non-Data API). Similar as we did with hey tool, but taking to the database directly instead of invoking Api Gateway. After I put the database under the load, I started the same experiment with the hey tool as described above and wanted to see what would happen. The same insight was generated but this time with the following anomalous metrics:
Now we see at least additional Aurora Serverless v2 database connection sum anomalous metric, but DBLoad(CPU) metrics are still missing.
Graphed anomalies look like this:
Of course, the experiment wasn't clean, as I did 2 load tests after each other and partially in parallel : the first one connecting to the database directly without API Gateway usage and the second by using Data API. This confirmed my initial assumption that database connection sum metrics is a very important criteria to generate DevOps Guru insight for Aurora Serverless v2 (and for RDS in general) and it's not expose in general in case of using Data API.
I already contacted Devops Guru team and shared with them my insights with the expectations that they will improve the service. Or first of all exposing database connection as a CloudWatch Metric will be fixed for using Aurora Serverless v2 with Data API.
In this article learned that DevOps Guru could successfully detect anomalies with Aurora (Serverless v2) PostgreSQL database in case of Lambda function with Java 21 managed runtime connected to it via Data API but could only showed the anomalous metrics related to the Lambda function being timed out as the database didn't respond. The main reason for that seems to be that database connection as a CloudWatch Metric isn't exposed (or always displayed as 0) in case of using Aurora Serverless v2 with Data API. Aurora Serverless v2 database metrics (database connection sum) was only showed during the second artificial experiment.
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3