Adventures with Cosmos: Alerting and 429 Status Codes

During periods of peak transactions Cosmos may return a 429 Status Code – Too Many Requests, which means the collection has exceeded the provisioned throughput limit (RUs) and that you should retry the request after the server specified retry after duration.

I want to setup alerting on both an Azure Function and the underlying Cosmos DB to send an email message when a 429 Status Code – Too Many Requests is generated.

If you are using the Cosmos DB .NET SDK Version 3.0, the SDK will retry the originating call 9 times with a maximum wait between retries of 30 seconds.

For this article, I will make use of the Cosmos DB .NET SDK Version 3.0 and configure the retry policy to not retry a failed request.

To accomplish this, we will:

  1. Setup a Cosmos DB account, database and collection.
  2. Create an Azure Function to call Cosmos DB in order to generate a 429 Status Code.
  3. Add an alert to the Azure Function to alert us whenever a 429 Status Code is encountered.
  4. Add an alert to the Cosmos DB account to alert us whenever a 429 Status Code is encountered.

Setup a Cosmos DB account, database and collection

In the Azure Portal you will need to create a Cosmos DB, a database called Items and a container called ToDos.

Set the Throughput of the database to 400 RUs, this is the minimum value supported.

Create an Azure Function

In Visual Studio or Visual Studio Code, create an Azure Function project with a single function called, Function1.

Copy and paste the code found at https://gist.github.com/mattruma/c3ab94dfeee72fac305a4d104e90efe9 into Function1.

You will need to add a Nuget package for Bogus.

We will use Bogus to easily generate a large enough document that will cause a 429 Status Code to be returned by Cosmos DB.

Your local.settings.json should look similar to https://gist.github.com/mattruma/494fb509ced91ef0a3ed86b23ade4e9c.

You will want to swap out the text YOUR_COSMOS_DB_CONNECTION_STRING with your Cosmos DB Connection String.

Run the Azure Function on your local machine.

Call the function from your favorite API Client tool, my preferred API Client tool is Postman.

Usually, by the second call, a 429 Status Code is returned from Cosmos DB.

Add an alert to the Azure Function

Let’s add an alert to our Azure Function to send us an email whenever a 429 Status Code is encountered.

Deploy your Azure Function to Azure, make sure Application Insights is enabled.

Anytime the Azure Function receives a 429 Status Code it will log it to Application Insights. You can query Application Insights to retrieve these logs using the query below:

union traces | 
union exceptions | 
where timestamp > ago(30m) | 
where customDimensions.['LogLevel'] == 'Error' | 
where customDimensions.['prop__{OriginalFormat}'] == 'Response status code does not indicate success: 429 Substatus: 3200 Reason: ().' | 
project timestamp, message = customDimensions.['prop__{OriginalFormat}'], logLevel = customDimensions.['LogLevel'] 

I refactored my query to be a little bit cleaner.

exceptions |
where timestamp > ago(30m) | 
where type == 'Microsoft.Azure.Cosmos.CosmosException' |
where outerMessage == 'Response status code does not indicate success: 429 Substatus: 3200 Reason: ().' 

Now that we know how to these logs look in Application Insights we can add our alert.

Before we create our alert we will need to create an Action Group in order to send emails to designated recipients.

With our Action Group created we can now move on to creating the actual alert.

In the Azure portal navigate to Function App and select the function app where you deployed your Azure Function.

From the Overview tab, select Application Insights.

From the Application Insights blade, select Alerts and then click New alert rule.

From the Create rule screen click Condition and then Custom log search.

From the Configure signal logic blade paste in the Application Insights query from above and set the Threshold value of the Alert Logic to 1.

Click Done.

From the Create rule screen, under Actions, click Select Action Group and select the Action Group we created earlier in this article.

Provide a Name for your rule, I called mine 429 Status Code - Too Many Requests and then click Create alert rule.

Run the Azure Function from the portal a few times, we need to generate some logs.

Keep in mind it may take up to 5 minutes to start seeing the logs.

You can click on the Monitor node to see exactly when the logs are available.

In about 10 minutes you will receive an email alerting you to the 429 Status Code error.

Add an alert to the Cosmos DB

In the Azure Portal, navigate to your Cosmos DB account.

From the Cosmo DB account blade, under Monitoring, select Alerts, and then click New alert rule.

From the Configure signal logic blade, search for the Signal Name called Total Requests and select it.

Set DatabaseName to Items.

Set CollectionName to ToDos.

Set StatusCode to 429.

Set the Threshold value to 1.

Click Done.

Under Actions, click Select Action Group and select the Action Group we created earlier in this article.

Provide a Name for your rule, I called mine 429 Status Code - Too Many Requests and then click Create alert rule.

Run the Azure Function from the portal a few times, we need to generate some logs.

In about 3 to 5 minutes you will receive an email alerting you to the 429 Status Code error.

You have now setup a couple levels of alerting to monitor 429 Status Code – Too Many Request messages.

Some final thoughts

Why would we want to raise alerts on BOTH the Azure Function and the Cosmos DB?

Earlier in this article I mentioned the concept of retry logic that is built in to the Cosmos DB 3.0 SDK.

If the SDK receives a 429 Status Code from Cosmos DB it will automatically retry the request after a designated period of time passes, up to a maximum of 9 times.

So there is a good chance we could encounter numerous 429 Status Code generated by Cosmos DB and none generated by the Azure Function, due to the retry logic.

In the long run, this is not ideal and we would want to address the 429 Status Codes with Cosmos DB, probably increasing the throughput for the database or a specific collection.

Also, I am not sure about you, but to me, waiting up to 10 minutes to receive an alert from my Azure Function seems like an eternity.

If this was an error I needed to know about immediately I would create a SendGrid output binding that would send an email immediately as each error was encountered.

Just a few things to keep in mind.

That’s it, I apologize for the longer than usual article, hopefully it provides you some help.

Please don’t hesitate to provide feedback, always looking for better ways to solve problems like these.

Thanks and keep on coding!

Leave a Reply

Your email address will not be published.