Avoiding deadlocks and lost records when using batch jobs in D365FO

Batch jobs are a major part of D365FO and are a great way to asynchronously process large amounts of data.

If we are writing a batch job process that is going to use multiple batch jobs simultaneously on the same data, there are potential problems that we should look out for.

In this article, we will cover how to solve the problem of deadlocks and so-called lost records by utilizing just three fields and a couple of system methods.

What are deadlocks?

In D365FO, like in any concurrent computing program, there exists potential for deadlocks.

A deadlock is a situation where different units of code that are executing on the same data are all waiting for each other, without any one of them being able to proceed. This leads to an indefinite waiting state.

The same situation can be applied to D365FO when multiple batch jobs are working on the same data.

For example, if there are two batch jobs that want to work on some data, they both randomly take data from the database and start working on it:

Batch job 1 will start working on record 1, and batch job 2 will start working on record 2.

Then batch job 1 will try to start working on record 2, and batch job 2 will try to start working on record 1, but they won't be able to start working on them because they are still locked by the other batch job.

Since both batch jobs are waiting for each other to finish, they will never proceed, thus leading to a deadlock.

How to prevent deadlocks in D365FO?

Now that we understand the problem, how do we avoid it? Fortunately, the solution is quite straightforward.

The way to avoid deadlocks is to specify which data is being processed by each batch job (e.g., reserve data for each batch job) so that they don’t interfere with one another.

We can achieve this by adding a ProcessorId field that will be unique for each batch job process.

Let’s look at an example of how this can be implemented in the code for the Docentric Alert Summary Email Distributor batch job, which implements similar logic as the base sysEmailDistributor class.

You can check out more about the Docentric Alert Summary Emails feature here!

First, we need to add a GUID (Globally Unique IDentifier) field to the table that is being used by the batch jobs, where we will be storing our GUID values, which we can create like so:

We can even use an EDT (Extended Data Type) of type GUID, but for this example, we will be using the SysEmailProcessorId EDT.

To populate this field, we will need to create a new GUID and assign the value to our field each time the batch job is being processed by using the newGuid() method:

To select only records that are currently not being processed, it is good to have an enum field defining the record's current status.

This is why we created the DocAlertSummaryEmailStatus enum, which includes the states: Created, InProcessing, and FullyProcessed.

We should select only those records which are newly created and assign the ProcessorId to them. Simplified, the pre-processing code would look like this:

Now that we have reserved the data, we can simply fetch all the records that we need to process, like so:

At the end of this process, records that are finished are changed to the FullyProcessed state.

In the above example, if two different batch jobs fetch the same record, they will know not to touch the ones that don't have the appropriate ProcessorId, thus successfully preventing a deadlock situation!

What are lost records?

Let’s say that while executing, one of the batch jobs crashes for some reason and thus doesn’t fully process the records it was working on.

This would mean that the status of the records it was processing at the time of the crash remains stuck on InProcessing.

New batch jobs that start processing the InProcessing records will never pick up these crashed records because they will never have the correct ProcessorId.

These records are now effectively lost records.

Preventing lost records

To prevent records from being lost due to crashes, we need a way to reset the now invalid InProcessing statuses back to Created.

For this, we need to implement a cleanup() method that resets the statuses before any records are marked for processing by the batch jobs, allowing the records to be picked up again.

But how do we identify which records need resetting? This is where a new field called SessionId is introduced. We can add this field similarly to the GUID field, but this time it will be a simple integer field:

We can also use an EDT here of type integer, in this case we will be using the default SessionId EDT.

To determine if the session is still active, we also need to introduce the SessionLoginDateTime field, which is of type UtcDateTime and can be created as follows:

For this we can use an EDT of type UtcDateTime, in this case it will be the default SysSessionLoginDateTime EDT.

To solve our problem of lost records with these new fields, we now have to add session data to each record before processing. The pre-processing code will now look like this:

The cleanup() method mentioned above is where we will reset the records lost in crashed sessions, as shown in the example below:

Here we see how we’re utilizing the global isSessionActive() method to check whether the session associated with the record is not one of the active sessions, which would mean that it is not being processed by any batch job and can be safely reset.

Thus, by using the SessionId and SessionLoginDateTime fields, we have ensured that we won't lose any records due to crashes!

If you would like to explore how we have implemented these techniques in a real scenario, you can download the Docentric Free Edition and look at the code yourself in the DocAlertSummaryEmailDistributorService class, which controls the Docentric Alert Summary Emails feature (link here)!

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Docentric respects your privacy. Learn how your comment data is processed >>

Docentric respects your privacy. Learn how your comment data is processed >>