When do we need to implement parallel processing?
You might have wondered from the previous post that how it will improve the performance if we split the process. If I’m having only one update statement and if I use set based processing for doing that, then where really is my performance improved. It will not be always good make a process do the processing in parallel. Sometimes it may have negative performance gains. As per the example I stated above it will be an overhead for the server to send 5 sql statements instead of one. So when do I need to make the process parallel? Below are some scenarios which you can think of introducing parallel processing.
1. My process is updating/ creating millions of rows in single run.
2. There is a possibility that multiple users will run my process at the same time for same transaction. This can happen if the same process is available in batch and online mode. In this case if one person runs in batch mode and one person runs online for the same transaction, one of my processes may error out or updates the tables with wrong data. Also there can be a chance where two users running the same process with same runcontrol parameters at the same time.
3. The transaction data to be processed is present in multiple tables and I do the processing by importing relevant data to a intermediate (temporary) table. With almost all the real process which does bulk processing this is applicable. The data required for processing may be scattered across different tables. I then need to query each individual table and select the relevant data and put that into a common temporary table and from there I do the processing. In the salary example, this scenario will come up if you are increasing the salary of your employees based of different rules such as (a) the percentage increase depends on your designation (b) percentage increase depends on your experience (c) percentage increase depends on your performance rating and so on.
4. You are doing row by row processing in your application engine program. There can be scenarios where you cannot do all your processing in set based manor. In such cases implementing parallel processing is the best option. Since the time required for processing is directly coupled to the number of rows, the more row you have to process, the more time it is going to take. So divide your data into logical set and run the process in parallel. It will reduce the number of rows for each individual process instance and thereby the processing time also gets reduced.