Average 28000 requests/month












 

Merge Batch Updates in SQL Server 2005

Article by Nigel Maneffa

Introduction

If you are upgrading a SQL2000 merge replication installation or are writing are a new application you need to be aware of a merge replication change that might affect your application either subtlety or very dramatically depending on your custom application – that is the new feature called “merge batch updates”. These are mentioned in MSDN2 as ‘Breaking Changes’ and this article will explain what is likely to break and what we can do about it!

SQL 2000 Behaviour

In SQL2000 (and SQL 7.0) all updates and inserts and deletes which the merge agent carried out were carried out on a single row-by-row basis, using replication stored procedures. So, if you applied 20 updates to a table, then when the merge agent ran it would carry out 20 single stored procedure calls at the subscriber.

New SQL 2005 Behaviour

Contrastingly, in SQL2005 the update may be one batch of 20 updates. In order to support this behaviour, there are 2 stored procedures produced for each action. EG for the insert on my one particular article I see these:

'dbo.MSmerge_ins_sp_00083C5D3D03432798D51F26F199477C'
'dbo.MSmerge_ins_sp_00083C5D3D03432798D51F26F199477C_batch'

 

The first procedure is a standard one, but the second (italicised) one has the suffix “_batch” and is the pertinent new procedure.If we examine the procedure a little as follows:

sp_helptext 'dbo.MSmerge_ins_sp_00083C5D3D03432798D51F26F199477C_batch'

We see that this is limited to 100 rows and the GUID values are all sent in to the procedure.

Why make such a change in SQL2005?

The answer is simple: performance ie running multiple updates in a single batch provides a notable increase in performance at the potential reduction of some concurrency. Some years ago I did some experimentation (this is purely anecdotal being so long ago as I did not document anything) using this kind of batch updating with a client server application on a LAN and on a single user system got about 30% increase in insert performance by using a batch count of 30 instead of 1 and this was consistent over a mixed range of table sizes. This is the advantage of one large update statement instead of say 30 single update statements. However, like most improvements there can be a price to pay. In this instance your application-level trigger code must be able to handle multi-line updates.

Example Cases where User Triggers Will Need Modifying

Almost all of the triggers I write are 'not for replication', meaning they are not invoked when the merge agent synchronizes changes. However, there is one startling exception, on a trigger that classically keeps a small summary table (not replicated) of stock from a much, much larger replicated table. This trigger is deliberately 'for replication' as it needs to be triggered to update in all cases including when stock updates come in from another remote site. This table is used to show indicative stock figures very quickly, and as the user homes in on the exact product, stock data is then taken from the master table in greater detail. This trigger is the only part of my system where I was able to dramatically improve performance with a cursor instead of a 'classic' set based solution. In fact it is the only cursor solution in a 10GB database.

99.999% of the updates made to the master table (millions of rows) are single record updates/inserts/deletes, but if support need to fix some stock records, they might update a handful of records at a time. Very occasionally we might update thousands of such records when working in a maintenance mode. During testing I found that the cursor trigger solution works OK up to say 20 records, but was a performance killer by the time you reached 1000 records. So it deliberately does not try to update the summary table if there are more than say 50 changes made to the master table in one batch (update statement), but instead makes a note in an errorlog table. The errorlog is auto scavenged and turned into emails of action required, then the trigger bombs out having done no work (and consequently leaves the summary table out of kilter). Experience shows that it would be much cheaper in this instance to run the code that recreates the summary table from scratch from the master table; this takes about one minute, which would be carried out at a very quiet time.

For the gurus out there you might be chuckling to yourself suggesting an indexed view would be a much better solution. You would be right, but the code proved too complex for SQL2000 SP2 merge replication, creating bizarre conflicts that deleted the inventory records! (I found out when I did my first live update having written all the code – motto: always use a full publisher/subscriber replication setup for final testing!!!), and then the disastrous http://support.microsoft.com/kb/816780 SP3 bug put me off indexed views for a long time. It works fine in SQL2005, of course on a cursory check.

In the meantime I spotted a potentially disastrous possibility on installing SQL2005 – what if the numbers of records the batch updater updated was too many for my puny trigger. After all, the batch update trigger can update batches of up to 100 records in one go and my trigger is limited to 50 updates to be usable. This would create errors all over the place as my custom trigger bombed out, or an even worse possibility, the number of records batched would vary by field size or the total number of fields in the table – this would catch me unawares after making an innocent field change some time in the future, stopping my trigger whenever the batch update system was invoked. It was something I had to find out. There was no info in BOL, so I created a test environment and found out the answers for myself. You might find the information useful at one stage so here it is:

Testing

The base table for testing is:

CREATE TABLE mergebatchtest(
pk int NOT NULL,
col1 char(10) NOT NULL,
rowguid uniqueidentifier ROWGUIDCOL NOT NULL
CONSTRAINT DF_mergetriggertest_guid DEFAULT (newid()) FOR rowguid)


I carried out 3 tests:
(1) If the field size changes, how does this affect the batch count?
(2) If we have more fields, how does this affect the batch count?
(3) How many pending changes are required before the new batch update functionality is invoked

Results

(1) If the field size changes, how does this affect the batch count?
 ● I found that the type and size of fields in the table made no difference to the merge batch update count.

2) If we have more fields, how does this affect the batch count?
● I added a char(10) field over and over again and noted the batch update count on the trigger. Interestingly, the batch update count DID vary by the number of fields in the table. See appendix for exact numerical details.

(3) How many pending changes are required before the new batch update functionality is invoked?
● This ended up being very simple. The batch update functionality invoked the batch stored procedures as soon as there was more than 1 pending update/insert/delete for a table (or article in Microsoft parlance). In fact with deletes it was invoked when there was only 1 pending delete on my setup (as though the overhead of running the batch delete stored procedure was low enough to always run it). On updates (and inserts) the 'single' stored procedure was run when there was one update per table, but the 'batch' version was run as soon as there were 2 updates or more pending. Sensible really.

[If you’re still following this, at the moment of my truth I found that SQL2005 doesn’t trash my trigger!!! The stock table in question has 44 fields, and this makes the batch update count around the 20 mark, so I am safe, and would have to do some very heavy pruning down to the 15 field mark to create a meltdown in my stock calculation system!]

Conclusion

Whilst bringing a notable performance increase, merge batch updates must be taken into consideration when designing your database / application. It would have been sensible for MS to have provided a command line parameter that turns off batch update functionality, for applications that cannot handle muti-row triggers, for whatever reason. I suspect such a parameter may exist, if only for internal debugging / performance testing, but it is undocumented and will probably remain so.

Appendix

The following table shows the relationship between number of fields in the table, and the batch update count.
 

No of fields in table

Batch update count

3

100

4

92

5

85

6

78

7

73

12

53

17

46

22

37

27

32

32

27

37

24

42

21

47

20

52

18

62

17

72

14

82

12

92

10

127

7

152

6

202

4

 

 

Since the numbers are undocumented, theoretically Microsoft could change these triggers during service packs, although I doubt they would unless users experienced unusual problems with the triggers, which is very unlikely.

(tech ed Paul Ibison)

 
 

Nov 2006