Merge Batch Updates in SQL Server 2005
Article by Nigel Maneffa
Introduction
If you are upgrading a SQL2000 merge replication
installation or are writing are a new application you need to be aware of a
merge replication change that might affect your application either subtlety or
very dramatically depending on your custom application – that is the new feature
called “merge batch updates”. These are mentioned in MSDN2 as ‘Breaking Changes’
and this article will explain what is likely to break and what we can do about
it!
SQL 2000 Behaviour
In SQL2000 (and SQL 7.0) all updates and inserts and
deletes which the merge agent carried out were carried out on a single
row-by-row basis, using replication stored procedures. So, if you applied 20
updates to a table, then when the merge agent ran it would carry out 20 single
stored procedure calls at the subscriber.
New SQL 2005 Behaviour
Contrastingly, in SQL2005 the update may be one batch of
20 updates. In order to support this behaviour, there are 2 stored procedures
produced for each action. EG for the insert on my one particular article I see
these:
'dbo.MSmerge_ins_sp_00083C5D3D03432798D51F26F199477C'
'dbo.MSmerge_ins_sp_00083C5D3D03432798D51F26F199477C_batch'
The first procedure is a standard one, but the second
(italicised) one has the suffix “_batch” and is the pertinent new procedure.If
we examine the procedure a little as follows:
sp_helptext
'dbo.MSmerge_ins_sp_00083C5D3D03432798D51F26F199477C_batch'
We see that this is limited to 100 rows and the GUID
values are all sent in to the procedure.
Why make such a change in SQL2005?
The answer is simple: performance ie running multiple
updates in a single batch provides a notable increase in performance at the
potential reduction of some concurrency. Some years ago I did some
experimentation (this is purely anecdotal being so long ago as I did not
document anything) using this kind of batch updating with a client server
application on a LAN and on a single user system got about 30% increase in
insert performance by using a batch count of 30 instead of 1 and this was
consistent over a mixed range of table sizes. This is the advantage of one large
update statement instead of say 30 single update statements. However, like most
improvements there can be a price to pay. In this instance your
application-level trigger code must be able to handle multi-line updates.
Example Cases where User Triggers Will Need Modifying
Almost all of the triggers I write are 'not for
replication', meaning they are not invoked when the merge agent synchronizes
changes. However, there is one startling exception, on a trigger that
classically keeps a small summary table (not replicated) of stock from a much,
much larger replicated table. This trigger is deliberately 'for replication' as
it needs to be triggered to update in all cases including when stock updates
come in from another remote site. This table is used to show indicative stock
figures very quickly, and as the user homes in on the exact product, stock data
is then taken from the master table in greater detail. This trigger is the only
part of my system where I was able to dramatically improve performance with a
cursor instead of a 'classic' set based solution. In fact it is the only cursor
solution in a 10GB database.
99.999% of the updates made to the master table (millions
of rows) are single record updates/inserts/deletes, but if support need to fix
some stock records, they might update a handful of records at a time. Very
occasionally we might update thousands of such records when working in a
maintenance mode. During testing I found that the cursor trigger solution works
OK up to say 20 records, but was a performance killer by the time you reached
1000 records. So it deliberately does not try to update the summary table if
there are more than say 50 changes made to the master table in one batch (update
statement), but instead makes a note in an errorlog table. The errorlog is auto
scavenged and turned into emails of action required, then the trigger bombs out
having done no work (and consequently leaves the summary table out of kilter).
Experience shows that it would be much cheaper in this instance to run the code
that recreates the summary table from scratch from the master table; this takes
about one minute, which would be carried out at a very quiet time.
For the gurus out there you might be chuckling to yourself
suggesting an indexed view would be a much better solution. You would be right,
but the code proved too complex for SQL2000 SP2 merge replication, creating
bizarre conflicts that deleted the inventory records! (I found out when I did my
first live update having written all the code – motto: always use a full
publisher/subscriber replication setup for final testing!!!), and then the
disastrous http://support.microsoft.com/kb/816780 SP3 bug put me off indexed
views for a long time. It works fine in SQL2005, of course on a cursory check.
In the meantime I spotted a potentially disastrous
possibility on installing SQL2005 – what if the numbers of records the batch
updater updated was too many for my puny trigger. After all, the batch update
trigger can update batches of up to 100 records in one go and my trigger is
limited to 50 updates to be usable. This would create errors all over the place
as my custom trigger bombed out, or an even worse possibility, the number of
records batched would vary by field size or the total number of fields in the
table – this would catch me unawares after making an innocent field change some
time in the future, stopping my trigger whenever the batch update system was
invoked. It was something I had to find out. There was no info in BOL, so I
created a test environment and found out the answers for myself. You might find
the information useful at one stage so here it is:
Testing
The base table for testing is:
CREATE TABLE mergebatchtest(
pk int NOT NULL,
col1 char(10) NOT NULL,
rowguid uniqueidentifier ROWGUIDCOL NOT NULL
CONSTRAINT DF_mergetriggertest_guid DEFAULT (newid()) FOR rowguid)
I carried out 3 tests:
(1) If the field size changes, how does this affect the batch count?
(2) If we have more fields, how does this affect the batch count?
(3) How many pending changes are required before the new batch update
functionality is invoked
Results
(1) If the field size changes, how does this affect the
batch count?
● I found that the type and size of fields in the table made no difference
to the merge batch update count.
2) If we have more fields, how does this affect the batch count?
● I added a char(10) field over and over again and noted the batch update count
on the trigger.
Interestingly, the batch update count DID vary by the number of fields in the
table. See appendix for exact numerical details.
(3) How many pending changes are required before the new batch update
functionality is invoked?
● This ended up being very simple. The batch update functionality invoked the
batch stored procedures as soon as there was more than 1 pending
update/insert/delete for a table (or article in Microsoft parlance). In fact
with deletes it was invoked when there was only 1 pending delete on my setup (as
though the overhead of running the batch delete stored procedure was low enough
to always run it). On updates (and inserts) the 'single' stored procedure was
run when there was one update per table, but the 'batch' version was run as soon
as there were 2 updates or more pending. Sensible really.
[If you’re still following this, at the moment of my truth I found that SQL2005
doesn’t trash my trigger!!! The stock table in question has 44 fields, and this
makes the batch update count around the 20 mark, so I am safe, and would have to
do some very heavy pruning down to the 15 field mark to create a meltdown in my
stock calculation system!]
Conclusion
Whilst bringing a notable performance increase, merge
batch updates must be taken into consideration when designing your database /
application. It would have been sensible for MS to have provided a command line
parameter that turns off batch update functionality, for applications that
cannot handle muti-row triggers, for whatever reason. I suspect such a parameter
may exist, if only for internal debugging / performance testing, but it is
undocumented and will probably remain so.
Appendix
The following table shows the relationship between number
of fields in the table, and the batch update count.
|
No of fields in table |
Batch update count |
|
3 |
100 |
|
4 |
92 |
|
5 |
85 |
|
6 |
78 |
|
7 |
73 |
|
12 |
53 |
|
17 |
46 |
|
22 |
37 |
|
27 |
32 |
|
32 |
27 |
|
37 |
24 |
|
42 |
21 |
|
47 |
20 |
|
52 |
18 |
|
62 |
17 |
|
72 |
14 |
|
82 |
12 |
|
92 |
10 |
|
127 |
7 |
|
152 |
6 |
|
202 |
4 |
|
|
|
Since the numbers are
undocumented, theoretically Microsoft could change these triggers during service
packs, although I doubt they would unless users experienced unusual problems
with the triggers, which is very unlikely.
(tech ed Paul Ibison) |