Merge Internals: How to Determine
Pending Merge Changes
Introduction
I had intended to write some merge articles for some time
now which help to explain a little more about what goes on behind the scenes.
So, this is the first in two or three articles just for merge geeks like myself
:)
Merge replication has an internal mechanism for tracking
changes to tables. This mechanism works behind the scenes on both the publisher
and the subscribers and at synchronisation time all changes are determined and
applied. Clearly this is not a simple process as only recent changes are applied
ie only those changes made since the last synchronization. So, hidden in the
metadata tables and system stored procedures there must be a means of tracking
what has changed, when it changed and where it changed. If we can understand how
this is implemented, we could go on to write queries listing which changes are
waiting to be applied before sync time. This is very useful information for us
merge replication administrators as it allows us to predict timescales for
synchronization and helps us understand how our data is changing. Such
administrative info has been readily available in transactional replication
using the system stored procedure sp_browsereplcmds and the view
MSdistribution_status, but in merge replication there has so far been no
equivalent system stored procedure or view. I had created a merge equivalent
procedure a while ago (subsequently edited by Patrick Molijn and Daniel
Kristensen) and posted
it on this site, and this article will outline the key points I came across
so you can create and tailor your own version of logging queries.
In order to automatically audit changes to published
tables, merge replication uses triggers and system tables. As this article is
focussing on change tracking, we’ll just look at the relevant tables for our use
and leave out extraneous tables. Also, I’ll need to confine the article to
non-filtered situations to make the article info more digestible!
How to find what has
changed?
To log what was happening behind the scenes I created a
table called tCompany and then merge replicated it. I don’t want this article to
become a list of tables/procedures that are impossible to follow, so I’ll show
how the setup exists in a little diagram:

Once I replicated the table “tCustomers” I looked at the
subscriber. As usual, there was a new column added to the table called rowguid
which contained a unique id for each row, and there were 3 triggers hanging off
the table each having the name in a format:
MSmerge_ins/upd/del_6B2966A1FADF41D2B40DF7F8EC131339. The strange looking number
at the end represents the internal article id for the table tCustomers and to
link this number to the table name you can look at the table sysmergearticles
(select name, artid from sysmergearticles). The triggers perform a simple
function – if a row is added or updated, the rowguid value is added to the
MSmerge_contents table, while if a row is deleted, the rowguid is inserted into
the MSmerge_tombstone table (and removed from MSmerge_contents if it exists
there). The only tricky part is that these changes have to be grouped together.
This is done by using something called a “generation” which is really just a
container, and these generations are listed in MSmerge_genhistory. Hopefully you
can already begin to guess at how some of this hangs together. If not, just
consider the inserts and updates and this’ll hopefully make it clearer:

So, we now know how to find out which rows in tCustomers
have changed and in which container they exist – we simply join the 3 tables
above using the following type of query:
select tCustomers.* from tCustomers
inner join MSmerge_contents on tCustomers.rowguid = MSmerge_contents.rowguid
inner join MSmerge_genhistory on MSmerge_contents.generation =
MSmerge_genhistory.generation
How to find just the Pending Changes?
However the query above is a fairly blunt instrument.
It’ll work OK initially, but it’ll always report on ALL changes
made to the published table. On a frequently changed table eventually it’ll
return the entire table which is obviously pretty useless. In fact what you'll
often see is it steadily increasing then going down to zero rows after the
metadata is cleaned up. Anyway, what I want to achieve is a method of finding
just the pending sync changes. For this we need to look a little
more at the generation numbers. Changes are grouped into a generation, which is
listed in the MSmerge_genhistory table. These generations are designated as
‘open’ (genstatus = 0) until the merge agent runs.
[You don’t need to follow this, but for those who are interested…..once
synchronization is started, the generation is marked as closed (genstatus = 1)
by the procedure sp_MSmakegeneration. This prevents any concurrent changes
entering this generation while the merge agent calls sp_MSenumchanges to find
all the relevant rows needed to replicate. The sp_MSmakegeneration process also
batches up the changes into additional generations having 1000 changes each so
that if the merge agent is interrupted, it just continues on the interrupted
batch and not the whole lot.].
Anyway, in the case of the merge agent, it has to distinguish between recently
closed generations and older closed generations (using sentgen and recgen from
sysmergesubscriptions). Fortunately, our case is somewhat simpler because we
check for changed data before synchronization starts ie we can
just look for open generations ie:
select tCustomers.* from tCustomers
inner join MSmerge_contents on tCustomers.rowguid = MSmerge_contents.rowguid
inner join MSmerge_genhistory on MSmerge_contents.generation =
MSmerge_genhistory.generation
where MSmerge_genhistory.genstatus = 0
To get a count of the actual changes, we could do a GROUP BY query quite simply.
However there is an additional column on the MSmerge_genhistory table that gives
us the information directly: changecount. So, changes waiting to be sent to the
publisher become:
select changecount from MSmerge_genhistory
where MSmerge_genhistory.genstatus = 0
Putting Everything together
There is a
script on this
website to list the pending merge changes which I created and then Patrick
Molijn and Daniel Kristensen edited over the last year or so. It is more complex
than those shown above to make it more widely applicable, but it is still
based on exactly the same principles. If you decide to use this script,
the extra complexity is there because the generations are incremented on a
per-article level, and also in some cases you will have a lot of articles in
your merge publication, so the article name isn’t hardcoded. The rowguid section
at the start is there because the column having the rowguid property might not
actually be called "rowguid", so this can’t be assumed and has to be looked up.
Finally there is a union statement because we want to be able to differentiate
between deletes and inserts/updates. These are all additional refinements, but
as I mentioned above the meat of the query is exactly the same as that outlined
above.
Conclusions
Hopefully you agree that this is not so complicated as
perhaps it once seemed and you can now generate your own set of auditing queries
and troubleshooting queries.
|