Average 28000 requests/month












 

Merge Internals: How to Determine Pending Merge Changes

 

Introduction

I had intended to write some merge articles for some time now which help to explain a little more about what goes on behind the scenes. So, this is the first in two or three articles just for merge geeks like myself :)

Merge replication has an internal mechanism for tracking changes to tables. This mechanism works behind the scenes on both the publisher and the subscribers and at synchronisation time all changes are determined and applied. Clearly this is not a simple process as only recent changes are applied ie only those changes made since the last synchronization. So, hidden in the metadata tables and system stored procedures there must be a means of tracking what has changed, when it changed and where it changed. If we can understand how this is implemented, we could go on to write queries listing which changes are waiting to be applied before sync time. This is very useful information for us merge replication administrators as it allows us to predict timescales for synchronization and helps us understand how our data is changing. Such administrative info has been readily available in transactional replication using the system stored procedure sp_browsereplcmds and the view MSdistribution_status, but in merge replication there has so far been no equivalent system stored procedure or view. I had created a merge equivalent procedure a while ago (subsequently edited by Patrick Molijn and Daniel Kristensen) and posted it on this site, and this article will outline the key points I came across so you can create and tailor your own version of logging queries.

In order to automatically audit changes to published tables, merge replication uses triggers and system tables. As this article is focussing on change tracking, we’ll just look at the relevant tables for our use and leave out extraneous tables. Also, I’ll need to confine the article to non-filtered situations to make the article info more digestible!
 

How to find what has changed?

To log what was happening behind the scenes I created a table called tCompany and then merge replicated it. I don’t want this article to become a list of tables/procedures that are impossible to follow, so I’ll show how the setup exists in a little diagram:

Once I replicated the table “tCustomers” I looked at the subscriber. As usual, there was a new column added to the table called rowguid which contained a unique id for each row, and there were 3 triggers hanging off the table each having the name in a format: MSmerge_ins/upd/del_6B2966A1FADF41D2B40DF7F8EC131339. The strange looking number at the end represents the internal article id for the table tCustomers and to link this number to the table name you can look at the table sysmergearticles (select name, artid from sysmergearticles). The triggers perform a simple function – if a row is added or updated, the rowguid value is added to the MSmerge_contents table, while if a row is deleted, the rowguid is inserted into the MSmerge_tombstone table (and removed from MSmerge_contents if it exists there). The only tricky part is that these changes have to be grouped together. This is done by using something called a “generation” which is really just a container, and these generations are listed in MSmerge_genhistory. Hopefully you can already begin to guess at how some of this hangs together. If not, just consider the inserts and updates and this’ll hopefully make it clearer:

 

So, we now know how to find out which rows in tCustomers have changed and in which container they exist – we simply join the 3 tables above using the following type of query:

select tCustomers.* from tCustomers
inner join MSmerge_contents on tCustomers.rowguid = MSmerge_contents.rowguid
inner join MSmerge_genhistory on MSmerge_contents.generation = MSmerge_genhistory.generation

How to find just the Pending Changes?

However the query above is a fairly blunt instrument. It’ll work OK initially, but it’ll always report on ALL changes made to the published table. On a frequently changed table eventually it’ll return the entire table which is obviously pretty useless. In fact what you'll often see is it steadily increasing then going down to zero rows after the metadata is cleaned up. Anyway, what I want to achieve is a method of finding just the pending sync changes. For this we need to look a little more at the generation numbers. Changes are grouped into a generation, which is listed in the MSmerge_genhistory table. These generations are designated as ‘open’ (genstatus = 0) until the merge agent runs.

[You don’t need to follow this, but for those who are interested…..once synchronization is started, the generation is marked as closed (genstatus = 1) by the procedure sp_MSmakegeneration. This prevents any concurrent changes entering this generation while the merge agent calls sp_MSenumchanges to find all the relevant rows needed to replicate. The sp_MSmakegeneration process also batches up the changes into additional generations having 1000 changes each so that if the merge agent is interrupted, it just continues on the interrupted batch and not the whole lot.].

Anyway, in the case of the merge agent, it has to distinguish between recently closed generations and older closed generations (using sentgen and recgen from sysmergesubscriptions). Fortunately, our case is somewhat simpler because we check for changed data before synchronization starts ie we can just look for open generations ie:

select tCustomers.* from tCustomers
inner join MSmerge_contents on tCustomers.rowguid = MSmerge_contents.rowguid
inner join MSmerge_genhistory on MSmerge_contents.generation = MSmerge_genhistory.generation
where MSmerge_genhistory.genstatus = 0


To get a count of the actual changes, we could do a GROUP BY query quite simply. However there is an additional column on the MSmerge_genhistory table that gives us the information directly: changecount. So, changes waiting to be sent to the publisher become:

select changecount from MSmerge_genhistory
where MSmerge_genhistory.genstatus = 0

Putting Everything together

There is a script on this website to list the pending merge changes which I created and then Patrick Molijn and Daniel Kristensen edited over the last year or so. It is more complex than those shown above to make it more widely applicable, but it is still based on exactly the same principles. If you decide to use this script, the extra complexity is there because the generations are incremented on a per-article level, and also in some cases you will have a lot of articles in your merge publication, so the article name isn’t hardcoded. The rowguid section at the start is there because the column having the rowguid property might not actually be called "rowguid", so this can’t be assumed and has to be looked up. Finally there is a union statement because we want to be able to differentiate between deletes and inserts/updates. These are all additional refinements, but as I mentioned above the meat of the query is exactly the same as that outlined above.

Conclusions

Hopefully you agree that this is not so complicated as perhaps it once seemed and you can now generate your own set of auditing queries and troubleshooting queries.
 

 
 

Jan 2007