Handling Identity Columns in Peer-to-Peer Transactional Replication on SQL Server 2005/2008Article by Paul Ibison IntroductionIf you’re replicating using the peer-to-peer transactional setup, at some stage you’ll want to replicate an article which has identity columns. This is not prohibited but it is not recommended according to BOL (“Use of identity columns is not recommended. When using identities, you must manually manage the ranges assigned to the tables at each participating database”). This means that out of the box there is no support for identity columns so you’ll have to find a custom solution, or else face the prospect of having identical values assigned at the various peers. If the identity column is a surrogate PK, which is often the case in the SQL Server world, then you will inevitably experience PK violations in which case the synchronization will be blocked from continuing. Not a nice prospect, so we have to ensure it doesn’t occur. So, what are the options?(1) Reseed the identity column at each peerIf you cast your mind back to the days of using merge replication on SQL Server 2000 the identity range management was manual by default. In many cases beginner Replication Administrators would not realise that the identity ranges weren’t be allocated automatically and then come across the inevitable problems on synchronization. As a fix, the solution was to reseed the identity range manually using DBCC CHECKIDENT. This is what manual identity range management amounts to really and exactly the same solution applies to peer-to-peer. With a little imagination you can assign ranges using the seeds which will never overlap, or use more sophisticated methods based on the seed and increment like e.g. odd and even allocations or even positive and negative allocations. While this works well, there are some obvious issues:
Despite these issues, this is probably the most commonly used method. (2) Use a composite keyThe PK could be composed of 2 columns: the identity column and additionally a LocationID column. This means that the identity column is allowed to have overlapping values from different peers and there is no manual reseeding required – the default seed and increment of (1,1) is fine. The combination of the Identity column and the LocationID is always unique across peers, because each peer has a different LocationID. The added benefit is that the record is immediately identifiable in terms of the originator as we simply refer to the value of the LocationID. The only downside is that the index is now wider than perhaps we want, and some DBAs much prefer to use a single column PK. (3) Use a persisted computed columnThis is similar to the solution above and I came across this from the excellent RBI Dev DBA Group. In this case we now have three columns: the Identity column, the LocationID column and a third persisted computed column that combines the two:
The PK_ID is the persisted computed column and is set as the primary key. Now we have a narrow primary key and also a guarantee of uniqueness across peers. To be honest I haven’t used persisted computed columns much and at first I thought that it wouldn't work because the replication-generated insert stored procedure wouldn't cater for the automatically-computed PK value, or that the updates to LocationID would be “confused” on synchronization with the PK itself having changed. Anyway, I tested it and it all works fine. The persisted column formula was simply the LocationID + Identity – with a large LocationID as a seed to ensure no overlap. Conceptually this is very similar to (1) with the LocationID performing a similar function to the seed of the DBCC CHECKIDENT. However IMO it is far more preferable as it is completely transparent. If you chose to implement this solution you’ll just need to allocate a different default LocationID to each peer and ensure they don’t overlap. This can be done via an additional mapping table maintained in the publication which maps HostName to LocationID, and using a scalar function as the default for the LocationID column which performs the lookup. This means that a backup from one server can be used as a starting point for another peer with no intervention at all assuming the mapping table is already populated. ConclusionsUsing identity columns in peer-to-peer replication is not straightforward. You’ll need to implement some sort of manual solution but it is possible to minimise the potential problems and make the whole solution quite transparent. |
Paul Ibison, Copyright © 2010 |