Average 28000 requests/month












 

Using the TABLEDIFF Utility

Introduction

This article details the use of the new TableDiff utility which is used to solve problems associated with non-convergence. Typically, this is related to replication, where the subscriber and publisher's data doesn't match (and once synchronised, you subsequently do a nosync initialization), however the utility could really be used in any comparable situation. We won't go into the exact causes of the non-convergence - very generally speaking these can include lax DBA protocols, internal replication errors and intentional use of the (non-updatable) subscriber - the focus here is what can this tool do and how to best use it.

NB It is possible to run TableDiff as a standalone utility if you simply copy the executable to another machine - the only requirement is that TableDiff needs the correct version of the .NET Framework installed. It will be possible in this way to use the utility to compare SQL Server 2000 databases (thx to Lee Greene for passing this on).

TABLEDIFF Arguments Listed

First of all, where is it? This tablediff.exe utility is a command-line program which exists in the 
C:\Program Files\Microsoft SQL Server\90\COM folder. 
To find the available options we use "tablediff -?" and the output is:

User-specified agent parameter values:
-? 

Replication Diff Tool Command Line Options
usage: tablediff
-- Source Options --
-sourceserver Source Host
-sourcedatabase Source Database
-sourceschema Source Schema Name
-sourcetable Source Table or View
-sourceuser Source Login
-sourcepassword Source Password
-sourcelocked Lock the source table/view durring tablediff

-- Destination Options --
Note: For multiple Destinations use a comma delimited list
-destinationserver Destination Host
-destinationdatabase Destination Database
-destinationschema Destination Schema Name
-destinationtable Destination Table or View
-destinationuser Destination Login
-destinationpassword Destination Password
-destinationlocked Lock the destination table/view durring tablediff

-- Misc Options --
-t Timeout
-c Column Level Diff
-f Generate Fix SQL (You may also specify a file name and path)
-q Quick Row Count
-et Specify a table to put the errors into
-dt Drop the error table if it exists
-o Output file
-b Number of bytes to read for blob data types (Default 1000)
-strict Strict compare of source and destination schema


Note that this is slightly different to the options shown in BOL, where "-bf" is additionally included (explanation below).

A typical execution is shown below:


TABLEDIFF Argument Details

In the above example, I was comparing 2 tables in the same database, for testing purposes. The tables are called tCustomers and tCustomersCopy. They had very simple table schemas and consisted of 2 rows which differed slightly. The actual command that was run is shown at the top, which is then formatted in the output to reiterate the selections made - a useful facility if you decide to redirect the output to a file (using '>'). After that comes a more interesting part and we discover that there are 2 differences along with the execution time. To find details of the actual differing rows, some useful TSQL to synchronize the data, output the results to a file and several other facilities, we must provide more than the default arguments, so let's examine all the options available.

Connection Arguments. The connection details (-source... and -destination... options) are mostly self-explanatory. The login options refer to SQL authentication, and for trusted security you simply omit the login names and passwords entirely, assuming you are logged in with a domain user that exists as a windows login in each SQL Server.

No options / -o option. With no additional arguments the entire output is simply written to the cmd window as above. An identical output is sent to a separate text output file if the -o argument is used.

-c option This returns the column names which are different. In the case where I have rows differing in the SName column I receive:
Err ID Col
Mismatch 2 SName

Note that although this is defined as 'Column Level Diff', it has nothing to do with differing numbers of columns or differing column orders. Different column orders is not a problem, however horizontally partitioned tables (different numbers of columns) are not permitted.


-f option generates the TSQL to synchronize the two tables.  Note that this creates a script to synchronize the subscriber with the publisher ie it works in one direction only. If you need it to be used in the other direction, you can swap the connection parameter values for the publisher and the subscriber. The filename and path is by default of the type:
"C:\Program Files\Microsoft SQL Server\90\COM\DIFFIX.632722665324687500"
This can be changed by using the -o option and specifying a filename. In either case the file output looks like that shown below:

-- Host: sql2005pc
-- Database: [pub1]
-- Table: [dbo].[tcustomerscopy]
SET IDENTITY_INSERT [dbo].[tcustomerscopy] ON
UPDATE [dbo].[tcustomerscopy] SET [SName]='Smith' WHERE [ID] = 2
SET IDENTITY_INSERT [dbo].[tcustomerscopy] OFF

You might ask h
ow the tool 'knows' which where clause to create. To understand, if you try to compare two tables consisting only of simple columns, you'll receive the error message below:
The replication table difference tool requires the comparison tables/views to have either a primary key, identity, rowguid or unique key column.

-b option. This is the number of bytes to compare for large object data type columns. Basically, there is a max of 8000 bytes and data after that isn't compared. 

-q option. This performs a simple rowcount, so even though the 2 rows are different we receive the following output:
Table [pub1].[dbo].[tCustomers] on sql2005pc and Table [pub1].[dbo].[tcustomerscopy] on sql2005pc are identical.
Table [pub1].[dbo].[tCustomers] on sql2005pc has 2 rows.
Table [pub1].[dbo].[tcustomerscopy] on sql2005pc has 2 rows.
The requested operation took 0.234375 seconds.


-et option. This allows us to put the output into a table. The table is created on the subscriber, and in the case of -f, it has the following type of structure, which is probably the lost useful aspect of this tool:

ID

MSdifftool_ErrorCode

MSdifftool_ErrorDescription

MSdifftool_FixSQL

MSdifftool_OffendingColumns

2

0  

Mismatch

UPDATE [dbo].[tcustomerscopy] SET [SName]='Smith' WHERE [ID] = 2

[SName]

-strict option Is used to disallow implicit datatype conversion mappings (such as int -> bigint, varchar(max) -> text) and therefore perform a 'strict' validation. Unlike normally, when you use the -strict option, the destination table must also have a primary key, identity, or ROWGUID column - NB this is not necessarily there in the case of transactional replication. The resulting error message is shown below:
Table [pub1].[dbo].[tCustomers] on sql2005pc and Table [pub1].[dbo].[tcustomerscopy] on sql2005pc have different schemas and cannot be compared.
The requested operation took 1.0625 seconds.


-f -bf#num option. When this is used, several files are created each with #num commands in them in order to synchronize the tables. Note that this command is not included in list of parameters shown by issuing -?. This facility could be important for huge tables with loads of mismatched rows, where the maximum filesize might be exceeded. 

-destinationlocked and -sourcelocked options

Why do we want to lock the tables, and how does TABLEDIFF work by default? To find the rows that differ, behind the scenes the tool uses the following query:
SELECT [dbo].[tCustomers].[ID],BINARY_CHECKSUM([dbo].[tCustomers].[FName],[dbo].[tCustomers].[ID],[dbo].[tCustomers].[SName]) 
as MShash_54267293 FROM [dbo].[tCustomers] WITH (READUNCOMMITTED) ORDER BY [dbo].[tCustomers].[ID] 

Once the different checksums were identified, we see this query to find one of the actual column values:
SELECT [dbo].[tCustomers].[FName],[dbo].[tCustomers].[ID],[dbo].[tCustomers].[SName] FROM [dbo].[tCustomers] WITH 
(READUNCOMMITTED) WHERE [ID] = 2

So, this is useful in that we notice it used dirty reads which explains the usefulness of the final parameters: -destinationlocked and -sourcelocked.

Comments

I think this is a pretty useful, albeit simple tool. I currently use Redgate's DataCompare which is in several ways a more sophisticated tool, but on those client sites which don't have a Redgate licence I'll definitely consider using it.

 
 

Jan 2006