2017-07-30

SqlBulkCopy with PowerShell

The challenge

A rather common task is to copy many (all) rows from one SQL Server database table to another as smooth and fast as possible. There are some tuning on the platform that is important, but the copy itself can be done in several very different ways.

A copy direct from table to table can be done with the .NET SqlBulkCopy class using PowerShell.

Alternative T-SQL statements can be used with reduced possibilities and (maybe) better performance:
INSERT INTO ... SELECT FROM where the target table must be created before, but can be in any filegroup.
Or SELECT INTO can be used where the target table will be created in the primary filegroup if the table does not exist there already.

Copying data between a table and a file can also be done with the SQL Server utility bcp.exe.
To copy data from a file to a database table can be done with the T-SQL statement BULK INSERT.

But for now I will focus on SqlBulkCopy with PowerShell.
Yan Pan wrote the great post „Use PowerShell to Copy a Table Between Two SQL Server Instances“ at the Hey, Scripting Guy! blog, but I will go through details on my own to get a deeper understanding.

Using SqlBulkCopy

There is a special case on identity insert as key constraints and not null is not checked by default. This can be changed using the SqlBulkCopyOption enumerations CheckConstraints.

With large amounts of data it is important to use streaming with a SqlDataReader object instead of a static DataSet object, as the DataSet object will hold all data in memory. This can really stress a server and might bring it down.

There are some interesting articles on MSDN Library and a good discussion on stackoverflow on SqlBulkCopy specific and SQL Server bulk operations in general:

I made the PowerShell function Copy-Sqltable to handle the copy of the data:
function Copy-SqlTable {
<#
.DESCRIPTION
  Copy single table from source database to target database on same SQL Server Database Engine instance.
#>
[CmdletBinding()]
[OutputType([void])]
Param()

Begin {
  $mywatch = [System.Diagnostics.Stopwatch]::StartNew()
  "{0:s}Z :: Copy-SqlTable()" -f [System.DateTime]::UtcNow | Write-Verbose

  [string]$ApplicationName = 'SqlBulkCopy.ps1'

  #Candidates for function parameters:
  [string]$SourceInstanceName = '(local)\SQL2016A'
  [string]$SourceDatabaseName = 'source'
  [string]$SourceTableName = '[test].[business]'

  [string]$TargetInstanceName = $SourceInstanceName
  [string]$TargetDatabaseName = 'target'
  [string]$TargetTableName = $SourceTableName
}

Process {
  'Connect to source...' | Write-Verbose
  [string]$CnnStrSource = "Data Source=$SourceInstanceName;Integrated Security=SSPI;Initial Catalog=$SourceDatabaseName;Application Name=$ApplicationName"
  "Source connection string: '$CnnStrSource'" | Write-Debug
  $SqlCnnSource = New-Object -TypeName System.Data.SqlClient.SqlConnection $CnnStrSource
  $SqlCommand = New-Object -TypeName System.Data.SqlClient.SqlCommand("SELECT * FROM $SourceTableName;", $SqlCnnSource)
  $SqlCnnSource.Open()
  [System.Data.SqlClient.SqlDataReader]$SqlReader = $SqlCommand.ExecuteReader()

  'Copy to target...' | Write-Verbose
  [string]$CnnStrTarget = "Data Source=$TargetInstanceName;Integrated Security=SSPI;Initial Catalog=$TargetDatabaseName;Application Name=$ApplicationName"
  "Target connection string: '$CnnStrTarget'" | Write-Debug
  try {
    $SqlBulkCopy = New-Object -TypeName System.Data.SqlClient.SqlBulkCopy($CnnStrTarget, [System.Data.SqlClient.SqlBulkCopyOptions]::KeepIdentity)
    $SqlBulkCopy.EnableStreaming = $true
    $SqlBulkCopy.DestinationTableName = $TargetTableName
    $SqlBulkCopy.BatchSize = 1000000 # Another candidate for function parameter
    $SqlBulkCopy.BulkCopyTimeout = 0 # seconds, 0 (zero) = no timeout limit
    $SqlBulkCopy.WriteToServer($SqlReader)
  }
  catch [System.Exception] {
    $_.Exception | Write-Output
  }
  finally {
    'Copy complete. Closing...' | Write-Verbose
    $SqlReader.Close()
    $SqlCnnSource.Close()
    $SqlCnnSource.Dispose()
    $SqlBulkCopy.Close()
  }
}

End {
  $mywatch.Stop()
  [string]$Message = "Copy-SqlTable finished with success. Duration = $($mywatch.Elapsed.ToString()). [hh:mm:ss.ddd]"
  "{0:s}Z $Message" -f [System.DateTime]::UtcNow | Write-Output
}
} # Copy-SqlTable


Measure

Execution time is measured on each run. For the PowerShell function I used a .NET Stopwatch object. The T-SQL statements are clocked default by Management Studio.

Also I kept an eye on Memory and CPU usage in Windows Performance Monitor on the Process object with all counters (*) on the processes sqlsrvr.exe and powershell/powershell_ise during each run.

Finally I caught actual execution plan on T-SQL statements and kept an eye on the SQL Server Activity Monitor, e.g. the wait statistics. Actually I enabled the SQL Server Query Store on both tables on creation to have some extra figures to look at.

Create test data

Using the batch delimiter "GO" in Management Studio or SQLCMD with the number of rows as the count parameter value to generate several rows of source data:
USE [source];
GO
SET NOCOUNT ON;
GO
INSERT INTO [test].[business] ([test_str],[test_nr])
VALUES (NEWID(), CONVERT(int, RAND()*2147483647));

GO 1000000000


Define SQL Server objects

The source database and table is created simple but still with parallel I/O in mind:
CREATE DATABASE [source]
ON PRIMARY
  (NAME = N'source_data', FILENAME = N'C:\MSSQL_data\source.primary.mdf',
  SIZE = 8MB, FILEGROWTH = 10MB ),
FILEGROUP [user_data]
  ( NAME = N'user_data00',
   FILENAME = N'C:\MSSQL_data\source.user_data00.ndf',
   SIZE = 128MB, FILEGROWTH = 10MB),
  ( NAME = N'user_data01',
   FILENAME = N'C:\MSSQL_data\source.user_data01.ndf',
   SIZE = 128MB, FILEGROWTH = 10MB),
  ( NAME = N'user_data02',
   FILENAME = N'C:\MSSQL_data\source.user_data02.ndf',
   SIZE = 128MB, FILEGROWTH = 10MB),
  ( NAME = N'user_data03',
   FILENAME = N'C:\MSSQL_data\source.user_data03.ndf',
   SIZE = 128MB, FILEGROWTH = 10MB),
  ( NAME = N'user_data04',
   FILENAME = N'C:\MSSQL_data\source.user_data04.ndf',
   SIZE = 128MB, FILEGROWTH = 10MB),
  ( NAME = N'user_data05',
   FILENAME = N'C:\MSSQL_data\source.user_data05.ndf',
   SIZE = 128MB, FILEGROWTH = 10MB),
  ( NAME = N'user_data06',
   FILENAME = N'C:\MSSQL_data\source.user_data06.ndf',
   SIZE = 128MB, FILEGROWTH = 10MB),
  ( NAME = N'user_data07',
   FILENAME = N'C:\MSSQL_data\source.user_data07.ndf',
   SIZE = 128MB, FILEGROWTH = 10MB)
LOG ON
  ( NAME = N'source_log',
   FILENAME = N'C:\MSSQL_translog\source_log.ldf',
   SIZE = 56MB, FILEGROWTH = 10MB);
GO

ALTER DATABASE [source] SET QUERY_STORE = ON;
ALTER DATABASE [source] SET QUERY_STORE (OPERATION_MODE = READ_WRITE);
GO

USE [source];
GO
CREATE SCHEMA [test];
GO
CREATE TABLE [test].[business] (
  [test_id] bigint NOT NULL IDENTITY (1, 1),
  [test_str] nvarchar(256) NOT NULL,
  [test_nr] int NOT NULL
  ) ON [user_data];
GO

USE [master];
GO


The target database and table is created in similar way but on another drive to further optimize I/O:
CREATE DATABASE [target]
ON PRIMARY
  (NAME = N'taget_data', FILENAME = N'M:\MSSQL_data\target.primary.mdf',
  SIZE = 8MB, FILEGROWTH = 8MB ) ,
FILEGROUP [user_data]
  ( NAME = N'user_data00',
   FILENAME = N'M:\MSSQL_data\target.user_data00.ndf',
   SIZE = 1792MB, FILEGROWTH = 32MB),
  ( NAME = N'user_data01',
   FILENAME = N'M:\MSSQL_data\target.user_data01.ndf',
   SIZE = 1792MB, FILEGROWTH = 32MB),
  ( NAME = N'user_data02',
   FILENAME = N'M:\MSSQL_data\target.user_data02.ndf',
   SIZE = 1792MB, FILEGROWTH = 32MB),
  ( NAME = N'user_data03',
   FILENAME = N'M:\MSSQL_data\target.user_data03.ndf',
   SIZE = 1792MB, FILEGROWTH = 32MB),
  ( NAME = N'user_data04',
   FILENAME = N'M:\MSSQL_data\target.user_data04.ndf',
   SIZE = 1792MB, FILEGROWTH = 32MB),
  ( NAME = N'user_data05',
   FILENAME = N'M:\MSSQL_data\target.user_data05.ndf',
   SIZE = 1792MB, FILEGROWTH = 32MB),
  ( NAME = N'user_data06',
   FILENAME = N'M:\MSSQL_data\target.user_data06.ndf',
   SIZE = 1792MB, FILEGROWTH = 32MB),
  ( NAME = N'user_data07',
   FILENAME = N'M:\MSSQL_data\target.user_data07.ndf',
   SIZE = 1792MB, FILEGROWTH = 32MB)
LOG ON
  ( NAME = N'source_log',
   FILENAME = N'C:\MSSQL_translog\target_log.ldf',
   SIZE = 56MB, FILEGROWTH = 16MB);
GO

ALTER DATABASE [target] SET QUERY_STORE = ON
ALTER DATABASE [target] SET QUERY_STORE (OPERATION_MODE = READ_WRITE)
GO

USE [target];
GO
CREATE SCHEMA [test];
GO
CREATE TABLE [test].[business] (
  [test_id] bigint NOT NULL IDENTITY (1, 1),
  [test_str] nvarchar(256) NOT NULL,
  [test_nr] int NOT NULL
) ON [user_data];
GO

USE [master];
GO


Evaluation

The first measure is a basic run with default batch size.

Copy-SqlTable finished with success. Duration = 00:13:28.0940294. [hh:mm:ss.ddd]; 134 308 637 rows; 12.3 GB data
Copy-SqlTable finished with success. Duration = 00:16:12.9162091. [hh:mm:ss.ddd]; BatchSize = 1 000
Copy-SqlTable finished with success. Duration = 00:11:34.3647701. [hh:mm:ss.ddd]; BatchSize = 10 000
Copy-SqlTable finished with success. Duration = 00:10:15.7085043. [hh:mm:ss.ddd]; BatchSize = 100 000
Copy-SqlTable finished with success. Duration = 00:10:00.1098163. [hh:mm:ss.ddd]; BatchSize = 1 000 000

No comments: