2015-12-04

SQL Server Errorlog parsing with PowerShell

In a recent Incident I browsed the SQL Server Errorlog, but with a daily cycle of the Errorlog there were quite a few logfiles to browse. This is a description of some ways to parse the SQL Server Errorlog for specific messages.

Ways to read Errorlog

There are several more or less common ways to read the Errorlog. I have collected some with a short description.

  • T-SQL: The stored procedures master.sys.sp_readerrorlog uses the extended stored procedure master.sys.xp_readerrorlog to search for a somewhat simple string. This can be enough in simple situations, but not enough when in need for more detailed information. Also these procedures are not documented by Microsoft, and then not supported.
  • SMO: The method Server.ReadErrorLog() reads a Errorlog file and returns an array of Errorlog lines. This is rather usefull in most cases, but sometimes I need to combine several lines from the Errorlog.
  • SQLPSX: This Codeplex project "SQL Server PowerShell Extensions" has defined the cmdlet Get-SqlErrorLog. The cmdlet is using the SMO method Server.ReadErrorLog() described above. Unfortunately it looks like the project has been dormant since 2011.
  • .NET: The method [System.IO.File]::ReadAllText() reads the entire contents of the Errorlog file into one String. This makes it convenient for combining several lines in the Errorlog in one search filter.
  • PowerShell: The cmdlet Get-Content, especially with the -Raw parameter set, gives a direct acces the the contents of a Errorlog file. I prefer to use the parameter -LiteralPath to the parameter -Path as it ensures the precise path on casing, spaces and everything else.
In general I prefer to use the PowerShell cmdlet Get-Content. Get-Content -Raw can be used to get one string instead of an array of SQL Server Errorlog lines. This makes it easy to match the entire event with one regular expression (regex) pattern.
To match across several lines in the Errorlog, the regular expression pattern includes \s to match both CR&LF and LF. Actually in this case both symbols are in the Errorlog to generate a NewLine in the presentation of the Errorlog.

When the parsing is implemented in Powershell advanced functions the input can be both one SQL Server Errorlog file or a folder holding several Errorlog files. The function will implicit traverse the files in the pipeline through the Process part of the function.
Calling a function with one Errorlog file can be like this
Get-FlushCache -ErrorLogFile 'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Log\ERRORLOG.1'
while calling the function on all Errorlog files in a folder can be like this
Get-ChildItem -Path 'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Log\ERRORLOG*' |
Get-FlushCache


A PowerShell advanced function has this general structure
function Get-FlushCache {
<#
.DESCRIPTION
Find FlushCache entries in SQL Server Errorlog and get metrics from the entry.
.PARAMETER
ErrorlogFileName Name of SQL Server errorlog file with full path
.INPUTS
SQL Server Database Engine Errorlog file by full path.
.OUTPUTS
(none)
.RETURNVALUE
Collection of FlushCache events
.EXAMPLE
#>
[CmdletBinding()]
param(
  [Parameter(Mandatory=$true,ValueFromPipeline=$true)]
  [System.IO.FileInfo]$ErrorLogFile
)
Begin {
  $FlushCaches = @() # Array to hold all FlushCache objects that are created on the matches
  $TotalWatch = [System.Diagnostics.Stopwatch]::StartNew()
}

Process {
  "Check file '$($ErrorLogFile.Name)'..." | Write-Verbose
  if ($ErrorLogFile.Exists -eq $false) {
    "File '$($ErrorLogFile.FullName)' does not exist" | Write-Error -Category ObjectNotFound
    return
  }

  "Matching pattern (File size = $($ErrorLogFile.Length) B)..." | Write-Verbose
  [String]$ErrorLog = Get-Content -LiteralPath $ErrorLogFile.FullName -Raw
  [String]$regex = ...

  $MatchWatch = [System.Diagnostics.Stopwatch]::StartNew()
  [MatchInfo]$FlushCacheMatches = $ErrorLog | Select-String -Pattern $regex -AllMatches
  $MatchWatch.Stop()
  if ($FlushCacheMatches -eq $null) {
    "No matches on pattern in '$($ErrorLogFile.Name)' (Match timer = $($MatchWatch.Elapsed.ToString()))." | Write-Verbose
  }
  else {
    "$($FlushCacheMatches.Matches.Count) matches found in '$($ErrorLogFile.Name)' (Match timer = $($MatchWatch.Elapsed.ToString()))." | Write-Verbose
    foreach($Match in $FlushCacheMatches.Matches) {
      $TimeStamp = [System.DateTime]$Match.Groups['timestamp'].Value
      $BufCount = [Int]$Match.Groups['bufs'].Value
      $WriteCount = [Int]$Match.Groups['writes'].Value
      $WriteTime = [Int]$Match.Groups['writetime'].Value
      $BufAvoidCount = [Int]$Match.Groups['bufsavoided'].Value
      $DbId = [String]$Match.Groups['db'].Value
      $AvgWritesPerSecond = [Double]$Match.Groups['avgwrites'].Value
      $AvgThroughput = [Double]$Match.Groups['avgthroughput'].Value
      $IoSaturation = [Int]$Match.Groups['iosaturation'].Value
      $ContextSwitchCount = [Int]$Match.Groups['contxtsw'].Value
      $LastTarget = [Int]$Match.Groups['lasttarget'].Value
      $AvgWriteLatency = [Int]$Match.Groups['avgwritelat'].Value

      $FlushCacheProperties = @{
        TimeStamp = $TimeStamp
        BufCount = $BufCount
        WriteCount = $WriteCount
        WriteTimeMs = $WriteTime
        BufAvoidCount = $BufAvoidCount
        DbId = $DbId
        AvgWritesPerSecond = $AvgWritesPerSecond
        AvgThroughput = $AvgThroughput
        IoSaturation = $IoSaturation
        ContextSwitchCount = $ContextSwitchCount
        LastTarget = $LastTarget
        AvgWriteLatency = $AvgWriteLatency
      }
      $FlushCache = New-Object -TypeName PSObject -Property $FlushCacheProperties
      $FlushCache.PSObject.TypeNames.Insert(0, 'SqlAdmin.FlushCache')
      $FlushCaches += $FlushCache
    }
  }
}

End {
  $TotalWatch.Stop()
  "$($FlushCaches.Count) matches in extract collection (Total watch = $($TotalWatch.Elapsed.ToString()))." | Write-Verbose
  $FlushCaches
}
} # Get-FlushCache()

The contents of the String regex is the regular expression pattern. Each pattern will define different output variables and the function must be defined with matching attributes on the custom object. Some examples on regular expression patterns are shown below.

How to match an event

When the content of a Errorlog file is read, the next step is to find the event(-s) in the file. This can in PowerShell be done somewhat direct with the cmdlet Select-String using the -Pattern parameter with a regular expression as value. The individual parts of the match can be isolated by defining matching groups in the regular expression.
Select-String return a MatchInfo object that contain a collection of Matches. This collection can be traversed with a ForEach statement on a variable that will be a Match object. The value of a match group element is in the Value property of each Group object in the Groups collection.
The group values in a match I put in a Custom Object (PSObject) and the objects are put in a collection. This collection is the basic output of the script.

FlushCache

This event is logged in the SQL Server Errolog like this example:
2013-01-13 12:13:14.37 spid16s     FlushCache: cleaned up 70601 bufs with 2731 writes in 130454 ms (avoided 13308 new dirty bufs) for db 42:0
2013-01-13 12:13:14.37 spid16s                 average writes per second:  20.93 writes/sec
            average throughput:   4.22 MB/sec, I/O saturation: 5536, context switches 12391
2013-01-13 12:13:14.37 spid16s                 last target outstanding: 498, avgWriteLatency 39

This is an example of an event that is logged across several lines in the Errorlog file.

The challenge in this case is immediately two-fold:
  1. Multiple lines in SQL Server Errorlog in one event entry.
  2. Multiple events in one SQL Server Errorlog.
The regular expression pattern used in this case is
[String]$regex = '(?\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{2}).{13}' `
+ 'FlushCache: cleaned up (?<bufs>\d+) bufs ' `
+ 'with (?<writes>\d+) writes ' `
+ 'in (?<writetime>\d+) ms ' `
+ '\(avoided (?<bufsavoided>\d+) new dirty bufs\) ' `
+ 'for db (?<db>\d+:\d+)' `
+ '\s+.+average writes per second: (?<avgwrites>\d+\.\d+) writes/sec' `
+ '\s+.+average throughput: (?<avgthroughput>\d+.\d+) MB/sec, ' `
+ 'I/O saturation: (?<iosaturation>\d+), ' `
+ 'context switches (?<contxtsw>\d+)' `
+ '\s+.+last target outstanding: (?<lasttarget>\d+), ' `
+ 'avgWriteLatency (?<avgwritelat>\d+)'

The pattern string is in the script code spread over several lines with one line per regex group. This gives me the oppertunity to describe each grouping pattern.

I/O requests taking longer than 15 seconds

2013-01-13 12:13:14.39 spid17s     SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\tempdb.mdf] in database id 2. The OS file handle is 0x0000000000001178. The offset of the latest long I/O is: 0x000002dc510000
This issue is described in detail in the Microsoft Support article „Diagnostics in SQL Server help detect stalled and stuck I/O operations“.
The regular expression pattern to find these events is
[String]$regex = '(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{2}).{13}' `
+ 'SQL Server has encountered (?<occurrence>\d+) occurrence\(s\) of I/O requests taking longer than 15 seconds to complete ' `
+ 'on file \[(?<file>.+)\] ' `
+ 'in database id (?<db>\d+). ' `
+ 'The OS file handle is (?<handle>0x[0-9a-f]+). ' `
+ 'The offset of the latest long I/O is: (?<offset>0x[0-9a-f]+)'


Login failed

This is something I want to look more into later. I mention the event now in a rather incomplete form as a teaser - to myself.
2013-01-13 12:13:14.64 Logon Error: 18456, Severity: 14, State: 38.
2013-01-13 12:13:14.16 Logon Login failed for user 'SQLAdmin\Albert'. Reason: Failed to open the explicitly specified database 'Adventureworks'. [CLIENT: 192.168.0.42]


2013-01-13 12:13:14.24 Logon Error: 18456, Severity: 14, State: 40.
2013-01-13 12:13:14.24 Logon Login failed for user 'Bobby'. Reason: Failed to open the database 'Adventureworks' specified in the login properties. [CLIENT: 192.168.0.66]


This error is described in the blog entry „Understanding "login failed" (Error 18456) error messages in SQL Server 2005“.

Anonymous login

When using Windows Authentication through double-hop like linked server or SharePoint and Kerberos delegation is not implemented it will fail with a entry in SQL Server Errorlog like:
Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'

This situation is very well described in SQL Server Protocols Blog: "SQL Linked Server Query failed with "Login failed for user...""

Memory page size - Large Pages

This is not a error, but a Errorlog entry on memory configuration.
By default Large Pages are not set, and this is logged in Errorlog when SQL Server start:
2013-01-13 12:13:14.15 Server Using conventional memory in the memory manager.
When Large Pages is enabled and SQL Server is restarted the Errorlog has a entry like:
???
The regular expression pattern to detect Larges Pages is like
[String]$regex = 'TBD'

Discussion

The PowerShell documentation in TechNet Library is not updated. Actually some of it is wrong like for Get-Content. Use the Get-Help cmdlet.

A more general function that takes regular expression pattern as parameter value could be nice. One thing I have to figure out is how to handle the groups in a regular expression pattern.

As the PowerShell advanced functions can take a folder content as input by the pipeline, it might speed up handling several files if the patterns matching is done in parallel. This might require another structure of the function with some thread handling.

This post will be updated when I have news or better ways to parse SQL Server Errorlog.

Reference

Happy SysAdmin: „Reading large text files with Powershell

No comments: