Techno Solutions for BIG Data Analytic and SQL Server

Thursday 27 December 2012

Rowset functions in SQL Server 2012

SQL Server provides many built-in functions. Let’s see one of the types called Rowset functions. We can also create our own user defined functions in SQL Server based on the end-user requirement.

Rowset functions can be used to access remote data through OLEDB provider. It can be used like table reference as in T-SQL Statements. It does not always return the same result set all the time even though it has same input values. That’s the reason rowset function is called as non-deterministic functions.

It is categorized into 4 types.

1. OPENDATASOURCE

It is used to access data from remote OLEDB data sources.

2. OPENQUERY

It executes the given query through specified Linked server which is an OLE DB data source.

3. OPENROWSET

It is used to access data from remote OLEDB data sources.

4. OPENXML

It is used to provide a rowset view for an XML document.

We will see detailed information about Rowset functions in the upcoming articles.

Wednesday 26 December 2012

What is Write-Ahead Transaction Log in SQLSERVER?

Write-Ahead Transaction Log (WAL) makes sure that log record for modified data pages are written to the disk before actual data pages.

Let’s see data flow process in SQL Server.

1) When data modification occurs, initially it is written to the buffer cache.

2) While checkpoint occurs, first associated log record for data modification is written to the disk. Log records are written to the disk when the transaction is committed.

3) And then actual modified data pages are written to the disk from the buffer cache. This operation is called flushing the page. A page modified in the buffer cache, but not yet written the disk is called dirty page.

Use of WAL in SQLServer

It is helpful for data recovery process. We can roll back the transaction without any issue due to WAL mechanism.

If modified data page (dirty page) is flushed before the associated log record is written, the dirty page creates a modification on the disk that cannot be rolled back if the server fails before the associated log record is written to disk. SQL Server has mechanism which prevents a dirty page from being flushed before the associated log record is written.

Tuesday 18 December 2012

Database Checkpoint in SQL Server

SQL Server database engine does not directly write data page modifications in memory [buffer cache] to the disk. Instead it issues database checkpoint. Each time while database engine issues checkpoint, it writes modified data pages [dirty pages] and its transaction log information from memory to disk. And also it records checkpoint information to transaction log.

Type of Database Checkpoints

1. Automatic Checkpoint

a. It is issued automatically from the background based on the [recovery interval] defined.

b. [Recovery Interval] parameter is specified in server level.

c. Default [recovery interval] value for automatic checkpoint is 0(zero) in which target recovery interval is 1 minute.

d. Query to define the [recovery interval]

EXEC SP_CONFIGURE '[recovery interval]','seconds'

2. Indirect Checkpoint

a. It is issued from the background based on the [TARGET_RECOVERY_TIME] defined in the database.

b. [TARGET RECOVERY TIME] parameter is specified in database level.

c. Default [Target Recovery Time] value is 0(zero) which causes automatic checkpoint.

d. Query to define the [TARGET RECOVERY TIME]

ALTER DATABASE [database name] SET TARGET_RECOVERY_TIME = [target recovery time] {SECONDS | MINUTES}

Find more details about Indirect Checkpoint here.

3. Manual Checkpoint

a. It is issued when user executes T-SQL Command CHECKPOINT.

b. [Checkpoint duration] parameter is specified in database level.

c. [Checkpoint duration] parameter specifies requested amount of time in seconds for the checkpoint to complete.

d. Query to issue Manual checkpoint

CHECKPOINT [ checkpoint duration ]

4. Internal Checkpoint

a. It is issued internally by database engine during various server operations such as database backup, snapshot creation etc.

Transaction Log in SQL Server

Transaction log files are most important in SQL Server database. It records all the transactions and the database modifications made by each transaction that occurs in the database.

Transaction Log Truncation

We should closely monitor the transaction log files growth to prevent it from filling-up. Periodical truncation of transaction log files is useful for freeing used space in log files by deleting inactive virtual log files from logical transaction log of the SQL Server database. SQL Server also automatically truncates the transaction log files on below event.

1. After check point occurs, under the simple recovery model.

2. If checkpoint has occurred since the previous backup, then truncation occurs after the transaction log backup, under the Full or Bulk-logged recovery model. Otherwise it is copy-only transaction log backup.

SQL Server cannot truncate the transaction log records as long as it remains active. We can identify the reason for the transaction log truncation delays from the below query.

SELECT

name,

log_reuse_wait,

log_reuse_wait_desc

FROM

sys.databases

Find below the different factors that are defined through log_reuse_wait and log_resue_wait_desc, for the log truncation delays.

0 - NOTHING [Currently there are one or more reusable transaction log files].

1 - CHECKPOINT [No checkpoint has occurred since the last log truncation].

2 - LOG_BACKUP [Log backup is required before the log truncation. It is applicable only for Full or Bul-Logged recovery models].

3 – ACTIVE_BACKUP_OR_RESTORE [Data backup or restore is in progress].

4 – ACTIVE_Transaction [Transaction is still active – it might be because of long running transaction or deferred transaction. Deferred transaction is nothing but a active transaction whose rollback is blocked due to some unavailable resources].

5 – DATABASE_MIRRORING [Database mirroring is paused or mirroring is under high performance mode].

6 – REPLICATION [During transactional replication, publisher’s transaction is still undelivered to the distributor].

7 – DATABASE_SNAPSHOT_CREATION [A database snapshot is being created].

8 – LOG_SCAN [Log scan is occurring].

9 – AVAILABILITY_REPLICA [Secondary replica of availability group is applying transaction log records to its secondary database].

10 – For internal use only.

11 – For internal use only.

12 – For internal use only.

13 – OLDEST_PAGE [If Indirect Checkpoint is configured in a database, the oldest page on the database might be older than the checkpoint LSN. In this case, the oldest page can delay log truncation].

Log truncation does not reduce the size of the physical transaction log file. We should go for database shrink to reduce the physical transaction log file size.

Importance of Transaction log

1. It plays vital role during recovery process which may be either individual transaction recovery or point-in-time database recovery.

2. It supports in high availability solutions such as Database Mirroring, Log shipping, Transaction replication and AlwaysOn Availability group.

Monday 17 December 2012

Parallel Plan Execution in SQL Server

Parallel plan execution is useful to increase the task performance, as one task is performed by more than one processor in parallel. Let’s see how parallel execution process is implemented in SQL Server.

SQL Server that runs on a computer which has more than one microprocessor or CPUs, and then we can define that how many system processors can be used to execute single SQL query statement

Max Degree of Parallelism – option is used to define the number of processor usage for parallel plan execution. And MAXDOP Query hint can be used to override the Max Degree of parallelism execution.

Setting Maximum Degree of Parallelism - option to 0[default value] allows SQL Server to use all the available processors up to 64 processors. To defeat the parallel plan execution, set max degree of parallelism to 1. To restrict the maximum number of processors used by a single query execution then set the value to a number greater than 1.

We can either use SQL Server Management Studio or T-SQL Script to configure parallel plan execution in SQL Server.

Steps to configure through SQL Server Management Studio

1. In Object Explorer, right-click a server and select Properties.

2. Select the Advanced node.

3. In the Max Degree of Parallelism box, select the maximum number of processors to use in parallel plan execution.

Refer below screen shot for the reference.

Steps to configure through T-SQL Script

Let’s use below script to configure through T-SQL.

USE [master];

EXEC sp_configure 'show advanced options', 1;

RECONFIGURE WITH OVERRIDE;

EXEC sp_configure 'max degree of parallelism', 8;

RECONFIGURE WITH OVERRIDE;

Refer below the screen shot for the test results.