Data Engineering with Avishkar: 03/01/2010

March 26, 2010

TempDB in SQL server 2005

Tempdb holds all temporary tables and temporary stored procedures. Tempdb will automatically increase and decrease as the data engine performs actions. Using a lot of temp tables and other operations can cause the log to grow,
but if you stopped the service and restarted it, then tempdb is created with
the last size set up.

Scenarios that can cause tempdb to fill up:
any sorting that requires more memory than has been allocated to SQL Server will be forced to do its work in tempdb;

 if the sorting requires more space than you have allocated to tempdb, one of the above errors will occur;

 DBCC CheckDB('any database') will perform its work in tempdb -- on larger databases, this can consume quite a bit of space;

 DBCC DBREINDEX or similar DBCC commands with 'Sort in tempdb' option set will also potentially fill up tempdb;

 large resultsets involving unions, order by / group by, cartesian joins, outer joins, cursors, temp tables, table variables, and hashing can often require help from tempdb;

 any transactions left uncommitted and not rolled back can leave objects orphaned in tempdb;

 Queries which show a relatively high value in the WRITES data column of a Profiler trace. Seeing SPOOL operators in the query plan is another sign that SQL Server is performing large numbers of writes to tempdb. The SPOOL operator means that SQL Server is "spooling" an interim result set to a secondary location (tempdb) so that it can do something with that data later in the plan.

 use of an ODBC DSN with the option 'create temporary stored procedures' set can leave objects there for the life of the connection.

Identifying tempdb bottleneck using Perfmon counters:

Use following performance counters to monitor tempdb issue in SQL Server.
 Temp Tables Creation Rate (SQL Server 2005): The number of temporary tables or variables that are created per second. The value depends on the workload. If your workload creates many temporary tables and the temporary table cache cannot be used, you may see high values. In such cases investigate why the temporary table cache cannot be used; this might be by design in the application. For example, if there is an explicit DDL after the temporary table is created, the table cannot be cached. In a perfect scenario, if there were 100% temporary table cache hits, this counter would be 0
 Temp Tables For Destruction (SQL Server 2005): The number of temporary tables or variables waiting to be destroyed by the cleanup system thread. The number of temporary tables or variables waiting to be destroyed by the cleanup system thread should be 0. Temporary spikes are expected, but should be rare. Spikes can happen when the temporary table cache or the procedure cache is being cleaned up.

Fixing tempdb full issue:

If tempdb is full, it gives one of the following error:
 1101 or 1105: A session has to allocate more space in tempdb in order to continue
 3959: The version store is full.
 3967: The version store has been forced to shrink because tempdb is full.
 3958 or 3966: A transaction is unable to find a required version record in tempdb.
Restarting SQL Server will re-create tempdb from scratch, and it will return to its usually allocated size. In and of itself, this solution is only effective in the very short term; assumedly, the application and/or T-SQL code which caused tempdb to grow once will likely cause it to grow again.

To shrink tempdb, you can consider using DBCC ShrinkDatabase, DBCC ShrinkFile (for the data or the log file).

Shrinking files is not a recommended practice, since these files will probably grow again. In addition, shrink operations cause data fragmentation. Performing shrink operation on tempdb has the following limitations:
• Shrink operations do not shrink the version store or internal objects. This means that you will not recover space if the internal object or version store page needs to be moved.

Identifying session which might be causing issue and fixing it by killing that session:

If you can't shrink the log, it might be due to an uncommitted transaction. See if you have any long-running transactions with the following command:

DBCC OPENTRAN -- or DBCC OPENTRAN('tempdb')

Check the oldest transaction (if it returns any), and see who the SPID is (there will be a line starting with 'SPID (Server Process ID) : '). Use that in the following:

DBCC INPUTBUFFER SPID

This will tell you at least a portion of the last SQL command executed by this SPID, and will help you determine if you want to end this process with:

KILL SPID number

Batch larger heavily-logged operations (especially deletes) that *might* overflow into tempdb into reasonable 'chunks' of rows, especially when joins are involved.

March 22, 2010

Finding Column Dependencies in SQL server 2005

If you need to make small changes to an existing database object, then you will need to do some impact assessment to determine what SQL Server Objects will be affected by your small change.We need to determine column dependencies within database as well as these dependencies across different databases.

Using the SYSCOMMENTS table:
The actual code for views, rules, defaults, triggers, CHECK constraints, DEFAULT constraints and stored procedures are stored in the syscomments table. The column TEXT in the syscomments table contains the actual code for all these objects. Knowing this allows you to write some simple T-SQL code that can scan the syscomments table looking for an actual table column name or database object name that you are planning on changing. By scanning the syscomments table, you will be able to narrow the focus of your impact analysis down to only those objects that might be affected by your change.

e.g
select name
from syscomments c
join sysobjects o on c.id = o.id
where TEXT like '%Titles%' and TEXT like '%Title_Id%' -- we are determining in which objects Title_Id column of Titles table is used.

This query looks for syscomments rows that contain both the string "Title_Id " and "titles" in the same record. With this query, I find all syscomments records that contain both a reference to the table and column that I planned on changing. This query joins the syscomments and sysobjects table so that the object name can be displayed in the results.

Using the sp_depends Stored Procedure:
SQL Sever maintains a database system table named sysdepends. SQL Server uses this table to store object dependencies. This table only contains information about objects that are dependent upon other objects. It does not contain detailed dependency information, such as which attributes of an object are referenced by other objects. Even so, you can use the information in the sysdepends table to help narrow down your impact analysis when you are making a schema change.

To access the sysdepends table information SQL Server provides the sp_depends system stored procedure.
e.g. EXEC sp_depends 'Titles'

Lets say , it return objects dbo.titleview , GetAllTitles

This will return all objects that are dependent on the table name "Titles."

Now run below query to get all objects which use column Title_id.

select distinct object_name(id) from sys.syscomments
where TEXT like '%Titles%' and TEXT like '%Title_Id%'
and object_name(id) in (
'GetAllTitles',
'titleview'
')

Not All Code is Stored in SQL Server
Now keep in mind the syscomments and sp_depends methods may not find all your code that is affected by a schema change. These two methods are only able to scan objects that are stored in SQL Server (views, rules, defaults, triggers, CHECK constraints, DEFAULT constraints, and stored procedures). If you have written your application code in such a way that allows you to issue T-SQL SELECT, UPDATE, INSERT and DELETE statements in code blocks that are not stored in SQL Server, then you will need to use other methods for scanning this code. Hopefully where you store the code for your application outside SQL Server is in a repository or file structure of some kind where you can do a global find or search of some kind.

When you are making a database schema change, it is important to identify all the objects that might be affected by the change. Without doing a complete impact assessment, you run the risk of causing problems when implementing your schema change. Therefore having some easy, and automated method to help identify the objects that will be impacted by your schema change is critical, and a time saver. The next time you are doing some impact analysis associated with a schema change to your database, you might consider using one or all the methods I discussed to help narrow down the objects impacted by your proposed change.

Log File in SQL Server 2005

In SQL Server 2005, each database contains at least one data file and one transaction log file. SQL Server stores the data physically in the data file. The transaction log file stores the details of all the modifications that you perform on your SQL Server database and the details of the transactions that performed each modification. Because the transactional integrity is considered a fundamental and intrinsic characteristic of SQL Server, logging the details of the transactions cannot be turned off in SQL Server.

Recovery Model
Recovery model is the way you want the transactions to be registered into the log file. Since log files include all the transactions in the database, they can be used for recovery. There are 3 types of recovery, here they are, and what they do.

Simple recovery model:

The simple recovery model is just that: simple. In this approach, SQL Server maintains only a minimal amount of information in the transaction log. SQL Server truncates the transaction log each time the database reaches a transaction checkpoint, leaving no log entries for disaster recovery purposes.

In databases using the simple recovery model, you may restore full or differential backups only.

Full recovery model:

In the event of a database failure, you have the most flexibility restoring databases using the full recovery model. In addition to preserving data modifications stored in the transaction log, the full recovery model allows you to restore a database to a specific point in time.

The bulk-logged recovery model is a special-purpose model that works in a similar manner to the full recovery model. The only difference is in the way it handles bulk data modification operations. The bulk-logged model records these operations in the transaction log using a technical known as minimal logging. This saves significantly on processing time, but prevents you from using the point-in-time restore option.

Issues due to large transaction log file:

Sometimes, the transaction log file may become very large and you may run out of disk space. When a transaction log file grows until the log file uses all the available disk space and cannot expand any more, you can no longer perform any data modification operations on your database. Additionally, SQL Server may mark your database as suspect because of the lack of space for the transaction log expansion.

To recover from a situation where the transaction logs grow to an unacceptable limit, you must reduce the size of the transaction logs. To do this, you must truncate the inactive transactions in your transaction log and shrink the transaction log file.

When the transaction logs grow to an unacceptable limit, you must immediately back up your transaction log file. While the backup of your transaction log files is created, SQL Server automatically truncates the inactive part of the transaction log. The inactive part of the transaction log file contains the completed transactions, and therefore, the transaction log file is no longer used by SQL Server during the recovery process. SQL Server reuses this truncated, inactive space in the transaction log instead of permitting the transaction log to continue to grow and to use more space.

The backup operation or the Truncate method does not reduce the log file size. To reduce the size of the transaction log file, you must shrink the transaction log file. To shrink a transaction log file to the requested size and to remove the unused pages, you must use the DBCC SHRINKFILE operation. The DBCC SHRINKFILE Transact-SQL statement can only shrink the inactive part inside the log file.

If your database's log file size reaches its limit, you can truncate and then shrink it to its minimum size by using the following commands:

USE DatabaseName
GO
DBCC SHRINKFILE(, 1)
BACKUP LOG WITH TRUNCATE_ONLY
DBCC SHRINKFILE(, 1)

If you've limit your log file size, and it gives an error when it has reached its maximum limit, you can do the following to tackle the issue:
1: Set your database's recovery mode to "SIMPLE". By default, SQL Server 2005 sets a database's recovery mode to "FULL". Now, before changing the recovery mode, you should know what you're willing to recede. When a database's recovery mode is "FULL", it means when you restore or when the database gets corrupted, SQL Server will restore it to a point right before it got corrupted! On the other hand, if you use "SIMPLE" recovery mode, it will simply restore the database to the last point the backup was made. For instance, you took a database's backup at, let's say, 10 am, and your database got corrupted at 5 pm the same day. With FULL recovery mode, you'll be able to restore your batabase to a state it was in right before 5 pm. With SIMPLE recovery mode, you'll be able to restore your database to a state it was in at 10 am!

2: Take periodic backup of your database's log file! When you take its backup, the committed transactions in the log file go in overwrite mode.

When you have your recovery mode to full, you are supposed to make two types of backups:
1.Full backup every ones in a while.
2.Transaction log backups according to transaction log volume to keep the log file small and being able to do a point in time recovery , If you skip this not only will the log file keep on growing, but the only restore you can do will be a restore from the full backup. And in that case you loose all the work from that moment on. If you backup the transaction log as well, you can do a roll forward and restore up until the last transaction log backup.

Schedule a job to take the backup of your log file and schedule it according to transaction log volume.

If you restore a database, a new database is created for you, and a new logfile. As SQL Server is a write ahead RDBMS, every transaction first goes to the log, and only then to disk. So a database in SQL Server will always contain a data file and a log file. Right after the restore, your database will be as at the time of the backup, the logfile will be almost empty. Only when you do a 'special' restore, some info from the logfile can be used. Under normal circumstances, all open transactions that are in the logfile are rolled back after a restore. So actually, there is some logfile data in a full backup ... but 'backup database' statement can never 'empty' the logfile for you !

Eliminate the Log File Completely
Sometimes, we just do not need the big log file. For example, I have a 40GB log file. I am sure I do not need this log file and want to get rid of it completely to free up the hard drive space. The logic is:

Backup DB
Detach DB
Rename Log file
Attach DB
New log file will be recreated
Delete Renamed Log file.

Let’s say, the database name is testDev. In the SQL Server Management Studio,

Highlight the database-> Tasks->Detach..-> Click OK
Go to log file folder -> rename the testDev_log.ldf to be like testDev_log-aa.ldf
Highlight Databases->Attach…-> Click Add -> add the database testDev, highlight the log file and click the ‘Remove’ button. This means you only attach testDev.mdf
After this is done, you can verify the contents of the attached database and then delete the log file
This way we can safely delete the log file and free up the space.

Data Engineering with Avishkar

March 26, 2010

TempDB in SQL server 2005

March 22, 2010

Finding Column Dependencies in SQL server 2005

Log File in SQL Server 2005

Stop Memorizing SQL Part 3 : A Practical SQL Cheat Sheet for Beginners

Search This Blog