April 23, 2009

Schtasks: To create, delete, query, change, run, and end scheduled tasks on a local or remote computer

Schtasks used to create, delete, query, change, run, and end scheduled tasks on a local or remote computer. Running Schtasks.exe without arguments displays the status and next run time for each registered task.

The following syntax is used to create a task on the local or remote computer.
schtasks /Create
[/S system [/U username [/P [password]]]]
[/RU username [/RP [password]] /SC schedule [/MO modifier] [/D day]
[/M months] [/I idletime] /TN taskname /TR taskrun [/ST starttime]
[/RI interval] [ {/ET endtime /DU duration} [/K]
[/XML xmlfile] [/V1]] [/SD startdate] [/ED enddate] [/IT] [/Z] [/F]

Deleting a Task
The following syntax is used to delete one or more scheduled tasks.
schtasks /Delete
[/S system [/U username [/P [password]]]]
[/TN taskname] [/F]

Running a Task
The following syntax is used to immediately run a scheduled task.
schtasks /Run
[/S system [/U username [/P [password]]]]
/TN taskname

Ending a Running Task
The following syntax is used to stop a running scheduled task.
Note To stop a remote task from running, ensure that the remote computer has the File and Printer Sharing and Remote Scheduled Tasks Management firewall exceptions enabled.
schtasks /End
[/S system [/U username [/P [password]]]]
/TN taskname

Querying for Task Information
The following syntax is used to display the scheduled tasks from the local or remote computer.
schtasks /Query
[/S system [/U username [/P [password]]]]
[/FO format /XML] [/NH] [/V] [/TN taskname] [/?]

Changing a Task
The following syntax is used to change how the program runs, or change the user account and password used by a scheduled task.
schtasks /Change
[/S system [/U username [/P [password]]]] /TN taskname
{ [/RU runasuser] [/RP runaspassword] [/TR taskrun] [/ST starttime]
[/RI interval] [ {/ET endtime /DU duration} [/K] ]
[/SD startdate] [/ED enddate] [/ENABLE /DISABLE] [/IT] [/Z] }
http://technet.microsoft.com/en-us/library/bb490996.aspx

April 15, 2009

Cascading deletes in SQL Server

SQL Server supports cascading deletes. SQL Server does so via foreign key constraints with the DELETE CASCADE flag set. In the following example, after creating the objects and inserting some data, we delete a USR_ID from the parent data. After querying the child table (TUSER_PHONE) a second time, we can see that the cascading delete worked :

CREATE TABLE TUSERS
(
USR_ID int CONSTRAINT [PK_Temp_Users1] PRIMARY KEY CLUSTERED ([USR_ID])
)

CREATE TABLE TUSER_PHONE
(
USR_ID int
,CONSTRAINT [PK_Temp_Users2] PRIMARY KEY CLUSTERED ([USR_ID])
)
ALTER TABLE [dbo].TUSER_PHONE WITH CHECK ADD
CONSTRAINT [FK_Temp_UsersPhone_Users] FOREIGN KEY([USR_ID])
REFERENCES [dbo].[TUsers] ([USR_ID])
ON DELETE CASCADE
GO
INSERT INTO TUSERS
SELECT 1 UNION SELECT 2 UNION SELECT 3
INSERT INTO TUSER_PHONE
SELECT 1 UNION SELECT 2 UNION SELECT 3
SELECT * FROM TUSER_PHONE
DELETE TUSERS WHERE USR_ID=2
SELECT * FROM TUSER_PHONE

How to find record in all tables in database

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

-- =============================================
-- Author: Avishkar
-- Create date: 04/15/2009
-- Description: It will find the particular word in all tables in all fields and gives the result
-- Execute as : Exec SearchStringInAllTables 'avi'
-- =============================================
ALTER PROCEDURE [dbo].[SearchStringInAllTables]
(
@SearchStr nvarchar(100)
)
AS
BEGIN
CREATE TABLE #Results (ColumnName nvarchar(370), ColumnValue nvarchar(3630))
SET NOCOUNT ON
DECLARE @TableName nvarchar(256), @ColumnName nvarchar(128), @SearchStr2 nvarchar(110)
SET @TableName = ''
SET @SearchStr2 = QUOTENAME('%' + @SearchStr + '%','''')
WHILE @TableName IS NOT NULL
BEGIN
SET @ColumnName = ''
SET @TableName =
(
SELECT MIN(QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME))
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE'
AND QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME) > @TableName
AND OBJECTPROPERTY(
OBJECT_ID(
QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME)
), 'IsMSShipped'
) = 0
)
WHILE (@TableName IS NOT NULL) AND (@ColumnName IS NOT NULL)
BEGIN
SET @ColumnName =
(
SELECT MIN(QUOTENAME(COLUMN_NAME))
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = PARSENAME(@TableName, 2)
AND TABLE_NAME = PARSENAME(@TableName, 1)
AND DATA_TYPE IN ('char', 'varchar', 'nchar', 'nvarchar')
AND QUOTENAME(COLUMN_NAME) > @ColumnName
)
IF @ColumnName IS NOT NULL
BEGIN
INSERT INTO #Results
EXEC
(
'SELECT ''' + @TableName + '.' + @ColumnName + ''', LEFT(' + @ColumnName + ', 3630)
FROM ' + @TableName + ' (NOLOCK) ' +
' WHERE ' + @ColumnName + ' LIKE ' + @SearchStr2
)
END
END
END
SELECT ColumnName, ColumnValue FROM #Results
END

SQL Server - Fragmentation

Even though SQL Server is very good at more or less automatically maintaining most statistics, physical data structuring and configurations etc (especially compared to some of it's competitors), there are still some situations where you need to perform a little manual work to help it. One of those areas is fragmentation.

Different types of fragmentation
Fragmentation occurs as data is modified. There are two major distinctions regarding fragmentation, disk fragmentation and SQL Server fragmentation.

Logical disk fragmentation:
The database files can be logically fragmented in the file system just like any other operating system files. This occurs when the file system cannot create the file in one logically contigous space. The result of this is that the disk head must move back and forth over the disk when reading from the file. To remove this type of fragmentation you can use the defragmentation tools that are included in Windows or use some third-party application. Remember though that SQL Server must be stopped during the defragmentation process because when it is started the database files are in use by SQL Server, even if no user is actually using that specific database, and files in use can not be defragged.

Avoiding fragmentation of database files:
To keep fragmentation in the database files at a minimum without defragging them you should avoid using the autogrow setting in SQL Server for them. Preallocate enough size for them from the start and when growth still is needed try to increase the size in large chunks at once. Also make sure you keep the database files on separate disks away from other files to make sure logical contigous space is available for them. Separating the data files from the log file is helpful and also important for performance in other ways, since the log file is written to sequentially.

SQL Server Index FragmentationThe other major type of fragmentation, and the one that is the most interesting to us, is SQL Server fragmentation. This means that the data stored inside the database files is fragmented on an index level basis, regardless of whether the database file is logically fragmented or not. The easiest way to describe it in a couple of words it to say that it prevents SQL Server from being able to use indexes in an optimal way. Index fragmentation can be either internal or external. Note that external fragmentation is not the same thing as logical disk fragmentation, though the name might imply it. Each of these types are described in detail below.
Internal fragmentation
Pages that have a lot of free space are said to be internally fragmented. This is caused by rows that are removed by DELETE statements or when pages are split and only filled to about half. Empty space on pages means there are less rows per page, which in turn means more page reads. Systems that are read-intensive can see a significant degrade in performance from these extra reads. Even for applications that are defined as OLTP (OnLine Transaction Processing) systems, i.e. write-intensive systems, the ratio between reads and writes is usually more than 80/20.
Sometimes necessary and wanted:
However, page splits are very expensive operations. Sometimes the tradeoff of extra page reads during read operations for fewer page splits during write operations is correct. For systems that really are write-intensive, or perhaps specific operations that generate many writes, this might be necessary. There is really no correct advice regarding this, as it is completely different between different applications and systems.

External fragmentation:
When tables grow as new data is inserted into them the ideal situation is that pages get allocated to an extent, then when that extent is filled up a new extent is allocated, then filled up with pages and so on. This way all the pages will always be located adjacent to eachother and readingeight pages will in a best case scenario only require reading one extent from disk.

Reality, however, is seldom ideal. New rows, when inserted, are bound to cause page splits sooner or later, as will updates that cause variable sized rows to grow too large to fit on the original page. When a new page is created from a page split it will be allocated in the same extent as the original page if there is room for it. If the extent is already full, which will normally be the case in a production system, a new extent will be allocated to the index and the new page will be placed there. The new page will not be contigous to the original page, so when following the page chain using the previous and next pointers in the page headers SQL Server will need to perform an extra extent switch. In a worst case scenario reading eight pages in order might require seven extent switches, and eight extents must be read from disk.

Ordered scans are affected:
When reading individual rows external fragmentation will not affect the performance at all, and the same goes for unordered scans. The reason for the latter is that unordered scans can use the IAM pages to find which extents need to be fetched in a very efficient way. It is only for scans that need to fetch the pages in logical order that the page chain is used and external fragmentation might become a degrading factor for performance.

Viewing fragmentation information:
To view how much fragmentation an index has you use DBCC SHOWCONTIG. It can show information for all indexes on a table or just a single index at a time. Using the option TABLERESULTS you get extra columns in the output that describe statistics about the index. By default DBCC SHOWCONTIG scans the page chain at the leaf level, but it is possible to scan all levels of an index.
When reviewing the output from DBCC SHOWCONTIG you should pay special attention to the following metrics:
· Avg. Page Density (full): Shows the average level of how filled the pages are. A percentage means the pages are almost full, and a low indicates much free space. This value should be compared to the fill factor setting specified when the index was created to decide whether or not the index is internally fragmented.
· Scan Density: Shows the ratio between the Best Count of extents that should be necessary to read when scanning all the pages of the index, and the Actual Count of extents that was read. This percentage should be as close to 100% as possible. Defining an acceptable level is difficult, but anything under 75% definitely indicates external fragmentation.
· Logical Scan Fragmentation: Shows the ratio of pages that are out of logical order. The value should be as close to 0% as possible and anything over 10% indicates external fragmentation.

Two things to know about DBCC SHOWCONTIG’ output and the numbers in Scan Denisty, Logical Scan Fragmentation and Extent Scan Fragmentation
1) If your indexes are small, then these values do not matter at all. Why? Because SQL Server has two types of extents: Uniform Extents and Mixed Extents. Uniform extents are extents that are owned by a single object where as mixed extents are used for more than one smaller objects. In the case of a uniform extent, all the 8 pages for the extent (an extent = 8 pages = 64KB) are allocated to one single object. That is why in smaller objects, you will see that the scan density will be low, extent scan fragmentation will be high since the object will be using say a single page in N extents. If the objects are large, then SQL Server will be using the uniform extents and you will see scan density closer to 100% if there is no fragmentation.
2) Another thing to note is that the extent scan fragmentation data that is shown in the output of DBCC SHOWCONTIG does not represent the true value if there are multiple files in the filegroup and the index belongs to that filegroup. This is actually also documented in BOL.

Defragging:
There are four different ways to defragment an index.
  • Drop and recreate index Pros: Completely rebuilds the index. Reorders and compacts the index pages, and removes unnecessary pages. Cons: Index is not available during process which means that queries will suffer. The process will block all queries accessing the index and also be blocked itself by other processes using the index, unable to do it’s work until those processes are finished. This option is especially bad for clustered indexes as dropping a clustered index means that all non-clustered indexes must be rebuilt (to use RID as bookmark), and then recreating the clustered index will once again force a rebuild of all non-clustered indexes.
  • CREATE INDEX ... WITH DROP_EXISTING Pros: Rebuilds index in one step which is good for clustered indexes, as it means that non-clustered indexes do not need to be rebuilt. Can be used to recreate (and thereby defragment) indexes created by constraints, if the index definition matches the requirements of the constraint. Cons: Potential blocking problems with other processes in the same way as for drop and recreate index above.
  • DBCC DBREINDEX Pros: Does the best job of removing internal and external fragmentation since it physically assigns new pages to the index and moves data to them. Has the possibility to rebuild all indexes on a table using one statement. Recreates indexes for constraints without forcing you to know their requirements. Cons: The same blocking issues as for the two above. Runs in a transaction so all work that has been done is rolled back if the command is stopped.
  • DBCC INDEXDEFRAG Pros: Reorders the leaf pages in index order – enhancing performance for scans especially – and compacts them using the setting specified for fill factor when the index was created. Empty pages are removed. This is an online operation, meaning it does not take long-term locks and thereby do not block other processes and do not get blocked itself by them. Work that has already been done will still be done if the command is cancelled. Cons: The fact that this is an online operation can also have a negative impact, as this means that DBCC INDEXDEFRAG will simply skip pages that are used by other processes. Very fragmented indexes will take a long time to defragment, so they should probably be rebuilt instead if possible. Generates a lot of log activity.

April 14, 2009

SQL Server 2005- Indexes

If you have a primary key set on that table then a clustered index will be created .The other columns which are widely used in the queries and which are used in the 'where' clause you can create non-clustered indexes on them.But the creation of indexes should be balanced and you need to optimise it by testing .More number of indexes results in slow inserts and also increases the maintenance costs.

You need info about different things before deciding on the indexes. Some of them are
Volume of data in the table (IMP),
Type of queries that are going to hit the table (IMP)
Purpose of the data , Connectivity , Concurrent connections, Server Hardware.....
There are other considerations also but offhand i can think of these.Of these the first two are most important and index design is influenced the most by them.
Clustered index keep the actual data in the leaf pages of the index in a sorted order. In many of the cases clustered index also functions as the Primary key/unique index and so you would want your most "selective" columns to be part of the clust index.

best practices are
* frequently used tables must have a clustered index
* no need create index on tables having little records or columns having duplicates (like country, gender etc) with exception if the columns are filtered on often it may be very beneficial to have an index on the column even if there are lots of duplicates. The index may need to be covering for SQL to use it.
* avoid duplicate index sets like (col1,col2, col3) and (col1, col3,col2)
In above situation, they must have the same columns in the same order. There are queries that could effectivly used the first of those indexes, but not the second and vis versa.
* create indexes on columns which are frequently used in WHERE and JOINS.guidelines for selecting clustered indexe key
* columns that contain large number of unique values
* queries that returns range of values.
* columns that are accessed sequentially
* queries that returns large resultset
* columns which are not updated frequently.guidelines for selecting nonclustered indexe key
* queries dont return large resultset
* columns frequently involved in WHERE having exact match

SQL Server 2005 Blocking

Database queries should be able to execute concurrently without errors and within acceptable wait times. When they don't, and when the queries behave correctly when executed in isolation, you will need to investigate the causes of the blocking. Generally, blocking is caused when a SQL Server process is waiting for a resource that another process has yet to release. These waits are most often caused by requests for locks on user resources. SQL Server 2005 has added some important new tools that adds to this toolkit. These tools include:Enhanced System Monitor counters (Perfmon) DMV's: sys.dm_os_wait_stats, sys.dm_os_waiting_tasks and sys.dm_tran_locks Blocked Process Report in SQL Trace SQLDiag Utility In System Monitor, the Processes Blocked counter in the SQLServer:General Statistics object shows the number of blocked processes. The Lock Waits counter from the SQLServer:Wait Statistics object can be added to determine the the count and duration of the waiting that is occurring. The Processes blocked counter gives an idea of the scale of the problem, but only provides a summary , so further drill-down is required. DMV's such as sys.dm_os_waiting_tasks and sys.dm_tran_locks give accurate and detailed blocking information.The sys.dm_os_waiting_tasks DMV returns a list of all waiting tasks, along with the blocking task if known. There are a number of advantages to using this DMV over the sp_who2 or the sysprocesses view for detecting blocking problems:The DMV only shows those processes that are waiting sys.dm_os_waiting_tasks returns information at the task level, which is more granular than the session level. Information about the blocker is also shown sys.dm_os_waiting_task returns the duration of the wait, enabling filtering to show only those waits that are long enough to cause concern The sys.dm_os_waiting_task DMV returns all waiting tasks, some of which may be unrelated to blocking and be due to I/O or memory contention. To refine your focus to only lock-based blocking, join it to the sys.dm_tran_locks DMV.

SQL SERVER - Isolation levels

Whenever a statement is executed, or a data modification is made, it runs under the influence of an isolation level. The isolation level directly impacts the performance of SQL Server, along with the integrity of the data.

Essentially, an isolation level determines the degree to which a data modification is isolated from another data modification, or a database modification is isolated from a statement (such as a SELECT statement), or how a statement is isolated from another statement.

More specifically, isolation levels are used to control the following:
If any locks are taken when data is read, and what kind of locks are being requested.
How long read locks are kept.
And last, they determine if a read operation acting on a row that is experiencing a data modification, (1) blocks the data modification until the row is unlocked, (2) retrieves the committed version of the row before the start of the data modification, or (3) reads the uncommitted data modified by the data modification.
Traditionally, SQL Server has supported four isolation levels:
Read Uncommitted: This is the lowest isolation level. It only isolates transactions and activities to ensure that physically corrupt data is never read. It allows dirty reads, nonrepeatable reads, and phantom reads.
Read Committed: This isolation level does not permit dirty reads, but does allow nonrepeatable reads and phantom reads. This is the default isolation level for SQL Server, and is used for each connection to SQL Server unless one of the other isolation levels has manually been set for a connection.
Repeatable Read: This isolation level does not permit dirty reads or nonrepeatable reads, but does allow phantom reads.
Serializable Read: This is the highest isolation level and ensures that all transactions and statements are completely isolated from each other. It does not allow dirty reads, nonrepeatable reads, or phantom reads.
Read Uncommitted provides the best concurrency, but the least data integrity. On the other extreme, Serializable Read provides the worst concurrency, but the highest level of data integrity. As you know, the higher the concurrency, the better SQL Server performs, and the lower the concurrency, the worse SQL Server performs. As a DBA, it is your job to balance the needs between data integrity and concurrency by selecting an appropriate isolation level.

In SQL Server 2005, two new isolation levels are introduced, both of which use row versioning. They include:
READ_COMMITTED_SNAPSHOT (statement level)
ALLOW_SNAPSHOT_ISOLATION (transaction level)
The purpose of these new isolation levels is to give DBAs or developers more granularity in their choice of isolation levels, with the goal of improving read concurrency. When read concurrency is increased, there are fewer locks to be managed by SQL Server. This results in less locking resources required by SQL Server, along with less blocking. This in turn boosts SQL Server's performance.
When a row versioning-based isolation level (which includes the two new ones we are now discussing) is enabled at the database level, the database engine maintains versions of each row that is modified (for an entire database). Whenever a transaction modifies any row, an image of the row before the modification is copied into a page of what is called the version store. The version store is located in the tempdb database and is used for temporary storage of versioned rows for all of the databases on a single SQL Server instance.

April 08, 2009

Deadlock

A deadlock is different to blocking, which is what I think you mean when you say that " ... The update SPs are fairly simple and should update fast enough to avoid a deadlock [Block] ..."
A block is not a deadlock, while a deadlock is a special kind of stalled block.A deadlock doesn't time out. SQL finds deadlocks, chooses a victim and kills it, allowing the other transaction to proceed - hopefully before it times out.
So what is a deadlock. It is a mutal thing, where generally two processes each have a resource, and want a resource that the other has (and like kids, wont budge!)
What causes deadlocks is insert/update (generally) code, that accesses the tables in different orders.
If all transactions update tables in the same order, then you cannot have deadlock. One transaction may need to wait for the other, but they won't both be waiting on each other.
To see a little more, run these tracesdbcc traceon( 3604 )dbcc traceon( 1204 )
You will need to do this each time the server restarts.
Now in the SQL Error Log, you have far more details from both processes.
C:\Program Fiiles\Microsoft SQL Server\MSSQL\LOG

Deadlocks aren't caused by people/systems accessing the same data at the same time. They're caused by situations where processes cannot resolve locks.
For example, process A has to lock table X, and then lock table Y, while process B has to lock table Y, and then table X. If A has a lock on X and wants a lock on Y, but B has a lock on Y and wants a lock on X, neither one can finish, thus a "deadlock" occurs, and SQL kills one of them.
The main way to solve that is to make sure that processes access and lock data in the same sequence as each other. No deadlocks that way.
I'd definitely analyze the deadlocks, find what's causing them, and fix that, rather than just taking the "nolock shotgun" and hoping you hit something with it. Using nolock as a solution for deadlocks is just asking for dirty reads and junked up data. Lots of people are taught to use nolock for this kind of thing, but aren't made aware of the fact that it can result in serious data problems.
For example, user A takes a look at row 3 of table X. Meanwhile, process B updates that row, but user A never sees the update, because of nolock and such. So user A updates the row, overwriting what process B did. Now, is that row of data correct? Process B's update is gone, as if it had never been done, and user A may have wanted a different update than the one he did, if he knew the data was being changed by process B. It can make a big mess.

Creating DataFrames from CSV in Apache Spark

 from pyspark.sql import SparkSession spark = SparkSession.builder.appName("CSV Example").getOrCreate() sc = spark.sparkContext Sp...