January 22, 2010

Pros and Cons of stored procedures

* Stored procedures generally result in improved performance because the database can optimize the data access plan used by the procedure and cache it for subsequent reuse.
* Stored procedures can be individually secured within the database. A client can be granted permissions to execute a stored procedure without having any permissions on the underlying tables.
* Stored procedures result in easier maintenance because it is generally easier to modify a stored procedure than it is to change a hard-coded SQL statement within a deployed component.
* Stored procedures add an extra level of abstraction from the underlying database schema. The client of the stored procedure is isolated from the implementation details of the stored procedure and from the underlying schema.
* Stored procedures can reduce network traffic, because SQL statements can be executed in batches rather than sending multiple requests from the client.


The advantages of using stored procedures are:

1) Network Bandwidth: Assume that the application server(s) and the database server(s) are separate servers. Since the source code is actually stored on the database server, the application only needs to send the name and the parameters for executing it and this in turn reduces the amount of data that is sent to the server. When you use embedded SQL or dynamically generated SQL through an ORM, then the full source of commands must be transmitted and in a scenario where there is a lot of concurrent activity going on and the load on the system requires a lot of users, this can very soon become a performance bottleneck. This can be mitigate in part by a judicious usage of views as well.

2) Abstraction Layer: It helps in separating the business logic from data manipulation logic. Since the interface to the application remains the same, changes done internally to the stored procedures/packages do not effect the application and in turn leads to easy deployment of changes.

3) It offers simplified batching of the commands. Since stored procedures/packages are meant to do a “Unit of Work”, this procedural logic can be simple to maintain and offers additional advantages like making use of the rich feature functionality that the database engines provide. SQL is a SET based language and using SET based procedural logic is the easiest and most performant way of dealing with the data. With every new release of Oracle, SQL Server or DB2 LUW, new features are being introduced in PL/SQL, T-SQL and/or SQL/PL which makes handling of different types of requirements very easy in the DB layer code.

4) Increased data security and integrity: One can secure the tables for direct access and only grant privileges on the stored procedures/packages.

5) By using bind variables (parameterized queries), SQL injection issues are mitigated as well.

6) Code Instrumentation and tracing logic can be built in very easily using the stored procedures. This is one thing that we implemented for one of our clients recently. We created a table which had a list of the DB code that was being used in the schema and this table had a trace_flag column in it which could have 4 different values:

0 (no code instrumentation),

1 (log the sequence of events),

2 ( log the sequence of events and the time taken by those SQL statements),

3 ( log the sequence of events + the time taken + the execution plan from that point of time – since the execution plan can very easily be different at the time of execution under a load scenario vs when you actually run it separately), and

4 (Start the trace – example: starting 10046 level 12 trace in the case of Oracle).

Using this logic, code instrumentation and troubleshooting production issues became very easy. One could then run reports against the data that was logged and present it to the end user or the support personnel. Code instrumentation can be done in the application tier as well using the same guidelines (or using logging blocks like MSFT logging block in .Net) and a good programmer would always instrument their code. However, for the DB code, this code instrumentation becomes a lot more easier to implement.

Cons of using stored procedures:

1) If your application runs on multiple RDBMS, example: You are a vendor and you need to provide your product that runs on Oracle, SQL Server and DB2 LUW in order to expand your customer base, then in that scenario, you have to code or put fixes in for three different code bases. Not only that, you need to have proper staffing to ensure that the code written is optimal since the locking and concurrency mechanisms are very different between these RDBMS. Also, the language used by all these “big 3″ is very different as well.

2) We have seen client shops which offload all of their CRUD operations on the DB tier – as a result they end up with one-liner stored procedures and if you have say 200 objects, you now have 200 * 4 (select/insert/update/delete) stored procedures or one procedure per object with the flag option to indicate the operation and need to code the procedure to use dynamic SQL in order to take into account the conditional parameter logic. Maintenance becomes a nightmare in this case. Also, developers/DBAs sometimes get carried away with this approach and forget that SQL is a set based language – one example is that using this scheme, a client shop was doing purges of the data and kept calling the delete procedure by passing in the ID (surrogate Key) value when they actually could have purged and archived the data using a SET based logic. Procedures are supposed to do unit of work – having one liner procedures and invoking that many calls in a batch does not yield any benefit. In addition, it has to un-necessarily incur the cost of checking permissions and plan associated with that one liner procedure – the cost is albeit a very miniscule one.

3) Parsing of strings is not one of the forte’s of the DB languages – that code is better suited in the application tier unless you start using CLR or Java code.

So, the bottom line is that one needs to carefully evaluate which portion(s) of the application really belong as stored procedure/packages. In applications that work on volume of data and do bulk data processing, it is always advisable to have that logic in stored procedures/packages that reside on the DB layer so that one can take advantage of the SET based approach of the SQL language.

One can also use ORMs (Object Relational Mappers) like Hibernate to prepare their data access strategy – one can then extend it to make calls to the database stored procedure (if so desired in certain situations), have externalized SQL statements, have dynamic SQL statements being prepared by the ORM etc.. One just has to make the right decision depending upon the application and the requirements that are needed.

Another thing to remember is when people point out that an advantage of stored procedures is that the code is always pre-compiled, that is not always true, there can be scenarios that can lead to re-compilation. Also, if proper bind variables are being used for the queries built using an ORM, it serves the same purpose (i.e. has the same advantage as that of a parsed/compiled stored procedure query) since the plan for that SQL statement is parsed and compiled. Depending upon a lot of factors (cache sizes, concurrent load, statistics updates etc.), that plan may or may not be available the next time the call gets made.

A good mix of an ORM plus DB code is usually the best compromise to get the best of both the worlds. Deciding what goes where is the key and though there are guidelines on that, it can vary from application to application depending upon the requirements.

Thanks for article :Coding Horror :programming and human factors by Jeff Atwood
For more information:
Pl visit http://www.codinghorror.com/blog/2004/10/who-needs-stored-procedures-anyways.html

January 06, 2010

Maximum Capacity Specifications for SQL Server

Maximum Capacity Specifications for SQL Server

The following tables specify the maximum sizes and numbers of various objects defined in SQL Server components.

Database Engine Objects
The following table specifies the maximum sizes and numbers of various objects defined in SQL Server databases or referenced in Transact-SQL statements.

SQL Server Database Engine object Maximum sizes/numbers SQL Server (32-bit) Maximum sizes/numbers SQL Server (64-bit)
Batch size1
65,536 * Network Packet Size
65,536 * Network Packet Size

Bytes per short string column
8,000
8,000

Bytes per GROUP BY, ORDER BY
8,060
8,060

Bytes per index key2
900
900

Bytes per foreign key
900
900

Bytes per primary key
900
900

Bytes per row8
8,060
8,060

Bytes in source text of a stored procedure
Lesser of batch size or 250 MB
Lesser of batch size or 250 MB

Bytes per varchar(max), varbinary(max), xml, text, or image column
2^31-1
2^31-1

Characters per ntext or nvarchar(max) column
2^30-1
2^30-1

Clustered indexes per table
1
1

Columns in GROUP BY, ORDER BY
Limited only by number of bytes
Limited only by number of bytes

Columns or expressions in a GROUP BY WITH CUBE or WITH ROLLUP statement
10
10

Columns per index key7
16
16

Columns per foreign key
16
16

Columns per primary key
16
16

Columns per nonwide table
1,024
1,024

Columns per wide table
30,000
30,000

Columns per SELECT statement
4,096
4,096

Columns per INSERT statement
4096
4096

Connections per client
Maximum value of configured connections
Maximum value of configured connections

Database size
524,272 terabytes
524,272 terabytes

Databases per instance of SQL Server
32,767
32,767

Filegroups per database
32,767
32,767

Files per database
32,767
32,767

File size (data)
16 terabytes
16 terabytes

File size (log)
2 terabytes
2 terabytes

Foreign key table references per table4
253
253

Identifier length (in characters)
128
128

Instances per computer
50 instances on a stand-alone server for all SQL Server editions except for Workgroup. Workgroup supports a maximum of 16 instances per computer.

SQL Server supports 25 instances on a failover cluster.
50 instances on a stand-alone server.

25 instances on a failover cluster.

Length of a string containing SQL statements (batch size)1
65,536 * Network packet size
65,536 * Network packet size

Locks per connection
Maximum locks per server
Maximum locks per server

Locks per instance of SQL Server5
Up to 2,147,483,647
Limited only by memory

Nested stored procedure levels6
32
32

Nested subqueries
32
32

Nested trigger levels
32
32

Nonclustered indexes per table
999
999

Number of distinct expressions in the GROUP BY clause when any of the following are present: CUBE, ROLLUP, GROUPING SETS, WITH CUBE, WITH ROLLUP
32
32

Number of grouping sets generated by operators in the GROUP BY clause
4,096
4,096

Parameters per stored procedure
2,100
2,100

Parameters per user-defined function
2,100
2,100

REFERENCES per table
253
253

Rows per table
Limited by available storage
Limited by available storage

Tables per database3
Limited by number of objects in a database
Limited by number of objects in a database

Partitions per partitioned table or index
1,000
1,000

Statistics on non-indexed columns
30,000
30,000

Tables per SELECT statement
Limited only by available resources
Limited only by available resources

Triggers per table3
Limited by number of objects in a database
Limited by number of objects in a database

Columns per UPDATE statement (Wide Tables)
4096
4096

User connections
32,767
32,767

XML indexes
249
249


1Network Packet Size is the size of the tabular data stream (TDS) packets used to communicate between applications and the relational Database Engine. The default packet size is 4 KB, and is controlled by the network packet size configuration option.

2The maximum number of bytes in any index key cannot exceed 900 in SQL Server. You can define a key using variable-length columns whose maximum sizes add up to more than 900, provided no row is ever inserted with more than 900 bytes of data in those columns. In SQL Server, you can include nonkey columns in a nonclustered index to avoid the maximum index key size of 900 bytes.

3Database objects include objects such as tables, views, stored procedures, user-defined functions, triggers, rules, defaults, and constraints. The sum of the number of all objects in a database cannot exceed 2,147,483,647.

4Although a table can contain an unlimited number of FOREIGN KEY constraints, the recommended maximum is 253. Depending on the hardware configuration hosting SQL Server, specifying additional FOREIGN KEY constraints may be expensive for the query optimizer to process.

5This value is for static lock allocation. Dynamic locks are limited only by memory.

6If a stored procedure accesses more than 8 databases, or more than 2 databases in interleaving, you will receive an error.

7If the table contains one or more XML indexes, the clustering key of the user table is limited to 15 columns because the XML column is added to the clustering key of the primary XML index. In SQL Server, you can include nonkey columns in a nonclustered index to avoid the limitation of a maximum of 16 key columns. For more information, see Index with Included Columns.

8 SQL Server supports row-overflow storage which enables variable length columns to be pushed off-row. Only a 24-byte root is stored in the main record for variable length columns pushed out of row; because of this, the effective row limit is higher than in previous releases of SQL Server. For more information, see the "Row-Overflow Data Exceeding 8 KB" topic in SQL Server Books Online.

Maximum in Oracle:

Maximum in Oracle:

Database file size Maximum Operating system dependent. Limited by maximum
operating system file size; typically 222 or 4 MB blocks


Bigfile Tablespaces Number of blocks 232 (4 GB) blocks
Smallfile (traditional)
Tablespaces
Number of blocks 222 (4 MB) blocks


GROUP BY clause Maximum length The GROUP BY expression and all of the
nondistinct aggregate functions (for example, SUM,
AVG) must fit within a single database block.

Indexes Maximum per table Unlimited
total size of indexed column 75% of the database block size minus some
overhead

Columns Per table 1000 columns maximum
Restrict the number of columns less than 255 and the performance will be good.

Per index (or clustered index) 32 columns maximum
Per bitmapped index 30 columns maximum
Constraints Maximum per column Unlimited
Subqueries Maximum levels of subqueries
in a SQL statement
Unlimited in the FROM clause of the top-level
query
255 subqueries in the WHERE clause
Partitions Maximum length of linear
partitioning key
4 KB - overhead
Maximum number of columns
in partition key
16 columns
Maximum number of partitions
allowed per table or index
64 KB - 1 partitions
Rows Maximum number per table Unlimited

Tables Maximum per clustered table 32 tables
Maximum per database Unlimited

January 05, 2010

Compare MS SQL server Vs Oracle 10g

SQL Server:

Pros:
Its easy installation, self-tuning capabilities, graphical administration, integrated business intelligence and extensive collection of help wizards
The total cost of ownership (TCO) of SQL Server 2005 is lower than that of Oracle
Specialised index on a computed column
Indexed View

Cons:
Locking and concurrency: SQL Server has no multi-version consistency model, which means that "writers block readers and readers block writers" to ensure data integrity

Performance and tuning:
a. DBA has no "real" control over sorting and cache memory allocation. The memory allocation is decided only globally in the server properties memory folder, and that applies for ALL memory and not CACHING, SORTING, etc.
b. All pages (blocks) are always 8k and all extents are always 8 pages (64k). This means you have no way to specify larger extents to ensure contiguous space for large objects.
c. In SQL Server, no range partitioning of large tables and indexes. In Oracle, a large 100 GB table can be seamlessly partitioned at the database level into range partitions.

With SQL Server 2005, INSERT, UPDATE, and DELETE statements are executed serially (MERGE is not supported).

Oracle:
Pros :
you can use Oracle on multiple platforms. Whereas Microsoft created SQL Server to be used on the Microsoft platform only, Oracle is available on multiple platforms, including Windows, Unix and now Linux, which is the foundation of Oracle's Real Application Clusters (RAC) strategy.

Locking and concurrency: "readers don't block writers and writers don't block readers." This is possible without compromising data integrity because Oracle will dynamically re-create a read-consistent image for a reader of any requested data that has been changed but not yet committed. In other words, the reader will see the data as it was before the writer began changing it (until the writer commits).

function-based indexes
Oracle will execute INSERT, UPDATE, DELETE, and MERGE statements in parallel when accessing both partitioned and non-partitioned database objects

Cons:
"Implementation of something similar to MSSQL Identity by using Oracle sequence would require reflecting the sequence name in the application or creating a trigger for each table/sequence pair.
Cost is higher
Required skilled DBA.

CONCURRENCY MODEL
concurrency control. The main differences are summarized in the table below:
Oracle Database 10g SQL Server 2005
Multi-version read consistency Always enabled. Not by default.
Must be enabled.
Non-escalating row-level locking Yes Locks escalate

SQL Server 2005 introduces two new isolation levels3:
read committed with snapshots (statement-level read consistency)
snapshot isolation (transaction-level read consistency)
These isolation levels correspond to Oracle’s READ COMMITTED and SERIALIZABLE isolation levels, respectively

Creating DataFrames from CSV in Apache Spark

 from pyspark.sql import SparkSession spark = SparkSession.builder.appName("CSV Example").getOrCreate() sc = spark.sparkContext Sp...