Data Engineering with Avishkar: 04/01/2008

April 21, 2008

DBA- Database Administrator

A DBA is a Database Administrator, and this is the most common job that you find a database specialist doing. There are Development DBAs and Production DBAs. A Development DBA usually works closely with a team of developers and gets more involved in design decisions, giving advice on performance and writing good SQL. That can be satisfying at a human level because you are part of a team and you share the satisfaction of the team's accomplishments. A Production DBA (on the other hand) is responsible for maintaining Databases within an organisation, so it is a very difficult and demanding job. He or she, often gets involved when all the design decisions have been made, and has simply to keep things up and running. Therefore, of course, it is also a rewarding job, both financially and in terms of job satisfaction. But it's a more 'lonely' job than being a Development DBA.
Production DBA", with skills covering all the "core" aspects of managing and safeguarding a company's databases and data, including such as backups and restores, securing data, performance tuning and troubleshooting, monitoring, auditing and so on. Alongside the Production DBA, is the Development DBA, who tends to look after the development and test databases, and is expert in all aspects of T-SQL.

Production DBA", with skills covering all the "core" aspects of managing and safeguarding a company's databases and data, including such as backups and restores, securing data, performance tuning and troubleshooting, monitoring, auditing and so on. Alongside the Production DBA, is the Development DBA, who tends to look after the development and test databases, and is expert in all aspects of T-SQL.
Exceptional DBA.
there are many other areas in which the Exceptional DBA can and should specialize. Some of these niches, such as "database design specialist", are well-established, but others, such as "reporting / business intelligence specialist", or "high availability specialist" are newer and in increasing demand.
I would suggest that any Exceptional DBA needs:
Sound knowledge of all "core" aspects of a DBA's job
Inside-out knowledge of one chosen specialist area
Adaptability.
There are many SQL Server DBAs with an intimate knowledge of how SQL Server works, the operating system, hardware and so on. They have the skills to keep a production system running without a hitch.
It is no longer enough.
As a DBA, you must communicate effectively. Communication does not just mean talking to your team members and boss. It includes writing effective emails, status reports, check lists, documentation, presentations and database diagrams. Even that is no longer enough; you must ensure that the company, as a whole, understands what you are doing and how your work contributes to the business.
A "good" DBA should have the ability, knowledge, and tenacity to quickly find out those things (s)he may not know and apply them correctly.
"exceptional" DBA must also know the business climate and how business requirements may or may not affect the server.

April 17, 2008

Database Fundamentals

1. What is a database?
The term "database" is used to describe both the collection of data and the software tool used to manage the data (usually called Database Management System).

2. What is RDBMS ?

RDBMS stands for Relational Database Management System. RDBMS data is structured in database tables, fields and records. Each RDBMS table consists of database table rows. Each database table row consists of one or more database table fields.

RDBMS store the data into collection of tables, which might be related by common fields (database table columns). RDBMS also provide relational operators to manipulate the data stored into the database tables. Most RDBMS use SQL as database query language.

Edgar Codd introduced the relational database model. Many modern DBMS do not conform to the Codd’s definition of a RDBMS, but nonetheless they are still considered to be RDBMS.

The most popular RDBMS are MS SQL Server, DB2, Oracle and MySQL.

3. What is SQL?
SQL stands for Structured Query Language. SQL language is used to create, transform and retrieve information from RDBMS (Relational Database Management Systems). SQL is pronounced SEQUEL. SQL was developed during the early 70’s at IBM.

4. Codd's laws :
The concept of relational databases was first described by Edgar Frank Codd (almost exclusively referenced as E. F. Codd in technical literature) in the IBM research report RJ599, dated August 19th, 1969.1 However, the article that is usually considered the cornerstone of this technology is "A Relational Model of Data for Large Shared Data Banks," published in Communications of the ACM(Vol. 13, No. 6, June 1970, pp. 377-87). Only the first part of the article is available online.
Additional articles by E. F. Codd throughout the 1970s and 80s are still considered gospel for relational database implementations. His famous "Twelve Rules for Relational Databases"2 were published in two Computerworld articles "Is Your DBMS Really Relational?" and "Does Your DBMS Run By the Rules?" on October 14, 1985, and October 21, 1985, respectively. He has since expanded on the 12 rules, and they now number 333, as published in his book "The Relational Model for Database Management, Version 2" (Addison -Wesley, 1990).
In 1985 Codd produced the following set of 'rules' by which systems should be judged:
4.1. Information.
All information is represented at the logical level and in exactly one way - by values in tables.
4.2. Guaranteed access.
Each datum (atomic value) in a relational database is guaranteed to be logically accessible through a combination of table-name, primary key value and column-name.
4.3. Systematic treatment of null values.
Null values (distinct from the empty character string or a string of blank characters, and distinct from zero or any other number) are supported in a fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type.
4.4. Database description.
The database description is represented at the logical level in the same way as ordinary data, so that authorised users can apply the same relational language to its interrogation as they apply to the regular data.
4.5. Comprehensive sub-language.
A relational system may support several languages and various modes of terminal use (for example the fill-in-the-blanks mode). However there must be at least one language whose statements are expressible through some well defined syntax as character strings, and that is comprehensive in supporting all the following items:
1. Data definition
2. View definition
3. Data manipulation (interactive and by program)
4. Integrity constraints
5. Authorisation
6. Transaction boundaries (begin, commit and rollback)
4.6. View updating.
All theoretically-updatable views are also updatable by the system. (A view is theoretically updatable if there is a time-independent algorithm for unambiguously determining a single series of changes to the base tables, having as their effect precisely the requested changes in the view.)
4.7. Insert and update.
The capability of handling a base table or a derived table as a single operand applies not only to retrieval of data but also to insertion, updating, and deletion. (This allows the system to optimise its execution sequence by determining the best access paths. It may be important in obtaining efficient handling of transactions across a distributed database, avoiding the communications costs of transmitting separate requests for each record obtained from remote sites.)
4.8. Physical data independence.
Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods. (There must be a clear distinction between logical and physical design levels.)
4.9. Logical data independence.
Application programs and terminal activities remain logically unimpaired when information-preserving changes of any kind that theoretically permit unimpairment are made to the base tables. (This rule permits logical database design to be changed dynamically, e.g. by splitting or joining base tables in ways which do not entail loss of information.)
4.10. Integrity independence.
Integrity constraints must be definable in the relational data sub-language and storable in the catalogue, not in the applications program. Certain integrity constraints hold for every relational database, further application-specific rules may be added. The general rules relate to:
1. Entity integrity : no component of a primary key may have a null value.
2. Referential integrity : for each distinct non-null 'foreign key' value in the database, there must exist a matching primary key value from the same domain.
4.11. Distribution independence.
A relational DBMS has distributional independence - i.e. if a distributed database is used it must be possible to execute all relational operations upon it without knowing or being constrained by the physical locations of data. This must apply both when distribution is originally introduced, and when data is redistributed.
4.12. Non-subversion.
A low-level (single record-at-a-time) language cannot be used to subvert or bypass the integrity rules and constraints expressed in the higher-level (multiple record-at-a-time) language.

April 15, 2008

Difference between MS SQL 2005 & Oracle 10g

Difference between SQL Server 2005 & Oracle 10g

1. The logical components

1.a SQL SERVER 2005:

The fundamental unit of data storage in SQL Server is the page. The disk space allocated to a data file (.mdf or .ndf) in a database is logically divided into pages numbered contiguously from 0 to n. Disk I/O operations are performed at the page level. That is, SQL Server reads or writes whole data pages.

Extents are a collection of eight physically contiguous pages and are used to efficiently manage the pages. All pages are stored in extents.

Data rows are put on the page serially, starting immediately after the header. A row offset table starts at the end of the page, and each row offset table contains one entry for each row on the page. Each entry records how far the first byte of the row is from the start of the page. The entries in the row offset table are in reverse sequence from the sequence of the rows on the page.

Extents are the basic unit in which space is managed. An extent is eight physically contiguous pages, or 64 KB. This means SQL Server databases have 16 extents per megabyte.
To make its space allocation efficient, SQL Server does not allocate whole extents to tables with small amounts of data. SQL Server has two types of extents:
• Uniform extents are owned by a single object; all eight pages in the extent can only be used by the owning object.
• Mixed extents are shared by up to eight objects. Each of the eight pages in the extent can be owned by a different object.
A new table or index is generally allocated pages from mixed extents. When the table or index grows to the point that it has eight pages, it then switches to use uniform extents for subsequent allocations. If you create an index on an existing table that has enough rows to generate eight pages in the index, all allocations to the index are in uniform extents.

1.b Database Files
SQL Server 2005 databases have three types of files:
• Primary data files

The primary data file is the starting point of the database and points to the other files in the database. Every database has one primary data file.
• Secondary data files

Secondary data files make up all the data files, other than the primary data file. Some databases may not have any secondary data files, while others have several secondary data files.
• Log files

Log files hold all the log information that is used to recover the database. There must be at least one log file for each database, although there can be more than one.

1.c Database Filegroups
Database objects and files can be grouped together in filegroups for allocation and administration purposes. There are two types of filegroups:
Primary
The primary filegroup contains the primary data file and any other files not specifically assigned to another filegroup. All pages for the system tables are allocated in the primary filegroup.
User-defined
User-defined filegroups are any filegroups that are specified by using the FILEGROUP keyword in a CREATE DATABASE or ALTER DATABASE statement.
Log files are never part of a filegroup. Log space is managed separately from data space.
No file can be a member of more than one filegroup. Tables, indexes, and large object data can be associated with a specified filegroup. In this case, all their pages will be allocated in that filegroup, or the tables and indexes can be partitioned

T-SQL vs PL/SQL
The dialect of SQL supported by Microsoft SQL Server 2000 is called Transact-SQL (T-SQL). The dialect of SQL supported by Oracle 9i Database is called PL/SQL. PL/SQL is more powerful language than T-SQL. This is the brief comparison of PL/SQL and T-SQL:
Feature PL/SQL T-SQL
Indexes B-Tree indexes,
Bitmap indexes,
Partitioned indexes,
Function-based indexes,
Domain indexes B-Tree indexes
Tables Relational tables,
Object tables,
Temporary tables,
Partitioned tables,
External tables,
Index organized tables Relational tables,
Temporary tables
Triggers BEFORE triggers,
AFTER triggers,
INSTEAD OF triggers,
Database Event triggers AFTER triggers,
INSTEAD OF triggers
Procedures PL/SQL statements,
Java methods,
third-generation language
(3GL) routines T-SQL statements
Arrays Supported Not Supported

1. SINGLE PLATFORM DEPENDANCY
SQL Server is only operable on the Windows platform, and this is a major limitation for it to be an enterprise solution. Oracle is available on multiple platforms such as Windows, all flavors of Unix from vendors such as IBM, Sun, Digital, HP, Sequent, etc. and VAX- VMS as well as MVS. The multi-platform nature of Oracle makes it a true enterprise solution.
2. LOCKING / CONCURRENCY
SQL Server has no multi-version consistency model which means that "writers block readers and readers block writers" to ensure data integrity. In contrast, with Oracle the rule is "readers don’t block writers and writers don’t block readers". This is possible without compromising data integrity because Oracle will dynamically re-create a read-consistent image for a reader of any requested data that has been changed but not yet committed.
In other words, the reader will see the data as it was before the writer began changing it (until the writer commits). SQL Server's locking scheme is much simpler (less mature) and will result in a lot ff delays/waits in a heavy OLTP environment.
Also, SQL Server will escalate row locks to page level locks when too many rows on a page are locked. This locks rows which are uninvolved in any updates for no good reason.
3. POTENTIAL OF LONG UNCOMMITED TRANSACTIONS HALTING DATABASE ACTIVITY
In SQL Server 2000, a long uncommitted transaction can stop other transactions which queue behind it in the single transaction log, and this can stop all activity on the database, whereas in Oracle, if there is a long uncommitted transaction, only the transaction itself will stop when it runs out of rollback space, because of the use of different rollback segments for transactions.
Oracle allocates transactions randomly to any of its multiple rollback segments and areas inside that rollback segment. When the transaction is committed, that space is released for other transactions, however Sql server allocates transactions sequentially to its single transaction log, the space occupied by committed transactions is not released to new transactions until the recycling of the transaction log is complete (in a circular round-robin manner).
This means if there is an uncommitted transaction in the middle, and the transaction log cannot grow by increasing the file size, no new transactions will be allowed. This is the potential of a single uncommitted transaction to halt database activity.
4. PERFORMANCE and TUNING
a. No control of sorting (memory allocation) in SQL Server. Oracle can fully control the sort area size and allows it to be set by the DBA.
b. No control over SQL Caching (memory allocation) in SQL Server. This is controllable in Oracle.
c. No control over storage/space management to prevent fragmentation in SQL Server. All pages (blocks) are always 8k and all extents are always 8 pages (64k). This means you have no way to specify larger extents to ensure contiguous space for large objects. In Oracle, this is fully configurable.
d. No range partioning of large tables and indexes in SQL Server, whereas in Oracle a large table (eg. 100 GB or more) can be seamlessly partitioned at the database level into range partitions, for eg. an invoice table can be partitioned into monthly partitions.
Such partitioned tables and partitioned indexes give performance and maintenance benefits and are transparent to the application.
e. No Log miner facility in SQL Server. Oracle 8i and 9i supply a Log Miner which enables inspection of archived redo logs. This comes free with the database. But in the case of SQL Server, external products from other companies have to be purchased to do this important DBA task.
f. A SQL Server DBA claimed that fully qualifying the name of an object in SQL Server code would lead to performance gains of 7% to 10%. There are no dictionary performance problems like that in Oracle. Oracle would have some gains if it fully qualified all names - say 0.01 percent.
This actually shows the difference in the internal database technology between Oracle and Microsoft and implies that the technology of resolving object names via the dictionary is more advanced in the case of Oracle, ie. Oracle seems to better access its internal dictionary and resolve names, unlike SQL server.
5. MISSING OBJECT TYPES IN SQL SERVER
a. No public or private synonyms b. No independent sequences c. No packages ie. collection of procedures and functions.
6. PROGRAMMING
a. Significant extensions to the ANSI SQL-92 standard in SQL Server, which means converting applications to a different database later will be a challenge (code re-write).
b. Sql Server has no inbuilt JAVA database engine as in Oracle. In Oracle, Java classes can be loaded and executed in the database itself, thus adding the database's security and scalability to Java applications.
c. In SQL Server, stored Procedures are not compiled until executed (overhead). In Oracle, packages and procs/functions are compiled before execution.
In Oracle 9i it is also possible to translate Pl/Sql into C code and then compile/link the code, which gives very good performance gains for numeric intensive operations. SQL Server has no such ability.
d. In SQL server, there is no ability to read/write from external files from a stored procedure. Oracle has this ability.
e. SQL Server uses cryptic commands for database administration like:
exec sp_addrolemember N'db_datareader', N'davidb' go
This is to add the user davidb to the role db_datareader.
On the other hand, Oracle uses standard English-like SQL commands to do the same:
grant db_datareader to davidb;
This one statement does all, in simple English, what the cryptic SQL Server command does.
f. Oracle SQL and PL/SQL syntax is more powerful and can do things more intuitively than Microsoft Transact-SQL. Try to sum up a column by each month, and show the totals for the month, in SQL Server you do it in T-SQL by grouping on strings, in Oracle it is possible to do this grouping by the trunc(,'month') function. This method in Oracle is more intuitive, it understands the dates, the method in SQL Server does not.
g. In SQL Server, you cannot issue a "create or replace" for either procedures or views, in Oracle you can. This one facility simplifies code writing, since in Sql Server the procedure or view must be dropped first if present and then recreated ie. two commands, in Oracle there is no need - a single command "create or replace" is enough.
h. In Oracle, a procedure/function/package/view is marked as invalid if a dependant object changes. In SQL Server there is no concept of an invalid procedure, it will run but give unexpected results. The former is more suitable for change control and preventing unexpected errors.
i. A recompile reuses the code that is in the Oracle database, the actual command is "alter procedure compile". This is applicable to procedures/functions/packages/views. This concept of recompiling is not there in SQL Server where you have to resubmit the whole code if you want to recompile a procedure.
j. Triggers in Oracle do not allow transactional control ie. commit/rollback/savepoint statements. Whereas, triggers in SQL Server allow commits/rollbacks, which is potentially dangerous and can cause problems with transactions which fire the trigger.
Triggers in SQL Server also can start transactions of their own which is not very good and shows lack of maturity of the language.
7. STANDBY DIFFERENCES
SQL Server and Oracle have differences regarding standby databases. A standby is a database set up on a second server and to which logs are applied ie. all database changes, so that the standby can be activated and used in the case of a failover.
a. In the case of Sql server, when there is a failover, the "master" and "msdb" databases have to be restored from backup or copied over from the primary to the standby and then the standby is activated after all logs are applied. In Oracle, there is no need for a restore to be done, the standby can be activated at any time after all logs are applied.
This difference exists because of the fact that in SQL server, new users/roles added to the primary are not carried over to the standby (these users/roles go in the master/msdb) and backups have to be done continuously of the master and msdb, these backups are then restored when the time comes for a failover.
In the case of Oracle, users/roles when created in the primary are automatically carried over to the standby. So when the failover time arrives, all that is needed is to activate the standby.
b. In the case of Sql Server, if the standby is opened as read only, to switch it back to standby again, a restore from backup has to be done. In the case of Oracle, from 8i version onwards, if a standby database is opened as read only, it can be reopened as a standby without restoring from backup.
c. The time delay to apply logs between the primary and the standby can be varied, but it can never be 0 minutes in the case of SQL Server.
In the case of Oracle, in 9i it is possible to have logs applied simultaneously to the primary as well as standby, using Net8 protocol. This means zero data loss in the case of a failover whereas SQL Server's log shipping cannot avoid data loss d during the time gap.
d. SQL Server's log shipping mechanism also happens at the OS level, whereas Oracle's mechanism can take place directly at the Net8 level where logs are automatically applied to standbys without any scripts or OS batch files, this mechanism in Oracle is called managed standby.
e.One deficiency of Oracle in the standby was that datafiles, if created on the primary, had to be manually created on the standby whereas SQL Server does this automatically. However, in 9i, this deficiency is fixed and data files are created automatically at the standby.
f. Another deficiency of Oracle in the standby is that direct loads, if using the unrecoverable facility to bypass redo logging, require the data files of the primary database to be manually copied across to the standby. This is not fixed in 9i. SQL Server's version of log shipping and direct loads do not require this copying across.
8. CLUSTER TECHNOLOGY
In clustering technology, in the case of SQL Server, two nodes cannot work on the same database, they "share nothing". At the best, to utilize the power of both nodes, the application must be manually spit up and redistributed between the hosts, working on different sets of data, and it is not possible to seamlessly scale upwards by adding another node to the cluster in the case of SQL Server.
Most cluster configurations in Sql Server use the power of only one node, leaving the other node to take over only if there is a problem with the first node.
In the case of Oracle Parallel server, it is possible to have two or more instances of the database on different nodes acting on the SAME data in active-active configurations. Lock management is handled by the Oracle Parallel server. With the new version of Parallel Server in Oracle 9i, renamed as the Oracle real application cluster (9i RAC), there is diskless contention handling of read-read, read- write, write-read, and write-write contention between the instances.
This diskless contention handling is called Cache Fusion and it means for the first time, any application can be placed in a cluster without any changes, and it scales upwards by just adding another machine to the cluster.
Microsoft has nothing like this clustering technology of Oracle, which can best be described as "light years ahead".
9. REPLICATION DIFFERENCES
In Microsoft SQL Server's version of simple replication ie, publisher-subscriber using transactional replication, even if only one table is being replicated, the entire transaction log is checked by the log reader agent and transactional changes applied to the subscribers.
In Oracle's simple replication, changes to a single table are stored in a snapshot log and copied across, there is no need to check all the archive logs.
10. SECURITY
As of 2002, Oracle has 14 independent security evaluations; Microsoft SQL Server has only one.
11. INFORMATION POOL
SQL Server and Internet articles of the magazine are only available with paid subscription. Whereas, Oracle has given its magazine free for many years, all articles are free on the internet, and the Oracle Technical network (OTN) is also free on the internet.
12. USABILITY
Some people say Microsoft SQL Server tools, like Enterprise manager, are easy to use. Oracle Enterprise Manager is a huge tool and seems daunting to inexperienced people. This is true to an extent, however ease of use cannot be compared with the many features in Oracle, and its industrial-level strength, and its many technical advantages.
13. TCP Benchmarks
March 2002 Benchmarks from TPC.org show that Oracle 9i is seen in the majority of top benchmarks in "non-clustered" TPC-C for performance (OLTP), whereas SQL Server is seen in the majority of entries for "clustered" TPC-C for performance (OLTP).
This gives the strange impression that SQL server is faster in a cluster than in a non-cluster, which is misleading. The fact is that this result is due to the use of "federated databases" in clusters by Microsoft in which pieces of the application are broken up and placed on separate active-active servers, each working on separate pieces of the application.
While excellent for theoretical benchmarks, this is not a practical approach in the real life IT world because it requires massive changes to any application, and also ongoing changes to the application when new servers are added to the cluster (each server has a view that sees the data in the other servers, adding a new server would mean rewriting the views for all tables on all servers) and would be rejected by any practical headed manager.
Using this impractical approach of federated databases in clusters, the impression is that Sql-server leads in clustered performance, but the practical reality is otherwise. This is seen in the way SQL Server is not to be seen in the non-clustered benchmarks.
Also, Oracle leads the way for Tpc benchmarks for Decision Support Systems with 1,000 Gb and 3,000 Gb sizes (TPC-H by performance per scale), whereas SQL Server is only seen to a small extent in the 300GB range.
14. XML Support
To provide a more native support for XML, the first release of Oracle9i Database introduced the XMLType datatype and associated XML specific behavior. In addition, built in XML generation and aggregation operators greatly increase the throughput of XML processing.
With the second release of Oracle9i, Oracle significantly adds to XML support in the database server. This fully absorbs the W3C XML data model into the Oracle database, and provides new standard access methods for navigating and querying XML - creating a native integrated XML database within the Oracle RDBMS.
The key XDB technologies can be grouped into two major classes - XMLType that provides a native XML storage and retrieval capability strongly integrated with SQL, and an XML Repository that provides foldering, access control, versioning etc. for XML resources. The integration of a native XML capability within the database brings a number of benefits.
In summation, Oracle9i Release 2's XDB functionality is a high- performance XML storage and retrieval technology available with the Oracle9i Release 2 database. It fully absorbs the W3C XML data model into the Oracle Database, and provides new standard access methods for navigating and querying XML. With XDB, you get all the advantages of relational database technology and XML technology at the same time.
In contrast to this, Microsoft SQL Server 2000 only has limited ways to read and write xml from its tables.
SUMMARY
SQL Server is clearly positioned between MS-ACCESS and ORACLE in terms of functionality, performance, and scalability. It makes a work group level solution (small number of users with small amount of data), perhaps at the departmental level.
Oracle is much more advanced and has more to offer for larger applications with both OLTP and Data Warehouse applications. Its new clustering features are ideal for Application Service Providers (ASPs) on the Internet who can now start with a cluster of 2 small servers and grow by just adding a server when they need to. Besides, Oracle's multi-platform capability makes it the most convincing argument for an enterprise.

1. Oracle runs on many platforms, SQL on Windows only
2. Oracle includes IFS (Internet File System), Java integration, SQL is more of a pure database
3. Oracle requires client install and setup (Not difficult, but very UNIX-like for Windows users)
4. SQL is #1 in Cost/Performance and overall Performance, although Oracle will refute that
5. Replication is much easier in SQL (I have been at clients where even the Oracle consultant couldn't get it working w/oracle)
6. Failover support in SQL is much, much easier
7. JDBC support is much better in Oracle, although Microsoft is working on it
8. ODBC support in both
9. SQL is ANSI-SQL '92 compliant, making it easier to convert to another ANSI compliant database, theoretically anyway (truth is every database has proprietary extensions). Oracle is generally more proprietary and their main goal is to keep their customers locked-in.
10. SQL natively supports ODBC, OLEDB, XML, XML Query, XML updates. Oracle natively supports proprietary connections, JDBC. Not sure about XML support though. 11. SQL Server is much easier to administrate, with GUI and command- line tools. Most of Oracle is command-line (Back in SQL 6.5 days I had a customer who was so proud that after a day's worth of work he had managed to script his database. I showed him how it was a 3 click operation in SQL ;-)
12. Oracle requires add-ons for transaction monitors, failover, etc. SQL has COM+, uses NT clustering and generally has everything built-in
13. SQL Analysis Services is included (A very powerful OLAP server). For Oracle it is a separate purchase.
14. Locking and concurrency SQL Server has no multi-version consistency model, which means that "writers block readers and readers block writers" to ensure data integrity. In contrast, with Oracle, the rule is "readers don't block writers and writers don't block readers." This is possible without compromising data integrity because Oracle will dynamically re-create a read-consistent image for a reader of any requested data that has been changed but not yet committed. In other words, the reader will see the data as it was before the writer began changing it (until the writer commits). SQL Server's locking scheme is much simpler (less mature) and will result in a lot of delays/waits in a heavy OLTP environment. Also, SQL Server will escalate row locks to page level locks when too many rows on a page are locked. This locks rows which are uninvolved in any updates for no good reason.

Performance and tuning

a. In SQL Server, the DBA has no "real" control over sorting and cache memory allocation. The memory allocation is decided only globally in the server properties memory folder, and that applies for ALL memory and not CACHING, SORTING, etc.
b. All pages (blocks) are always 8k and all extents are always 8 pages (64k). This means you have no way to specify larger extents to ensure contiguous space for large objects.
c. No range partioning of large tables and indexes. In Oracle, a large 100 GB table can be seamlessly partitioned at the database level into range partitions. For example, an invoice table can be partitioned into monthly partitions. Such partitioned tables and partitioned indexes give performance and maintenance benefits and are transparent to the application.
d. There is no partitioning in SQL Server.
e. There are no bitmap indexes in SQL Server.
f. There are no reverse key indexes in SQL Server.
g. There are no function-based indexes in SQL Server.
h. There is no star query optimization in SQL Server.

Object types
Here are some object types missing in SQL Server that exist in Oracle. a. You cannot declare public or private synonyms. b. There is no such thing as independent sequence objects. c. There are no packages; i.e., collections of procedures and functions. d. No "before" event triggers (only "after" event triggers) and no row triggers (only statement)

- Stored procedure plans are indeed retained in a compiled state, it is blatantly false to say that they are recompiled every time - Statistics are much easier to manage in SQL, but you can get far more fine-grained control in Oracle - Generally speaking, SQL query optimizer is far far more reliable in SQL; query hints are used rarely, if ever. That said, every database will have at least some optimizer glitches, and it is up to the database developer to discover and fix them (usually with hints) - Generally testing performance is easy in SQL, with easily accessible query plans and statistics. These can be obtained from Oracle but are much more challenging.

very different concurrency model between SQL and Oracle. Oracle operates against point-in-time(log) snapshots of the data (using SCNs), but acquires those on demand. Therefore readers never block. In SQL, you have more choices of isolation/concurrency, but the more aggressive settings will have writers block readers, and the most aggressive can have readers block writers. In classic RDBMS this is not too bad - but in complex deep relations such as OORDBMS-over-RDBMS or other object-graph storage solutions, deadlocks can really hurt your project unless you can scale some queries back to Read Committed isolation level.

Oracle will not index null values (for compound indexes, that means all values null produces no entry).
SQL will index all rows.
On Oracle this can be an advantage for large tables where columns are often null but you want to find the non-null values quickly; but is a terrible disadvantage if you need to find null values (it produces a table scan).
On SQL sometimes you wish you could ignore nulls (but I believe they dropped that option long ago); but you almost never get surprised by those nasty table scans.

The following table addresses the most frequently asked questions (FAQ), related to migrating from Oracle to SQL Server. This is list is by no means complete and will be updated periodically.
Oracle SQL Server More Information
%TYPE data type No equivalent The %TYPE data type of Oracle, lets you create a variable and have that variable's data type be defined by a table or view column or a PL/SQL package variable.

There is no equivalent for Oracle's %TYPE datatype in T-SQL, but it can be simulated (not very conveniently though) using User Defined Data types (UDT). Here is an example:

EXEC sp_addtype 'MyType', 'smallint', NULL

CREATE TABLE MyTable (i MyType)

CREATE PROC MyProc
AS
BEGIN
DECLARE @i MyType
END
BEFORE triggers INSTEAD OF triggers Use INSTEAD OF trigger in SQL Server as an equivalent to Oracle's BEFORE trigger.

For more information on INSTEAD OF triggers, see SQL Server Books Online
DECODE() function CASE expression DECODE can be conveniently simulated using the T-SQL CASE expression. Here's an example:

SELECT Sport,
CASE Sport
WHEN 'Cricket' THEN 'England'
WHEN 'Hockey' THEN 'India'
WHEN 'Base Ball' THEN 'America'
ELSE NULL
END AS 'Originating Country'
FROM Sports
DESCRIBE sp_help or sp_columns There are a lot of alternatives for Oracle's DESCRIBE, in SQL Server. You could use the system stored procedure sp_help for detailed information about a table's columns and other properties.

If sp_help is providing you with too much information, then try the ODBC catalog stored procedure, sp_columns.

There are a bunch of other useful sp_help* stored procedures available in SQL Server. You can find more information about those in SQL Server Books Online.

If none of those procedures are suitable for your requirements, then you could query the system view INFORMATION_SCHEMA.COLUMNS, to get the desired information. You could wrap your code inside a stored procedure named DESCRIBE, if you wish.

As a last resort, you could even query system tables like sysobjects and syscolumns, but this is not a recommended approach.
DUAL table No equivalent There is no DUAL table in SQL Server. In fact, you don't need one in SQL Server, as you can have a SELECT statement without a FROM clause.

For example, consider the following SELECT statement in Oracle:

SELECT 'Something'
FROM DUAL

In SQL Server, the same result can be obtained by the following command:

SELECT 'Something'

If you are porting some code from Oracle into SQL Server and if you don't want to remove all references to DUAL table, then just create a DUAL table in your database using the following commands:

CREATE TABLE DUAL
(
DUMMY varchar(1)
)

INSERT INTO DUAL (DUMMY) VALUES ('X')
INTERSECT operator Not supported Use EXISTS clause to generate the same result.

The following example illustrates the simulation of Oracle's INTERSECT operator:

SELECT OrderID, OrderDate
FROM Orders O
WHERE EXISTS
(
SELECT 1
FROM RefundsTable R
WHERE O.OrderID = R.OrderID
)
MINUS operator Not supported Use NOT EXISTS clause in your SELECT statement to generate the same result.

The following example illustrates the simulation of Oracle's MINUS operator:

SELECT OrderID, OrderDate
FROM Orders O
WHERE NOT EXISTS
(
SELECT 1
FROM RefundsTable R
WHERE O.OrderID = R.OrderID
)
Nested tables Not supported Oracle 8i and prior versions didn't support this feature and is introduced in Oracle 9i. This feature basically enables you to store a table, within a column. It is like having an array of records in your database columns.

SQL Server has no concept of nested tables.

As a workaround, You could store your sub-tables or child tables in the form of XML inside a char, nchar, varchar, nvarchar, text or ntext type column, and parse it as needed, at runtime. See OPENXML, sp_xml_preparedocument, sp_xml_removedocument in SQL Server 2000 Books Online.

Another possible workaround would be to store comma separated values (CSV).

Note that this is against the basic rules of normalization. Columns are nomore atomic, with nested tables.

From a design point of view, best thing to do would be, to create different tables for representing different entities and link them with primary and foreign key relationships. This will also make searching easier.
Oracle SQL Server :-)
Packages Not supported No equivalent in SQL Server for Oracle's Packages and Package variables concept
PL/SQL T-SQL Every database product implements and extends the standard SQL. Oracle's implementation of SQL is called PL/SQL, while Microsoft's is called T-SQL (Transact-SQL)
Row level security No equivalent Though there is no inbuilt support in SQL Server for row level permissions, you can implement it using view and system functions.

For more information and a working example, read this article:

Implementing row level permissions/security in SQL Server databases

rownum pseudo column No equivalent Though there is no rownum or rowid in SQL Server, there are several ways in which a row number can be generated.

For some examples, read this article:

Q186133 INF: How to Dynamically Number Rows in a Select Statement

SELECT...FOR UPDATE UPDLOCK hint Use the locking hint UPDLOCK in your SELECT statement.

See SQL Server Books Online for more information.
Sequences IDENTITY It is much simpler in SQL Server to generate a sequence value for a primary key or a non-key column.

You just need to turn on the IDENTITY property of column. IDENTITY property can be set to columns of the following data types: tinyint, smallint, int, bigint, decimal, numeric

Try this example to see how it works:

CREATE TABLE foo
(
i int IDENTITY(1, 1)
j int
)

INSERT INTO foo (j) VALUES (911)
INSERT INTO foo (j) VALUES (999)
SQL *Plus Query Analyzer For connecting to SQL Server and executing queries and modifying data, use the built-in Query Analyzer. It is much more powerful and friendlier than Oracle's SQL *Plus
START WITH...CONNECT BY clause No equivalent Though there's no direct equivalent in T-SQL for Oracle's START WITH...CONNECT BY, there are several ways and efficient techniques for processing and querying hierarcical data.

For more information, read this article:

Working with hierarchical data in SQL Server databases

Synonym Views You can simulate Oracle Synonyms in SQL Server using Views. For example, the following creates a view that returns the OrderID and OrderDate from Orders table.

CREATE VIEW vOrders
AS
SELECT OrderID, OrderDate
FROM Orders

Now you can select data from the view, using the following SELECT statement:

SELECT * FROM vOrders

The following example will create a view which is equivalent to Oracles TABS synonym or USER_TABLES data dictionary view (For simplicity, I am only including the table name in the view definition):

CREATE VIEW TABS
AS
SELECT name AS table_name
FROM sysobjects
WHERE type = 'U'

Now you can execute the following SELECT statement to return all user table names:

SELECT table_name
FROM TABS

JOIN in Oracle 10g

JOIN in Oracle 10g:

JOIN: Oracle 10g As your database grows so will the need to get information from more than one table.

A join is nothing more than writing a query that takes a set of rows from one or more tables and combines them together. The tables that will participate in the join are listed in the FROM clause of the SQL query and then, depending on the syntax we choose, rows from each table are linked by either the WHERE clause or the JOIN clause.

A join is a query that combines rows from two or more tables, views, or materialized views. Oracle performs a join whenever multiple tables appear in the query's FROM clause. The query's select list can select any columns from any of these tables. If any two of these tables have a column name in common, you must qualify all references to these columns throughout the query with table names to avoid ambiguity.

Equijoins or Inner Join:

Table_A and Table_B each have one column named LETTER.

Table_A.letter Table_B.letter

A A

An equijoin is a join with a join condition containing an equality operator ( = ).

An equijoin combines rows that have equivalent values for the specified columns.

The following SQL joins Table_A against Table_B where they share the same letter value.

SQL >SELECT Table_A.letter, Table_B.letter 2

FROM Table_A, Table_B 3

WHERE Table_A.letter = Table_B.letter;

LETTER LETTER

---------- ----------

A A

Self Joins:

A self join is a special form of equijoin or INNER JOIN where a table is joined against itself. This means that the table must exists two times in the FROM clause of the SQL query. A self join is a join of a table to itself. This table appears twice in the FROM clause and is followed by table aliases that qualify column names in the join condition. To perform a self join, Oracle combines and returns rows of the table that satisfy the join condition.

Left Outer Joins: Often times we need to return rows from one table even if there are no matching rows that are produced through a join condition. For this situation, we use outer joins. A left outer join is where the table, on the left of a FROM clause is required to return all of its rows regardless of having matching rows from the table it is being joined on. So in the following SQL we want to join Table_A to Table_B and show where they are equal on the column LETTER as well as return rows from Table_A that do not have a match on Table_B. The old method in Oracle was to supply the plus sign in parentheses (+) next to all columns in the WHERE clause that may not have values that will match against the table being joined to. Here is the old method in Oracle to produce a left outer join.

SQL >SELECT Table_A.letter, Table_B.letter 2

FROM Table_A, Table_B 3

WHERE Table_A.letter = Table_B.letter(+);

LETTER LETTER

---------- ----------

A A

Right Outer Joins:

A right outer join is just the opposite of a left outer join. It states that you would like all rows from the right table in the FROM clause to be returned regardless of having a true match defined in the WHERE clause against the left side table in the FROM clause. Here is the old method in Oracle for producing the right outer join.

SQL >SELECT Table_A.letter, Table_B.letter 2

FROM Table_A, Table_B 3

WHERE Table_A.letter(+) = Table_B.letter;

LETTER LETTER

---------- ----------

A A

Full Outer Joins:

The full outer join is a special condition. With Oracle's old method of supplying (+) notation, there is no method to explicitly state in a SQL statement that you want to return both left and right sides of a query regardless of having a match. To accomplish this you must write a left outer join SQL statement and UNION it with a right outer join SQL statement. Doing that produces the following SQL and results.

SQL >SELECT Table_A.letter, Table_B.letter

FROM Table_A, Table_B

WHERE Table_A.letter = Table_B.letter(+)

UNION

SELECT Table_A.letter, Table_B.letter

FROM Table_A, Table_B

WHERE Table_A.letter(+) = Table_B.letter;

LETTER LETTER

---------- ----------

A A

B C

Cartesian Products:

A cartesian join is a query where there is actually no join criteria between the table in the query. What is returned is a cartesian product where each row from a table is matched against every row in the other table. So in our example where Table_A has two rows and Table_B has two rows there will be 2 X 2 or 4 rows returned. The old method was to exclude the WHERE clause. SQL >

SELECT Table_A.letter, Table_B.letter 2

FROM Table_A, Table_B;

LETTER LETTER

---------- ----------

A A

A C

B A

B C

April 14, 2008

Oracle scripts

-- Create table
create table Owner1.Table_Name1
(
Col11 NUMBER(5) not null,
Col12 NUMBER(3),
Col13 VARCHAR2(10)
);
create table Owner1.Table_Name2
(
Col21 NUMBER(5) not null,
Col22 NUMBER(3),
Col23 VARCHAR2(10)
);
-- Add comments to the table
comment on table Owner1.Table_Name1 is ‘Test Table1.';

-- Create/Recreate primary, unique and foreign key constraints
alter table Owner1.Table_Name1
add constraint PK_Col11 primary key (Col11);
alter table Owner1.Table_Name1
add constraint FK_ Col12 foreign key (Col12)
references Owner1.Table_Name2 (Col22)
deferrable;

-- Create/Recreate check constraints
alter table Owner1.Table_Name1
add constraint NN_ Table_Name1_ Col11
check ("Col11" IS NOT NULL);

-- Grant/Revoke object privileges
grant select, references on Owner1.Table_Name1 to Owner2 with grant option;
grant select on Owner2.Table_Name2 to PUBLIC;

-- create or replace function
create or replace function F_Function(istr1 varchar2,
inum1 number)
return varchar2 is
-- return number is
-- Declare variables
num1 number;
str1 varchar(30);
str2 varchar(30);
begin
str1 := istr1;
If
/* Code1 */
return str1 ;
else
/* Code1 */
end if;
return str2;
end F_Function;
-- CREATE OR REPLACE PROCEDURE
CREATE OR REPLACE PROCEDURE P_PROC1as
-- Cursor to get REQUIRED INFO FROM BASE TABLES
Cursor C_CURSOR
is select t1.col1
from table1 t1, table2 t2
where t1.col1=t2.col2;
val number;
begin
OPEN C_CURSOR ;
loop
-- To get next value
fetch C_CURSOR into val;
EXIT WHEN C_CURSOR%NOTFOUND;
insert into t2(col2)
select col1
from t1
where t1.col1=val;
end loop;
CLOSE C_CURSOR;
end;

Data Engineering with Avishkar

April 21, 2008

DBA- Database Administrator

April 17, 2008

Database Fundamentals

April 15, 2008

Difference between MS SQL 2005 & Oracle 10g

JOIN in Oracle 10g

April 14, 2008

Oracle scripts

Stop Memorizing SQL Part 3 : A Practical SQL Cheat Sheet for Beginners

Search This Blog