April 11, 2011

Database Design Essentials

The components that are absolutely necessary for a successful database project. There are 10 essentials:

Understand the database’s purpose:
The first essential is that you discover and understand the database’s purpose before doing anything else. Saying that the database’s purpose is “to store records” isn’t good enough. You need to understand the business reason for this database’s existence before it goes into development, much less production. The business reason needs to map not only to the technical specifications but also to the business purpose and the business processes that this database will be supporting.

Get the right tool:
A good CASE tool can help you quickly create different types of data models—such as functional decompositions, system and context diagrams, business process and data flow diagrams, and entity relationship models—each of which will help you communicate to your colleagues the visual components of the proposed system you’re building.

Gather the requirements for the database.
You need to understand the requirements well enough to be able to create visual representations of data, processes, and systems.

Be conscientious when modeling the data:
You need to perform data modeling at multiple levels. You should create the following three models:

Conceptual data model (CDM). The CDM contains the semantics of the information domain that you’re modeling. The CDM defines the scope of the database. Entity classes—the distinct data sets I mentioned earlier—represent things of interest to the organization, and the relationships between the entity pairs assert business rules and regulations. The CDM expresses both the as-is and to-be states and is often created during the Discovery stage.
Logical data model (LDM). The LDM describes the structure of the information domain. It includes criteria such as attribute set details, data types, key status, and individual attribute nullability.
Physical data model (PDM).The PDM defines how the data will be physically stored. You need to map the LDM to a specific database management system (DBMS) platform.

Enforce relationships.
The whole idea of a relational database is to identify and enforce relationships between entity pairs so that data integrity, as defined in the business rules and regulations and represented in the data models, is maintained.

Use the appropriate data types.
Defining appropriate data types and lengths, nullability, defaults, and check constraints when creating tables
Using user-defined types, user-defined functions, and stored procedures that evaluate data when it’s being entered or used
Using Declarative Referential Integrity (DRI) or triggers to enforce relationships between pairs of tables
Using lookup tables to restrict data values when they’re inputted or edited

Include indexes when modeling.:
From the gathered requirements, you should be able to determine which columns will be used for filtering (i.e., in the WHERE clause),sort ( ORDER BY clause)


Standardize the naming convention.

Store the code that touches the data in Server.
The best way to avoid inconsistent rule application and SQL injection is to store code that touches a database’s data as a stored procedure or CLR object.
Document the work.

April 05, 2011

Performance Counters for SQL server DBA

Available Mbytes

Definition

Available Mbytes stands for free unallocated RAM and displays the amount of physical memory, in MB, available to processes running on the computer.

Interpretation

This counter only displays the last value and is not an average.
If the value is less than 20/25 percent of installed RAM it is an indication of insufficient memory.
Less than 100 MB is an indication that the system is very starved for memory and paging out.
Fluctuations of 100 MB or more can indicate that someone is logged in remotely into the server.
Pages/Sec

Definition

Pages/sec is the number of pages read from the disk or written to the disk to resolve memory references to pages that were not in memory at the time of the reference.

Interpretation

1 This is the sum of two counters - Pages Input/sec and Pages Output/sec.

2 The threshold is normally 20 pages/sec, although one has to investigate activity on the server before concluding paging is the problem.

3 Spikes in pages/sec are normal and possible due to backups, big files/data being written to disk and after reboot.

4 SQL Server has to be configured to dynamically manage to the "Dynamically configure SQL Server memory" option, and the "Maximum Memory" setting should be set to the maximum RAM possible with allowing room for OS. SQL Server should also ideally be the only application on the server.

5 High Available mbytes and low paging file % usage with high pages/sec may not indicate a problem, may merely be indicating that the system is reading a memory mapped file sequentially.

6 Also investigate Page Faults per second, which is the cumulative sum of hard and soft page faults since when the system rebooted. It may be hard to interpret this counter since it is a cumulative value and may be very large but if you have multiple programs sharing the computer with SQL Server you may be able to see which program is causing the paging by looking at each program’s page faults per second.

How to identify the SQL Server Start/Restart Date & Time ?

1. When was the SQL Server Service Started ?

SELECT agent_start_date FROM msdb.dbo.Syssessions WHERE session_id = (SELECT max(session_id) from msdb.dbo.Syssessions)

SELECT login_time from sys.dm_exec_sessions where session_id = 1

SELECT min(login_time) FROM Master..sysprocesses
-- SQL server 2008 only
SELECT sqlserver_start_time FROM sys.dm_os_sys_info;

SELECT login_time FROM Master..sysprocesses WHERE spid = 1

2. When was the TempDB database Re-Created ?
SELECT create_date 'SQL Server Started at' FROM sys.databases WHERE [name] ='tempdb'

GO

Creating DataFrames from CSV in Apache Spark

 from pyspark.sql import SparkSession spark = SparkSession.builder.appName("CSV Example").getOrCreate() sc = spark.sparkContext Sp...