Showing posts with label Data Warehouse. Show all posts
Showing posts with label Data Warehouse. Show all posts

November 30, 2024

Secure a Microsoft Fabric data warehouse

 Data warehouse in Microsoft Fabric is a comprehensive platform for data and analytics, featuring advanced query processing and full transactional T-SQL capabilities for easy data management and analysis.

Microsoft Fabric data warehouse is a complete platform for data, analytics, and AI (Artificial Intelligence). It refers to the process of storing, organizing, and managing large volumes of structured and semi-structured data.

In a warehouse, administrators have access to a suite of technologies aimed at safeguarding sensitive information. These security measures are capable of securing or masking data from users or roles without proper authorization, ensuring data protection across both Warehouse and SQL analytics endpoints. This ensures a smooth and secure user experience, with no need for alterations to the existing applications.

Microsoft Fabric following security features allows for sophisticated security mechanism at the warehouse level:

Workspaces roles – Designed to provide different levels of access and control within the workspace. You can assign users to the various workspace roles such as Admin, Member, Contributor, and Viewer. 

Item permissions – Individual warehouses can have item permissions assigned to facilitate the sharing of the Warehouse for downstream use.

Data protection security – For more precise control, you can use T-SQL to grant specific permissions to users. Warehouse supports a range of data protection features that enable administrators to shield sensitive data from unauthorized access. This includes object-level security for database objects, column-level security for table columns, row-level security for table rows using WHERE clause filters, and dynamic data masking to obscure sensitive data like email addresses. These features ensure data protection across Warehouses and SQL analytics endpoints without necessitating changes to applications.

Dynamic data masking offers several key benefits that enhance the security and manageability of your data like the actual data remains intact and secure, while nonprivileged users only see a masked version of the data.


-- For Email

ALTER TABLE Customers

ALTER COLUMN Email ADD MASKED WITH (FUNCTION = 'email()');


-- For PhoneNumber

ALTER TABLE Customers

ALTER COLUMN PhoneNumber ADD MASKED WITH (FUNCTION = 'partial(3,"XXX-XXX-",4)');


-- For CreditCardNumber

ALTER TABLE Customers

ALTER COLUMN CreditCardNumber ADD MASKED WITH (FUNCTION = 'partial(4,"XXXX-XXXX-XXXX-",4)');

The Masked data will looks like as: 

CustomerName: John Doe

Email: j*****@contoso.com

PhoneNumber: XXX-XXX-7890

CreditCardNumber: XXXX-XXXX-XXXX-3456

As you can see, the sensitive data is hidden from the nonprivileged user, enhancing the security of your data. 


Row-level security:

This feature that provides granular control over access to rows in a table based on group membership or execution context.

For example, in an e-commerce platform, you can ensure that sellers only have access to order rows that are related to their own products. This way, each seller can manage their orders independently, while maintaining the privacy of other sellers’ order information.

Column-level security:

It provides granular control and allows you to restrict column access in order to protect sensitive data and access specific pieces of data, enhancing the overall security of your data warehouse.

Column-level security can help ensure that sensitive information is only accessible to those who are authorized to see it

January 13, 2016

Data Warehouse in 2016

Data Warehouse in 2016

Data Warehouse Vendors in 2016 will distinguish innovations and feature enhancements in the areas of:
  • Integration with in-memory architectures to enable real-time analytics
  • Integration with Hadoop to support larger ingestion and transformation
  • Leveraging native data compression capabilities to secure sensitive data
  • Ability to simplify integration via data virtualization
  • Enabling in-database analytics to support sophisticated requirements
  • Analytic data platforms - Real time,ready-to-use tools—native SQL, integration with the R programming language, and data mining algorithms
  • Modern data types : Mobile devices, social media traffic, networked sensors (i.e. the Internet of Things)

Data Warehouse - Copmarision Matrix

DW ProductIBM - DB2 Data Warehouse Oracle - ExadataTeradata
Company IBM has four major businesses:
hardware, software, services, and
financing. Data warehousing is part
of the data management business,
which is part of the software business.
Oracle has three businesses: database,
applications, and consulting.
Its database business is the
largest by far and currently represents
80 percent of Oracle’s new
license revenues.
NCR has four businesses: data
warehousing, financial self-service,
retail store automation, and customer
services. The Teradata Division
is responsible for the data
warehousing business
Product URLwww.ibm.com/DataWarehousing‎https://www.oracle.com/database/data-warehouse/index.htmlwww.teradata.com/
STRENGTHS- Rich and flexible data partitioning capabilities
-strong analytic functionality in OLAP and data
mining
- Market presence
- Data Models
- Hardware bundle
- Partner network
- Strong services arm (IBM GSA)
- Intelligent Storage Grid
- Hybrid Columnar Compression
- Smart Flash Cache
- Massively-parallel, partitioned,
- Shared-nothing database server
architecture
- its simple and highlyautomated
physical data warehouse
implementation
-set of indexing approaches that enable
fast access to data
- scalable hybrid-storage capabilities
- Teradata has buddied up with all enterprise Hadoop distro providers, enabled new analytic workloads to be added to Teradata systems (JSON, geospatial, 3D geospatial and others) and more.
WEAKNESSES- complex physical
implementation
- lack of integration
with multidimensional
- High cost of ownership
- DB2 on the open systems platforms continues to suffer from locking problems.
• Closed systems; can’t easily ride cost curves associated with commodity hardware.
• Expensive fault tolerant solution compared with Exadata

- its incomplete visual tools for build and manage functionality
- Proprietary hardware
- Costly to maintain
and upgrade
- Limited skilled implementation expertise
Deployment Platforms IBM AIX
Microsoft Windows
Linux
Sun Solaris
IBM AIX
Hewlett-Packard HP-UX
Linux
Microsoft Windows
Sun Solaris
NCR SVR4 UNIX MP-RAS
Microsoft Windows
Server ArchitectureServer platform with a single processor
Single database partition on a
server platform with a multiple
processors
Multiple partition configurations
• Shared-nothing
• Multiple server platforms
• Server interconnect
• Any number of readers and
writers
Single server platform
Distributed database
Real Application Cluster (RAC)
• Shared, partitioned data
• Multiple server platforms
• Server interconnect
• Any number of readers and
writers
Single and multiple node organization
where a node is a hardware
and software platform specialized
and dedicated to data warehousing
Teradata Warehouse is a sharednothing
architecture in both its single
and multiple node configurations
Data Type Support SQL types:
• Numeric
• Binary
• Character
• Date time
DATALINK
XML
Large objects (max 2 gigabytes)
User-defined types (distinct—
renamed SQL types, structured—
object oriented, reference—
hierarchies of built-in types)
Oracle built-in data types (SQL
types):
• Numeric
• Binary
• Character
• Date time
Large objects (max 2 gigabytes)
User-defined types (object-oriented
types, object identifier types, arrays,
nested tables)
Oracle-supplied types
• Spatial
• Media
• Text
• XML
SQL types:
• Numeric
• Binary
• Character
• Date time
Large objects
Physical Design
Recommendation
Neutral on the physical design
of data warehouses.
Neutral on the physical design
of data warehouses.
Teradata is neutral on the physical
design of data warehouses but
recommends a physical design of
third normal form for data warehouses
to maximize flexibility.
Teradata further recommends that
denormalized structures be implemented
as views or redundant
structures (logical data marts or
special purpose tables).
Physical ImplementationManualManual
Template-based via templates and
Database Configuration Assistant
(DBCA) tool
Automated via Oracle managed
files
Automated
Custom Transformations May be written in:
SQL
Java
C++
May be written in:
SQL
PL/SQL
May be written in:
SQL
C++
Summary Table Support Materialized query tables automate
the creation and management of
summary tables. A materialized
query table stores the results of a
query in a table
Materialized views automate the
creation and management of
summary tables. A materialized
view stores the results of a query in
a table
The OLAP transformations of
Teradata Warehouse Miner can
create and manage summary tables.
SQL Extensions CUBE and ROLLUP in SELECT
Functions
• Aggregate
• Numeric
• Statistical
• Correlation
• Random number generation
• Regression
• Date time
User-defined
CUBE and ROLLUP in SELECT
Functions
• Ranking
• Window aggregate
• Reporting aggregate
• Lag/lead
• Linear regression
• Inverse percentile
• Hypothetical rank and distribution
• First/last
• Numeric
• Date time
User-defined
QUALIFY, SAMPLE, and WITH in
SELECT
Functions and operators
• Aggregate
• Numeric
• Date time
• OLAP
OLAP DB2 provides OLAP build and
manage capabilities, relational
OLAP on DB2 tables, and multidimensional
and hybrid OLAP on a
combination of DB2 tables and
external multidimensional structures.
DB2 OLAP Server is a separately-
priced and -packaged product
that is an external, but tightly
integrated, multidimensional OLAP
facility that IBM OEMs from Hyperion
Solutions.
Oracle OLAP is a separatelypackaged
and -priced product that
provides OLAP functionality
Provides relational OLAP on Teradata
Warehouse tables
Data Mining DB2 Intelligent Miner is bundled
with DB2 Data Warehouse Enterprise
Edition.
Oracle Data Mining is a separately-
priced and -packaged product.
Teradata Warehouse Miner is a
separately-packaged and -priced
product that is tightly integrated
with Teradata Warehouse.

Data Warehouse- Teradata


Company - DW ProductTeradata
Company NCR has four businesses: data
warehousing, financial self-service,
retail store automation, and customer
services. The Teradata Division
is responsible for the data
warehousing business
Product URLwww.teradata.com/
STRENGTHS- Massively-parallel, partitioned,
- Shared-nothing database server
architecture
- its simple and highlyautomated
physical data warehouse
implementation
-set of indexing approaches that enable
fast access to data
- scalable hybrid-storage capabilities
- Teradata has buddied up with all enterprise Hadoop distro providers, enabled new analytic workloads to be added to Teradata systems (JSON, geospatial, 3D geospatial and others) and more.
WEAKNESSES- its incomplete visual tools for build and manage functionality
- Proprietary hardware
- Costly to maintain
and upgrade
- Limited skilled implementation expertise
Deployment Platforms NCR SVR4 UNIX MP-RAS
Microsoft Windows
Server ArchitectureSingle and multiple node organization
where a node is a hardware
and software platform specialized
and dedicated to data warehousing
Teradata Warehouse is a sharednothing
architecture in both its single
and multiple node configurations
Data Type Support SQL types:
• Numeric
• Binary
• Character
• Date time
Large objects
Physical Design
Recommendation
Teradata is neutral on the physical
design of data warehouses but
recommends a physical design of
third normal form for data warehouses
to maximize flexibility.
Teradata further recommends that
denormalized structures be implemented
as views or redundant
structures (logical data marts or
special purpose tables).
Physical ImplementationAutomated
Custom Transformations May be written in:
SQL
C++
Summary Table Support The OLAP transformations of
Teradata Warehouse Miner can
create and manage summary tables.
SQL Extensions QUALIFY, SAMPLE, and WITH in
SELECT
Functions and operators
• Aggregate
• Numeric
• Date time
• OLAP
OLAP Provides relational OLAP on Teradata
Warehouse tables
Data Mining Teradata Warehouse Miner is a
separately-packaged and -priced
product that is tightly integrated
with Teradata Warehouse.

Oracle Data Warehouse - Exadata

-->
Company - DW ProductOracle - Exadata
Company Oracle has three businesses: database,
applications, and consulting.
Its database business is the
largest by far and currently represents
80 percent of Oracle’s new
license revenues.
Product URLhttps://www.oracle.com/database/data-warehouse/index.html
STRENGTHS- Intelligent Storage Grid
- Hybrid Columnar Compression
- Smart Flash Cache
WEAKNESSES
Deployment Platforms IBM AIX
Hewlett-Packard HP-UX
Linux
Microsoft Windows
Sun Solaris
Server ArchitectureSingle server platform
Distributed database
Real Application Cluster (RAC)
• Shared, partitioned data
• Multiple server platforms
• Server interconnect
• Any number of readers and
writers
Data Type Support Oracle built-in data types (SQL
types):
• Numeric
• Binary
• Character
• Date time
Large objects (max 2 gigabytes)
User-defined types (object-oriented
types, object identifier types, arrays,
nested tables)
Oracle-supplied types
• Spatial
• Media
• Text
• XML
Physical Design
Recommendation
Neutral on the physical design
of data warehouses.
Physical ImplementationManual
Template-based via templates and
Database Configuration Assistant
(DBCA) tool
Automated via Oracle managed
files
Custom Transformations May be written in:
SQL
PL/SQL
Summary Table Support Materialized views automate the
creation and management of
summary tables. A materialized
view stores the results of a query in
a table
SQL Extensions CUBE and ROLLUP in SELECT
Functions
• Ranking
• Window aggregate
• Reporting aggregate
• Lag/lead
• Linear regression
• Inverse percentile
• Hypothetical rank and distribution
• First/last
• Numeric
• Date time
User-defined
OLAP Oracle OLAP is a separatelypackaged
and -priced product that
provides OLAP functionality
Data Mining Oracle Data Mining is a separately-
priced and -packaged product.

IBM - DB2 Data Warehouse


Company - DW ProductIBM - DB2 Data Warehouse
Company IBM has four major businesses:
hardware, software, services, and
financing. Data warehousing is part
of the data management business,
which is part of the software business.
Product URLwww.ibm.com/DataWarehousing‎
STRENGTHS- Rich and flexible data partitioning capabilities
-strong analytic functionality in OLAP and data
mining
- Market presence
- Data Models
- Hardware bundle
- Partner network
- Strong services arm (IBM GSA)
WEAKNESSES- complex physical
implementation
- lack of integration
with multidimensional
- High cost of ownership
- DB2 on the open systems platforms continues to suffer from locking problems.
• Closed systems; can’t easily ride cost curves associated with commodity hardware.
• Expensive fault tolerant solution compared with Exadata
Deployment Platforms IBM AIX
Microsoft Windows
Linux
Sun Solaris
Server ArchitectureServer platform with a single processor
Single database partition on a
server platform with a multiple
processors
Multiple partition configurations
• Shared-nothing
• Multiple server platforms
• Server interconnect
• Any number of readers and
writers
Data Type Support SQL types:
• Numeric
• Binary
• Character
• Date time
DATALINK
XML
Large objects (max 2 gigabytes)
User-defined types (distinct—
renamed SQL types, structured—
object oriented, reference—
hierarchies of built-in types)
Physical Design
Recommendation
Neutral on the physical design
of data warehouses.
Physical ImplementationManual
Custom Transformations May be written in:
SQL
Java
C++
Summary Table Support Materialized query tables automate
the creation and management of
summary tables. A materialized
query table stores the results of a
query in a table
SQL Extensions CUBE and ROLLUP in SELECT
Functions
• Aggregate
• Numeric
• Statistical
• Correlation
• Random number generation
• Regression
• Date time
User-defined
OLAP DB2 provides OLAP build and
manage capabilities, relational
OLAP on DB2 tables, and multidimensional
and hybrid OLAP on a
combination of DB2 tables and
external multidimensional structures.
DB2 OLAP Server is a separately-
priced and -packaged product
that is an external, but tightly
integrated, multidimensional OLAP
facility that IBM OEMs from Hyperion
Solutions.
Data Mining DB2 Intelligent Miner is bundled
with DB2 Data Warehouse Enterprise
Edition.

Fashion Catalog Similarity Search using Datastax AstraDB Vector Database

DataStax Astra DB's vector database capabilities can be leveraged to build an efficient fashion catalog similarity search, enabling user...