Hive is an open source, peta-byte scale date warehousing framework based on Hadoop that was developed by the Data Infrastructure Team at Facebook that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.
Hive architecture:
Hive organizes data in tables and partitions. A good partitioning scheme allows Hive to prune data while processing a query and that has a direct impact on how fast a result of the query can be produced. Behind the scenes, Hive stores partitions and tables into directories in Hadoop File System (HDFS).
Hive comprises of the following major components:
-Metastore: To store the meta data.
-Query compiler and execution engine: To convert SQL queries to a sequence of map/reduce jobs that are then executed on Hadoop.
- SerDe and ObjectInspectors: Programmable interfaces and implementations of common data formats and types.
-UDF and UDAF: Programmable interfaces and implementations for user defined functions (scalar and aggregate functions).
-Clients: Command line client similar to Mysql command line and a web UI.
Data Flow into Hadoop Cloud:
For more Information:
http://www.vldb.org/pvldb/2/vldb09-938.pdf
This blog is useful for Database, Business Intelligence, Bigdata and Data Science professionals.
Subscribe to:
Post Comments (Atom)
Secure a Microsoft Fabric data warehouse
Data warehouse in Microsoft Fabric is a comprehensive platform for data and analytics, featuring advanced query processing and full transa...
-
Database Mirroring and index maintenance in AWS Database Mirroring, the database has to be in the full recovery mode, so all index rebuilds...
-
This post contains examples of a number of the most commonly performed DBA activities. They are all listed in alphabetical order by the comm...
-
Move distribution database to another drive: There are scenarios in SQL server Replication that we need to move datafiles and logfiles of di...
No comments:
Post a Comment