Hive architecture:
Hive organizes data in tables and partitions. A good partitioning scheme allows Hive to prune data while processing a query and that has a direct impact on how fast a result of the query can be produced. Behind the scenes, Hive stores partitions and tables into directories in Hadoop File System (HDFS).

Hive comprises of the following major components:
-Metastore: To store the meta data.
-Query compiler and execution engine: To convert SQL queries to a sequence of map/reduce jobs that are then executed on Hadoop.
- SerDe and ObjectInspectors: Programmable interfaces and implementations of common data formats and types.
-UDF and UDAF: Programmable interfaces and implementations for user defined functions (scalar and aggregate functions).
-Clients: Command line client similar to Mysql command line and a web UI.
Data Flow into Hadoop Cloud:

For more Information:
http://www.vldb.org/pvldb/2/vldb09-938.pdf
No comments:
Post a Comment