Hive is a data warehouse Software built on the top of Hadoop for query, Data Summary and analysis. It is an SQL like interface to query data stored in the form of data bases and the file systems that integrate with Hadoop. Hive provides necessary SQL abstraction to integrate queries like HIVE SQL which does the w queries with lower level of API. Since many of the data ware housing application works with SQL based querying languages , Hive has a feature of portability of SQL – based application to hadoop.
Get in touch with OnlineITGuru for mastering the Big Data Hadoop online TrainingArchitecture of Hive :
The architecture of Hive is shown below. Let us discuss them each in detail.User Interface: Hive is a Data ware house Infrastructure. That can create interaction between user and HDFS. The user interfaces that supports Hive are Hive Web UI, Hive HD insight and Hive command line.
Meta Store: Schema or Meta data of tables, data bases, columns and HDFS mapping were stored in the data base servers by Hive. It contains partition of meta data which helps the driver to track the progress of various data sets distributed over the cluster. The data is stored in RDBMS Format .
Hive QL process Engine : It is one of the replacement of traditional approach for Map Reduce Program It is similar to SQL for Querying the Schema information on the MetaStore. Hive QL is similar to the SQL for querying the schema information . Here Map Reduce Job reduces the problem of writing Map Reduce Program in Java .
Execution Engine : It is the common part of Map Reduce and HiveQL process Engine . It process the query and generates the results same as Map Reduce .
HBASE : These are the data storage techniques to store data into the file system.
Comparing of Hive with Traditional Data Bases :
Based on SQL, Hive SQL does not strictly follow Full SQL -92 Standard . It offers a extensions that are not in SQL which includes Multi table inserts and create table as select , but this offers only the basic support for indexes .It lacks the support for materialized views and transactions . It support for INSERT , Update . The storage and querying operations of Hive closely resemble to the data bases while SQL is Language . A schema applied to a table in traditional data bases .
For those data bases , the table typically enforces the schema when data loaded into the table . This enables the data bases to ensure that data entered follows the representation of the table as specified in the table definition. This design is called Schema on Write . Hive does not verify the data against the table schema on write . But it checks for the run time checks when the data is read .This is called Schema on Read. Quality checks were performed against the data at the load time to ensure for the data corruption. Early detection corruption ensures early exception handling .
Characteristics of Hive:
In Hive, tables and Databases created first and then data loaded into the tables.
While dealing with structured data, it has a Feature of UDF where the Map Framework doesn’t have
Hive can to improve performance on certain queries to partition data using directory structures
Most of the interactions take place over Command line interface to write Hive Queries using Hive Query language (HQL).