Big Data Interview Questions and Answers-Hbase
1. When should we use hbase ?
when we need to work with billions of rows and millions of columns, hbase is the best.
2. Difference b/w hbase and hdfs ?
HDFS is a distributed file system for storing and managining large data across clusters. HBase is built on top of HDFS and provides fast record lookups (and updates) for large tables.
3.why to use Hbase?
High capacity storage system.
Distributed design to cater large tables.
High performance & Availability.
supports random real time CRUD operations.
4.why to use Hbase?
High capacity storage system
Distributed design to cater large tables
High performance & Availability
supports random real time CRUD operations
5. Mention how many operational commands in Hbase?
Operational command in Hbases is about five types
6) In Hbase what is column families?
Column families comprise the basic unit of physical storage in Hbase to which features like compressions are applied.
7) What is the use of row key ?
The use of row key is to have logical grouping of cells which ensures all cells with the same rowkey are co-located on the same server.
8) Explain deletion in Hbase? Mention what are the three types of tombstone markers in Hbase?
When you delete the cell in Hbase, the data is not actually deleted but a tombstone marker is set, making the deleted cells invisible. Hbase deleted cells are actually removed during compactions.
Three types of tombstone markers are there:
Version delete marker: For deletion, it marks a single version of a column
Column delete marker: For deletion, it marks all the versions of a column
Family delete marker: For deletion, it marks of all column for a column family.
9) Explain how does Hbase actually delete a row?
In Hbase, whatever you write will be stored from RAM to disk, these disk writes are immutable barring compaction. During deletion process in Hbase, major compaction process delete marker while minor compactions don’t.
10) Explain what happens if you alter the block size of a column family on an already occupied database?
When you alter the block size of the column family, the new data occupies the new block size while the old data remains within the old block size. During data compaction, old data will take the new block size. New files as they are flushed, have a new block size whereas existing data will continue to be read correctly. All data should be transformed to the new block size, after the next major compaction.
Python is a dynamic interrupted language which is used in wide varieties of applications. It is very interactive object oriented and high-level programming language.
Tableau is a Software company that caters interactive data visualization products that provide Business Intelligence services. The company’s Head Quarters is in Seattle, USA.
Micro Strategy is one of the few independent and publicly trading Business Intelligence software provider in the market. The firm is operational in 27 Countries around the globe.
Pega Systems Inc. is a Cambridge, Massachusetts based Software Company. It is known for developing software for Customer Relationship Management (CRM) and Business process Management (BPM).
Workday specialises in providing Human Capital Management, Financial Management and payroll in online domain.It is a major web based ERP software vendor.
Power BI is business analytics service by Microsoft. With Power BI, end users can develop reports and dashboards without depending on IT staff or Database Administrator.