Find below the list of Hadoop interview questions and answers jotted down by experts of JanBask Training to help job seekers
Question: What is Hadoop and its workings?
Answer: When “Big Data” appeared as problematic, Apache Hadoop changed as an answer to it. Apache Hadoop is a context which offers us numerous facilities or tools to store and development of Big Data. It benefits in analysing Big Data and creation business decisions out of it, which can’t be done professionally and successfully using old-style systems.
Question: What is the usage of Hadoop?
Answer: With Hadoop, the employer can run requests on the systems that have thousands of bulges scattering through countless terabytes. Rapid data dispensation and assignment among nodes helps continuous operation even when a node fails averting system let-down.
Question: On what idea, the Hadoop framework runs?
Answer: Hadoop Framework acts upon the subsequent two core components-
1)HDFS – Hadoop Distributed File System is the java based file system for ascendable and consistent storage of great datasets. Data in HDFS is kept in the form of blocks and it functions on the Master Slave Architecture.
2)Hadoop MapReduce-This is a java based software design paradigm of Hadoop framework that delivers scalability across numerous Hadoop clusters. MapReduce allocates the assignment into numerous tasks that can route in parallel.
Hadoop jobs accomplish 2 separate tasks- job. The map job disruptions down the data sets into key-value pairs or tuples. The decrease job then receipts the output of the map job and syndicates the data tuples to into lesser set of tuples. The lessen job is always achieved after the map job is performed.
Question: What are the basic features of Hadoop?
Answer: Inscribed in Java, Hadoop framework has the competence of resolving questions involving Big Data analysis. Its program design model is based on Google MapReduce and substructure is based on Google’s Big Data and dispersed file systems. Hadoop is ascendable and more nodes can be implemented to it.
Question: What is the finest hardware configuration to run Hadoop?
Answer: The finest formation for performing Hadoop jobs is double core machines or dual mainframes with 4GB or 8GB RAM that practice ECC memory. Hadoop extremely assistances from using ECC recollection though it is not low – end. ECC memory is suggested for running Hadoop since most of the Hadoop users have skilled various checksum faults by using non ECC memory. Though, the hardware formation also be subject to on the workflow necessities and can change consequently.
Question: What is a block and block scanner in HDFS?
Answer: Block – The minute’s amount of data that can be delivered or written is largely mentioned to as a “block” in HDFS. The defaulting size of a block in HDFS is 64MB.
Block Scanner – Block Scanner pathways the list of blocks contemporary on a Data Node and confirms them to find any kind of checksum blunders. Block Scanners use a regulating device to standby disk bandwidth on the data node.
Question: Explain the variance between Name Node, Backup Node and Checkpoint Name Node.
Answer: Name Node: Name Node is at the foremost part of the HDFS file system which achieves the metadata i.e. the information of the records is not deposited on the Name Node but somewhat it has the directory tree of all the files existing in the HDFS file system on a Hadoop collection. Name Node uses two records for the namespace-
fsimage file- It retains track of the newest checkpoint of the namespace.
edits file-It is a log of variations that have been made to the namespace since checkpoint.
Checkpoint Node: Checkpoint Node retains track of the up-to-date checkpoint in a directory that has same erection as that of NameNode’s directory. Checkpoint node produces checkpoints for the namespace at stable intervals by moving the edits and fsimage file from the NameNode and integration it locally. The new-fangled image is then again modernized back to the active NameNode.
BackupNode: Backup Node also delivers check pointing functionality like that of the checkpoint node but it also preserves its up-to-date in-memory print of the file structure namespace that is in sync with the vigorous NameNode.
Question: Associate HDFS with Network Attached Storage (NAS).
Answer: In this query, first explain NAS and HDFS, and then compare their features as follows:
Network-attached storage (NAS) is a file-level computer data storage server linked to a computer network so long as data access to a varied group of clients. NAS can moreover be a hardware or software which delivers services for storage and retrieving files. While Hadoop Distributed File System (HDFS) is a dispersed file system to accumulate, data using product hardware.
In HDFS Data Blocks are dispersed across all the machineries in a cluster. Where as in NAS data is stored on an enthusiastic hardware.
HDFS is intended to work with MapReduce example, where calculation is moved to the data. NAS is not appropriate for MapReduce since data is stored unconnectedly from the computations.
HDFS uses product hardware which is cost operative, whereas a NAS is a high-end storage devices which comprises high cost.
Question: What happens when two customers try to contact the same file in the HDFS?
Answer: HDFS supports high-class writes only.
When the primary client associates the “NameNode” to sweeping the file for writing, the “NameNode” allowances a tenancy to the client to create this file? When the another client stabs to open the same file for lettering, the “NameNode” will sign that the lease for the file is previously granted to additional client, and will cast-off the open request for the additional client.
Question: How does NameNode challenge DataNode letdowns?
Answer: NameNode occasionally obtains a signal from each of the DataNode in the bunch, which suggests DataNode is operative properly.
A block report comprises a list of all the chunks on a DataNode. If a DataNode flops to send a signal message, after an exact period it is noticeable dead.
The NameNode duplicates the blocks of dead node to additional DataNode using the imitations created earlier.
Question: Can Name Node and Data Node be a product hardware?
Answer: The keen answer to this query would be, DataNodes are product hardware like individual computers and laptops as it supplies data and are compulsory in a big number. But from your knowledge you can tell that, NameNode is the chief node and it supplies metadata about all the chunks stored in HDFS. It needs high memory (RAM) space, so NameNode desires to be a high-end mechanism with decent memory space.