Apache Hive is a data warehouse infrastructure based on Hadoop framework that is perfectly suitable for Data summarization, Data analysis, and Data querying. The platform is largely helpful to manage voluminous datasets that reside inside distributed storage system.
You will be surprised to know that before becoming an integral part of open source Hadoop framework, Hive was originally initiated by the Facebook. Hive framework was designed with a concept to structure large datasets and query the structured data with a SQL-like language that is named as HQL (Hive query language) in Hive.
Hive is getting immense popularity because tables in Hive are similar to relational databases. If you know how to work with SQL then working with Hive would be a cakewalk for you. A plenty of users are simultaneously querying data using HQL worldwide.
Read More: Hive Installation in Ubuntu
HQL is a simple SQL-like query language that is used to manage or query large datasets for enterprises working on voluminous data almost every day. This is easy to work with HQL if you know how to use SQL. The experience Hive programmers having hands-on experiences in HQL can write custom MapReduce functions to perform data analysis more sophistically.
Hive is a data warehouse infrastructure and a declarative language like SQL suitable to manage all type of data sets while Pig is data-flow language suitable to explore extremely large datasets only. This is the reason why Hive is always given more preference over pig framework.
Till the time, we have discussed on Hive basics and why it is so popular among organizations. Now, we will focus on Hive commands on HQL with examples. These are frequently used commands that are necessary to know for every Hive programmer wither he is beginner or experiences. So, let us go through each of the commands deeply so that you can quickly start your work as required.
DDL is used to build or modify tables and objects stored in the database.Some of the examples of DDL statements are – CREATE, DROP, SHOW, TRUNCATE, DESCRIBE, ALTER statements etc.
The first step when start working with databases is to create a new database. If you are not sure how to create a new database in Hive, let us help you. Open the HIVE shell and enter the command “create <database name>” to start a new database in Hive. Let us give you a deep understanding of the concept through general syntax and example given in the screenshot below- Here is the actual usage of command for HIVE –
As the name suggest, DROP command is used to delete a database that has already been created earlier. In Hadoop, the database is kept at ‘restrict’ mode by default and it cannot be deleted permissions are not set by the administrator or it is empty. If you are a new user then you should change the ‘RESTRICT’ mode to ‘CASCADE’ before you delete a database.
When using DROP command then Hive may show the error ‘If exists’ that appears when the user tries to delete a database that is not available actually. Let us give you a deep understanding of the concept through general syntax and example given in the screenshot below-
The DESCRIBE command is used to check the associated metadata with the database. The command is useful when you wanted to check data volume and information on large datasets. Let us see how it works actually –
If you wanted to change the metadata associated with the databases then ALTER is just the perfect choice to get your job done within seconds. You can also use the ALTER command to modify the OWNER property and change the role of the OWNER. Here is the general syntax that you should use when working with Hive –
Well, you wanted to check there are how many databases stored in the current schema. The good news is that you can check the same within seconds by using the SHOW command. It will give a list of databases currently exist.
The command is suitable to select a specific portion of the database or it may be termed as the particular database session where a set of queries can be executed. Here is the example of general syntax for your reference –
Till the time, we have discussed DDL command for the database as you have seen earlier like how to create a database, how to delete a database, how to check the number of databases in the current schema, how to use the database and how to alter a database. I hope you must be familiar with all the basic commands well and this is time to start working with tables in Hive by using DDL commands. They are easy and simple when used in the same way as discussed in the blog, let us see how it work actually.
Create table command is used to create a table in the already existing databaseto store data in the form of rows or columns. For example,if you wanted to create a table with the name “Employee” then important fields could be the name, address, phone number, email id, occupation etc. Also, you need to add a location to the table so that you can mention where particular table needs to store within HDFS. In Hive, you also have the flexibility to copy the schema of an existing table, not the data. In other words, only structure will be copied to the new table and data can be added as per your convenience. It not only speeds up the table creation but improves the overall efficiency of a programmer too.
With DROP command, you have the flexibility to delete the data associated with the table. This command deletes the metadata and data only, not the structure. Data is sent to Trash and it can be recovered back in case of emergency. If you wanted to delete data permanently then add a ‘PURGE’ option along with the DROP command so that data should be shifted to the Trash anyhow.
The truncate command is used to delete all the rows and columns stored in the table permanently. When you are using truncate command then make it clear in your mind that data cannot be recovered after this anyhow. Here is the general syntax for truncate table command in Hive –
With the ALTER command, the structure, metadata or data of the table can be modified quickly with a simple command as shown below in the screenshot. Further, there is DESCRIBE and the SHOW table command to check the metadata associated with table and number of tables available in the particular database.
That’s all for the day! We have discussed the basic DDL commands in the blog that help you to create a database and table perfectly. You can also perform relevant operations too as required. To know more about Hive commands in HQL with examples, you should join JanBask Training for Hadoop training and certification program right away.
JanBask Training is a leading Global Online Training Provider through Live Sessions. The Live classes provide a blended approach of hands on experience along with theoretical knowledge which is driven by certified professionals.
Receive Latest Materials and Offers on Hadoop Course