Today's Offer - AWS Certification Training - Enroll at Flat 10% Off.

- AWS Blogs -

What is Amazon Athena?

Data is the new age currency to business success. Future will belong to the businesses and people who will have the nerve to play the real disruptor and lead change in their spaces driven by data.  Data will ultimately write the true success stories of the businesses. It will also be defined by the proactiveness of business models to be able to foresee the emerging realities and challenges and mold them to their advantage. There will be no possibility of growth if you as a business will not be able to truly read between the lines and work around various aspects of data and come up with jaw-dropping insights for all the stakeholders. This is the scope of data analytics, which now seems like a major technological disruption, will soon become a new norm.

Data analytics has gained significance in recent years due to rapid growth in data- a trend which will continue to make ripples in the business world for years to come. There are many predictions which suggest that the coming days will be marked by huge growth in data by 2025. Data analytics is thus vital to business growth, and as many as 90 percents of professionals already rely on insights to make more informed business decisions. Thus, businesses which rely on the analytics stand a clear chance of success than the ones which do not. Although endowed with many new and obvious advantages, data analytics is an intricate process. Many attempts have been made to ease the whole process to make it more comprehensible and simple. There are many tools available, and Amazon Athena powered by AWS or Amazon Web Services is one of them.

what is Amazon Athena?

Amazon AthenaAmazon launched Athena in 2016 as one of the servers. It is a tool for data analytics mainly used for processing complex queries in a short time. As it does not have any server thus all hassles for setting it up are ruled out, and they do not require any management of infrastructure, no setup or data warehouses. It helps make it easy to analyze data on Amazon S3, making use of the standard SQL. It is by the new Athena query engine that the real power of S3 storage is completely unleashed along with no need for maintenance. There is no need for any kind of infrastructure, and querying can be started by the creation of a table and loading of data inside it.

The customer has to ultimately pay only for the queries which are run as there is no need for any infrastructure. Athena can automatically scale up and work on queries to give quick results even in case of huge datasets and other complex queries. It is helpful in the analysis of structured, semi-structured, and even unstructured data which is stored in Amazon S3. Many dynamic queries can be created for the datasets using the Athena. Latter also works with the AWS Glue for giving you a better way to keep the metadata in S3. It is with the help of AWS CloudFormation and Athena that many named queries can be used which will let you name your query and then call it making use of that name. The interactive service from Amazon Web Services can be very useful for Data Scientists, and other developers for taking a quick look into the table and thereby avoiding the hassle of running the entire query. Athena can also be used for fetching the data from S3 and load it into different stores of data via the Athena  JDBC driver, which is used for log storage, analysis, and other data warehousing events.

List of Athena Features

Athena is one of the huge numbers of services which are provided by Amazon. There are many features of Athena which make it highly recommendable for the task of Data Analysis. Presto which is an open-source distributed SQL query engine backs, Athena. Former is also instrumental in running highly interactive analytic queries on all sizes of data sources, i.e. from gigabytes to petabytes. DDL (Data Definition Language) which is written in Apache Hive or either the Table statements are used for facilitating, reading, writing, and even the management of huge and distributed datasets. Although Hive supports SQL the latter also allows for concepts like external tables and data portioning. All the metadata like definitions of tables, names of columns, etc. is stored in the Athena metadata store. Athena also comes in many complex joins, nested queries, and many other window functions. It also supports many complex data types like the arrays and even the struts. It is easy to achieve partition using any key, which also includes the custom keys of date and time.

Read: How to Start Career as AWS Developer?

Various file formats in which data is stored in the form of objects are:

  • CSV, war logs, text files
  • Apache weblogs
  • JSON
  • Compressed files
  • Apache Parquet or Apache ORC which are basically the columnar formats

As and when the query is performed, you can obtain the stream of data from Amazon S3 in the same manner as a query is made in a real SQL database. Queries are made either by APIs or the AWS Console. It is by making use of the AWS Console that you can get the running time of the query and scan the data in bytes. In Athena, the worries about scaling, performance, and even maintenance of data are ruled out as there are sufficient resources for having fast and interactive query performance. Queries are automatically run in parallel or petabytes of data, thus most results are returned within seconds. All this is possible as Athena makes use of warm compute pools over many Availability Zones.

Athena can be accessed by either of the three ways:

  • AWS Console
  • AWS CLI
  • Athena with JDBC

Why is Athena most suitable for Data Analytics?

Athena is one of the many services provided by Amazon. It has many features which make it highly suitable for Data Analytics. Here are the most prominent ones:

Athena =>Data Analytics

Read: What is AWS CLI? How to Install AWS CLI?
  • Ease of Implementation: The best feature of Athena is that there is no need for any installation, and it can be directly accessed from the AWS Console and also the AWS CLI.
  • No Server: Athena is Serverless. This means that the user does not even have to worry about infrastructure, configuration, scaling, or even failure. Everything is handled by itself in Athena.
  • Pay Per Query: Athena will only charge you for the query which has been run, i.e. the amount of data which is managed per query. A lot can be saved if you can compress the data and accordingly format the dataset.
  • Speed: Athena is a very fast tool for data analytics. Many complex queries can be performed in no time by breaking the same into simpler ones and hence running them in parallel. The results can be combined to get the appropriate output.
  • Security: Athena is able to grant full control over the dataset with the IAM policies and he AWS Identity. As the data is stored in various S3 buckets, IAM policies help the users in the management of control.
  • Available: It is because of the backing of AWS, Athena is readily available. This enables the user to be able to execute queries for all 24 hours. Just like AWS, Athena is also available for 99.999%. of the time.
  • Integrated: The best feature of Athena is that it can be incorporated or integrated with AWS Glue. Latter is instrumental in helping the users in the creation of a more unified data repository which is useful in better versioning of data, better views, and even better tables.

All these features are an addition to its cost-effective. It has a very simple and basic pricing structure. The best part is that you are only making a payment for the queries which you run. The charges are at a rate of $5 per TB of data scanned. All the DDL statements like CREATE, ALTER, DROP, queries used for partitioning, and even the failed queries are free completely with no hidden charges. In case you cancel a query midway then the charges will only be levied for the amount of data which has been scanned till that point. Costs can also be brought down by compression of many columnar formats and partitions. It is with the help of these techniques that Athena has to scan very less data than Amazon S3. Computation does not have any direct charges, so the total cost estimation is done purely based on the amount of data which you have to work with.

Supported Business Tools

AWS Athena Support Tools

Athena by AWS easily integrates with many Business Intelligence tools like Looker, Tableau, Mode Analysis, AWS QuickSight, etc. and many more for some highly advanced reports and visualizations. All this needs to be considered as it specifically true for businesses which want the simplicity of making use of Athena for spot or even ad hoc data analysis.

How to tune your Performance of Athena?

In order to get the maximum from Athena, you may consider structuring your data. Here are a few tips which will help to maximize your Athena performance:

Performance of Athena

Read: Things that you should know about AWS Migration
  • Data Partitioning: You may partition your data as it divides the table into simple parts keeping the related data together all based on various column values like date, country, region, etc. Partitions are often seen as virtual columns which are defined at the time of the creation of tables. This helps in reduction of the amount of data which goes for scan in a single query, hence boosting performance.
  • Bucketing Data: This is another method to partition data in a single partition. When your bucket, one or more columns can be specified which contain rows which have to be grouped together. These rows then have to be added to many buckets. This permits you to only query that bucket which you require to read as and when the value of the bucket is specified. This can dramatically bring down the number of rows of data which have to be read.
  • Compress the Files: Compressing the files can significantly increase the query speed. You have to ensure that the files are of optimal size and are splittable. Small files help bring down the network traffic from Amazon S3 to Athena. The splittable files also enable Athena’s execution engines to split the reading of the file by many readers for increasing parallelism. More the compression ratio of the algorithm, the more CPU will be required for compression and decompression of data.
  • Optimization of File Sizes: Many queries are run much more effectively when data is read in parallel and the blocks of data are read in sequence. Having splittable file formats is helpful with parallelism irrespective of the size of files. Contrarily if the file size is too small, the execution engine will have to spend more time due to the overhead of Amazon S3 files, a listing of directories, seeking metadata of objects, transferring of data, reading of file headers and even reading of the compression dictionaries. If the files are not splittable and the files are large, the processing of query will have to wait till a single reader has finished reading the whole file.
  • Optimization of the Data Store Generation: Apache Parquet and Apache ORC are the most known columnar data stores. They are endowed with features like efficient storage of data by using column-wise compression, different encoding, and also compression, which is based on data-type. These are also splittable. Better compression ratios often mean reading lesser bytes from Amazon S3 thereby leading to enhanced query performance.
  • Optimization of ORDER: BY The ORDER BY clause often returns the results of the query in sorted order. In case you are using the same for looking at either the top or bottom N values, you can also use the LIMIT clause for reduction of the cost of the sort significantly by both pushing and limiting the same to individual workers.

Underlying Technology behind Athena

Amazon Athena uses the Presto with complete standard SQL support, which also works with many standard data formats, which include CSV, JSON, ORC, Avro, and even Parquet. Athena can take care of much complex analysis, which involves large joins, window functions, and even the arrays. As Athena makes use of the Amazon S3 as the baseline data store, it has high availability and durability with redundant storage of data across many facilities and devices in every facility.

Conclusion

Amazon AWS Athena is one of the most promising data analytics tools by Amazon. Being an interactive service and based on Amazon S3, it makes data analysis easier. Also being serverless, there are no requirements of infrastructure management. It is truly a technology of tomorrow. For more information on such topics, you may refer to www.janbasktraining.com.


    Janbask Training

    JanBask Training is a leading Global Online Training Provider through Live Sessions. The Live classes provide a blended approach of hands on experience along with theoretical knowledge which is driven by certified professionals.


Trending Courses

AWS

  • AWS & Fundamentals of Linux
  • Amazon Simple Storage Service
  • Elastic Compute Cloud
  • Databases Overview & Amazon Route 53

Upcoming Class

2 days 14 Nov 2019

DevOps

  • Intro to DevOps
  • GIT and Maven
  • Jenkins & Ansible
  • Docker and Cloud Computing

Upcoming Class

3 days 15 Nov 2019

Data Science

  • Data Science Introduction
  • Hadoop and Spark Overview
  • Python & Intro to R Programming
  • Machine Learning

Upcoming Class

3 days 15 Nov 2019

Hadoop

  • Architecture, HDFS & MapReduce
  • Unix Shell & Apache Pig Installation
  • HIVE Installation & User-Defined Functions
  • SQOOP & Hbase Installation

Upcoming Class

4 days 16 Nov 2019

Salesforce

  • Salesforce Configuration Introduction
  • Security & Automation Process
  • Sales & Service Cloud
  • Apex Programming, SOQL & SOSL

Upcoming Class

2 days 14 Nov 2019

QA

  • Introduction and Software Testing
  • Software Test Life Cycle
  • Automation Testing and API Testing
  • Selenium framework development using Testing

Upcoming Class

-0 day 12 Nov 2019

Business Analyst

  • BA & Stakeholders Overview
  • BPMN, Requirement Elicitation
  • BA Tools & Design Documents
  • Enterprise Analysis, Agile & Scrum

Upcoming Class

3 days 15 Nov 2019

SQL Server

  • Introduction & Database Query
  • Programming, Indexes & System Functions
  • SSIS Package Development Procedures
  • SSRS Report Design

Upcoming Class

7 days 19 Nov 2019

Comments

Search Posts

Reset

Receive Latest Materials and Offers on AWS Course

Interviews