Data Engineering with Avishkar: Amazon Athena

January 08, 2020

Amazon Athena

Overview of Athena
Amazon Athena is an interactive query service, which developers and data analysts use to analyze data stored in Amazon S3. Athena’s serverless architecture lowers operational costs and means users don’t need to scale, provision or manage any servers.

Amazon Athena users can use standard SQL when analysing data. Athena does not require a server, so there is no need to oversee infrastructure; users only pay for the queries they request. You don’t even need to load your data into Athena, just need to point to their data in Amazon S3, define the schema, and begin querying.

To get started, just log into the Athena Management Console, define your schema, and start querying. Amazon Athena uses Presto with full standard SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Apache Parquet and Avro. While Amazon Athena is ideal for quick, ad-hoc querying and integrates with Amazon QuickSight for easy visualization, it can also handle complex analysis, including large joins, window functions, and arrays.

Some Athena Facts

Athena supports only EXTERNAL tables, when you drop a table in Athena, only the table metadata is removed; the data remains in Amazon S3
Athena uses an approach known as schema-on-read
Athena does not modify your data in Amazon S3
Athena uses Apache Hive to define tables and create databases, which are essentially a logical namespace of tables
Athena can only query the latest version of data on a versioned Amazon S3 bucket, and cannot query previous versions of the data.
Athena does not support querying the data in the GLACIER storage class
Athena performs full table scans instead of using indexes
Athena supports ACID-compliant.
Athena is case-insensitive and turns table names and column names to lower case.
Athena table, view, database, and column names cannot contain special characters, other than underscore (_)

Data Engineering with Avishkar

January 08, 2020

Amazon Athena

No comments:

Fashion Catalog Similarity Search using Datastax AstraDB Vector Database

Search This Blog