#blog (Sean Ferigan): Use Apache Spark and Hive on Amazon EMR with the AWS Glue Data Catalog

Wednesday, 16 August 2017

Use Apache Spark and Hive on Amazon EMR with the AWS Glue Data Catalog

You can now use the AWS Glue Data Catalog with Apache Spark and Apache Hive on Amazon EMR. The AWS Glue Data Catalog is a managed metadata repository that is integrated with Amazon EMR, Amazon Athena, Amazon Redshift Spectrum, and AWS Glue ETL jobs. Additionally, it provides automatic schema discovery and schema version history. You can choose to use the AWS Glue Data Catalog to store external table metadata for Hive and Spark instead of utilizing an on-cluster or self-managed Hive Metastore. This allows you to more easily store metadata for your external tables on Amazon S3 outside of your cluster.

from What's New http://ift.tt/2i0z2KY

#blog (Sean Ferigan)

Pages

Wednesday, 16 August 2017

Use Apache Spark and Hive on Amazon EMR with the AWS Glue Data Catalog

No comments:

Post a Comment