Query Linux Foundation Delta Lake tables

Linux Foundation Delta Lake is a table format for big data analytics. You can use Amazon Athena to read Delta Lake tables stored in Amazon S3 directly without having to generate manifest files or run the MSCK REPAIR statement.

The Delta Lake format stores the minimum and maximum values per column of each data file. The Athena implementation makes use of this information to enable file-skipping on predicates to eliminate unwanted files from consideration.

Considerations and limitations

Delta Lake support in Athena has the following considerations and limitations:

  • Tables with AWS Glue catalog only – Native Delta Lake support is supported only through tables registered with AWS Glue. If you have a Delta Lake table that is registered with another metastore, you can still keep it and treat it as your primary metastore. Because Delta Lake metadata is stored in the file system (for example, in Amazon S3) rather than in the metastore, Athena requires only the location property in AWS Glue to read from your Delta Lake tables.

  • V3 engine only – Delta Lake queries are supported only on Athena engine version 3. You must ensure that the workgroup you create is configured to use Athena engine version 3.

  • Delta Lake reader version – Delta Lake reader protocol up to version 3 is supported.

  • Deletion vector support – Athena supports reading from Delta Lake tables with deletion vectors. Deletion vectors are a storage optimization feature that can be enabled on Delta Lake tables. For more information, see What are deletion vectors? in the Delta Lake documentation.

  • Column mapping and timestampNtzDelta column mapping, which allows Delta table columns and the underlying Parquet file columns to use different names, and timestamp without timezone (timestampNtz) are supported.

  • No time travel support – There is no support for queries that use Delta Lake’s time travel capabilities.

  • Read only – Write DML statements like UPDATE, INSERT, or DELETE are not supported.

  • Lake Formation support – Lake Formation integration is available for Delta Lake tables with their schema in sync with AWS Glue. For more information, see Using AWS Lake Formation with Amazon Athena and Set up permissions for a Delta Lake table in the AWS Lake Formation Developer Guide.

  • Limited DDL support – The following DDL statements are supported: CREATE EXTERNAL TABLE, SHOW COLUMNS, SHOW TBLPROPERTIES, SHOW PARTITIONS, SHOW CREATE TABLE, and DESCRIBE. For information on using the CREATE EXTERNAL TABLE statement, see the Get started section.

  • Skipping S3 Glacier objects not supported – If objects in the Linux Foundation Delta Lake table are in an Amazon S3 Glacier storage class, setting the read_restored_glacier_objects table property to false has no effect.

    For example, suppose you issue the following command:

    ALTER TABLE table_name SET TBLPROPERTIES ('read_restored_glacier_objects' = 'false')

    For Iceberg and Delta Lake tables, the command produces the error Unsupported table property key: read_restored_glacier_objects. For Hudi tables, the ALTER TABLE command does not produce an error, but Amazon S3 Glacier objects are still not skipped. Running SELECT queries after the ALTER TABLE command continues to return all objects.