Monday 11 May 2015

Quickly Filter Data in Amazon Redshift using Interleaved Sorting

You can now use Interleaved Sort Keys to quickly filter data without the need for indices or projections in Amazon Redshift. A table with interleaved keys arranges your data so each sort key column has equal importance. While Compound Sort Keys are more performant if you filter on the leading sort key columns, interleaved sort keys provide fast filtering no matter which sort key columns you specify in your WHERE clause. To create an interleaved sort, simply define your sort keys as INTERLEAVED in your CREATE TABLE statement.

The performance benefit of interleaved sorting increases with table size, and is most effective with highly selective queries that filter on multiple columns. For example, assume your table contains 1,000,000 blocks (1 TB per column) with an interleaved sort key of both customer ID and product ID. You will scan 1,000 blocks when you filter on a specific customer or a specific product, a 1000x increase in query speed compared to the unsorted case. If you filter on both customer and product, you will only need to scan a single block.

For more information, please see our AWS Blog Post on Interleaved Sorting and review our documentation on Best Practices for Designing Tables.



from What's New from Amazon Web Services http://ift.tt/1zWlgfq

No comments:

Post a Comment