AWS launches Lake Formation for building data lakes
August 14, 2019
Amazon Web Services (AWS) has announced the general availability of AWS Lake Formation, a fully managed service that makes it easier for users to build, secure and manage data lakes.
Lake Formation simplifies and automates many of the complex manual steps usually required to create a data lake, including collecting, cleaning and cataloguing data, and securely making those data available for analytics.
Users can bring their data into a data lake from various sources using pre-defined templates, automatically classify and prepare the data, and centrally define granular data access policies to govern access by the different groups within an organisation. They can then analyse these data using their choice of AWS analytics and machine learning services, including Redshift, Athena and Glue, with EMR, QuickSight and SageMaker following in the next few months.
There are no additional charges required to use AWS Lake Formation, and customers pay only for the underlying AWS services used.
Users want to perform analytics and machine learning across all their data, regardless of the format or where the data lives. A data lake removes data silos and allows data to reside in a central place so users can more easily apply different types of analytics and machine learning across all their data.
Amazon S3 simple storage service has become a very popular place to build data lakes because of its scale, cost-effectiveness, durability and easy integration with AWS’s analytics and machine learning services. However, even with those benefits, building and managing a data lake can still be a complex and time-consuming process.
Users need to provision and configure storage, move data from disparate sources into the data lake, and extract the schema and add metadata tags to make it accessible from a searchable data catalogue. To do so, they must clean and prepare the data – including partitioning, indexing and transforming the data – to optimise the performance and cost that comes with running analytics on the data.
Then, they have to set up data access roles and enforce security policies across their storage and each of their different analytics engines, and update the security policies when permissions change or new end users are added. And, finally, they need to make the data available in a secure way to their data analysts so that they can analyse and process the data using any of the available analytics engines. These steps require them to perform a lot of manual work and, as a result, most can take up to several months to set up a data lake.
AWS Lake Formation simplifies the process and removes the heavy lifting from setting up a data lake. It automates manual, time-consuming steps, such as provisioning and configuring storage, crawling the data to extract schema and metadata tags, automatically optimising the partitioning of the data, and transforming the data into formats such as Apache Parquet and Orc that are suitable for analytics.
Lake Formation cleans and de-duplicates data using machine learning to improve data consistency and quality. To simplify data access and security, it provides a single, centralised place to set up and manage data access policies, governance and auditing across Amazon S3 and multiple analytics engines.
To reduce the time analysts and data scientists spend hunting down the right data set for their needs, Lake Formation provides a central, searchable catalogue, which describes the available data sets and their appropriate business use. Users can now easily access data from a single place and integrate with their choice of AWS analytics and machine learning services. With Lake Formation, they can set up and begin using a data lake in days instead of months.
“Our customers tell us that Amazon S3 is the ideal place to house their data lakes, which is why AWS hosts more data lakes than anyone else, with tens of thousands and growing every day,” said Raju Gulabani, vice president at AWS. “They’ve also told us that they want it to be easier and faster to set up and manage their data lakes. That’s why we built AWS Lake Formation, so customers can spend more time learning from their data and innovating, rather than wrestling those data into functioning data lakes. AWS Lake Formation is available today and we’re excited to see how customers use it as one of the building blocks for growing and transforming their businesses and customer experiences.”
Lake Formation is available in US East (Ohio), US East (North Virginia), US West (Oregon), Asia Pacific (Tokyo) and Europe (Ireland) with additional regions coming soon.
Anand Desikan, director of cloud and data services at Panasonic Avionics, a supplier of in-flight entertainment and communication systems, said: “We wanted to create a data platform with the ability to manage the security settings for all the different applications in our environment. With AWS Lake Formation, we can now define policies once and enforce them in the same way, everywhere, for multiple services we use, including AWS Glue and Amazon Athena. The enhanced level of control gives us secure access to data and meta-data for columns and tables, not just for bulk objects, which is an important part of our data security and governance standard.”