Data automation works on the principles of design patterns. It comprises a central repository of design patterns, which encapsulate architectural standards as well as best practices for data design, data management, data integration, and data usage.
Automation of data provides advantages like source data exploration, data models, data marts, ETL/ELT optimization, test automation, metadata management, managed deployment, scheduling, change impact analysis and easier maintenance and modification of the data pipelines.
Modern data marts evolved into so called data lakes, a method of storing data within a system or repository, in its natural format, that facilitates the collocation of data in various schema and structural forms, usually object blobs or files. The idea of data lake is to have a single store of all data in the enterprise ranging from raw data (which implies exact copy of source system data) to transformed data which is used for various tasks including reporting, visualization, analytics and machine learning. The data lake includes structured data from relational databases (rows and columns), semi-structured data (CSV, LOGS, XML, JSON), unstructured data (emails, documents, PDFs) and even binary data (images, audio, video), thus creating a centralized data store accommodating all forms of data.