Few years ago the vast majority of data warehouse systems were manually operated, but the market for ETL software has steadily grown and the majority of practitioners now use ETL tools in place of hand-coded systems.
In this article I am trying to explain why to use ETL. Below are key benefits which can answer this question.
1. Visual flow
The single greatest advantage of an ETL tool is that it provides a visual flow of the system’s logic (if the tool is flow based). Each ETL tool presents these flows differently, but even the least-appealing of these ETL tools compare favourably to custom systems consisting of plain SQL, stored procedures and system scripts, and perhaps a handful of other technologies.
2. Structured Design
ETL tools are designed for the specific problem of data integration: populating a data warehouse or integrating data from multiple sources, or even just moving the data. With maintainability and extensibility in design, they provide in many cases a metadata-driven structure to the developers. This is particularly a big advantage for teams building their first data warehouse.
3. Operational resilience
Many of the home-grown data warehouses we have evaluated are rather fragile: they have many emergent operational problems. ETL tools provide functionality and standards for operating and monitoring the system in production. It is certainly possible to design and build a well instrumented hand-coded ETL application. Nonetheless, it’s easier for a data warehouse / business intelligence team to build on the features of an ETL tool to build a resilient ETL system.
4. Data-lineage and impact analysis
We would like to be able to right-click on a number in a report and see exactly how it was calculated, where the data was stored in the data warehouse, how it was transformed, when the data was most recently refreshed, and from what source system(s) the numbers were extracted. Impact analysis is the flip side of lineage: we’d like to look at a table or column in the source system and know which ETL procedures, tables, cubes, and user reports might be affected if a structural change is needed. In the absence of ETL standards that hand-coded systems could conform to dependency on other vendors.
5. Advanced data profiling and cleansing
Most data warehouses are structurally complex, with many data sources and targets. At the same time, requirements for transformation are often fairly simple, consisting primarily of lookups and substitutions. If you have a complex transformation requirement, for example if you need to de-duplicate your customer list, you should buy on additional module on top of the ETL solution (data profiling / data cleansing). At the very least, ETL tools provide a richer set of cleansing functions than are available in SQL.
6. Performance
You might be surprised that performance is listed as one of the last under the advantages of the ETL tools. It’s possible to build a high-performance data warehouse whether you use an ETL tool or not. It’s also possible to build an absolute dog of an data warehouse whether you use an ETL tool or not. We’ve never been able to test whether an excellent hand-coded data warehouse outperforms an excellent tool-based data warehouse; we believe the answer is that it’s situational. But the structure imposed by an ETL platform makes it easier for an (novice) ETL developer to build a high-quality system. Furthermore many ETL tools provide performance enhancing technologies, such as Massively Parallel Processing, Symmetric Multi-Processing and Cluster Awareness.
7. Big Data
A lot of ETL tools are now capable of combining structured data with unstructured data in one mapping. In addition they can handle very large amounts of data, that do not necessarily have to be stored in data warehouses. Now Hadoop-connectors or similar interfaces to big data sources are provided by almost 40% of the ETL tools nowadays. And the support for Big Data is growing continually.
GrassDew has four main business streams – Consulting Services, Software Solutions, Security Services and Knowledge Services. Our primary focus is on various software development and maintenance services.
To know more about our services, email us at shekhar.pawar@grassdew.com