Glossary

Data Aggregation

Written by ModuleQ | Oct 7, 2024 2:56:02 PM

From Raw to Ready: The Art of Data Aggregation

What is Data Aggregation?

Data Aggregation is the fundamental process of combining disparate sources of data that may be warehoused in different locations. In today's digital world, this is often done in databases specifically designed for dealing with issues surrounding aggregation, and these databases are also designed with a data science intent in mind.

Think of it as a chef who heads over to the farmer's market in order to aggregate various ingredients before she formulates her dish. She may need to venture to different stores, get different classes of foods (fruits vs vegetables vs poultry and dairy), before she can store them (in her walk-in cooler) before beginning the meal preparation.

Similarly, data can take numerous formats. It can be structured or unstructured, it can have different periodicities (by the second vs by the day) and it can be different types (numerical, text, images). Before that data can be analyzed, it needs to be aggregated, and in order for it to be aggregated, these formatting conventions need to be normalized.

For businesses, having an intentional approach toward data aggregation can help build a solid foundation, streamlining the types of analytics and business functions that can flow from having clean and well-manicured data.

Data Aggregation as defined by: 

PagerDuty: the process of compiling typically [large] amounts of information from a given database and organizing it into a more consumable and comprehensive medium.

IBM: Data aggregation is the process where raw data is gathered and expressed in a summary form for statistical analysis.

Additional Reading: 

A Survey of Distributed Data Aggregation Algorithms - Jesus et al (2014)