As the amount of data explodes exponentially, data architecture is becoming increasingly important. The best practices for dealing with it have increased in significance. There is a need for speed, flexibility, and innovation from data, as businesses demand faster insights for essential decision-making.
According to IBM, data architecture describes how data is managed from collection through to transformation, distribution, and consumption. Database systems can be broadly categorised as relational database management systems and non-relational NoSQL databases. In other words, structured and unstructured data.
The goal of a data architecture is to deliver relevant data to people who need it, when they need it, and help them make sense of it. Data architecture should start from a data vision by translating business requirements into technical requirements. It should always be designed with the end in mind which is to deliver business value.
What are the best practices of data architecture?
1) Avoid data silos and ensure data consistency
Data silo makes data governance impossible to manage on an organisation-wide scale. When there are duplicate efforts and overlaps to transform data across the organisation, it wastes resources and may give an incomplete view of the business. Data quality also suffers. Data from disparate sources should be homogenised and consolidated as much as possible in the data warehouse.
For example, when a business metric changes, data logic needs to be updated in each of the transformation processes. If data logic is not updated in any of the silos, it can cause a misalignment of metrics reporting. There is also the duplicate daily processing cost for the same metrics. Additionally, it increases storage costs due to data duplicates.
In the bigger picture, data siloes can impede regulatory compliance and open the door to the misuse of private or sensitive data. It also discourages collaborative work.
Despite its strong disadvantages, data silo is still quite a common issue. This can be due to organisation culture where departments are accustomed to working on their own and with their own processes and challenges. Bringing data together will generate value much greater than the sum of its parts.
2) Anomaly detection to see issues right away
Proactively ensuring data quality is a critical step. Common data anomalies include data freshness and missing or duplicate data. It is important to spot data anomalies as soon as possible.
For example, when there is missing data, business users may make incorrect or suboptimal business decisions. There can also be cost implications such as the costs of sending a CRM mail too many times to one person. Or there can be extensive time lost in manually cleaning data.
Also, if issues are frequent, it may erode users' trust in the data. Consequently, users may become reluctant to use the data. This hampers efforts for data adoption, especially in a maturing data organisation. Sometimes, there is also time pressure when business users need to report for meeting purposes or to make a decision. Business users may also feel frustrated with the non-availability of accurate data.
In a decentralised data organisation structure i.e., the analytics function sits within each department (and not within the central data team), anomaly detection to see issues right away becomes more important. This is because there can be an 'amplified' effect when an anomaly happens. Many analysts' and data scientists' time and work are simultaneously impacted. They will all spend time troubleshooting to understand the root cause of an odd analysis or model run. This contributes to non-value-added tasks.
3) Establish a meta-data or data dictionary for structured data
A meta-data or data dictionary provides basic information for cataloging and identifying data. In other words, it means "data about the data". Metadata is useful because it ensures meaning, relevance and quality of data are the same for all users. It helps in data governance because it forces the organisation to think about data structure and think of design. Through an overview of how data is stored, it can help to resolve duplicate data issues such as easing the identification of duplicates.
"A metadata architecture is the beating heart of any effective BI implementation" - Astera
It also enables data discoverability and data usability. It empowers data users to manage and control data without going into the code itself (Astera, 2021). Users can have clarity on data lineage and relationships that is how data is transformed, and the source of data.
Broadly, Tutorialspoint shares that there are three categories of metadata:
Business metadata - Has data ownership information, business definition, and changing policies
Technical metadata - Includes database system names, table and column names and sizes, data types, and allowed values. Technical metadata also includes structural information such as primary and foreign key attributes and indices
Operational metadata - Includes currency of data and data lineage. Currency of data means whether the data is active, archived, or purged. Lineage of data means the history of data migrated and transformation applied to it
By looking at the metadata, you not only understand what the data asset is but also where it came from, how it has been changed, and where it is used (Atlan, 2023).
Good data architecture is important and data should be treated like a product. It helps build trust, accountability, and transparency with company data. Hence, it must be looked at in a holistic manner. It helps set a good foundation for the future and is an enabler to becoming an insight-driven organisation. Over time, building data architecture should be an iterative approach and should be automated where possible with AI tools.
How do you manage your company's data architecture for BI and Analytics? Share by leaving us a comment. If you require more information or help on analytics activities, contact us. We want to be an extension of our clients. Subscribe to our newsletter for regular feeds.
Did you find this blog post helpful? Share the post! Have feedback or other ideas? We'd love to hear from you. Thank you for reading!
References
Astera, Introduction to the Metadata-Driven Data Architecture, https://www.astera.com/type/blog/introduction-to-metadata-architecture, published 16 February 2021
Atlan, Metadata Management and Data Lineage: How Their Synergy Enhances Data Understanding and Data Governance, https://atlan.com/metadata-management-and-data-lineage/, published 30 May 2023
IBM, What is a data architecture, https://www.ibm.com/topics/data-architecture#:~:text=A%20data%20architecture%20describes%20how,artificial%20intelligence%20(AI)%20applications., accessed on 23 June 2023
Towards Data Science, Fundamentals of Data Architecture to Help Data Scientists Understand Architectural Diagrams Better, https://towardsdatascience.com/fundamentals-of-data-architecture-to-help-data-scientists-understand-architectural-diagrams-better-7bd26de41c66, published 11 September 2020
Tutorialpoints, https://www.tutorialspoint.com/dwh/dwh_metadata_concepts.htm# , accessed on 20 June 2023
留言