Data Explosion: Is Your Business Ready For Growth?

Consumer Data Explosion Photo

We find ourselves at the onset of a data explosion. As core business applications continue to iterate and multiply, the data that flows through them will grow exponentially. However, in an economy where the ability to manage the quality, quantity, and accessibility of data creates a competitive advantage, many companies are finding themselves coming up short.

In the past, businesses managed data growth by adding hardware, increasing staff, or cobbling together various quick fixes that kept them one step ahead. Today, tighter budgets, reduced staff and increasing demands for performance, availability and customer service require aggressive and sophisticated methods for managing data growth.

By leveraging data modeling to analyze existing systems, organizations are better equipped to stay on top of their data requirements. Applying a model-driven approach to data management strategies helps businesses detect performance degradation, create strategies for separating operational and archival data, and leverage collaborative workflow processes. These three areas are key to a company’s ability to manage their in-house data explosion, and maintain their competitive edge.

Modeling tools have traditionally been used to build new data structures. The ability to analyze existing systems has been an under-utilized feature of data-modeling tools. This is no longer the case as the data explosion is now highlighting the need to assess existing systems.

With the spike in data quantity, operational data – the lifeblood of an organization – is growing exponentially. Those incremental bits and bytes of data are just that – bits and bytes – but they add-up to terabyte after terabyte. As a consequence, businesses, particularly those interested in capturing and archiving critical user-patterns, are facing three key data management challenges: performance degradation, separation of operational and archival data, and the need for collaboration.

Performance degradation is a natural consequence of the explosive growth in data. To minimize this risk, DBAs, database developers and/or performance managers need a means for determining duplication patterns, periodic storage and capacity growth, potential bottlenecks, and so forth. Modeling tools help by providing a visual means for quickly pinpointing areas that cause performance degradation.

For example, many systems have hundreds of tables. If the systems are disparate, invariably many data professionals invent the same wheel over and over, and have multiple tables containing the same data. Database 1 has an object called A, and database 2 has an object called B, but the objects are the same. With a model-driven approach, the user can quickly identify patterns of duplication – i.e. pinpoint the same objects across systems and then take steps to consolidate tables, thereby improving performance and reducing the overall amount of storage consumed.

Another benefit of eliminating duplication is the assurance that the data remains synchronized. When the same data is maintained in multiple tables, the data managed in table 1 may be refreshed differently and the quality may be different than that stored in table 2. By looking at patterns, data professionals can determine how different systems can effectively point to the same unique set of data.

Reviewing system storage and capacity is another aspect of determining performance degradation. Taking an Oracle system as an example, a user can review the data model and quickly determine what is the current storage and capacity. More importantly, they can see if the table space files are functioning according to specifications, if the files can scale to meet expanding needs, if the min/max extents are set properly, and so forth.

Finally, data professionals can also evaluate traditional problem areas like performance optimization. For example, are current systems indexed correctly? Are indexes optimized? Partitioned to manage the necessary space? The explosion of data has heightened the need for proper indexes. One reason for this is that, when a full table scan occurs, a proper index ensures that query time is optimal.

Modeling tools help data professionals determine if standard indexes are applied across tables. It can also help pinpoint hotspots. The user looks at the data model, locates the hotspot, and then checks to see if it is properly indexed. If it is not, they can quickly take action to resolve the problem and improve performance.

The data explosion starts at the operational level. To remain competitive, the systems (the operational databases) must remain optimized to meet service-level requirements. At the same time, businesses need to collect data for analysis and reporting purposes. However, collecting data and supporting the collection of data for analysis and reporting purposes are two distinct functions. To support both, businesses are adopting data warehousing initiatives that involve the separation of operational and archival data.

One of the greatest challenges of data warehousing initiatives is ensuring that the data in the operational databases and the archival databases remain synchronized. To ensure this, more and more businesses are using modeling tools to analyze their current systems and design their archival data warehouses. This helps businesses build out real-time operational systems that match the archival data warehouse system, ensuring that the real-time tables correspond to those in the warehouse tables. In addition, by creating a mirrored data warehouse, businesses can mark when to move data – either by setting a point in time or a capacity threshold – and use an extraction, transformation and loading (ETL) tool to move the data to the data warehouse.

Once data has been moved, the question becomes, ‘What happens to the data in the operational databases?’ To maintain optimal performance, it must be offloaded. Generally, data models are not built with offloading data in mind. However, the volume of data collected on a regular basis now requires that it be periodically offloaded. Before taking this action, data professionals need to determine relationships and dependencies to maintain the data integrity while it is offloaded. A model-driven approach is key when determining what data to offload. Modeling tools provide the means to determine all the relationships and dependencies so that the data professional can offload all the data.

The increasing complexity of managing data requires that teams of multidisciplinary professionals concurrently work on the data models. Today, collaboration is no longer a ‘nice-to-have.’ It is an integral part of the business workflow. The use of a collaboration server offers sophisticated features that increase the productivity and reduce the complexity of large data-model management within teams of designers.

Businesses are taking advantage of collaboration servers to promote team-based modeling, which allows greater administrative control over the creation, monitoring, and administration of user security within the collaborative model server. In addition, repository administrators can leverage the change control interface to review, accept, and/or reject changes that are proposed for check-in to the collaboration server. Further, it provides teams with the ability to communicate these designs to a wider audience in the enterprise.

Data modeling is no longer simply a mechanism to create new data structures – it is an integral part of analyzing and deconstructing information. Businesses have embraced a model-driven approach for analyzing their existing systems in order to face today’s challenges. A model-driven approach simplifies the analysis of what the current state of data is and where it needs to be, and helps to implement an effective transformation.

Modeling tools provide businesses with a visual means to quickly pinpoint areas that cause performance degradation. It offers an effective way to implement sound data warehousing initiatives. And, disparate teams of experts from data architects to DBAs can collaborate and use the collaboration server to work together more easily, and deliver projects more quickly and with greater confidence.

Most existing software solutions do not offer businesses an easy method for analyzing the current state of their systems and understanding the impact of the extraordinary growth in data. More and more companies depend on modeling tools and collaboration servers to empower their diverse teams in their battles against data explosion.