Although the basic concept of spinning magnetic disks accessed by a movable stack of disk heads has not changed, hard drives have become much more complex to enable the increased density and performance. Since then, the capacity of a 3.5″ drive has increased by about 10 6 times (from 10 MB to about 10 TB), sequential throughput by about 10 3 times, and access times by about 10 1 times. While hard drives have been around since the 1950s, the current 3.5″ form factor (actually 4″ wide) appeared in the early 1980s. The ability to analyze the past state of data is increasingly important for applications to provide auditing and other forms of fact-checking.Modern hard drives store an incredible amount of data in a small space, and are still the default choice for high-capacity (though not highest-performance) storage. Auditors require easy access to data at times of interest related to after-the-fact, real-world discoveries. Retrospective snapshot systems that support computations over datastore snapshots allow applications to provide past state analysis conveniently, using simple datastores like Berkeley DB or SQLite. This dissertation presents the design, implementation, and evaluation of the Retrospective Query Language (RQL), a simple declarative extension to SQL that allows the users to specify and run cross-snapshot computations conveniently in a snapshot system.Ĭurrent snapshot systems, however, offer no adequate support for computations that analyze multiple snapshots. To achieve this, we propose a small number of simple mechanisms defined in terms of relational constructs familiar to programmers. We explain how they translate into SQL computations in a snapshot system and show how to express several common analysis patterns with illustrative examples. The RQL implementation utilizes the SQLite UDF framework in the Berkeley DB datastore and the Retro page-level incremental snapshot system. Retro creates and provides access to copy-on-write snapshots. Cross-snapshot computations running over page-level incremental snapshots bring up interesting performance issues that have not been studied before. We present the first study defining a performance envelope for cross-snapshot computations over page-level incremental snapshots.Īuditing queries can include wasteful redundancies when they repeat the same computations on data of interest that do not change between snapshots. SQL optimizer does not eliminate such redundancies, common in many workloads. We have designed and implemented RID, the first run-time optimization framework that detects and eliminates duplicate computations in SQL programs running over page-level copy-on-write snapshots. Complementary to SQL optimizer, RID reduces the cost of audit programs substantially. It has a novel, very efficient software structure. Where other techniques detect and eliminate redundancies at the language level, RID detects redundancies in the snapshot system at low-cost, taking advantage of the snapshot metadata. It eliminates redundancies at the language level by reusing results and exploiting computation semantics. The low-level detection is fast but conservative. It does not detect duplicate computations when data of interest remains unchanged between snapshots but unrelated data, on the same page, changes, reporting a false negative. We developed an analytical model explaining how application workload parameters impact false negatives. Measurements of RID in RQL validate the model. The results show that when audit queries run over infrequently modified data, RID provides substantial benefit, despite false negatives.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |