A Comparison of Five Probabilistic View-Size Estimation Techniques in OLAP

Aouiche, Kamel et Lemire, Daniel (2007). « A Comparison of Five Probabilistic View-Size Estimation Techniques in OLAP », dans ACM 10th International Workshop on Data Warehousing and OLAP (DOLAP 2007, Lisbon, Portugual)

Fichier(s) associé(s) à ce document :
[img] PDF
Télécharger (234kB)

Résumé

A data warehouse cannot materialize all possible views, hence we must estimate quickly, accurately, and reliably the size of views to determine the best candidates for materialization. Many available techniques for view-size estimation make particular statistical assumptions and their error can be large. Comparatively, unassuming probabilistic techniques are slower, but they estimate accurately and reliability very large view sizes using little memory. We compare five unassuming hashing-based view-size estimation techniques including Stochastic Probabilistic Counting and LogLog Probabilistic Counting. Our experiments show that only Generalized Counting, Gibbons-Tirthapura, and Adaptive Counting provide universally tight estimates irrespective of the size of the view; of those, only Adaptive Counting remains constantly fast as we increase the memory budget.

Type: Communication, article de congrès ou colloque
Mots-clés ou Sujets: OLAP, materialized views, view-size estimation, data warehouse, random hashing
Unité d'appartenance: Télé-université > UER Science et Technologie
Déposé par: Daniel Lemire
Date de dépôt: 27 août 2007
Dernière modification: 20 avr. 2009 14:28
Adresse URL : http://archipel.uqam.ca/id/eprint/373

Statistiques

Voir les statistiques sur cinq ans...