Università della Svizzera italiana Facoltà di scienze economiche ./index.htm

DIRECTIONS IN DATA SCIENCE: Kerrie Mengersen 1 June 2016 at 11:00 Room 251

A Principled Experimental Design Approach to Big Data Analysis

Kerrie Mengersen

Professor of Statistics
Science and Engineering Faculty, Mathematical Sciences
Queensland University of Technology
Brisbane, Australia

  • 1 June 2016
  • 11:00 - 12:00
  • Room 251

 Big Datasets are endemic, but they are often notoriously dicult to analyse because of their size, complexity, history and quality. The purpose of this paper is to open a discourse on the use of modern experimental design methods to analyse Big Data in order to answer particular questions of interest. By appeal to a range of examples, it is suggested that this perspective on Big Data modelling and analysis has wide generality and advantageous inferential and computational properties. In particular, the principled experimental design approach is shown to provide a flexible framework for analysis that, for certain classes of objectives and utility functions, delivers equivalent answers compared with analyses of the full dataset. It can also provide a formalised method for iterative parameter estimation, model checking, identification of data gaps and evaluation of data quality. Finally it has the potential to add value to other Big Data sampling algorithms, such as divide-and-conquer strategies, by determining efficient sub-samples.