When does simulated data match real data?

TitleWhen does simulated data match real data?
Publication TypeConference Papers
Year of Publication2011
AuthorsStonedahl F, Anderson D, Rand W
Conference NameProceedings of the 13th annual conference companion on Genetic and evolutionary computation
Date Published2011///
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-0690-4
KeywordsAgent-based modeling, business, Calibration, Genetic algorithms, information search, network analysis

Agent-based models can replicate real-world patterns, but finding parameters that achieve the best match can be difficult. To validate a model, a real-world dataset is often divided into a training set (to calibrate the parameters) and a test set (to validate the calibrated model). The difference between the training and test data and the simulated data is determined using an error measure. In the context of evolutionary computation techniques, the error measure also serves as a fitness function, and thus affects evolutionary search dynamics. We survey the effect of five different error measures on both a toy problem and a real world problem of matching a model to empirical online news consumption behavior. We use each error measure separately for calibration on the training dataset, and then examine the results of all five error measures on both the training and testing datasets. We show that certain error measures sometimes serve as better fitness functions than others, and in fact using one error measure may result in better calibration (on a different measure) than using the different measure directly. For the toy problem, the Pearson's correlation measure dominated all other measures, but no single error measure was Pareto dominant for the real world problem.