Scaling Learning Analytics: The Practical Application Of Synthetic Data

Alan Berg
University of Amsterdam, Netherlands

Stefan Mol
University of Amsterdam, Netherlands

Gábor Kismihók
University of Amsterdam, Netherlands

Niall Sclater
Sclater Digital Ltd, United Kingdom


This case study is based on experiences gained during the running of a two-day data hackathon around large scale Learning Analytics infrastructure at the LAK16 conference. The main conclusion is that there will be a significant demand for realistic synthetic data to support the development of large scale infrastructures. Synthetic data overcomes ethical barriers to sharing large data sets between different (parts of) organizations. Properly simulated synthetic data can be leveraged to fine tune algorithms deployed within the field of Learning Analytics. This data driven approach lowers the risk of accidental disclosure and bypasses limitations rightfully imposed due to legal and/or ethical constraints associated with real student data. The application of synthetic data to performance testing allows universities to develop highly scalable infrastructure in parallel to developing central data governance practices. This short paper explores the conformance testing of Learning Record Stores (LRS – secure locations to store and query student digital traces), discusses the implications for Universities around a specific set of xAPI recipes (Berg, Scheffel, Drachsler, Ternier, & Specht, 2016) and generalizes practices for the acceleration of large scale deployments of LA infrastructure. The authors argue that by applying a standardized set of synthetic data based on a peer reviewed synthetic data generator, universities will find it easier to develop reliable recipes for digital learner traces. Consistent data storage across university boundaries will subsequently enable the benchmarking of algorithms that consume student digital traces and support the generation of predictive validity evidence across university boundaries. Thus universities can compare the value of their algorithms relative to other universities and consistently apply algorithms when students transfer.

Full Text:



  • There are currently no refbacks.