issues

Measurement

1st Edition 2nd Edition

Key points

  • There is limited interdisciplinary collaboration at the intersection of open data, use practices, and impact assessment methods concerning algorithmic systems.
  • There have been recent changes in how rankings are produced regarding open government data, but there is no consistency in the units of analysis, the evidence considered, or the methods used for aggregating indicators.
  • The concept of “high-value” data intended for public benefit is gaining attention; however, there are multiple, possibly conflicting, definitions for what constitutes the value of data.

Danny Lämmerhirt

Waag FutureLab

Danny Lämmerhirt is a sociologist of technology and a media scholar who studies data infrastructures and their role for making public problems visible, debatable, and governable. He is interested in participatory models for data production, interpretation, and goal setting around public problems, impact assessment methods for data sharing and use, and practice-theoretical studies of digital data in use. He currently works as a research associate at TU Dortmund. Previously, he worked on the Global Open Data Index and co-chaired the measurement and accountability working group at the Open Data Charter.

Introduction

As data becomes increasingly important in public policy and public life, the ways of measuring its use or capacity also gain significance. Measuring data involves questions of fairness, redistribution, value, and values. Different measurement methods are influenced by political debates and norms regarding what should be measured, and factors taken into account include its uses and users (will the publication of data result in a high number of app developments?), its qualities (the usefulness, reliability, accuracy, completeness, and timeliness), and the data resources available from government (what data does a public body hold and can it be opened up?). 

The measurements of data always rest upon certain values and conventions of what counts as “good”,1 2 3 but measurements are also always performative. It is through measurement that stakeholders determine whether data, its intermediaries, production contexts, or downstream uses are "valuable". Enabling an assessment of the worth of data, or of government agendas to support data publication to create future value, or of the potential risks of using data, begs the question: What concepts and methods are currently used to measure data and how have measurements evolved? 

This blog post documents the conceptual and methodological development of open data measurement over the past five years, combining a review of global open data ranking methods with a review4 of academic and grey literature on open data measurement. The first part provides a review of how “readiness”, “publication”, “use”, and “impact” are being measured - dimensions introduced by global indices to help conceptualize progress toward open data that were the central organizing principle for the chapter on measurement in the original State of Open Data report in 2019.5 The second part of this update explores newer conceptual developments and methods for measuring open government data and focuses on the notions of “value” and a “practice-based” understanding of open data. 

Methodological Progress in Academic Literature

Academic literature indicates that between 2018 and 2023, the measurement of open data underwent some incremental development, adding to existing bodies of research. Over the past five years, scholars have continued to develop measurements of readiness by assessing various properties of open data, such as quality6 (completeness, metadata quality),7 as well as the performance of open data portals.8 This line of research represents a data-centric kind of measurement, assessing the technical features of data without necessarily considering the use of data. The literature also reflects a concern for data literacy, the factors inhibiting or promoting data use, as well as use case typologies. These typologies are often grouped by outcome (e.g. to facilitate anti-corruption work or to improve public sector efficiency), or more rarely, by activities (e.g. ordering information, raising attention to otherwise unnoticed problems, or correcting errors in open datasets). These topics continue a longstanding concern in academic literature around the enabling the conditions for open data reuse, but generally not focusing on actual use practices. To consider how data is used in concrete situations, Ruijer et al. have proposed a “practice lens” on open data.9

Regarding the measurement of impact from open data, scholars have continued to develop models to connect data provision and use. For instance, in 2019, Ham and her co-authors assessed use cases from open data portals to understand “patterns” of increasing data use.10 Other scholars have considered the impact of open data in certain environments, such as cities,11 or the use of open data as a means of contestation that expands ideals of deliberative democracy.12 13 A paper by Yoon and Copeland from 2019 focuses on how local community data intermediaries use open data to address problems, arguing that more work is needed to support data intermediaries in their data work.14 This line of research continues to explore the impact of open data through data intermediaries, their institutional and organizational setups, data brokerage mechanisms, and role as facilitators of data reuse.15 

The Evolution of Country Rankings and Indices 

Country rankings of open government data continue to benchmark progress toward open data readiness, implementation, use, and impact, and there have been significant changes since the first edition of the State of Open Data. Two of the first international measurement tools, the Open Data Barometer (ODB) and the Global Open Data Index (GODI), have been discontinued because their funders, the International Development Research Centre (in the case of ODB), and the William and Flora Hewlett Foundation (in the case of GODI) stopped funding them. However, the International Development Research Centre has moved funding from the Open Data Barometer toward the new Global Data Barometer (GDB), a measurement project run by the Data for Development Network and Iniciativa Latinoamericana por los Datos Abiertos (ILDA) to measure “the extent to which data is governed, shared, and used for the public good”. Instead of focusing solely on open data as “public good” to be made “open by default”, the GDB assesses how countries create, govern, and circulate different types of data (open, closed, shared) to advance the “public good”. The GDB derives its notion of the public good from the globally agreed Sustainable Development Goals and translates them into different “modules” (e.g. “health” or “climate action”) against which four “pillars” (“availability”, “capabilities”, “governance”, “use and impact” of data) are measured, and the results are aggregated as a composite index of 0-100 points. 

Other regional and international rankings have undergone changes to their methodology. The changes to OECD’s ourDATA Index could not be assessed because its latest results report was not published at the time of producing this update; however, a policy paper based on the 2019 edition, published in 2020, suggests that the index has remained mostly the same and still focuses on indicators of “data availability”, “data accessibility”, and “government support for data reuse”.16 These indicators incorporate several aspects, such as “stakeholder engagement over data quality” or whether national governments promote data literacy programs and monitor impact from open data.

The Open Data Inventory (ODIN) by Open Data Watch continues to focus on the publication of open data by national statistics offices according to a criteria based on coverage and openness. Since 2018, ODIN has undergone several methodological changes, introducing new data categories, such as Food Security or Sex (including sexual violence), renaming data categories to broaden or specify them, changing the geographic granularity (e.g. no longer scoring some data categories at certain administrative levels), adding requirements for providing additional metadata, and accepting SDG indicators as substitute indicators for ODIN indicators. The latest methodology revision took place in 2022.17

The European Open Data Maturity Assessment (EODMA), conducted by the European Commission, wants to help national open government data teams of European member states prioritize high-quality open data publication, support and monitor data reuse, develop portal features, and create more inclusive and participatory governance structures. To do so, it updated its four measurement indices, “open data policy”, “open data portal design”, “impact”, and “data quality”, in 2022 by developing a more granular questionnaire. EODMA suggests that national open data teams monitor specific areas of impact (e.g. government) more strongly than others (e.g. social impact). The most notable methodology change considers the disambiguation of “data use” and “impact”. According to the EODMA’s latest results report, European member states are aware of the conceptual difference between data reuse and understanding impact, but face challenges in developing methods to understand the impact of data reuse**.**18 EODMA currently uses a three-answer approach to questions (yes, no, I don’t know) to assess the impact of opening up data along multiple themes (e.g. to support accountability) and forms of impact (e.g. on government, society, or the economy). It also asks national governments to provide supporting evidence regarding “use”, such as the log files of data portals, feedback mechanisms for users, surveys, or processes to “systematically gather reuse cases”. Regarding impact, EODMA assesses the existence of “various reuse examples”. A look at the raw data provided by each country suggests that European open data teams provide different kinds of evidence (e.g. the existence of public policies, data portals, or third-party applications), with different levels of methodological reliability (e.g. written by academic scholars, crowdsourced, or assembled by open data teams in public sector bodies) and different conceptual and methodological starting points (e.g. providing secondary reviews of original literature, commissioning qualitative case studies, or posting links to a third-party app). 

Overall, the conceptual and methodological updates to global and regional data-measuring indices still leave room for open data teams and researchers to use different kinds of evidence. This has the advantage that indices can compile evidence that is attuned to different country contexts, as well as different perspectives on what counts as a “good” use or impact. It enables indices to take into account a country’s available resources to produce evidence on open data and may involve actors directly responsible for the provision of open data in the production of evidence on open data use and impact, but it also risks conflating different conceptual understandings and methods for measuring and assessing the “reuse”, “impact”, or “public good” associated with open government data. This same observation was included in the first edition of the State of Open Data and it still applies to various indices. It is also noteworthy that the indicators used by transnational indices continue to show topical overlap as identified in the original chapter on measurement with regard to datasets (e.g. types of datasets considered, data quality), stakeholder engagement (e.g. promotion of data literacy programs), reuse, and impact monitoring,; however, the underlying units of analysis, what constitutes acceptable evidence to support the indicator, and how indicators are aggregated, continues to differ across the rankings. 

Measuring Data’s Value and Data Practices

Discussions around the measurement of open data have reflected a growing concern in recent years over the need to better understand the value, governance, and practices of opening up and using data - what should be the goal of opening up data and how should concepts of value, governance, and practice inform measurement and assessment. 

A notable development is an increasing concern around “high-value” datasets. Different jurisdictions have developed different definitions of high value. In the 2018 recast of the PSI Directive, the European Commission, for instance, argued that high-value datasets are defined as “documents whose re-use is associated with considerable benefits for society, the environment and the economy”. Several studies by the European Commission (e.g. on the perspective of data providers19) and member states (e.g. Denmark’s concept of “basic data”20) have sought to further determine high value, illustrating many different understandings of the concept and the different goals of releasing data across the European region. For instance, Denmark defines high value based on whether the data improves the efficiency of public sector bodies. The Netherlands determines high value based on legal duty, transparency requirements, cost reduction, and the potential for reuse. According to the European Commission’s report on the perspective of data providers, public officials base their opinion on whether open data is of high value on aspects like download statistics, technical formatting criteria (e.g. interoperability), and reuse vis-à-vis certain policy areas and target audiences. However, the report remarks on a general lack of awareness of what kinds of impact or value should be measured beyond the technical aspects or download statistics of data. As the report concludes, a provider perspective is not enough as the criteria providers use to assess value, such as data quality or number of downloads, does not assess its potential for reuse. 

By comparison, the Indian government, in a 2020 report by the Committee of Experts on Non-Personal Data Governance Framework, defines a high-value dataset as one “that is beneficial to the community at large and shared as a public good”.21 India’s definition centres around a “public good purpose” for the community that is the defining criterion on whether data can be shared publicly, instead of on a theoretical capacity for potential use. 

The European Union and India are not the only jurisdictions that have proposed definitions of high value data (others include Australia and Canada); however, the European Union and India are useful examples of how to conceptualise high value, albeit from two different starting points: the a priori definition and the subsequent provision of data versus the provision of data based on the planned downstream reuse by a community. Both approaches demonstrate a growing interest in aligning data provision and reuse around the notions of value or benefit in consideration of the interests of particular user groups, yet existing concepts and methods to measure and assess open data use and its benefits may not be adequate. This may be due to the fact that many studies of high value data (similar to the impact of open government data) lack an empirical engagement with how different social groups interpret, assess, and create meaning from open data in different situations. While it is acknowledged that data is not intrinsically valuable and that its value must be realized in practice, existing measurement and assessment approaches lack a sound conceptualization of such practices. 

A 2021 meta-review of open data rankings and academic progress models confirms this observation. As Zuiderwijk and co-authors find, open data use, participation, and user engagement are underdeveloped aspects of open data measurements.22 Open data benchmarks mainly consider the possibility for open data use, participation, or user engagement, rather than qualifying the actual use or user engagement. This is problematic, because it fails to consider how people interact with data, how they ascribe meaning to data, and how they integrate it in their daily practices to provide insights into users, user engagement, and the value created from reusing data. 

Social scientists have proposed a so-called “practice-theoretical perspective” that differs from the more vague concept of data “use”.23 24 25 26 Based on social scientific understandings of practice as distributed across human and non-human agents, a practice-theoretical lens on data looks at how people engage with and ascribe meaning to data through the use of technologies, methods, organizations, regulations, and other aspects that matter in concrete situations. For practice-theorists, the value of data and how it is being used, to what end and effect, can only emerge from a close-up study of such situations.27 28 29 For these scholars, such authors, a more granular description of the practices that data enables (e.g. filtering, sorting, making visible, comparing, predicting) is necessary as an intermediate step to understand how practices with data lead to certain outcomes, rather than focusing the broad brush purposes of data use. The practice-theoretical lens suggests a useful middle path in between assessing high value a priori from the provider side and determining it through unspecified communities and broad use purposes. However, though it's already being applied in fields like communication studies, science studies, and the sociology of health, this practice-based lens is yet to be applied to the methodologies behind the measurements and assessment of open government data. 

Conclusion and Recommendations

The measurement of open data has evolved in recent years. Notable is a shift in the discourse from measuring impact toward defining and measuring the value of data. Rather than seeing open data as a resource to be unlocked and evaluate later downstream, actors have started to focus on the beneficial use purposes of data. Perhaps the most significant development is a departure by some countries from the limited proactive provision of certain types of data toward a provision based more on the use purposes and communities who plan to put it to use. However, in practice, the notions of impact and value remain vague, acknowledging that value is situation-dependent. We might be even more radical and suggest that the value of data is never a given, but always the outcome of valuations of and with data. This shift toward valuing things opens up new avenues to measuring data - as something to be valued and evaluated (e.g. data quality), but also as something that is used to create value. 

This shift toward valuation has not yet translated into more systematic or innovative ways of understanding reuse, users, or the value of data use, but a promising avenue has been proposed by practice-theoretical concepts of how people use data and involve it in acts of value creation. Rather than counting use cases, this view takes as its unit of analysis how data is used in practice and specific situations. Such a view also requires adequate methodologies, such as qualitative, action-based, or participant observation methods. Such methods could also be included in the development of national measurements activities. 

To develop measurements and assessments of open data further, the following recommendations should be taken into account: 

More systematic comparisons of measurement methods should be encouraged. There continue to be many different ways of assessing aspects of open data - this applies to indicators and the concepts measured, as well as the kinds of evidence used to measure them - which risks the ongoing conflating of the means (e.g. apps or their use) with the ends (e.g. societal effects based on app use). Measurements must systematically assess recognised units of analysis (e.g. data downloads or an app), determine what evidence should count as valid, and reflect more deeply on the intended phenomenon to be measured. Actors who develop measurement and assessment concepts and methods can support greater methodological reflexivity by actively sharing and publicly debating what constitutes suitable evidence and lessons learned. A fruitful avenue may be to attempt more cross-pollination with participatory methods for impact assessments of data sharing (e.g. those piloted by data access committees in the UK30). 

There is a wide variety of disciplines and communities engaged in studying the measurement of data use (for instance, research on the valuation of data - see the National Data Guardian for Health and Social Care in England 202231, or Fiske et al. 202232) or in debates on developing  assessments of algorithmic systems.33 Cases like the large-scale use of openly licensed image data for the training of facial recognition algorithms demonstrate the need for connecting aspects of open data publication and reuse with debates on consent, the supervision of algorithmic systems, and their impact in specific settings.34 As a result, measurements and assessments of open government data may enter a more intensified dialogue in order to establish a shared knowledge base on the benefits and risks of open data use. 

Participation, inclusion, and user engagement should not only be a metric but a key principle for the development of measurements and assessments. There is broad awareness that user participation and reuse result in benefits from data. This should translate into including a wider range of actors, such as civil society organizations who can articulate interests in certain data types, as well as new disciplines and their conceptual contributions to debates on measurement. Since methods do not simply reflect different aspects or viewpoints on the same topic, but actively shape how we understand open data, it is important to encourage diversity and collaboration among people from different disciplines and with different expertise. 

The production of evidence should be actively supported and funded, and quality criteria for suitable evidence should be defined to ensure a minimum set of requirements. For instance, to produce qualitative case studies, national and local open data teams could support or partner with researchers doing qualitative research, action research, or participant observation. This should also improve the reuse of existing evidence. 


  1. 1: * Daly, et al. Good Data, 2019, https://networkcultures.org/blog/publication/tod-29-good-data/
  2. 2: * Lämmerhirt, Chrzanowski, and van der Waal, 2019, What data counts in Europe? Towards a public debate on Europe’s high value data and the PSI Directive https://blog.okfn.org/2019/01/16/what-data-counts-in-europe-towards-a-public-debate-on-europes-high-value-data-and-the-psi-directive/
  3. 3: * Desrosieres, 2007, Measurement and its Uses: Harmonization and Quality in Social Statistics https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1751-5823.2000.tb00320.x
  4. 4: * For the keyword-based literature review the Web of Science database was used to search for academic journal articles referencing the terms [open data and use], [open data AND impact], and [open data AND measurement] in their titles, abstracts, or text body. Studies of the last five years were considered, and their abstracts were analyzed according to the following aspects: 1) What is their unit of analysis?, 2) Data sources used for the assessment (e.g. document analysis of secondary studies, interviews), 3) Analysis method.
  5. 5: * https://stateofopendata.od4d.net/chapters/issues/measurement.html
  6. 6: * Ehrlinger Lisa, Wöß Wolfram, A Survey of Data Quality Measurement and Monitoring Tools, Frontiers in Big Data Vol 5, 2022, https://www.frontiersin.org/articles/10.3389/fdata.2022.850611
  7. 7: * J. Nogueras-Iso, J. Lacasta, M. A. Ureña-Cámara and F. J. Ariza-López, "Quality of Metadata in Open Data Portals," in IEEE Access, vol. 9, pp. 60364-60382, 2021, https://ieeexplore.ieee.org/document/9405650
  8. 8: * Abella, A., Ortiz-de-Urbina-Criado, M., & De-Pablos-Heredero, C. (2022). Criteria for the identification of ineffective open data portals: pretender open data portals. Profesional De La información, 31(1). https://doi.org/10.3145/epi.2022.ene.11
  9. 9: * Ruijer, E., Grimmelikhuijsen, S., van den Berg, J., & Meijer, A. (2020). Open data work: understanding open data usage from a practice lens. International Review of Administrative Sciences, 86(1), 3–19. https://doi.org/10.1177/0020852317753068
  10. 10: * Ham, J., Koo, Y. and Lee, J.-N. (2019), "Provision and usage of open government data: strategic transformation paths", Industrial Management & Data Systems, Vol. 119 No. 8, pp. 1841-1858. https://doi.org/10.1108/IMDS-04-2019-0218
  11. 11: * Fátima Trindade Neves, Miguel de Castro Neto, Manuela Aparicio, (2020)
  12. 12: * Amanda Meng, Carl DiSalvo, Lokman Tsui & Michael Best (2019) The social impact of open government data in Hong Kong: Umbrella Movement protests and adversarial politics, The Information Society, 35:4, 216-228, DOI: 10.1080/01972243.2019.1613464
  13. 13: * Gray, Jonathan, Towards a Genealogy of Open Data (September 3, 2014). The paper was given at the General Conference of the European Consortium for Political Research in Glasgow, 3-6th September 2014., Available at SSRN: https://ssrn.com/abstract=2605828 or http://dx.doi.org/10.2139/ssrn.2605828
  14. 14: * Yoon, A. and Copeland, A. (2019), "Understanding social impact of data on local communities", Aslib Journal of Information Management, Vol. 71 No. 4, pp. 558-567. https://doi.org/10.1108/AJIM-12-2018-0310
  15. 15: * Open Data Intermediaries in Developing Countries, van Schalkwyk, François; Chattapadhyay, Sumandro; Caňares, Michael; Andrason, Alexander, 2015, https://idl-bnc-idrc.dspacedirect.org/handle/10625/56288
  16. 16: * OECD Open, Useful and Re-usable data (OURdata) Index: 2019 OECD Policy Papers on Public Governance No. 1, March 2020 - Jacob Arturo Rivera Perez, Cecilia Emilsson & Barbara Ubaldi, https://www.oecd.org/gov/digital-government/policy-paper-ourdata-index-2019.htm
  17. 17: * ODIN 2022/23 Methodology Guide https://docs.google.com/document/d/1q1h0\_z0TUGayO-qN9o3ablmo\_qVdSGgPgU\_Ptq5xrdU/edit
  18. 18: * Open Data in Europe 2022, https://data.europa.eu/en/publications/open-data-maturity/2022
  19. 19: * High-value datasets, 2020, Huyer, Esther and Blank, Marit, https://op.europa.eu/es/publication-detail/-/publication/5b20f52a-db7e-11ea-adf7-01aa75ed71a1/language-en
  20. 20: * Good basic data for everyone - A driver for growth and efficiency (2012)
  21. 21: * Expert Committee Report on Non-Personal Data Governance Framework, 2020, https://prsindia.org/policy/report-summaries/non-personal-data-governance-framework
  22. 22: * Anneke Zuiderwijk, Ali Pirannejad, Iryna Susha, Comparing open data benchmarks: Which metrics and methodologies determine countries’ positions in the ranking lists?,
  23. 23: * Ruijer, E., Grimmelikhuijsen, S., van den Berg, J., & Meijer, A. (2020). Open data work: understanding open data usage from a practice lens. International Review of Administrative Sciences, 86(1), 3–19. https://doi.org/10.1177/0020852317753068
  24. 24: * Lämmerhirt, Micheli and Schade (forthcoming) https://www.cambridge.org/core/journals/data-and-policy/special-collections/practices-of-data-driven-innovation-in-the-european-public-sector
  25. 25: * Marcus Burkhardt et.al, Interrogating Datafication:Towards a Praxeology of Data, (2022) https://www.transcript-publishing.com/978-3-8376-5561-2/interrogating-datafication/
  26. 26: * Van Maanen, G. (2023). Studying open government data: Acknowledging practices and politics. Data & Policy, 5, E3. doi:10.1017/dap.2022.40
  27. 27: * Christin Angele, What Data Can Do: A Typology of Mechanisms, International Journal of Communication (2020), https://ijoc.org/index.php/ijoc/article/view/12220
  28. 28: * Leonelli and Tempini, Data Journeys in the Sciences,
  29. 29: * Fiske, A., Degelsegger-Márquez, A., Marsteurer, B. et al. Value-creation in the health data domain: a typology of what health data help us do. BioSocieties (2022). https://doi.org/10.1057/s41292-022-00276-6
  30. 30: * Lara Groves, Algorithmic impact assessment: a case study in healthcare (2022) https://www.adalovelaceinstitute.org/report/algorithmic-impact-assessment-case-study-healthcare/
  31. 31: * https://www.gov.uk/government/publications/what-do-we-mean-by-public-benefit-evaluating-public-benefit-when-health-and-adult-social-care-data-is-used-for-purposes-beyond-individual-care
  32. 32: * Fiske A, Degelsegger-Márquez A, Marsteurer B, Prainsack B. Value-creation in the health data domain: a typology of what health data help us do. Biosocieties. 2022 Apr 12:1-25. doi: 10.1057/s41292-022-00276-6. Epub ahead of print. PMID: 35432575; PMCID: PMC9002030.
  33. 33: * Watkins, Elizabeth , Moss, Emanuel, Metcalf, Jacob, Singh, Ranjit and Elish, Madeleine Clare, Governing Algorithmic Systems with Impact Assessments: Six Observations (May 14, 2021). AAAI / ACM Conference on Artificial Intelligence, Ethics, and Society (AIES), 2021, Available at SSRN: https://ssrn.com/abstract=3846300
  34. 34: * Olivia Solon, Facial recognition's 'dirty little secret': Millions of online photos scraped without consent, NBC News (2019) https://www.nbcnews.com/tech/internet/facial-recognition-s-dirty-little-secret-millions-online-photos-scraped-n981921
Previous chapter

Continues in 23. Privacy

Next chapter