1st Edition 2nd Edition

Renewing Open Data Action

What does the future hold for action on open data? Is advocacy for open data relevant anymore? Or should we abandon openness in favour of other frameworks that more directly address questions around data and power? The articles in this collection ultimately make the case that openness remains an important ideal: a value worth holding onto and working toward. They demonstrate that open data practitioners have amassed a depth of critical learning around putting openness into practice in complex settings. Yet, the need remains for a renewal of the political project at the heart of open data advocacy: creating a world in which both corporate and state data power can be effectively called to account, and in which, the opportunities to use data to deliver insight and innovation are accessible to all, rather than increasingly enclosed.

The renewal called for is not some long-term distant project but is one that is urgent and timely. As we look around us, we perceive a moment when debates over data (often framed at this particular moment in terms of artificial intelligence) have once again taken centre stage in many countries and global fora. Yet the language of openness is often absent or co-opted in these debates. Today, all stakeholders should be aware of the imbalances of power that the digital transformation has brought to the World, bringing with it digital inequalities and a stagnation of practice. For many, open data was conceived as a tool to challenge imbalances of power, and though in places, progress has been made, in too many places, the situation today has become worse. There is then a pressing need for more strategic advocacy, coupled with global, regional, local, and thematic coordination to make sure the values and visions that brought together many open data advocates in the past are not lost as new data futures are charted.

Facing rapid shifts in political and public attitudes toward data in general, and new governance spaces and fora rapidly emerging to shape the future of our national and global digital infrastructures, the open data community cannot afford to remain dispersed and spread so thin. In many ways, the structure of the State of Open Data reflects the ‘sectoral perspective’ that open data advocacy took in the middle of the last decade. While this has enabled significant wins, it has also left advocacy for openness scattered. However, taken together, the essays in this revised edition of the State of Open Data can offer a renewed departure point, and knowledge base, to help seize the moment for advocacy and mobilisation.

There is, in our view, an urgent task to recalibrate openness and open data to truly serve as an accelerator of sustainable development. Drawing on common threads from across the articles in this updated State of Open Data collection, we put forward three shifts that we believe are needed to advance a renewed open data agenda and to help the advocates, activists, and architects of openness to engage with the required changes. These shifts are not entirely new directions. They reflect existing trends within open data movements and paths many people are already travelling. Our goal here is to put the spotlight on these shifts and encourage a renewed and strengthened focus as we head toward the middle of the decade.

Shift 1: From ad-hoc open data to open data as the backbone of digital public infrastructure

The early years of this century witnessed unprecedented advances in projects, policies, and initiatives to drive greater transparency, accountability, and citizen participation. Information and communication technologies, and their ability to rapidly and cheaply scale, combined with affordable access to hardware and connectivity, all coupled with energetic democratic and participatory politics, appeared to herald an era of sunlight in government affairs and innovation in the delivery of public services.

Direct communication between governments and citizens, mediated through technology and data, seemed able to support a better understanding of important public interest issues, such as tax expenditure, citizen security, and public health. A handful of private companies, many based in Silicon Valley, flourished in parallel and often assisted citizen-led initiatives to hold States more accountable. An increase of space for collaboration, activism, and even dissent, transcended borders. The template was set for new forms of civic engagement focused on solving local and global problems using accessible technologies and principles of open access and voluntary collaboration. Examples included activism around software freedoms, free culture, the open data movement, and the Wikipedia collaboration network.

Yet, in all the excitement about the potential that might be unlocked through open data and open technology, data and its power should have been analysed more thoroughly. In practice, its value was diffuse, its materiality abstract, and its governance an unresolved debate that never truly moved beyond discussions of privacy. The result has been almost two decades of predominantly ad-hoc open data initiatives - with data that could and should be central to public decision-making often maintained as a ‘public’ resource only through poorly resourced portals, volunteer supported projects, and weak informal agreements. As open data has transitioned from a novel and exciting idea to old news, the maintenance and development of public open data infrastructure has only eroded.

Meanwhile, governments at al levels have become more and more data-driven: with both public and private datasets intertwined with electoral politics, policy-making, and crisis responses, as evident in the pandemic and climate crisis management. All the time that open data has been making slow progress, those Silicon Valley-based firms that were early open advocates, have doubled down on investment in vast private datasets and data infrastructure and on seeing these embedded as dependencies within government administrations. The rise of machine learning and generative AI in particular could see public decision-making reliant on proprietary and private data infrastructures in ways that were unimaginable when the first open data advocates were calling for the release of powerful state datasets.

It can, of course, be tempting for cash-strapped states to turn to private sources and to turn away from improving their own data quality and openness. Yet, this would be exactly the wrong response. As pointed out in chapters across this collection, it is sustained investment in open data infrastructures that will deliver public value.

When we consider transport infrastructure, it is not uncommon to find that even a small town might be investing more in road network than the whole country invests in its public data infrastructure. Maintaining the technical standards, software platforms, data collection, quality control, and community management around key public data need not be prohibitively expensive, but too often, data stewards are faced with inadequate budgets or legacy funding models that prevent data being made open and accessible.

A systemic approach needs to incorporate data in the public sector cycles, integrate different data sources across institutions and sectors, and establish the basis for future, integrated intelligent systems. Investment in the technical data infrastructure also needs to be matched by strategic investment in skills and capabilities across the economy to maintain and use data more effectively within the legal and ethical frameworks required to support meaningful data governance.

When the global financial crisis hit in the early years of open data, there was a narrative shift from talking about open data as a tool of empowerment to an emphasis on open data as a means of outside-innovation and cost-savings for governments. But a decade of selling open data cheap has led to most missing out on the value that modest but sustained investment can bring, while at the same time, the private capture of data, and state reliance on proprietary services, has grown.

This infrastructural focus is complementary to, and compatible with, the ‘third wave of open data’ advocated by Verhulst et. al, which supports purpose-directed release of specific datasets and the fostering of collaboration and partnership around re-use for public good. At the same time, it draws attention to the need to prioritise public infrastructures that are interoperable, scalable, adaptable to different types of data needs, and updatable to accommodate changes in those needs.

Led by India, which presided over the G20 in 2023, discussions of digital public infrastructure (DPI) have risen up the international agenda. Commitments to DPI were explicitly mentioned in the leaders' declaration, as well as in a high-level discussion facilitated by the UNDP and the ITU at the 78th UN General Assembly. Contributions to the G20 debate have gone as far as to describe open data as the backbone of digital public infrastructure. A renewed mainstreaming of open data initiatives at the state level within the framework of DPI might lead to updated commitments to sustainable data investment and to allocating more resources and skills around open data by governments in the delivery of public tasks.

Unfortunately, with the exception of the newly created Digital Public Goods Alliance, the participation of the open data community remains marginal to the evolving discussions around DPI, where the main actors are states, international organisations, and companies. And it is uncertain whether the commitment to openness will extend to other components of DPI that need to be invested in and deployed by governments.

As the chapters in this collection make evident, we must continue to argue that open data is critical to ensuring that digital services and systems are reliable and accountable, therefore, making it worthwhile to invest in its operations, functionality, and maintenance, as well as in the organisations and communities involved in contributing to or maintaining it and the regular users who access it.

The question, of course, is how do we deliver this shift? While it is within the purview of an individual data steward or organisation to purposefully publish their datasets - creating interoperable national and global data infrastructures is a much larger task. It seems that the Digital Public Infrastructure and the One Future Alliance fund announced by G20 to support the deployment of DPI in low and middle-income countries could be an opportunity to achieve interoperability, scale, and sustainability of open data if the opportunity is seized effectively; however, the means to realise this potential, we contend, is found in the second shift required.

Shift 2: From supporting implementation to strategic advocacy and oversight

In the first edition of the State of Open Data, Christopher Wilson argued that civil society had been a driving force in the open data agenda, often playing a complex role as promoters, facilitators, consultants, and critics of open data initiatives. Although the range of organisations focussed on open data has thinned a little, this trend has continued with non-governmental organisations and individuals focussed on open data taking on an even broader range of roles, including capacity building and data governance, and working at multiple points on the data spectrum from shared to open.

If we are serious about delivering robust public open data infrastructures, open data expertise cannot be spread so thin. Instead, the lessons learned from more than a decade of open data practice need to be turned into renewed and refreshed policy advocacy, with civil society (and reforming public servants and firms) playing a more focused role in demanding that the data infrastructures upon which policy and services are based are publicly governed and open by default. In short, civil society action is needed to keep the ecosystem loyal to the mission of providing public value, both through how data is made available and through how communities of users are engaged and supported.

This advocacy needs to be paired with a primary focus on accountability. In 2024, we recognise that there are many places where this is critical: whether to respect indigenous data rights, to navigate legitimate privacy concerns, to mitigate against potential harms from data misuse, or to prevent the exploitation of resources in ways that ultimately undermine the public good. After all, we should recognise that just because the bits and bytes of digital data have a low marginal cost does not mean that their use is not at times costly (think about the costs of training machine learning models, which can be prohibitive to many potential users), nor that it may not have costly externalities and effects.

The point of ‘open by default’, however, is that deviations from the default need to be justified and explained rather than the other way around. This calls for oversight to make sure that the default and deviations from it are not incorrectly or unfairly applied. As a number of the essays in this collection hint, it is easy for openness to drop down the agenda when governments are focused on their own internal data analytic needs or on getting access to new private sector data flows. Equally, significant private interests are at play seeking to create new enclosed data infrastructures. Without effective outside scrutiny, states may quickly find themselves locked in once again to proprietary data ecosystems where value is captured for private or political gain rather than shared for the public good.

As discussed in the previous section, the recent commitments to digital public infrastructures and the funds allocated to it by the most powerful nations to assist others represents is a critical opportunity to implement this strategic oversight with the UN system as an ally. Other agencies have affirmed that open data is fundamental to achieve the sustainable development goals with an eye on incorporating open data as the backbone of their digital public infrastructure initiatives in the years to come. The private sector is also involved, providing technologies and other resources for the creation or reuse of digital public goods.

Being present in the rooms where decisions are made is vital. Where open data activists previously sought access to datasets themselves, we argue that they should now be seeking access to the places where data-related decisions are made, shining a light on the who and how of decisions about data infrastructure, and asking for a seat at the table to advocate for the public interest within systems of data governance. This is the greater leverage point for change. Accelerating action in this area will require funders to support not just data use projects but to invest in the long-term capacity-building and in-depth labour required for grassroots organisations to engage with ongoing working groups, standard-setting processes, or governance boards.

When we talk about oversight and accountability, our focus should not just be on governments. It is more important than ever to consider which parts of the data collected in the private sector should be considered part of public infrastructure, and how. Just like transport networks, which in practice are often provided through a mix of public investment and private enterprise, there can be conditions placed on private actors that catalyse a selective shift away from private-sector data being shared only as a philanthropic or voluntary gesture to data being released according to public mandate and public interest principles. The European Union project to create sectoral data spaces that can facilitate the exchange or open sharing of data derived from both public and private organisations, offers one route toward this. It is also an example of the kinds of data infrastructure that will require civil society oversight if open data principles are to be applied in ways that maximise public value. There is long overdue work to be done that is focussed on the private sector and its commitment to open its data. Furthermore, our communities should actively advocate for the mandatory release of data in open formats when the data is urgently needed for, or strategic to, public interest situations, such as pandemics, natural disasters, or measures to mitigate the impact of climate change. Such sectoral shifts can only happen through legislation or strict policies for its impact to be lasting.

Academia also has a critical role to play in safeguarding open and accountable data infrastructures and ecosystems: both as a part of civil society and by playing its own unique role. Although few articles in the State of Open Data collection touch directly on scientific data, it is vital that mandates for open access research data continue to develop and become embedded as a norm, and that academics champion the value of open data, including in collaborations with private data sources. Academia also has an important role in providing infrastructure, support, and archival capacity for community-generated data, acting as a custodian of critical non-governmental datasets that can support better policy-making and social action.

For some organisations, focussing on advocacy and oversight on questions of how datasets are created and shared will be a return to old practice. For others, particularly traditionally sector focussed organisations, it may be a new departure. The case for making this shift involves taking seriously the threats to openness and the idea that it would be wrong to drop ‘open by default’, even if we now recognise more cases where the final access arrangements around data need to be more nuanced. Reconnecting open data with wider movements for open knowledge and other forms of openness is an important part of allowing shared learning across contexts, and more unified campaigning, to defend and extend the principle that in the modern world, the value that is to be found in data, information, and knowledge should be shared with all.

In focussing on the advocacy and oversight roles of civil society, we do not mean to ignore the strategic role that startups, social enterprise, and open-source collaborations can play in shaping data ecosystems. Interventions to build tools, develop services, or create products that facilitate data sharing and enable more people and communities to access the value in specific datasets or kinds of data are vitally important. However, on their own, these actions rarely create the kinds of scalable open infrastructures we need to build future societies and economies on the basis of open knowledge, rather than on new enclosure practices and the corporate capture of our data and knowledge commons. We hope that where enterprises have found sustainable business models built on open data, they will re-invest some of the dividends into the campaigning and governance labour required to keep the commons healthy. We also hope that such initiatives, especially the local ones, can be leveraged to contribute to national commitments to build digital infrastructures.

Shift 3: from open to open and inclusive 

As a movement fostered by borderless online communication and input from well-connected communities, open data has always been a more-or-less a global movement, with advocacy and experiments in North America and Europe joined by educated and internationally connected leadership from Latin America, Africa, and Asia. Development sector interest and funding helped to bring missing regions on board. However, for all the global diversity of the open data movement, open data discourse was once predominantly driven by technically or legally trained elites, focussing on securing access to data free from technical or legal restrictions in order for the data to flow. Far less attention was paid to openness as space-making - opening the door for anyone to come in. The initial goal was to reduce legal and technical barriers, so everyone could participate and create their digital ventures and public interest projects without the cost and complexities of thosee technologies and legal structures present at the time. As the internet of creation faded away, it became urgent to open the door not only to creators but to the people impacted by the technologies being deployed, the data being collected, and the promises of digitisation unfulfilled. Now we need to invite diverse communities into co-creation of our data future and into the oversight and advocacy described above.

As Mor Rubenstein argues in the gender equity article for this collection, “If people are missing from the creation of data infrastructure tools, the data will tend to be more biased… and reinforce existing power dynamics”. We need to always think about who is at the table, regardless of whether that table is in the boardroom where data collection decisions are made or at a hack-day where new re-uses of data are imagined and developed. Mor draws particular attention to the need to foster more inclusive pathways into the roles that end up defining data, noting that: “not many women have the privilege to study how standards or data governance frameworks work.”

The need to accelerate the shift to a more inclusive open data community is essential if the two shifts above are to help deliver equitable sustainable development. Investments in data infrastructure need to be designed with multiple perspectives considered, and marginalised groups have a critical role to play in the oversight and scrutiny of how data is collected, managed, and used. Where past open data advocacy was often framed in terms of property rights (or, more accurately, in terms used to challenge exclusive property rights), calling for it to be treated mores a commons, we need to shift to a focus on substantive human rights, exploring how access to, and the use of, open data can help support individual and collective self-determination and the delivery of public goods.

In an era when the strengths, weaknesses, and biases of AI systems are conditioned by the data that is available for them to train on, we need to actively consider how to create a more inclusive information ecosystem. Rather than focus on ‘low-hanging fruit’ and easy-to-publish datasets and fields, we should pay attention to data that represents a variety of human experience and perspectives and on the data publication and use that has the best chance of distributing value more widely and equitably.

We outlined in the introduction that, in many ways, the open data field has already been on this path, moving from a focus on datasets to a focus on ecosystems and stakeholder engagement. Our call is for continued and accelerated action in this direction, emphasising participatory approaches that can enable wider groups to influence the direction of data infrastructure without necessarily requiring in-depth technical knowledge or extensive resources. A transformative approach to open data cannot be about a transfer of power from existing data holders to just another group of empowered individuals. Instead, it needs to involve the ongoing distribution and sharing of that power through all stages of the open data lifecycle.

Concluding remarks (bringing together the three shifts)

Without the open data and open knowledge movements, our current data and technology landscape might look very different. Innovations and data-analytics capabilities now embedded within government service delivery would have been slower to emerge and would have had a considerably more corporate character. AI developers may have faced bigger challenges in securing training data for their models, and progress toward development goals would be even more challenging to monitor.

Some might argue that open data movements have run their course. New narratives are needed to shape the future of data. While we don’t disagree with the need for a new agenda, we don’t believe that we are done with the idea of open data yet. Sharing out the value of data as widely as possible, and critically building shared, open, and generative data infrastructures, remains as important as ever. Open data is not just a call to publish bits and bytes: it is an invitation to see, debate, and shape the data that shapes us. And it is also the basis, the backbone of, a robust digital infrastructure capable of providing public goods equitably and at scale.

The next five years will be key not only to achieving the SDGs but also to meeting the Paris Agreement goals on climate change. There is broad consensus that data will play a critical role in both accelerating and scaling solutions with lasting impact, from climate crisis monitors to entire educational platforms. We need to make sure we have an open by design approach embedded in shaping these emerging digital infrastructures of the future.

Only by embedding openness into our digital infrastructures will it be possible to achieve impact over the long-term - maintaining spaces for debate, innovation, and continual development. But for that to happen, it is necessary to have active and united advocacy: for practitioners who have been solving specific problems with an open approach to renew a much more collaborative approach and inform the new international digital cooperation models that are emerging. Ultimately it is up to the open data communities represented across this collection of essays, united in their diversity, to seize this political moment and bring about a transformative and more equitable future.

About the authors

  • Renata Avila is an international human rights and technology lawyer and openness advocate.  She is associated with the Center for Internet and Society at CNRS, France and a Network affiliate of the Stanford Institute of Human-Centered Artificial Intelligence. Currently, she is the Open Knowledge Foundation CEO and participates on several organisations' boards, including Open Future, Common Action Forum and the Just Net Coalition. She co-founded the <A+> Alliance for Inclusive Algorithms and the Progressive International. She regularly contributes to different publications in English and Spanish.

  • Tim Davies is a researcher, practitioner and facilitator focussing on participatory governance of, and with, data and technology. Tim was co-editor of the first edition of The State of Open Data, and has worked extensively on the development of data standards for transparency, accountability and collaboration. He is currently Director of Research and Practice for Connected by Data, the campaign for communities to have a powerful voice in data governance.