Digging into Mandatory Disclosure Data: Highlights from ONE’s DataDive with DataKind UK

Kate Vang, Data Scientist, The ONE Campaign
Joseph Kraus, Director, Transparency & Accountability, The ONE Campaign

Published 18 December 2017
Updated 11 January 2018

Over the weekend of November 25-26, ONE had the unique privilege of partnering up with dozens of data scientists at DataKind UK’s Autumn DataDive. After years of advocating with our partners for extractive companies to publish information about their payments to governments, we finally had a large set of data on these financial flows at a granular, project-by-project level. This data is important because it helps enable citizens to demand that their country’s natural resource wealth goes towards things like education, health, infrastructure and poverty eradication.

Our challenge for the weekend was to dive into this new mandatory disclosure data, alongside voluntary payment data from the Extractive Industries Transparency Initiative (EITI), and answer a set of complex questions — with the help of our volunteer, expert data wranglers. This wouldn’t have been possible without the heavy lifting of the data team from the Natural Resource Governance Institute, who have scraped thousands of pages of pdf documents to build a tidy data set of mandatory disclosure payment information. Part of our objective for the weekend was to put this groundbreaking data to use, and to explore ways of making it actionable for researchers and advocates.

After 19 hours of intensive work, presentations, camaraderie and pizza-eating, here’s a look at what we learned.

The Data

The data set we analysed contained details on $292bn of payments made by 499 companies, related to projects located in 135 different countries. Most of the data is for payments made in 2015 and 2016, although a smaller number are for 2014 and 2017. $44bn (15%) of the payments reported were made to governments in Africa, with the majority to Angola ($17bn) and Nigeria ($15bn). The payments relate to approximately 3,400 different extractives projects around the world, although companies vary in how they define a “project”. But this was just a snapshot — the data is regularly updated by NRGI as companies make more disclosures and as new PDFs are ingested.

Debunking Industry Claims with Data

As we push for this data to be made available by all companies around the world, we often run into resistance and push-back from the oil, gas and mining industries. What we found in the data sheds new light on long-running debates.

One industry group, the American Petroleum Institute (API), has staunchly opposed efforts to require this payment information to be published in the US, in part because it claims that doing so would be too burdensome. But our volunteers found that several API member companies are already publishing this information in other jurisdictions. In fact, we found that API members – such as Shell, Chevron, BP, and others – have already disclosed at least $145 billion of payments to governments, many through subsidiaries. This represents nearly half of the total payments reported so far. That undermines the API’s assertions that publishing these reports under US law would be burdensome, since many are already doing it anyways as required by EU and Canadian laws.

Our analysis of the data also fully debunks another evidence-free claim advanced by the API, namely, that four countries (Angola, Cameroon, China, and Qatar) would prohibit them from disclosing payments and punish them if they did. Guess what? Five of the largest API members we identified in the data have collectively reported payments of nearly $20 billion to those countries, without experiencing any negative effects. The data undermines their claims that publishing this information was prohibited or would cause them harm.

Some opponents of this data also claim that publishing the information would put them at a competitive disadvantage to state-owned competitors, on the assumption that state-owned companies would not need to report their payments to their own or other governments. However, the data shows that this is simply not the case: state-owned companies account for 2 in 5 of the total payments reported to date. These include several large state-owned companies from countries like Russia (e.g. Gazprom and Rosneft) and China (e.g. CNOOC) that are hardly models of transparency, as well as Norway (Statoil). (See Figure 1).

Figure 1: Many of the companies with the largest reported payments are state-owned.

Visualising Payment Flows

A team of volunteers also set out to visualise the data in ways that would make it more actionable for activists and journalists. In doing this, we found that the mix of payment types varies widely across company and recipient government — and that there were differences in the mix of payments made to African governments vs. non-African governments.

One team focused on payments to Africa and visualised the payments from the top 20 companies in an interactive Sankey diagram (Figure 2). Doing so revealed that production entitlements were the largest payment type made by these companies to African governments, and that the majority of payments unsurprisingly went to Angola and Nigeria, the continent’s two largest oil producing countries.

Figure 2: A Sankey diagram showing payments made to African governments from the 20 largest paying companies.

When we plotted the same companies’ payments to governments outside of Africa, the picture looked different: taxes represent a larger share of the payment mix (see Figure 3). This reveals an interesting issue that merits further exploration. Production entitlements often flow to state-owned entities in the form of in-kind payments (e.g. barrels of oil). While this can be a legitimate arrangement, state-owned entities can be notoriously opaque, particularly in Africa, where several such companies have come under scrutiny in recent years for misplacing or mismanaging billions in revenues. The revelation that these types of payments are more extensively used in countries like Angola and Nigeria, where state-owned oil companies are particularly secretive and scandal-prone, highlights the importance of more closely examining these types of payments to ensure that they are handled appropriately.

Figure 3: A Sankey diagram that shows the same companies as the previous, but now reflecting the payments they made to governments outside of Africa.

Maps also featured at the Data Dive as the data experts attempted to link project-level data to individual concessions using OpenOil’s Concession Map. In time, this could be a great way for activists to explore this new payment data. However, more work will be needed in cleaning the project names so that we can cleanly link them to individual concessions.

The volunteers also tried using machine learning techniques such as clustering to identify patterns in the types of payments that companies make to governments. Clear patterns emerged (see Figure 4), so we think that this approach could eventually become a tool to help researchers to spot “red flags” in the data.

Figure 4: An example of clustering analysis of the payments data.

In-Kind Payments

Another team of volunteers worked with Alex Malden of NRGI to explore in-kind payments. This partially included the production entitlements described earlier, but also meant analysing the free-text notes and annotations that companies use to describe the payments data.

For example, ENI’s 2016 Report on Payments to Governments shows taxes and royalties paid to Libya’s National Oil Corporation (see Figure 5). Footnotes on these payments explain that at least part of these payments were made as direct transfers of oil instead of cash.

Figure 5: Excerpt from ENI’s 2016 Report on Payments to Governments (PDF) showing notes about in-kind payments.

Over the weekend, volunteers developed a provisional methodology to flag line items in the data that refer to in-kind payments. Using this methodology, they estimated that roughly 20% of all payments reported in the mandatory disclosures are made in-kind. This equates to roughly $80 billion of value — a huge number that highlights the urgent need for more transparency on the volumes and transfer pricing of non-cash payments. We also found that the share of in-kind payments varied significantly from one receiving government to the next. Doing further work to perfect this methodology will allow investigators to target their investigations to the areas most susceptible to corruption.

Linking the Data

While this new data tells us a great deal, we think its real potential will be realised when it is combined with other available data — such as data from the Extractives Industry Transparency Initiative (EITI), commodity data, budget data, financial statements, corporate ownership data, contracts, and more. So part of our exploratory work at the Data Dive involved trying to build methodologies to link the information from the mandatory disclosures to these other data sources. This proved to be difficult, but in the process we learned a lot about the specific challenges we face and the next steps to overcome them as a community.

One team focused on the EITI data, with the aim of linking individual companies between it and the mandatory disclosures data. Why did we hone in on this data? In short, we think that finding a way to combine them could result in a more comprehensive picture of extractives payments. EITI member countries submit annual reports that detail the payments their governments receive from extractive companies. In essence, the information provided through this process is similar to what companies are supposed to report in the mandatory disclosures, particularly going forward since EITI countries will soon begin reporting project level data. But since neither EITI nor the mandatory disclosures are yet implemented globally, the two data sets each reflect a different, overlapping patchwork of countries and companies. Linking them together would enable us to compare two different accounts of the same underlying system.

Using text matching tools on company names, the team was able to find 35 companies from the mandatory disclosures in the EITI data. While this number was small, the matched companies accounted for over 40% of the total financial flows in the mandatory disclosures. These overlaps can now be analysed in further detail to check for validity and consistency.

But we also saw very clearly what we were missing: a tidy dataset of company ownership information. The entities reporting the mandatory disclosures were predominately large parent companies while the entities reported in the EITI data were usually smaller, local operating subsidiaries. We used text analysis of the company names to link some of these together, but we know this method left a lot of connections uncovered. The next step would be to locate information from the larger parent companies about their subsidiaries and build a comprehensive ownership dataset, which could then be used to decisively connect the two data sets. We look forward to continuing this work with OpenOwnership and the wider community of partners.

Linking companies solely through text matching proved to be a messy and time-consuming process. So a team also worked on connecting the EITI data to OpenCorporates, which maintains a vast data set of corporate entity information organised with unique corporate IDs. At first we were able to make exact matches on 230 names, which represented c.23% of the total flows reported in the EITI data. After a lot of cleaning and fuzzy matching the volunteers found matches for 600 names, which correspond to c.28% of the financial flows. We would love to share our code and learning from this work with others who are keen to help take it forward. This work also highlighted the value of Legal Entity Identifiers being incorporated in all company reporting, as our research would have been much easier if we could easily identify and link unique corporate entities.

What’s Next

As a community, we are still at the beginning of a journey to maximise the potential of data about governments’ natural resource revenues. But the DataDive was an energetic, whistle-stop tour of a groundbreaking data set: we left with a deeper understanding of what the data meant and feel inspired by the new questions and possibilities that volunteers unlocked. In coming weeks we will publish further detailed documentation of the work done at the dive, along with links to code. Please contact Kate Vang or Joseph Kraus with questions, contributions, or to discuss anything in more detail.

All of us at ONE give huge thanks to DataKind UK, to NRGI and to all of the DataDive volunteers. This project would not have been possible without the incredible volunteer Data Ambassadors: Victoria Bauer, Stephen Gaw and Nick Jewell. And a special thanks to the Elsevier Foundation and University College London for sponsoring the event.

This post was revised on January 11th, 2018 with updated figures.