Cosmin Marginean

February 19, 2023

Using BODS RDF to link Beneficial Ownership records with other datasets

edge2edge-media-t1OalCBUYRc-unsplash.jpg


BODS RDF enables a series of compliance use cases like UBOs, ultimate parents, and others, to be expressed more naturally using a graph language. However, another motivation behind the proposal was to create a language for linking this data to other sources, and to be able to answer questions across multiple domains and datasets.

Linking beneficial ownership records with other sources is a use-case-specific endeavour, and very much dependent on the quality of the third-party datasets. In this exercise we're focusing less on the use case, and more on the data-linking capability itself, but the principles apply to many other problems.

For our purposes, we chose the Free Company Data Product from Companies House, which provides basic company information that should be easy to link with the Open Ownership register records.


Converting the Free Company Data Product to RDF

This CSV contains over 50 fields, but for our needs here, we've selected a small subset of data points which are well-structured and can be used to build some interesting SPARQL queries.
  • URI - UK company identifier which dereferences in various formats, including RDF. This will be the subject in the triples we'll be processing.
  • Accounts.NextDueDate - The date when the next accounts filing is due for a company.
  • Mortgages.NumMortOutstanding - Number of outstanding mortgages for a company.
  • SICCode.SicText_[1..4] - The SIC activity classification for a company, in the form "68310 - Real estate agencies". We'll be extracting both the text and the numeric code for this field.

With these fields we're going to produce a small set of RDF triples for each company. We've been quite liberal with the naming, but we tried, where possible, to use the Companies House Linked Data Services vocabulary.

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ch:  <http://data.companieshouse.gov.uk/doc/company/> .
@prefix cht: <http://www.companieshouse.gov.uk/terms/> .

ch:11347643
    cht:SICCode             "68310"^^xsd:int ;
    cht:SicText             "68310 - Real estate agencies" ;
    cht:AccountsNextDueDate "2023-12-31"^^xsd:date ;
    cht:NumMortOutstanding  "0"^^xsd:int .



Linking the Open Ownership register data with company URIs

The company URI above has a straightforward format, so it can be easily produced from a BODS JSON record using the data in the "identifiers" section for the "GB-COH" schema.

{
  "statementID": "openownership-register-11398213650416229415",
  "statementType": "entityStatement",
  "identifiers": [
    {
      "scheme": "GB-COH",
      "schemeName": "Companies House",
      "id": "11347643"
    }
  ]
  ...
}

It's important to note that http://business.data.gov.uk/id/company/11347643 redirects to http://data.companieshouse.gov.uk/doc/company/11347643, so it's safe to assume these are themselves "sameAs" URIs. In order to future-proof the design for this potential variation, we can reference both URIs from the BODS statement.

bodsr:openownership-register-11398213650416229415
    owl:sameAs <http://data.companieshouse.gov.uk/doc/company/11347643>;
    owl:sameAs <http://business.data.gov.uk/id/company/11347643> .


Combining the datasets

For this proof of concept, we'd then need three datasets:
  1. The data from the Free Company Data Product as RDF, processed as described above.
  2. The BODS RDF version of Open Ownership register.
  3. The owl:sameAs references for linking BODS statements to UK companies URIs.

The first is available here (110MB), and the last two are available for download on the BODS RDF module page. You'll also find some code to produce the first dataset in the kbods-experimental module under the KBODS project.


Queries

After importing these in a graph store, we can test some of the queries below and any others you can imagine within this domain.

Mortgages

Accounts

SIC Codes

These examples only scratch the surface of what's possible when linking beneficial ownership records with other sources. But I hope that it stimulates our imagination to create more suitable solutions for some complex compliance and intelligence use cases.