BODS RDF enables a series of compliance use cases like UBOs, ultimate parents, and others, to be expressed more naturally using a graph language. However, another motivation behind the proposal was to create a language for linking this data to other sources, and to be able to answer questions across multiple domains and datasets.
Linking beneficial ownership records with other sources is a use-case-specific endeavour, and very much dependent on the quality of the third-party datasets. In this exercise we're focusing less on the use case, and more on the data-linking capability itself, but the principles apply to many other problems.
For our purposes, we chose the Free Company Data Product from Companies House, which provides basic company information that should be easy to link with the Open Ownership register records.
Converting the Free Company Data Product to RDF
This CSV contains over 50 fields, but for our needs here, we've selected a small subset of data points which are well-structured and can be used to build some interesting SPARQL queries.
- URI - UK company identifier which dereferences in various formats, including RDF. This will be the subject in the triples we'll be processing.
- Accounts.NextDueDate - The date when the next accounts filing is due for a company.
- Mortgages.NumMortOutstanding - Number of outstanding mortgages for a company.
- SICCode.SicText_[1..4] - The SIC activity classification for a company, in the form "68310 - Real estate agencies". We'll be extracting both the text and the numeric code for this field.
With these fields we're going to produce a small set of RDF triples for each company. We've been quite liberal with the naming, but we tried, where possible, to use the Companies House Linked Data Services vocabulary.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix ch: <http://data.companieshouse.gov.uk/doc/company/> . @prefix cht: <http://www.companieshouse.gov.uk/terms/> . ch:11347643 cht:SICCode "68310"^^xsd:int ; cht:SicText "68310 - Real estate agencies" ; cht:AccountsNextDueDate "2023-12-31"^^xsd:date ; cht:NumMortOutstanding "0"^^xsd:int .
Linking the Open Ownership register data with company URIs
The company URI above has a straightforward format, so it can be easily produced from a BODS JSON record using the data in the "identifiers" section for the "GB-COH" schema.
{ "statementID": "openownership-register-11398213650416229415", "statementType": "entityStatement", "identifiers": [ { "scheme": "GB-COH", "schemeName": "Companies House", "id": "11347643" } ] ... }
It's important to note that http://business.data.gov.uk/id/company/11347643 redirects to http://data.companieshouse.gov.uk/doc/company/11347643, so it's safe to assume these are themselves "sameAs" URIs. In order to future-proof the design for this potential variation, we can reference both URIs from the BODS statement.
bodsr:openownership-register-11398213650416229415 owl:sameAs <http://data.companieshouse.gov.uk/doc/company/11347643>; owl:sameAs <http://business.data.gov.uk/id/company/11347643> .
Combining the datasets
For this proof of concept, we'd then need three datasets:
- The data from the Free Company Data Product as RDF, processed as described above.
- The BODS RDF version of Open Ownership register.
- The owl:sameAs references for linking BODS statements to UK companies URIs.
The first is available here (110MB), and the last two are available for download on the BODS RDF module page. You'll also find some code to produce the first dataset in the kbods-experimental module under the KBODS project.
Queries
After importing these in a graph store, we can test some of the queries below and any others you can imagine within this domain.
Mortgages
- Outstanding mortgages for a UBO's companies. Lists the companies that an individual controls directly or indirectly, along with their outstanding mortgage count. It also returns the share percentages where applicable.
- Sibling companies with outstanding mortgages. Lists the companies with outstanding mortgages and which are controlled by the same parents who control the target.
Accounts
- Accounts due date for all the companies in a group. Lists the companies controlled by an entity (directly or indirectly) and the date when the next accounts filing is due for each of them.
- Companies in a group which have overdue accounts.
SIC Codes
- SIC information for each company in a group.
- Companies by SIC code for the whole corporate group. A query that's symmetrical to the previous one, with companies grouped by SIC codes.
- Sibling companies with the same SIC code. Lists companies controlled by the same parents that control the target, and have at least on common SIC code.
These examples only scratch the surface of what's possible when linking beneficial ownership records with other sources. But I hope that it stimulates our imagination to create more suitable solutions for some complex compliance and intelligence use cases.