has a nice quote from this 2005 paper on the histories of computing:
The quote is:
But in the end, computation is about rewriting strings of symbols.
The transformations themselves are strictly syntactical, or structural. They may have a semantics in the sense that certain symbols or sequences of symbols are transformed in certain ways, but even that semantics is syntactically defined. Any meaning the symbols may have is acquired and expressed at the interface between a computation and the world in which it is embedded. The symbols and their combinations express representations of the world, which have meaning to us, not to the computer. It is a matter of representations in and representations out. What characterises the representations is that they are opera- tive. We can manipulate them, and they in turn can trigger actions in the world. What we can make computers do depends on how we can represent in the symbols of computation portions of the world of interest to us and how we can translate the resulting transformed representation into desired actions. We represent in a variety of forms a Boeing 777 – its shape, structure, flight dynamics, controls. Our representations not only direct the design of the aircraft and the machining and assembly of its components, but they then interac- tively direct the control surfaces of the aircraft in flight. That is what I mean by ‘operative representation’.
Now what has this to do with the scientific literature? When we think about what machine learning does, it uses statistical models to find patterns, and find correlations, across high dimensional data sets. When we see ML systems operating as if they had some intrinsic understanding of the domain, we are being fooled, and we are seeing that a domain has predicable characteristics that were hidden from us.
A mental model in which we imagined that we could take the research literature as our symbolic inputs to some computation may well not be sufficient to make progress, as the literature is deeply flawed in terms of its completeness. In some sense we get around this by having people be the computing units, who operate on the literature, but we are now in a an environment where the scale of the research corpus is beginning to be tractable to tooling. Given all of that, then, what is the correct structure to create thin layers between the literature, the compute later, and operating layer, so that the gaps between our symbols and meaning do not lead to gaps in our ability to operate better in the world? Adding to that, what are the incentivisation structures that we need to put in place to keep those layers working? I don's know.