I was looking at some data from Google Analytics on referrer traffic for BMJ. This is a sample of traffic from about one or two thousand referrer urls.
There are a lot of near overlaps in this data, e.g. different country versions of Google Scholar.
I had Claude build me a streamlit tool that allowed me to create a small local ML pipeline for creating a URL classifier so that I can group these URLS by category. Took less than 30 minutes from idea to classification.
And below is a relative report that shows what we are currently seeing.
FYI the "Relative Engagement" score is more a score of how many different URLs make up that class of category of URL. It's the inverse of the "URL" distribution score.
What we are seeing is an emergence of referrer traffic from sources like GPT, perplexity, and others.
About Ian Mulvany
Hi, I'm Ian - I work on academic publishing systems. You can find out more about me at mulvany.net. I'm always interested in engaging with folk on these topics, if you have made your way here don't hesitate to reach out if there is anything you want to share, discuss, or ask for help with!