Interesting moves by Wiley to integrate with an AI future

Two really interesting announcments from Wiley this week:

Integration with Perplexity - https://www.perplexity.ai/hub/blog/perplexity-partners-with-wiley-to-power-educational-ai-search

"Students can access assigned Wiley curriculum materials through their institution's Enterprise Pro subscription, eliminating the need to switch between platforms. When studying, they can ask questions about textbook content and receive explanations and get comparative real-time examples from the live web. "

This is technically impressive because it is integrating entitlements information with presumably some level of RAG hosted indexing across the entire Wiley Corpus. This is a few steps away from being able to attribute LLM responses to specific corpora. Entitlemets are not technically hard, but there is often a lot of accumulated techncnical debt in these systems. I wonder whether the Atypon knowledge graph played any role in making this integration easier? If any Wiley folk want to reach out, I'd love to find out more.

This also means that within Wiley any internal concerns about these tools fabricating inaccurate information in use have been overcome.

From a large LLM provider, it's somewhat notable that this is coming from Perplexity. I never use Perplexity, and it's a hard market to operate in with such heavily capitalised organisations. That makes it really hard to know both what the actual real cost of customer acquisition is as companies can pay way over the odds to drive growth, and it inflates the narrative around value creation. So I wonder whether Perplexity are trying to grow market share by targeting knowledge markets with strong incumbents, in order to piggy back on existing content moats?

The second announcement: https://newsroom.wiley.com/press-releases/press-release-details/2025/Wiley-Announces-Collaboration-With-Amazon-Web-Services-AWS-to-Integrate-Scientific-Content-Into-Life-Sciences-AI-Agents/default.aspx

AWS are releasing a toolkit and the code is open source, here is the prompt for the agent that searches PubMed:

You are a medical research assistant AI specialized in summarizing internal and external evidence related to cancer biomarkers.
Your primary task is to interpret user queries, gather internal and external evidence, and provide relevant medical insights based on the results.
Use only the appropriate tools as required by the specific question. Follow these instructions carefully:
1. When querying PubMed: a. Summarize the findings of each relevant study with citations to the specific pubmed web link of the study b. The json output will include 'Link', 'Title', 'Summary'. c. Always return the Title and Link (for example, 'https://pubmed.ncbi.nlm.nih.gov/') of each study in your response.
2. For internal evidence, make use of the knowledge base to retrieve relevant information. Always provide citations to specific content chunks.
3. When providing your response: a. Start with a brief summary of your understanding of the user's query. b. Explain the steps you're taking to address the query. Ask for clarifications from the user if required. c. Separate the responses generated from internal evidence (knowledge base) and external evidence (PubMed api). d. Conclude with a concise summary of the findings and their potential implications for medical research.

Here is the prompt for the Wiley agent:

"
2.1. Instructions

You are a highly knowledgeable and friendly AI assistant designed to assist users with accurate and detailed information. You have access to a function that based on your search query, retrieves data from scientific articles in the Wiley knowledgebase. When responding to user queries, follow these guidelines:

Clarity and Accuracy: Provide clear, concise, and accurate answers to the user's questions. Avoid ambiguity or overly technical jargon unless explicitly requested.

Citations and References: Always include citations from the original scientific articles you reference. Provide the title of the article, the authors (if available), and a direct link (doi.org) to the source.

Contextual Relevance: Tailor your responses to the context of the user's query. If the question is broad, provide a summary and offer to dive deeper into specific aspects if needed.

Politeness and Professionalism: Maintain a polite and professional tone in all interactions. Be patient and understanding, even if the user’s query is unclear or repetitive.

Error Handling: If you cannot find relevant information or the query is outside your scope, politely inform the user and suggest alternative ways to find the information.

Examples and Explanations: Where applicable, provide examples or step-by-step explanations to help the user understand complex concepts.

Limitations: Clearly state any limitations in the data or knowledge you provide. For example, if the information is based on a specific dataset or publication date, mention it.

Expected Result: Your responses should be informative, well-structured, and helpful, ensuring the user feels supported and informed. Always strive to enhance the user's understanding and provide actionable insights when possible.

Important Instruction: Use the wiley online library (wol) to get the articles. It will return high quality article excerpts based on the query. Make sure to add the hyperlink to the https://doi.org (from the wol_link) to reference all used articles when you compose your answers. It is imperative to include the doi.org hyperlinks in your final response.

Citation format: Present findings with source URLs in parentheses: [Factual response] (source: [URL])

"

Really interesting to see the instruction to prefix links with doi.org!

I downloaded the codebase and navigated to the Wiley agent code, and then asked Claude code the following question:

"Staying wihtin this directory and subdirectories, examine this code and write up a short
paragrpah that explains how this agent interfaces with content from Wiley."

It replied:

"This agent interfaces with Wiley Online Library through a serverless architecture using Amazon
Bedrock Agents. It processes user queries by sending them to a dedicated API Gateway endpoint
(https://51xu00806d.execute-api.us-east-1.amazonaws.com/api) that specifically handles Wiley
content searches. The agent constructs simple URL-encoded GET requests with the user's question
as a parameter. When Wiley's API responds, the agent receives JSON containing relevant article
excerpts, citations, and direct links to the original content on Wiley Online Library. The
agent then formats this information with proper academic citations including DOI links, making
it suitable for research applications. The implementation uses Claude 3 Sonnet (Nova Pro) as
its foundation model with content filtering guardrails."

A little more prompting got me this response:

"There are limited details about how Wiley structures content or serves responses on their API.
From examining the code, I can tell you:

1. The API interface is simple - it's a GET endpoint that accepts a single "question" parameter
2. The response contains two key elements:
- "text": Contains article excerpts and citations matching the query
- "wol_link": Contains links to the original Wiley Online Library content (DOI links)

The agent documentation specifies that the response contains "high quality article excerpts
based on the query" but doesn't reveal the underlying retrieval system. There's no information
about whether Wiley is using embeddings, vector search, keyword matching, or other technologies
to serve these responses.

The code simply forwards queries to a specific API Gateway endpoint
(https://51xu00806d.execute-api.us-east-1.amazonaws.com/api) and receives the structured JSON
response. The actual content processing and retrieval logic happens on Wiley's side behind this
API."

When Wiley announced Wiley partner solutions I was slightly pessimistic, and indeed I don't know how that business unit is doing overall. I am impressed with their technical capability, and I really like these two examples of what they are doing to reinvent pathways to their knowledge. I _think_ in both cases it is knowledge that is primarily returned. I wonder how we might imagine beyond that. Anyway, these were super interesting to read about.

The AWS toolkit that they have created has also got the following agents:

00-Notebook-environment
01-Biomarker-database-analyst
02-Clinical-evidence-researcher
03-Medical-imaging-expert
04-Statistician
05-Biological-pathways-analyst
06-Omics-signatures-enrichment
07-Pathology-Agent
08-Protein-Design-Agent
09-Radiology-Report-Agent
10-SEC-10-K-agent
11-Tavily-web-search-agent
12-JSL-analyze-medical-reports
13-JSL-medical-reasoning
14-USPTO-search
15-clinical-study-research-agent
16-Clinical-trial-protocol-generator-agent
17-variant-interpreter-agent
18-Wiley-online-library-agent

This is super interesting.