Module 2 · Expert Track22 min read · AI for Research and Academia

Literature Discovery and Synthesis

The literature review is the most time-intensive part of most research projects, and it is the part that AI tools have disrupted most significantly. The disruption is real and valuable — but it introduces new failure modes that can be more insidious than the old ones. AI-assisted literature work is faster and broader in scope, but if the workflow is poorly designed, it is also capable of producing syntheses that sound authoritative while systematically misrepresenting the literature they claim to summarize.

The pre-AI literature problem

Traditional literature review has well-understood failure modes. Selection bias in what researchers choose to read, anchoring on a small number of highly-cited papers and missing heterodox views, language bias toward English-language publications, publication bias in what journals publish, and simple time constraints that make comprehensive coverage impossible in large literatures. A conscientious researcher doing a systematic review in a large field would spend months on the literature alone.

These problems were serious but at least visible. Researchers knew their searches were incomplete. They could describe their search strategy and acknowledge its limitations. Peer reviewers in a field could identify obvious gaps. The incompleteness was acknowledged and structured.

The risk with AI-assisted literature synthesis is a new kind of problem: syntheses that appear comprehensive and authoritative but are generated by models whose relationship to the actual literature is opaque. When a researcher asks an LLM to "summarize the literature on X," the resulting summary may sound complete, cite familiar names, and describe plausible findings — while being substantially confabulated. The danger is not just that specific citations may be fabricated (though they often are), but that the characterization of what the literature says, where debates lie, and what has been established may be systematically wrong in ways that are difficult to detect without deep domain expertise.

AI-powered literature discovery tools: an honest evaluation

The category of AI tools that actually searches academic databases is fundamentally different from general-purpose LLMs, and understanding this distinction is the foundation of responsible AI-assisted literature work. These tools retrieve real papers; they do not generate descriptions of imaginary ones. They are trustworthy for different reasons than general LLMs, and they have their own important limitations.

Elicit: the systematic review assistant

Elicit is the most purpose-built tool for the literature synthesis task. When you ask Elicit a research question, it searches Semantic Scholar's database of over 200 million papers and returns papers it judges relevant. More importantly, it then extracts structured information from each paper's text — the research question addressed, methodology, population studied, key findings, limitations — presenting this in a structured table that makes cross-paper comparison tractable.

For systematic and scoping reviews, Elicit significantly reduces the time required to screen and characterize large volumes of papers. The structured extraction allows researchers to identify patterns across papers (similar methodologies reaching different conclusions, for instance) that would take much longer to surface through manual reading. Elicit also supports literature synthesis: you can ask it to write a synthesis paragraph on a subset of papers, which you then verify and revise.

Elicit's limitations matter. Its coverage is strong for peer-reviewed papers in English but weaker for conference proceedings, non-English literature, dissertations, preprints, and very recent publications. For interdisciplinary topics, it may miss papers from peripheral disciplines that use different terminology. It is a tool for discovery and first-pass characterization — not a substitute for reading the papers you ultimately cite.

Semantic Scholar: citation network analysis

Semantic Scholar, developed by the Allen Institute for AI, is a free academic search engine with AI-powered features for exploring citation networks. Its most distinctive capability is TLDR — a one-sentence AI-generated summary of each paper — and its citation context analysis, which shows not just that Paper B cites Paper A, but in what context and with what framing. This allows researchers to understand quickly why papers are cited, which is often more informative than knowing they are cited.

Semantic Scholar's Recommended Papers feature surfaces papers that are conceptually related to ones you have identified as relevant, using embedding-based similarity rather than pure citation co-occurrence. This helps discover papers that are relevant but use different terminology or come from adjacent disciplines — one of the persistent blind spots of keyword-based search.

ResearchRabbit: the citation network visualizer

ResearchRabbit is purpose-built for citation network exploration. You seed it with a set of papers you already know are relevant, and it maps their citation relationships — what they cite, what cites them, and what papers are commonly cited alongside them. This visual exploration is particularly valuable for identifying foundational papers you may have missed and for understanding the intellectual history of a field.

The "similar work" and "earlier work" navigation is intuitive and surfaces discoveries that keyword-based search misses. ResearchRabbit integrates with Zotero for reference management, making it relatively easy to incorporate into existing workflows.

Connected Papers: the dependency map

Connected Papers generates a visual graph for a single paper, showing all papers it cites and all papers that cite it, laid out spatially so that related clusters are visually proximate. It is most useful for understanding the broader scholarly conversation around a specific paper and for identifying the major threads of work that converge on a topic. Less useful for large-scale discovery than ResearchRabbit, but excellent for understanding the context of individual papers you have already identified.

A Practical Discovery Workflow

A productive AI-assisted discovery workflow often combines tools: use Elicit or Semantic Scholar for initial keyword discovery; use ResearchRabbit to explore the citation network from your seed papers; use Connected Papers to understand the specific context of your most important sources. At each stage, you are expanding your net while maintaining personal contact with what you are including — reading abstracts, checking methodology, deciding inclusion yourself.

The common mistake is treating any of these tools as a complete solution and skipping the step of personally evaluating each paper you ultimately rely on. The tools accelerate discovery; they do not replace judgment about what is actually relevant and reliable.

Systematic review augmentation

Systematic reviews follow a defined protocol — pre-specified search strategy, eligibility criteria, dual screening, quality assessment, data extraction, and synthesis — designed to minimize bias and maximize reproducibility. AI intersects with this process at multiple points, but introducing AI also introduces questions about methodology that must be addressed explicitly in the review itself.

For the screening stage — reading titles and abstracts to decide which papers meet inclusion criteria — AI tools have shown genuine utility. Machine learning models trained on human screening decisions can prioritize the order in which papers are presented for screening, allowing human screeners to reach a recall threshold with fewer papers reviewed. Tools like Rayyan, Covidence, and Abstrackr now include AI-assisted screening features that accelerate the screening process while maintaining the human judgment required by systematic review methodology.

For data extraction — pulling specific quantitative and qualitative information from included papers — AI shows more mixed performance. Structured extraction with careful verification against original text is a reasonable workflow for some fields and some types of data. Unstructured extraction from full-text papers is currently less reliable and requires more intensive verification. The PRISMA guidelines for systematic reviews do not yet have consensus standards for AI use disclosure, but methodological transparency about AI tools used is good practice regardless.

Methodological Transparency

If you use AI tools in any stage of a systematic or scoping review, this must be disclosed in the methods section with enough specificity that another researcher could understand what role AI played and could assess whether it might have introduced systematic bias. The question is not whether to disclose, but how to describe your AI-assisted workflow precisely and honestly. Vague statements like "AI was used to assist with the review" are insufficient.

Identifying gaps in literature

One of the most valuable things AI can do in literature work is help identify what has not been studied. Gap identification is cognitively difficult — it requires holding a mental map of what exists and noticing absences, which is hard to do systematically across a large literature. AI tools can help in two ways.

First, structured synthesis tools like Elicit can surface the distribution of methodological approaches, populations studied, and outcome measures across a literature, making it relatively easy to see which combinations are underrepresented. If most studies of an intervention use adult samples in Western countries, the gap in pediatric populations and non-Western settings becomes visible through the table.

Second, you can engage general-purpose LLMs in a dialogue about gaps — but with important caveats. Asking Claude or ChatGPT "what are the gaps in the literature on X?" will produce a plausible-sounding answer, but that answer is generated from the model's training data and may not accurately reflect the current state of the field. The useful workflow is to first develop your own map of the literature through actual searching, then use AI to help you articulate the gaps you have identified and to probe whether you have considered the full range of gap types (population gaps, methodological gaps, theoretical gaps, outcome gaps, etc.).

Synthesizing large bodies of work

Once you have identified a set of relevant papers, synthesizing them — moving from a list of findings to a coherent narrative about what the literature says — is one of the most intellectually demanding tasks in research. AI can assist, but the nature of appropriate assistance is specific.

The most effective AI-assisted synthesis workflows use AI to help organize and structure synthesis that the researcher has already developed from reading primary sources. You read the papers, take notes, identify themes, and develop an understanding of the literature. Then you use AI to help you articulate that synthesis more clearly — to find the right language for a concept, to structure a section logically, to identify whether your narrative has gaps. The AI is working from your understanding, not generating understanding independently.

A less effective workflow — but one that is tempting given its speed — is to paste a collection of abstracts or excerpts into an AI tool and ask it to synthesize them. The resulting synthesis may be fluent and organized, but it is synthesizing summaries rather than the papers themselves, and it may introduce interpretations or framings that are not faithful to the original sources. If you use this approach, treat the output as a rough draft that requires careful verification against every claim.

Detecting contradictions across papers

One specific synthesis task where AI assistance is genuinely valuable is contradiction detection. When you are working with a large literature, identifying papers that reach contradictory conclusions on the same question is important but cognitively demanding at scale. You can use AI tools — particularly those with large context windows like Claude — to read a set of papers and identify where they disagree, either in findings or in interpretation of shared findings.

This requires careful prompting. Rather than asking for a synthesis, ask specifically: "identify any places where these papers reach different conclusions about X, or where one paper's findings challenge another's interpretation." This focused task is one where AI performs more reliably, because the task is comparative and bounded rather than generative and open-ended.

Read before you synthesize
Every paper you cite should be a paper you have personally read — at minimum the abstract, introduction, and conclusions. AI synthesis tools are discovery accelerators, not reading substitutes. The intellectual responsibility for the literature review you produce is yours, and it requires personal engagement with the primary sources.
Search multiple databases, not just one
No single database covers all relevant literature. A comprehensive search strategy typically includes PubMed or MEDLINE (for biomedical), Web of Science or Scopus (for broad coverage), Semantic Scholar (for AI-assisted discovery), and field-specific databases (PsycINFO, ERIC, SSRN, arXiv, etc.). AI tools are not substitutes for multi-database searching.
Document your search strategy
A reproducible literature review requires documentation of search terms, databases searched, date of search, and inclusion/exclusion criteria. If you used AI tools, document which ones and how. This is required for systematic reviews and good practice for any literature-based research.
Update near submission
A literature search conducted 12 months before submission may miss significant recent publications. Run a final update search — targeting the same databases with the same strategy — shortly before submission to ensure the review reflects the current state of the field.

Citation networks and intellectual genealogy

Understanding how a field has developed — which papers built on which, where paradigm shifts occurred, and which contributions are genuinely foundational versus highly cited for incidental reasons — is a kind of scholarly knowledge that takes years to develop through experience. Citation network tools can accelerate this understanding significantly.

The key insight from citation network analysis is that citation frequency is a poor proxy for intellectual importance. Some papers are heavily cited because they are genuinely foundational; others because they provide a convenient benchmark or were published in a high-visibility venue at the right moment. Papers with profound impact in a field are sometimes under-cited by modern standards because they predated the era of dense citation practice, or because their ideas were so thoroughly absorbed that subsequent work no longer cites them explicitly.

ResearchRabbit and Connected Papers help you trace intellectual genealogies in both directions: backward to foundational work that established the framework your literature operates within, and forward to see how influential papers have been applied, extended, and sometimes contradicted by subsequent work. This genealogical knowledge is one of the most important things a thorough literature review provides — and it is something AI tools can substantially accelerate without replacing the judgment required to interpret what you find.

What Good AI-Assisted Literature Work Looks Like

A researcher working at the frontier uses AI tools to extend the reach of their literature work, not to replace the critical engagement that makes a literature review valuable. They use Elicit to surface papers they might have missed and to extract structured data across a corpus; they use ResearchRabbit to explore citation networks and discover seminal work; they use a general-purpose LLM to help articulate syntheses they have already developed from reading; and they personally read every paper they cite. The resulting literature review is both broader and deeper than what the same researcher could have produced without AI — and every claim in it is grounded in their own engagement with primary sources.