Automating Literature Research

FutureHouse







Winter 2024

FutureHouse Structure

  • Non-profit
  • Funded primarily by Eric Schmidt
  • Based in San Francisco
  • 20 employees

Science is changing independent of AI


Arxiv.org,10.6084/m9.figshare.17064419.v3

Intellectual bottlenecks are growing


📝 Increasing paper count ($\approx$5M per year)

🧬 Larger data sets from cheaper experiments (genome at $200 per person, $1 / GB of sequencing)

🔍Increasingly less disruptive papers (96% decline in biology)


Park, M., Leahey, E. & Funk, R.J. Papers and patents are becoming less disruptive over time. Nature 613, 138–144 (2023). https://doi.org/10.1038/s41586-022-05543-x

Mission


Automate Scientific Discovery

PaperQA: an agent for literature research


Language agents achieve superhuman synthesis of scientific knowledge

Michael D. Skarlinski, Sam Cox, Jon M. Laurent, James D. Braza, Michaela Hinks, Michael J. Hammerling, Manvitha Ponnapati, Samuel G. Rodriques, Andrew D. White arXiv:2409.13740, 2024

Better at answering questions than PhD biology experts

Better than competing models

Better than human written Wikipedia articles

Can detect if a claim is contradicted anywhere in literature

PaperQA is not ChatGPT

PaperQA is AI built on top of literature, rather than replacing literature.


We do not train LLM models on article text!

Applications

WikiCrow

  1. Wikipedia articles for all 19,255 protein-coding genes
  2. Succeeded on 17,269
  3. Wikipedia had 3,639, so gain of 13,630
  4. 48 Hours

Partnerships with FH

Interested in:

  1. Access to corpus to provide links to your content
  2. PaperQA for your corpus or topics
  3. Exploring research questions about interacting with literature