Paperpile runs on data at scale, with a literature database of 250M+ academic
papers and a growing body of user data accumulated over more than a decade.
You'll work across the systems that ingest, process, store, and serve this
data reliably: building pipelines, optimizing search, handling PDFs at scale,
and exposing clean APIs.
Requirements
- Strong backend engineering background with experience building and operating data-heavy systems in production.
- Experience deploying and operating services on AWS.
- Experience designing and maintaining data ingestion pipelines handling messy, heterogeneous sources. Comfortable with web scraping and working with third-party data sources and APIs.
- Familiarity with Node.js and TypeScript. It’s fine if you come from a different background, such as Java or Python, but you should be comfortable working in this environment.
- High standards for data quality. You think carefully about correctness, deduplication, and consistency.
- Solid understanding of full-text search systems including indexing strategy, relevance tuning, and query optimization.
- Proficient in building reliable REST APIs.
More useful experience
- Familiarity with academic publishing formats and data sources (PubMed, Crossref, arXiv…)
- Experience with PDF processing pipelines (extraction, transformation, storage and delivery at scale).
- Experience with LLM-based document processing or ML pipelines for extracting structured data from unstructured text.
- Large scale web crawling and scraping.
Compensation
Please mention the word NOURISH when applying to show you read the job
post completely (#RMzYuNzMuMzMuODU=). This is a feature to avoid fake spam
applicants. Companies can search these words to find applicants that read this
and instantly see they're human.
Salary and compensation
$70,000 — $110,000/year
Benefits
🌎 Distributed team
🏖 Paid time off
🏬 Coworking budget
📚 Learning budget