🌍 Global Mirror — Visit original CN site →
Skip to main content
Document loaders provide a standard interface for reading data from different sources (such as Slack, Notion, or Google Drive) into LangChain’s Document format. This ensures that data can be handled consistently regardless of the source. All document loaders implement the BaseLoader interface.

Interface

Each document loader may define its own parameters, but they share a common API:
  • load(): Loads all documents at once.
  • loadAndSplit(): Loads all documents at once and splits them into smaller documents.
import { CSVLoader } from "@langchain/community/document_loaders/fs/csv";

const loader = new CSVLoader(
  ...  // <-- Integration specific parameters here
);
const data = await loader.load();

By category

LangChain.js categorizes document loaders in two different ways:
  • File loaders, which load data into LangChain formats from your local filesystem.
  • Web loaders, which load data from remote sources.

File loaders

If you’d like to contribute an integration, see Contributing integrations.

PDFs

Document LoaderDescriptionPackage/API
PDFLoaderLoad and parse PDF files using pdf-parsePackage

Common file types

Document LoaderDescriptionPackage/API
CSVLoad data from CSV files with configurable column extractionPackage
JSONLoad JSON files using JSON pointer to target specific keysPackage
JSONLinesLoad data from JSONLines/JSONL filesPackage
TextLoad plain text filesPackage
DOCXLoad Microsoft Word documents (.docx and .doc formats)Package
EPUBLoad EPUB files with optional chapter splittingPackage
PPTXLoad PowerPoint presentationsPackage
SubtitlesLoad subtitle files (.srt format)Package

Specialized file loaders

Document LoaderDescriptionPackage/API
DirectoryLoaderLoad all files from a directory with custom loader mappingsPackage
UnstructuredLoaderLoad multiple file types using Unstructured APIAPI
MultiFileLoaderLoad data from multiple individual file pathsPackage
ChatGPTLoad ChatGPT conversation exportsPackage
Notion MarkdownLoad Notion pages exported as MarkdownPackage
OpenAI Whisper AudioTranscribe audio files using OpenAI Whisper APIAPI

Web loaders

Webpages

Document LoaderDescriptionWeb SupportPackage/API
CheerioLoad webpages using Cheerio (lightweight, no JavaScript execution)âś…Package
PlaywrightLoad dynamic webpages using Playwright (supports JavaScript rendering)❌Package
PuppeteerLoad dynamic webpages using Puppeteer (headless Chrome)❌Package
FireCrawlCrawl and convert websites into LLM-ready markdownâś…API
SpiderFast crawler that converts websites into HTML, markdown, or textâś…API
RecursiveUrlLoaderRecursively load webpages following links❌Package
SitemapLoad all pages from a sitemap.xmlâś…Package
BrowserbaseLoad webpages using managed headless browsers with stealth modeâś…API
WebPDFLoaderLoad PDF files in web environmentsâś…Package

Cloud providers

Document LoaderDescriptionWeb SupportPackage/API
S3Load files from AWS S3 buckets❌Package
Azure Blob Storage ContainerLoad all files from Azure Blob Storage container❌Package
Azure Blob Storage FileLoad individual files from Azure Blob Storage❌Package
Google Cloud StorageLoad files from Google Cloud Storage buckets❌Package
Google Cloud SQL for PostgreSQLLoad documents from Cloud SQL PostgreSQL databasesâś…Package

Productivity tools

Document LoaderDescriptionWeb SupportPackage/API
Notion APILoad Notion pages and databases via APIâś…API
FigmaLoad Figma file dataâś…API
ConfluenceLoad pages from Confluence spaces❌API
GitHubLoad files from GitHub repositoriesâś…API
GitBookLoad GitBook documentation pagesâś…Package
JiraLoad issues from Jira projects❌API
AirtableLoad records from Airtable basesâś…API
TaskadeLoad Taskade project dataâś…API

Search & data APIs

Document LoaderDescriptionWeb SupportPackage/API
SearchAPILoad web search results from SearchAPI (Google, YouTube, etc.)âś…API
SerpApiLoad web search results from SerpApiâś…API
Apify DatasetLoad scraped data from Apify platformâś…API

Audio & video

Document LoaderDescriptionWeb SupportPackage/API
YouTubeLoad YouTube video transcriptsâś…Package
AssemblyAITranscribe audio and video files using AssemblyAI APIâś…API
SonioxTranscribe multilingual audio files with optional translation using Soniox APIâś…API
SonixTranscribe audio files using Sonix API❌API

Other

Document LoaderDescriptionWeb SupportPackage/API
CouchbaseLoad documents from Couchbase database using SQL++ queriesâś…Package
LangSmithLoad datasets and traces from LangSmithâś…API
Hacker NewsLoad Hacker News threads and commentsâś…Package
IMSDBLoad movie scripts from Internet Movie Script Databaseâś…Package
College ConfidentialLoad college information from College Confidentialâś…Package
Blockchain DataLoad blockchain data (NFTs, transactions) via Sort.xyz APIâś…API

All document loaders


Connect these docs to Claude, VSCode, and more via MCP for real-time answers.