Extract and redact PII from PDF documents
LogScrub extracts text content from PDF files and applies the same powerful PII detection used for log files. View the document with a live preview, identify sensitive data, and export the sanitized text.
LogScrub extracts and anonymizes text content from PDFs. The output is plain text (not a PDF) to ensure complete removal of sensitive data. For full PDF redaction with preserved formatting, specialized PDF editing tools may be needed.
ACME Corporation 123 Business Park Drive San Francisco, CA 94105 Dear Mr. John Smith, Your account #4532-8876-2234 has been activated. Please contact our support team at support@acme-corp.com or call +1 (415) 555-0123 for assistance. Your customer ID: CUST-2024-00892
ACME Corporation [ADDRESS-1] [CITY-1], [STATE-1] [ZIP-1] Dear Mr. [NAME-1], Your account #[ACCOUNT-1] has been activated. Please contact our support team at [EMAIL-1] or call [PHONE-1] for assistance. Your customer ID: [ID-1]
INVOICE #INV-2024-1234 Bill To: Jane Doe 456 Oak Street, Apt 7B New York, NY 10001 Payment Method: Visa ending 4242 Transaction ID: txn_1ABC2DEF3GHI
INVOICE #[INVOICE-1] Bill To: [NAME-1] [ADDRESS-1] [CITY-1], [STATE-1] [ZIP-1] Payment Method: Visa ending [CC-LAST4-1] Transaction ID: [TXN-1]
View the original PDF alongside extracted text. Navigate pages while reviewing what text has been extracted.
Process entire documents with page-by-page text extraction. Each page's content is clearly separated.
Same values get the same replacement throughout the document. If "John Smith" appears on page 1 and page 5, both become [NAME-1].
LogScrub analyzes the extracted text and suggests additional patterns it detects that you might want to add.
Image-only PDFs (scanned without OCR) cannot have text extracted. If your PDF contains only images, you'll need to run OCR first using a tool like Adobe Acrobat or Tesseract.
Drop your PDF file into LogScrub to extract and sanitize text.
Launch LogScrub