LogScrub Help

Why Use LogScrub?

Every day, developers and IT professionals need to share logs for debugging, support tickets, bug reports, and collaboration. But logs often contain sensitive information that shouldn't be shared.

Common Scenarios

Sharing Logs with AI Assistants

AI tools like ChatGPT, Claude, and Copilot are incredibly useful for debugging and analyzing logs. But pasting raw logs means sending customer emails, IP addresses, API keys, and other sensitive data to third-party services.

Solution: Scrub your logs first. The AI can still understand error patterns, stack traces, and timing issues without seeing real customer data.

Filing Bug Reports & Support Tickets

When reporting issues to software vendors or open-source projects, you often need to include logs. These logs may contain your company's internal hostnames, user data, or credentials that were accidentally logged.

Compliance & Data Protection

Regulations like GDPR, HIPAA, and CCPA restrict how personal data can be shared and processed. Sanitizing logs before sharing helps maintain compliance and reduces your data exposure footprint.

Why Client-Side Processing Matters

Many log sanitization tools require uploading your logs to a server. This defeats the purpose — you're sharing sensitive data with yet another third party.

LogScrub processes everything in your browser using WebAssembly. Your logs never leave your device. You can even use it offline or on an air-gapped machine.

Preserving Log Usefulness

The goal isn't just to remove data — it's to keep logs useful for their intended purpose:

  • Consistency Mode ensures the same email always becomes [EMAIL-1], so you can still trace a user's journey
  • Time Shift lets you anonymize timestamps while preserving the relative timing between events
  • Log Crop trims log files to a specific time window, so you only share the relevant portion
  • Selective Rules let you keep non-sensitive data like UUIDs or timestamps when needed for debugging

Quick Start

LogScrub helps you remove personally identifiable information (PII) and sensitive data from log files before sharing them.

Basic Workflow

  1. Paste or upload your log content into the "Original" pane
  2. Review detection rules by clicking the Rulesets button (toggle rules on/off, review matches)
  3. Click "Scrub" or press ⌘/Ctrl + Enter
  4. Copy or download the scrubbed output
Before (Original)
2024-01-15 10:23:45 INFO User login email: john.doe@example.com ip: 192.168.1.105 session: sid=abc123def456
After (Scrubbed)
2024-01-15 10:23:45 INFO User login email: [EMAIL-1] ip: [IPV4-1] session: [SESSION_ID-1]

How It Works

LogScrub uses pattern matching to identify sensitive data in your text. Each detection rule has a regular expression (regex) that matches specific data formats.

Processing Pipeline

  1. Pattern Matching — Each enabled rule's regex is applied to find matches
  2. Replacement — Matched text is replaced according to your chosen strategy
  3. Consistency — When enabled, identical values get identical replacements
Tip

Use the "Analyze" feature to preview what will be detected before sanitizing. This also suggests disabled rules that would match content in your text.

Important

Only enable rules for data types you expect to find in your content. Enabling all rules will likely cause false positives — for example, the "US Zip Code" pattern matches any 5-digit number. Be selective to get accurate results.

Replacement Strategies

Choose how detected PII should be replaced:

Label

Replaces with a descriptive label and counter.

john@example.com[EMAIL-1]

Fake

Replaces with realistic fake data. Preserves structural prefixes (e.g. ICCID 89, BTC bc1).

john@example.commaria.wilson@example.org

Fake (Country)

Fake data that preserves country-specific prefixes like phone country codes and TLDs.

+447508804412+447291635804

Redact

Replaces with blocks matching the original length.

john@example.com████████████████

Template

Custom replacement format using variables.

{TYPE}, {n}, {len}

Fake Data Generation

The Fake strategy uses a Rust-based data generation library to create realistic-looking replacements. This makes your scrubbed output look natural while still protecting sensitive information. Fake data is generated deterministically — the same input always produces the same fake output, ensuring consistency.

PII TypeFake Data GeneratedExample
EmailRealistic email addressesmaria.wilson@example.org
Person Names (ML)Full names from name databaseJames Rodriguez
Locations (ML)City namesPortland
Organizations (ML)Company namesAcme Industries
IPv4Valid IPv4 addresses142.58.201.33
IPv6Valid IPv6 addresses2001:db8:85a3::8a2e:370:7334
MAC AddressValid MAC addresses4A:3B:2C:1D:5E:6F
Phone NumbersFormatted phone numbers(555) 842-9173
HostnameRealistic hostnamesserver42.internal.net
URLValid URLs with pathshttps://demo.io/api/4821
UUIDValid UUID v4 formata1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d
Credit CardLuhn-valid card numbers4111-1111-1111-1234
SSNValid format (not real)284-17-5932
IBANPlausible IBAN formatDE89370400440532013000
UK NHS NumberCheck-digit valid format485 293 7164
UK NINOValid NI number formatAB123456C
UK PostcodeValid UK postcode formatSW1A 2AA
US Zip Code5-digit zip codes90210
GPS CoordinatesValid lat/long pairs51.5074, -0.1278
File PathsRealistic file paths/home/user/data.json
DatesValid dates (randomized)2019-07-15
Hashes (MD5/SHA)Random hex strings5d41402abc4b2a76...
API KeysFormat-matching tokenssk_test_Abc123xyz...
Crypto AddressesValid format addresses1A1zP1eP5QGefi2DMPTfTL...
Deterministic Generation

Fake data is generated using a seeded random number generator based on the original value. This means the same input (e.g., john@example.com) will always produce the same fake output across multiple runs. Combined with Consistency Mode, this ensures your scrubbed data maintains referential integrity.

Fake (Country) — Preserve Country Prefixes

The Fake (Country) strategy extends the Fake strategy by preserving country-specific prefixes. This is useful when you need to keep geographic context (e.g. which country a phone number belongs to) while still anonymizing the rest of the data.

PII TypeWhat's PreservedExample
International Phone (+)+ and country code digits+447508804412 → +447291635804
International Phone (no +)Country code digits447508804412 → 447291635804
US Phone+1 or 1 prefix+1-555-234-5678 → +1-832-671-9042
UK PhoneLeading 007508804412 → 08291635804
ICCID89 + country code (5 digits total)8944200011231044047 → 8944273829156308291
IBANFirst 2 letters (country code)GB82WEST12345698765432 → GB47XKRJ83920147562918
EmailDomain TLDuser@company.co.uk → jsmith@inbox.co.uk
HostnameTLDserver.example.co.uk → web42.co.uk
URLDomain TLDhttps://app.example.de/api → https://demo.de/users/4821
MAC AddressOUI (first 3 octets)AA:BB:CC:11:22:33 → AA:BB:CC:7F:3A:E2
Credit CardBIN (first 6 digits)4111111111111111 → 4111118294736150
Fake vs Fake (Country)

The base Fake strategy preserves invariant structural prefixes (e.g. 89 on all ICCIDs, bc1 on Bech32 Bitcoin addresses) since these are the same regardless of country. Fake (Country) goes further by also preserving country-specific prefixes like phone country codes, IBAN country letters, and domain TLDs. For all PII types not listed above, Fake (Country) behaves identically to Fake.

Consistency Mode

When enabled, the same input value always produces the same replacement. This preserves relationships in your data:

With Consistency Mode
User [EMAIL-1] logged in User [EMAIL-1] viewed page User [EMAIL-2] logged in
Without Consistency Mode
User [EMAIL-1] logged in User [EMAIL-2] viewed page User [EMAIL-3] logged in

Detection Rules Reference

LogScrub includes 95+ built-in detection rules organized by category.

Contact Information

RuleExample MatchDefault
Email
Show pattern \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b
user@example.com Enabled
Email Message-ID
Show pattern <[A-Za-z0-9!#$%&'*+/=?^_`.{|}~-]+@[A-Za-z0-9.-]+>
<abc123@mail.example.com> Disabled
Phone (US)
Show pattern \b(?:\+?1[-.\s]?)?\(?[0-9]{3}\)?[-.\s]?[0-9]{3}[-.\s]?[0-9]{4}\b
(555) 123-4567 Enabled
Phone (UK)
Show pattern \b(?:0[1-9][0-9]{8,9}|0[1-9][0-9]{2,4}[\s-][0-9]{3,4}[\s-]?[0-9]{3,4})\b
020 7946 0958 Enabled
Phone (Intl)
Show pattern \+[1-9][0-9]{1,3}[\s-]?[0-9]{6,14}\b
+44 7911 123456 Enabled
Phone (Intl, No +)
Show pattern \b[1-9][0-9]{9,14}\b
Validated against ~100 E.164 country codes with per-country digit length checks
447508804412 Disabled

Network

RuleExample MatchDefault
IPv4
Show pattern \b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
192.168.1.1 Enabled
IPv6
Show pattern (?i)\[(?:[0-9a-f]{1,4}:){7}[0-9a-f]{1,4}\]... (full IPv6 pattern with bracketed, bare, link-local, and IPv4-mapped variants)
2001:0db8:85a3::8a2e:0370:7334 Enabled
MAC Address
Show pattern (?i)\b(?:[0-9A-F]{2}[:-]){5}[0-9A-F]{2}\b
00:1A:2B:3C:4D:5E Enabled
Hostname
Show pattern \b(?![a-zA-Z0-9-]+\.(?:txt|log|json|xml|...))\b)(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,12}\b
Excludes common file extensions (.txt, .log, .json, .xml, .csv, etc.) to reduce false positives
api.example.com Enabled
URL
Show pattern https?://[^\s<>\[\]{}|\\^`\x00-\x1f\x7f]+
https://example.com/path Enabled
Private IP Preservation (RFC1918)

When enabling IPv4 or IPv6 scrubbing, you'll be asked whether to preserve private/internal IP addresses. Private IPs are not routable on the internet, so they're generally safe to share.

Preserved IPv4 ranges:

  • 10.0.0.0/8 - Class A private network (10.0.0.0 – 10.255.255.255)
  • 172.16.0.0/12 - Class B private network (172.16.0.0 – 172.31.255.255)
  • 192.168.0.0/16 - Class C private network (192.168.0.0 – 192.168.255.255)
  • 127.0.0.0/8 - Loopback (127.0.0.1, localhost)
  • 169.254.0.0/16 - Link-local (APIPA)

Preserved IPv6 ranges:

  • fe80::/10 - Link-local addresses
  • fc00::/7 - Unique local addresses (fd00::/8 and fc00::/8)
  • ::1 - Loopback
  • ::ffff:x.x.x.x - IPv4-mapped addresses (checks IPv4 portion)

You can toggle this setting via the "Preserve Private IPs" checkbox in the Settings panel.

Identity (US)

RuleExample MatchDefault
SSN
Show pattern \b[0-9]{3}-[0-9]{2}-[0-9]{4}\b
Validates area/group/serial ranges (no 000, 666, or 900-999 area codes)
123-45-6789 Enabled
US ITIN
Show pattern \b9[0-9]{2}[- ]?(5[0-9]|6[0-5]|7[0-9]|8[0-8]|9[0-24-9])[- ]?[0-9]{4}\b
912-54-1234 Disabled
Passport
Show pattern (?i)\b(?:passport[:\s#]*)?[A-Z]{1,2}[0-9]{6,9}\b
AB1234567 Disabled
Driver's License
Show pattern (?i)\b(?:d\.?l\.?|driver'?s?\s*(?:license|lic))[:\s#]*[A-Z0-9]{5,15}\b
DL: D1234567 Disabled

Identity (UK & International)

RuleExample MatchDefault
UK NHS Number
Show pattern \b([0-9]{3})[- ]?([0-9]{3})[- ]?([0-9]{4})\b
Mod-11 checksum validation
450 557 7104 Disabled
UK National Insurance
Show pattern (?i)\b[A-Z]{2}\s?[0-9]{2}\s?[0-9]{2}\s?[0-9]{2}\s?[A-D]\b
Validates prefix letters (excludes D, F, I, Q, U, V in first; D, F, I, O, Q, U, V in second)
AB 12 34 56 C Disabled
AU Tax File Number
Show pattern \b[0-9]{3}\s?[0-9]{3}\s?[0-9]{3}\b
Weighted checksum validation
123 456 789 Disabled
India PAN
Show pattern \b[A-Z]{3}[ABCFGHLJPT][A-Z][0-9]{4}[A-Z]\b
ABCPD1234E Disabled
Singapore NRIC
Show pattern (?i)\b[STFGM][0-9]{7}[A-Z]\b
Mod-11 with letter check table
S1234567A Disabled
Spanish NIF/DNI
Show pattern \b[0-9]{8}[A-Z]\b
Mod-23 checksum validation
12345678Z Disabled
Spanish NIE
Show pattern (?i)\b[XYZ][0-9]{7}[A-Z]\b
Mod-23 checksum (X/Y/Z prefix mapped to 0/1/2)
X1234567L Disabled
Canadian SIN
Show pattern \b[0-9]{3}[- ]?[0-9]{3}[- ]?[0-9]{3}\b
Luhn checksum validation
123-456-789 Disabled
VIN
Show pattern \b[A-HJ-NPR-Z0-9]{17}\b
Check-digit validation at position 9, transliteration + weighted sum
1HGBH41JXMN109186 Disabled
ICCID (SIM Card)
Show pattern \b89[0-9]{16,20}\b
Must start with 89 (telecom MII) + Luhn checksum validation
8944200011231044047 Disabled

Note: NHS Number, AU TFN, Singapore NRIC, Canadian SIN, ICCID, Spanish NIF/NIE, and VIN include checksum or check-digit validation to reduce false positives.

Financial

RuleExample MatchDefault
Credit Card
Show pattern \b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13}|6(?:011|5[0-9]{2})[0-9]{12})\b
Luhn checksum validation
4111111111111111 Enabled
IBAN
Show pattern \b[A-Z]{2}[0-9]{2}[A-Z0-9]{4}[0-9]{7}(?:[A-Z0-9]?){0,16}\b
Mod-97 checksum validation (ISO 7064)
GB82WEST12345698765432 Enabled
Bitcoin Address
Show pattern \b(?:bc1|[13])[a-zA-HJ-NP-Z0-9]{25,62}\b
Base58 format check
1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2 Enabled
Ethereum Address
Show pattern \b0x[a-fA-F0-9]{40}\b
Hex format check (40 hex characters after 0x prefix)
0x742d35Cc6634C0532925a3b844Bc... Enabled
Money/Currency
Show pattern (?:[$£€¥₹₩₽¢฿₪₴₦₡₱₲₵₸₺₼₾][0-9]{1,3}(?:[,.\s][0-9]{2,3})*(?:[.,][0-9]{1,2})?|[0-9]{1,3}(?:[,.\s][0-9]{2,3})*(?:[.,][0-9]{1,2})?\s*(?:USD|EUR|GBP|JPY|CNY|INR|...))
$10.99, £1,000.00, 100 EUR Disabled

Tokens & API Keys

RuleExample MatchDefault
JWT
Show pattern eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+
eyJhbGciOiJIUzI1NiIsInR5cCI6... Enabled
Bearer Token
Show pattern (?i)bearer\s+[a-z0-9_-]+\.[a-z0-9_-]+\.?[a-z0-9_-]*
Bearer abc123.def456 Enabled
AWS Access Key
Show pattern \b(?:AKIA|ABIA|ACCA|ASIA)[0-9A-Z]{16}\b
AKIAIOSFODNN7EXAMPLE Enabled
AWS Secret Key
Show pattern (?i)(?:aws.?secret|secret.?access)[^a-z0-9]*['"]?([a-z0-9/+=]{40})['"]?
aws_secret_access_key = wJalrXUt... Enabled
Stripe Key
Show pattern \b(?:sk|pk)_(?:test|live)_[0-9a-zA-Z]{24,}\b
sk_test_4eC39HqLyjWDarjtT1zdp7dc Enabled
GCP API Key
Show pattern \bAIza[0-9A-Za-z_-]{35}\b
AIzaSyDaGmWKa4JsXZ-HjGw7ISLn... Enabled
GitHub Token
Show pattern \b(?:ghp|gho|ghu|ghs|ghr)_[A-Za-z0-9_]{36,}\b
ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxx Enabled
OpenAI API Key
Show pattern \bsk-(?:proj-)?[a-zA-Z0-9]{32,64}\b
sk-proj-abc123def456ghi789... Enabled
Anthropic API Key
Show pattern \bsk-ant-[a-zA-Z0-9_-]{32,64}\b
sk-ant-api03-abc123def456... Enabled
X AI API Key
Show pattern \bxai-[a-zA-Z0-9]{32,64}\b
xai-abc123def456ghi789... Enabled
Cerebras API Key
Show pattern \bcsk-[a-zA-Z0-9]{40,50}\b
csk-abc123def456ghi789... Enabled
Slack Token
Show pattern \bxox[baprs]-[0-9]{10,13}-[0-9]{10,13}[a-zA-Z0-9-]*\b
xoxb-123456789012-123456789012-abc... Enabled
NPM Token
Show pattern \bnpm_[A-Za-z0-9]{36}\b
npm_abc123def456ghi789jkl012... Enabled
SendGrid Key
Show pattern \bSG\.[A-Za-z0-9_-]{22}\.[A-Za-z0-9_-]{43}\b
SG.abc123def456.ghi789jkl012mno345... Enabled
Twilio Key
Show pattern \b(?:AC|SK)[a-f0-9]{32}\b
SK1234567890abcdef1234567890abcdef Enabled
Database URL
Show pattern (?i)(?:mongodb|postgres|postgresql|mysql|redis|amqp|mssql)://[^\s]+
postgres://user:pass@host:5432/db Enabled

Secrets

RuleExample MatchDefault
Generic Secret
Show pattern (?i)(?:password|passwd|pwd|secret|token|api[_-]?key|apikey|auth[_-]?token|access[_-]?token)\s*[:=]\s*['"]?([^\s'"]{8,})['"]?
password=MyS3cr3tP@ss! Enabled
High Entropy Secret
Show pattern ['"][A-Za-z0-9!@#$%^&*_+\-]{8,64}['"]
Shannon entropy > 3.5 bits/char, requires mixed character types
xK9#mP2$vL7@nQ4! Disabled
Private Key
Show pattern -----BEGIN (?:RSA |DSA |EC |OPENSSH |PGP )?PRIVATE KEY-----
-----BEGIN RSA PRIVATE KEY----- Enabled
Basic Auth
Show pattern (?i)basic\s+[a-z0-9+/]+=*
Basic dXNlcjpwYXNzd29yZA== Enabled
URL Credentials
Show pattern (?i)(?:https?|ftp)://[^/:@\s"']+:[^@\s"']+@[^\s/"']+
https://user:pass@example.com Enabled
Session ID
Show pattern (?i)(?:session[_-]?id|sid|jsessionid|phpsessid|aspsessionid)[=:\s]*[a-z0-9_-]{16,}
JSESSIONID=ABC123DEF456789XYZ Enabled

Location

RuleExample MatchDefault
GPS Coordinates
Show pattern -?(?:[1-8]?[0-9](?:\.[0-9]{4,})?|90(?:\.0+)?)\s*,\s*-?(?:1[0-7][0-9]|[1-9]?[0-9])(?:\.[0-9]{4,})?
51.5074, -0.1278 Disabled
UK Postcode
Show pattern (?i)\b[A-Z]{1,2}[0-9][0-9A-Z]?\s?[0-9][A-Z]{2}\b
SW1A 1AA Disabled
US Zip Code
Show pattern \b[0-9]{5}(?:-[0-9]{4})?\b
90210 Disabled

Date & Time

RuleExample MatchDefault
Date (ISO)
Show pattern \b(?:19|20)[0-9]{2}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12][0-9]|3[01])\b
2024-01-15 Disabled
Date (MM/DD/YY)
Show pattern \b(?:0?[1-9]|1[0-2])[/-](?:0?[1-9]|[12][0-9]|3[01])[/-](?:19|20)?[0-9]{2}\b
01/15/24, 12/31/2024 Disabled
Date (DD/MM/YY)
Show pattern \b(?:0?[1-9]|[12][0-9]|3[01])[/-](?:0?[1-9]|1[0-2])[/-](?:19|20)?[0-9]{2}\b
15/01/24, 31/12/2024 Disabled
Time
Show pattern \b(?:[01]?[0-9]|2[0-3]):[0-5][0-9](?::[0-5][0-9])?(?:\s*[AaPp][Mm])?\b
14:30:00, 2:30 PM Disabled
DateTime (ISO)
Show pattern \b(?:19|20)[0-9]{2}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12][0-9]|3[01])[T\s](?:[01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9](?:\.[0-9]+)?(?:Z|[+-][0-9]{2}:?[0-9]{2})?\b
2024-01-15T10:30:00Z Disabled
DateTime (CLF)
Show pattern \[?\d{1,2}/(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)/\d{4}:\d{2}:\d{2}:\d{2}\s*[+-]?\d{4}\]?
[15/Jan/2024:10:30:00 +0000] Disabled
Unix Timestamp
Show pattern \b1[0-9]{9}(?:[0-9]{3})?\b
1705312200 Disabled

Date/time rules are disabled by default as they often match non-sensitive data.

SQL

RuleExample MatchDefault
SQL Tables
Show pattern (?i)(?:FROM|JOIN|INTO|UPDATE|TABLE)\s+(`[^`]+`|\[[^\]]+\]|"[^"]+"|[a-zA-Z_][a-zA-Z0-9_]*)
FROM users, INSERT INTO orders Disabled
SQL Strings
Show pattern '(?:[^'\\]|\\.)*'
'John Doe', "example@email.com" Disabled
SQL Identifiers
Show pattern (?i)(?:SELECT|WHERE|AND|OR|ON|SET|ORDER\s+BY|GROUP\s+BY|HAVING|AS|,)\s*(`[^`]+`|\[[^\]]+\])|(`[^`]+`|\[[^\]]+\])\.\s*(`[^`]+`|\[[^\]]+\])
column_name, table.field Disabled

SQL rules help scrub sensitive data from database queries and logs. Disabled by default to avoid false positives.

SQL Dump File Support

LogScrub processes SQL dump files from PostgreSQL (pg_dump), MySQL (mysqldump), SQLite (.dump), and other databases. Upload .sql files directly or paste SQL content.

When processing SQL dumps, LogScrub detects PII in:

  • INSERT statements — String literals in VALUES clauses
  • UPDATE statements — SET clause values
  • Comments — Both -- and /* */ style comments
  • String literals — Single and double quoted strings throughout

SQL structure is preserved — table names, column names, SQL keywords, and syntax remain untouched while only values are anonymized. This keeps your scrubbed dump valid and executable.

Learn more about SQL dump anonymization →

Exim (Mail Server)

RuleExample MatchDefault
Exim Subject
Show pattern T="(?:[^"\\]|\\.)*"
T="Meeting reminder" Disabled
Exim Sender
Show pattern F=<[^>]+>
F=<user@example.com> Disabled
Exim Auth
Show pattern (?i)A=[a-z_]+(?::[^\s]+)?
A=login:user Disabled
Exim User
Show pattern U=[^\s]+
U=mailuser Disabled
Exim DN
Show pattern DN=[^\s]+
DN=cn=user,dc=example,dc=com Disabled

Rules for Exim mail server logs. These follow Exim's well-documented log format with field prefixes (T=, F=, A=, U=, DN=).

Postfix (Mail Server)

RuleExample MatchDefault
Postfix From
Show pattern from=<[^>]*>
from=<user@example.com> Disabled
Postfix To
Show pattern to=<[^>]+>
to=<recipient@example.com> Disabled
Postfix Relay
Show pattern relay=[^\s,]+(?:\[[^\]]+\])?
relay=mail.example.com[192.168.1.1] Disabled
Postfix SASL User
Show pattern sasl_username=[^\s,]+
sasl_username=admin Disabled

Rules for Postfix mail server logs. Postfix is one of the most widely used MTAs on Linux systems.

Dovecot (IMAP/POP3)

RuleExample MatchDefault
Dovecot User
Show pattern user=<[^>]+>
user=<mailuser> Disabled
Dovecot Remote IP
Show pattern rip=[0-9a-fA-F.:]+
rip=192.168.1.100 Disabled
Dovecot Local IP
Show pattern lip=[0-9a-fA-F.:]+
lip=10.0.0.1 Disabled

Rules for Dovecot IMAP/POP3 server logs. Captures login events with username and IP addresses.

Sendmail (Mail Server)

RuleExample MatchDefault
Sendmail From
Show pattern from=<[^>]*>,
from=<user@domain.com>, Disabled
Sendmail Relay
Show pattern relay=[^\s,\[\]]+(?:\[[^\]]+\])?
relay=mail.example.com Disabled
Sendmail MsgID
Show pattern msgid=<[^>]+>
msgid=<abc123@host> Disabled

Rules for Sendmail mail server logs. One of the oldest Unix MTAs with a well-established log format.

SIP/VoIP

RuleExample MatchDefault
SIP Username
Show pattern (?i)username="[^"]+"
username="john.doe" Disabled
SIP Realm
Show pattern (?i)realm="[^"]+"
realm="sip.example.com" Disabled
SIP Nonce
Show pattern (?i)nonce="[^"]+"
nonce="abc123def456" Disabled
SIP Response
Show pattern (?i)response="[a-f0-9]+"
response="9f8e7d6c5b4a3210" Disabled
SIP From Name
Show pattern (?i)^From:\s*"[^"]*"
From: "John Doe" <sip:john@example.com> Disabled
SIP To Name
Show pattern (?i)^To:\s*"[^"]*"
To: "Jane Smith" <sip:jane@example.com> Disabled
SIP Contact
Show pattern (?i)^Contact:\s*<?sip:[^>]+>?
Contact: <sip:john@192.168.1.100:5060> Disabled
SIP URI
Show pattern sips?:[^\s<>@]+@[^\s<>;]+
sip:user@domain.com:5060 Disabled
SIP Call-ID
Show pattern (?i)^Call-ID:\s*[^\s]+
Call-ID: abc123@192.168.1.100 Disabled
SIP Branch
Show pattern (?i)branch=z9hG4bK[a-zA-Z0-9]+
branch=z9hG4bK-abc123 Disabled
SIP User-Agent
Show pattern (?i)^User-Agent:\s*[^\r\n]+
User-Agent: Oasis SIP Phone Disabled
SIP Via
Show pattern (?i)^Via:\s*SIP/2\.0/[^\r\n]+
Via: SIP/2.0/UDP 192.168.1.100:5060 Disabled

Rules for SIP (Session Initiation Protocol) traces used in VoIP systems. Useful for scrubbing packet captures or debug logs from phone systems.

Hashes

RuleExample MatchDefault
MD5 Hash
Show pattern (?i)\b[a-f0-9]{32}\b
d41d8cd98f00b204e9800998ecf8427e Disabled
SHA1 Hash
Show pattern (?i)\b[a-f0-9]{40}\b
da39a3ee5e6b4b0d3255bfef95601890afd80709 Disabled
SHA256 Hash
Show pattern (?i)\b[a-f0-9]{64}\b
e3b0c44298fc1c149afbf4c8996fb924...27ae41e4649b934ca495991b7852b855 Disabled

Cryptographic hash detection. Disabled by default as hashes are often legitimate identifiers, but can be enabled if you need to scrub file checksums or content hashes.

Other

RuleExample MatchDefault
UUID
Show pattern (?i)\b[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\b
550e8400-e29b-41d4-a716-446655440000 Enabled
File Path (Unix)
Show pattern (?:/(?:home|Users)/[a-zA-Z0-9_-]+(?:/[a-zA-Z0-9._-]+)+|/tmp(?:/[a-zA-Z0-9._-]+)+)
/home/john/documents/file.txt Disabled
File Path (Windows)
Show pattern (?i)[a-z]:\\(?:Users|Documents and Settings)\\[^\s\\]+(?:\\[^\s\\]+)*
C:\Users\John\Documents\file.txt Disabled
Docker Container ID
Show pattern (?i)\b[a-f0-9]{12}\b
abc123def456789012345678901234567890abcd Disabled
URL Parameters
Show pattern Captures query string key-value pairs from URLs
?user=john&token=abc123 Disabled

Custom Rules

Create your own detection rules using regular expressions for patterns specific to your organization.

Adding a Custom Rule

  1. Click the "+ Regex" button in the Rulesets panel (Custom Rules tab)
  2. Enter a descriptive name for the rule
  3. Enter your regex pattern (JavaScript syntax)
  4. Click "Add Rule"

Example: Company Employee IDs

Name: Employee ID

Pattern: EMP-[0-9]{6}

Matches: EMP-123456, EMP-000001

Example: Internal Hostnames

Name: Internal Servers

Pattern: \b[a-z]+-(?:prod|staging|dev)-[0-9]+\.internal\.company\.com\b

Matches: api-prod-01.internal.company.com

Tip

Test your regex pattern using the "View Pattern" button (⚙) which includes a pattern tester.

Plain Text Patterns

For exact text matches that don't require regex (like specific hostnames or identifiers), use Plain Text patterns.

Adding a Plain Text Pattern

  1. Click the "+ Text" button in the Rulesets panel (Custom Rules tab)
  2. Enter a label for the pattern
  3. Enter the exact text to match
  4. Click "Add Pattern"

When to Use Plain Text vs Regex

Use Plain TextUse Regex
Specific hostnames: db-master.prod.internal Pattern-based hostnames: db-\d+\.prod\.internal
Known usernames: admin_system Username patterns: user_[a-z]+_[0-9]+
Fixed identifiers: ACME-CORP Variable IDs: ACME-[A-Z]{3}-[0-9]+

Presets

Presets let you save and quickly switch between different rule configurations.

Built-in Presets

  • Minimal — Only critical PII (emails, SSN, credit cards)
  • Standard — Common PII without dates/paths
  • Paranoid — Everything enabled for maximum redaction
  • Dev/Debug — Focus on secrets, tokens, and credentials
  • GDPR — EU personal data (includes UK postcodes, IBAN)

Custom Presets

Save your current rule configuration as a preset:

  1. Configure your rules as desired
  2. Click "Presets" to expand the presets panel
  3. Enter a name and click "Save"

Import/Export

Share configurations between computers or team members using JSON export/import.

Multi-File Upload

Process multiple log files at once with batch operations. Upload files individually or as a ZIP archive, analyze and scrub them all together, and export the results as a single ZIP download.

Uploading Multiple Files

There are several ways to upload multiple files:

  • File picker — Click "Upload" and select multiple files (hold Ctrl/Cmd to select multiple)
  • Drag and drop — Drag multiple files onto the editor area
  • ZIP archive — Upload a .zip file containing text files (they will be automatically extracted)

Supported file types: .log, .txt, .json, .xml, .csv, .sql, .zip

Files Tab

When you upload multiple files, a "Files" tab appears:

  • File list — Shows all uploaded files with their names, sizes, and status
  • Status badges — Pending, Analyzing, Analyzed, Processing, Done, or Error
  • Detection count — Shows how many PII items were found in each file
  • File selection — Click a file to view it in the editor
  • Remove files — Click the X button to remove individual files, or "Clear All" to start over

Batch Operations

Process all files at once using the batch buttons at the top of the Files tab:

  • Analyze All — Run detection on all files to see what PII will be found (preview mode)
  • Scrub All — Process all files with your current rule settings
  • Export ZIP — Download all scrubbed files as a single ZIP archive

A progress bar shows which file is currently being processed during batch operations.

File Navigation

When in multi-file mode, a navigation bar appears above the editor showing:

  • Current file name
  • Position indicator (e.g., "2 of 5")
  • Previous/Next buttons to quickly switch between files

Combined Statistics

The Statistics view includes a toggle to show stats for either:

  • Current File — Detection counts for the currently selected file
  • All Files — Combined totals across all uploaded files

Cross-File Consistency

When Consistency Mode is enabled, the same PII value will receive the same replacement across ALL files in the batch. For example, if john@example.com appears in multiple files, it will always be replaced with [EMAIL-1] in every file.

Tip

For large batches, consider using "Analyze All" first to review what will be detected before running "Scrub All".

Limits

Maximum 50 files per batch. Maximum 100MB total combined size.

Organizing Rules

Customize the order of rule categories and individual rules to match your workflow.

Reordering Categories

Drag categories to change their display order:

  1. Grab the drag handle (⋮⋮) to the left of a category name
  2. Drag the category up or down to your preferred position
  3. Release to drop it in place

Reordering Rules Within a Category

  1. Expand the category by clicking its name
  2. Grab the drag handle () to the left of a rule
  3. Drag the rule up or down within the category
  4. Release to set its new position

Persistence

Your custom ordering is automatically saved to your browser's local storage and will be restored on your next visit. When you save a preset, the ordering is included.

Tip

Put frequently-used categories at the top for quick access. For example, if you primarily work with network logs, drag the "Network" category to the top.

Time Shift

Timestamps in logs can reveal when incidents occurred, work patterns, or timezone information. Time Shift lets you anonymize temporal data while preserving the relative timing between events.

Note

Time Shift only appears when LogScrub detects timestamps in your input.

Supported Timestamp Formats

  • ISO 86012024-01-15T10:30:00Z
  • ISO Date2024-01-15
  • Apache error log[Sat Aug 12 04:05:51 2006]
  • Apache access log[17/May/2015:10:05:03 +0000]
  • SyslogJan 15 10:30:00
  • US Format01/15/2024 10:30:00

Offset Mode

Shift all timestamps by a fixed amount. Useful when you want to obscure the actual date/time while keeping the duration between events accurate.

Original
2024-01-15T10:30:00Z Request started 2024-01-15T10:30:05Z Database query 2024-01-15T10:30:07Z Response sent
Shifted by -48 hours
2024-01-13T10:30:00Z Request started 2024-01-13T10:30:05Z Database query 2024-01-13T10:30:07Z Response sent

Start From Mode

Set the first timestamp to a specific date/time. All subsequent timestamps shift by the same amount, preserving relative timing.

Scope: Line Start vs All Timestamps

Logs often contain two types of timestamps:

  • Log timestamps — At the start of each line
  • Content dates — Within the log message itself (DOBs, expiry dates, etc.)
ScopeBest For
Line Start Shifting log timing while sanitizing content dates with detection rules
All Timestamps When all dates should shift together (e.g., test data generation)
Tip

Use "Line Start" scope with date/time detection rules enabled. Log timestamps get shifted while content dates like DOBs get scrubbed to [DATE-1].

Log Crop

When working with large log files that span many hours or days, you often only need a specific time window. The Crop tool lets you trim a log file to a precise time range, keeping only the lines you need.

Note

The Crop button appears above the Original pane when LogScrub detects timestamps in your input. It uses the same timestamp formats supported by Time Shift.

How to Use

  1. Paste or upload a log file with timestamps
  2. Click the Crop link above the Original pane
  3. Review the detected time range, duration, and line count
  4. Select the time window you want to keep
  5. Click Crop to trim the log

Custom Range

Set a specific start and end time to keep. Any lines with timestamps outside this range are removed. Lines without timestamps (such as stack traces or continuation lines) are kept if they follow a line within the selected range.

Start + Duration

Select a start time and then choose a duration preset. This is useful when you know the approximate start of an incident and want to capture a fixed window of time.

Available presets: 1 minute, 5 minutes, 15 minutes, 30 minutes, 1 hour, 2 hours, 4 hours, 8 hours, 12 hours, and 24 hours.

Tip

Cropping replaces the original text. If you need to go back, use your browser's undo or re-paste the original log. Crop before scrubbing for faster processing on large files.

Analyze Mode

Preview detections before scrubbing to fine-tune your rules.

Using Analyze

  1. Paste your content
  2. Click "Analyze" above the Original pane
  3. Review highlighted matches (shown in red)
  4. Check the suggestions for disabled rules that would match
  5. Adjust rules as needed, then Scrub

Smart Suggestions

After analysis, LogScrub suggests disabled rules that found matches in your text. This helps you discover rules you might want to enable.

Log Format Detection

When you run Analyze, LogScrub automatically detects common log formats and offers to load a suitable preset. This saves time by enabling the most relevant rules for your log type.

Log FormatDetection MethodSuggested Preset
Apache/Nginx Common Log Format (CLF) with HTTP methods nginx / Apache
AWS CloudTrail/CloudWatch AWS ARNs, eventSource, eventName patterns AWS CloudWatch
SSH/Auth Logs sshd, sudo, pam_unix, failed password messages Auth / SSH Logs
Email Headers Multiple Received: headers Opens Email Routing visualization
SIP/VoIP Traces SIP protocol headers (Via, From, To, Call-ID) SIP / VoIP

When a log format is detected, a colored banner appears above the editor with a button to load the recommended preset. You can dismiss the banner if you prefer to configure rules manually.

ML Name Detection

LogScrub includes optional machine learning-based detection for identifying person names, locations, and organizations that pattern-based rules might miss.

Privacy First: All ML processing happens entirely in your browser. Your data never leaves your device. The model is downloaded once and cached locally in your browser's storage.

How It Works

ML Name Detection uses a pre-trained Named Entity Recognition (NER) model to identify entities in text:

  • PER (Persons) - Names of people (e.g., "John Smith", "Dr. Sarah Johnson")
  • LOC (Locations) - Place names (e.g., "London", "Silicon Valley")
  • ORG (Organizations) - Company and organization names (e.g., "Microsoft", "NHS")

Technology

  • Library: Transformers.js by Hugging Face
  • Model: BERT-based NER models (DistilBERT or BERT Base)
  • Format: ONNX for efficient browser execution via WebAssembly
  • Caching: Models are cached in IndexedDB after first download

Enabling ML Detection

  1. Click the Settings button in the toolbar
  2. Select a model under ML Name Detection (DistilBERT recommended for balance of speed/accuracy)
  3. Click Download Model (only required once — cached models auto-load on startup)
  4. Once "Ready" appears, click Run ML Analysis or it will run automatically during Analyze

Available Models

Model Size Speed Accuracy
DistilBERT NER ~250 MB Fast Good
BERT Base NER ~420 MB Slower Best
BERT Base NER (uncased) ~420 MB Slower Best (case-insensitive)

ML Detection Rules

When ML detection is enabled, three additional rules appear in the ML Detection category:

  • Person Names (ML) - Names identified by the ML model
  • Locations (ML) - Place names identified by the ML model
  • Organizations (ML) - Company/org names identified by the ML model

These rules are automatically enabled when you turn on ML Name Detection, but you can disable individual entity types if needed.

When to Use ML Detection

ML detection is most useful when:

  • Text contains person names not in email or username formats
  • You need to detect organization names
  • Location names need to be identified
  • Pattern-based rules are missing names in free-form text
Note: ML models can produce false positives (detecting non-names as names) or false negatives (missing actual names). Always review results, especially for unfamiliar text patterns.

Syntax Validation

LogScrub automatically validates the syntax of structured file formats when you paste or upload content. This helps catch malformed files before processing.

Supported Formats

  • JSON - Objects, arrays, and nested structures
  • XML - Including SVG, HTML, GPX, and other XML-based formats
  • CSV - Validates consistent column counts across rows
  • YAML - Configuration files and structured data
  • TOML - Configuration files (e.g., Cargo.toml, pyproject.toml)

Format Detection

The format is detected automatically by:

  1. File extension - When you upload a file, the extension determines the format
  2. Content analysis - For pasted content, LogScrub examines the structure (e.g., starts with { or [ for JSON, < for XML)

Validation Results

After analysis completes:

  • Valid syntax - A green checkmark appears next to the "Original" heading. Hover over it to see the detected format (e.g., "Valid JSON syntax").
  • Invalid syntax - A red error banner appears showing the format, line number, column number, and error message. Click the line number to scroll directly to the error location.
Tip

Syntax validation runs in WebAssembly for speed. Even large files are validated almost instantly.

Context-Aware Detection

Beyond regex patterns, LogScrub can detect potential secrets by analyzing JSON structures and identifying suspicious key names. This is especially useful for structured logs that contain key-value pairs.

How It Works

When you run Analyze, LogScrub automatically:

  1. Detects JSON content in your text (pure JSON, NDJSON, or embedded JSON in log lines)
  2. Parses the JSON and walks through all key-value pairs
  3. Flags values associated with suspicious keys like "password", "token", "api_key", etc.
  4. Shows findings in the Context-Aware tab within Smart Suggestions

Supported JSON Formats

FormatExample
Pure JSON {"database": {"password": "secret123"}}
JSON Lines (NDJSON) Multiple JSON objects, one per line (common in logs)
Embedded JSON 2024-01-15 INFO Request: {"user": "john", "token": "abc123"}

Suspicious Keys

LogScrub looks for keys that commonly contain sensitive data:

High Confidence

Exact matches: password, passwd, pwd, secret, token, api_key, apikey, access_key, private_key, client_secret, auth_token, access_token, refresh_token, bearer, jwt, ssh_key, passphrase, credential, credentials

Medium Confidence

Pattern matches: Keys ending in _key, _token, _secret, _password; keys containing auth, cred, or ending in pass

Using Context-Aware Findings

After analysis, check the Context-Aware tab in Smart Suggestions. Each finding shows:

  • Key name — The suspicious key that triggered detection
  • Confidence level — High (exact match) or Medium (pattern match)
  • JSON path — Full path like config.database.password
  • Value preview — A sample of the detected value

Click "Add to scrub" to create a plain-text pattern that will redact that specific value. This is useful when you find a secret that wasn't caught by the regex-based rules.

JSON Log Entry
{"config": {"db_password": "MyS3cr3tP@ss!"}}
Context-Aware Detection
Key: db_password Path: config.db_password Confidence: Medium (_password suffix) Value: MyS3cr3tP@ss!

Complementing Regex Detection

Context-aware detection complements (not replaces) the regex-based rules:

  • Regex rules — Detect secrets by their format (JWT structure, API key prefixes, etc.)
  • Context-aware — Detect secrets by their semantic context (what key they're assigned to)

Using both approaches together provides more comprehensive coverage for secrets that might not have a recognizable format but are stored under suspicious key names.

Tip

If you frequently work with JSON logs that contain secrets in predictable key names, consider adding custom plain-text patterns for those specific values after discovering them via context-aware detection.

Email Header Analysis

When LogScrub detects email headers (multiple Received: headers), it offers to visualize the email's routing path through mail servers.

Email Routing Visualization

Click "View Email Routing" in the blue banner to open a visual diagram showing:

  • Server chain — Each mail server the message passed through, from origin to destination
  • Timestamps — When each server received the message
  • Transit time — Duration between each hop (shown on the connector lines)
  • TLS encryption — A padlock icon indicates encrypted transmission, with TLS version and cipher details on hover
  • Protocol — The mail protocol used (SMTP, ESMTP, LMTPS, etc.)

What Gets Parsed

HeaderInformation Extracted
Received:Server hostnames, IP addresses, timestamps, TLS info, protocol
Date:Original send time (used as the origin timestamp)

Understanding the Diagram

The routing diagram reads from top to bottom:

  1. Origin — The sending server (extracted from the first Received: header's "from" field)
  2. Intermediate servers — Mail relays, spam filters, or corporate gateways
  3. Destination — The final receiving server
TLS Indicator

A padlock icon on the connector line indicates the transmission was encrypted. Hover over it to see the TLS version (e.g., TLSv1.3) and cipher suite used.

Use Cases

  • Debugging email delivery — Identify where delays occur in the delivery chain
  • Security analysis — Verify TLS encryption was used throughout the route
  • Spam investigation — Trace the origin of suspicious emails
  • Compliance — Document that email was transmitted securely

Spam Report Parsing

LogScrub detects and parses spam filter reports from SpamAssassin and rspamd, displaying them in an easy-to-read sortable table.

Supported Formats

FilterHeaderFormat
SpamAssassin X-Spam-Report Table with pts, rule name, and description columns
rspamd X-Spam-Report Symbol: RULE_NAME(score) format with Action field

Report View Features

  • Sortable table — Click column headers to sort by rule name or score
  • Color-coded rules — Green for ham (negative scores), red for spam (positive scores)
  • Total score — Combined score with visual indicator
  • Action taken — Shows what the filter decided (no action, add header, reject, etc.)
  • Rule breakdown — Count of ham, neutral, and spam rules

Multi-line Headers

Spam reports are often split across multiple lines in email headers. LogScrub automatically handles header continuation lines (lines starting with whitespace) and reassembles the complete report.

Multiple Reports

Some mail systems run multiple spam filters. When both SpamAssassin and rspamd reports are present (e.g., X-Spam-Report and X-Spam-Report-Secondary), LogScrub detects both and displays them with tabs to switch between the two reports for comparison.

Tip

When an amber banner appears saying "spam reports detected", click "View Reports" to see the parsed rules. If multiple reports are present, use the tabs at the top to switch between them.

GPX Route Transposition

GPX files contain GPS track data from fitness devices, cycling computers, and navigation apps. When sharing routes for debugging or analysis, you may want to hide your actual location while preserving all other statistics.

How It Works

When LogScrub detects a GPX file (by extension or content), a green banner offers to transpose the route to a different continent. The transposition:

  • Shifts all coordinates — Moves the entire route to a new location
  • Preserves route shape — All turns, distances, and geometry remain identical
  • Keeps timestamps — Duration and timing data unchanged
  • Retains elevation — All elevation data preserved
  • Includes waypoints — Waypoints and route points also transposed

Destination Regions

Choose from six destination regions:

RegionTarget Area
EuropeParis, France
North AmericaNew York, USA
South AmericaSão Paulo, Brazil
AsiaTokyo, Japan
OceaniaSydney, Australia
AfricaCape Town, South Africa

Route Statistics

Before transposition, the modal displays:

  • Track name and detected region
  • Number of GPS points
  • Total duration
  • Elevation range
  • Center coordinates
Use Case

Perfect for sharing cycling or running routes in bug reports without revealing where you live or work. The route shape and performance data remain valid for debugging.

Shifting Timestamps

GPX files also contain timestamps for each point. Use the Time Shift feature (in the toolbar) to shift all timestamps by a fixed offset or to a new start date. This adds another layer of privacy by obscuring when the activity occurred.

More GPX/FIT Tools

For more advanced GPX and FIT file manipulation (merging, splitting, editing, converting), visit skeffling.net/gpxfit.

Audit Reports

Generate detailed reports of all detected PII for compliance documentation.

Report Contents

  • Timestamp and source file name
  • Summary of detection counts by type
  • List of unique detected values (up to 100 per type)

Export Formats

  • Text (.txt) — Plain text, easy to read
  • JSON (.json) — Machine-readable, for integration
  • HTML (.html) — Formatted report for sharing

Access audit reports by clicking the detection count badge, then "Download Audit Report".

RTF Export with Highlighting

For a visual export of your scrubbed output, use the rtf download button (shown in green). This generates a Rich Text Format file with all replacements highlighted in green, matching the diff view appearance.

  • Visual review — Easily spot all changes at a glance
  • Shareable — RTF opens in Word, LibreOffice, TextEdit, and most word processors
  • Printable — Great for compliance documentation or review meetings

The RTF export uses a light green background with dark green text for replacements, making it easy to see what was scrubbed while keeping the document readable.

Reverse Lookup

When sharing scrubbed logs with others for analysis, you may receive feedback like "the IP on line 100 is causing the issue." LogScrub provides several ways to map scrubbed values back to their originals.

Hover Tooltips

Hover your mouse over any replacement in the scrubbed output (e.g., [IP-1]). A tooltip will show the original value, type, and line numbers.

Mapping Table

For a complete overview, open the Stats panel and click the Mapping tab. This shows a searchable table with every unique replacement.

Export Mapping Dictionary

  1. Open Stats → Mapping tab
  2. Click Export
  3. Choose JSON (for scripts) or CSV (for spreadsheets)

JSON Format Example

{
  "[EMAIL-1]": {
    "original": "john.doe@example.com",
    "type": "email",
    "count": 3,
    "lines": [12, 45, 89]
  }
}
Tip

Enable Consistency Mode before scrubbing to ensure the same original value always gets the same replacement label.

AI Explain

When sharing scrubbed logs with AI assistants like ChatGPT, Claude, or Copilot, they need context to understand the redaction format. The AI Explain feature generates a ready-to-use explanation.

How to Use

  1. Scrub your log as usual
  2. Click the "AI Explain" button in the scrubbed output toolbar
  3. Copy the generated explanation
  4. Paste it into your AI chat before pasting the scrubbed log

What's Included

The generated explanation includes:

  • Replacement strategies — Describes each strategy used (Label, Fake, Fake (Country), Redact, Template)
  • Consistency mode status — Tells the AI whether same tokens mean same values
  • Detected types table — Lists each PII type found with its replacement strategy and example format
  • Interpretation guidelines — Instructions for how the AI should reference replacements

Example Output

## Log Redaction Context

This log has been sanitized using LogScrub...

### Detected & Replaced Data Types

| Type | Strategy | Example Replacement | Count |
|------|----------|---------------------|-------|
| Email | Label | `[EMAIL-1]` | 5 |
| IPv4 | Fake | `142.58.201.33` | 12 |
| Hostname | Template | `<HOSTNAME-1>` | 3 |
Tip

The explanation reflects each rule's individual replacement strategy. If you use different strategies for different types (e.g., Label for emails, Fake for IPs), the AI will know exactly what each replacement format means.

Documents & Spreadsheets

LogScrub can scrub PII from document and spreadsheet files, not just plain text. All processing happens client-side in your browser using WebAssembly.

Supported Formats

FormatExtensionScrubbingPreview
PDF.pdfTrue redaction (text removed)Full page rendering
Word Document.docxFull text replacementFormatted preview
Excel Spreadsheet.xlsxFull text replacementTable view
OpenDocument Text.odtFull text replacementBasic text preview
OpenDocument Spreadsheet.odsFull text replacementTable view

How Document Processing Works

PDF Files

PDFs are rendered using MuPDF (WebAssembly). Due to the complexity of PDF format, scrubbing uses redaction only — detected PII is covered with black boxes rather than replaced with labels or fake data. This ensures the document structure remains intact.

True Redaction

The underlying text is permanently removed from the PDF, not just visually covered. The redacted text cannot be recovered by selecting, copying, or using PDF extraction tools.

Tip

The PDF preview shows page-by-page rendering with match counts per page, helping you verify all sensitive data was found.

Word Documents (.docx)

DOCX files are ZIP archives containing XML. LogScrub extracts the document content, applies your scrubbing rules, and repackages the file. The preview uses the docx-preview library to show formatted content including bold, italic, tables, and images.

Excel Spreadsheets (.xlsx)

XLSX files are processed using excelize-wasm. Cell contents are scrubbed while preserving the spreadsheet structure, formulas references, and formatting. The preview shows a table view of each sheet.

Note

Legacy .xls format (Excel 97-2003) is not supported. Please save as .xlsx format first.

OpenDocument Files (.odt, .ods)

LibreOffice/OpenOffice formats are ZIP archives with XML content, similar to Microsoft Office formats. LogScrub extracts and scrubs the content XML while preserving document structure.

Document Preview Features

  • Split view — See original and scrubbed documents side by side
  • ScrollSync — Original and scrubbed previews scroll together
  • Resizable panels — Drag the resize handle to adjust preview height
  • Match highlighting — PDF preview shows match count per page

Document Metadata

Documents often contain metadata that may include sensitive information:

  • Author name and company
  • Creation and modification dates
  • Application used to create the document
  • Revision history and editing time

When you upload a document, LogScrub automatically checks for metadata. If found, you'll see a dialog showing all detected metadata fields and can choose to:

  • Remove Metadata — Strip all metadata from the downloaded file
  • Keep Metadata — Preserve original metadata in the downloaded file
Tip

The metadata choice you make at upload time applies when you download. If you upload a new file, you'll be asked again.

Downloading Scrubbed Documents

After scrubbing, click Download to save the sanitized document. The output file preserves the original format — a scrubbed .docx remains a .docx that can be opened in Word.

Keyboard Shortcuts

ShortcutAction
⌘/Ctrl + EnterScrub the input text
⌘/Ctrl + SDownload scrubbed output
⌘/Ctrl + GGo to line number
EscapeCancel processing / Close dialogs

Working with Network Captures

LogScrub can scrub network packet captures (PCAP files), but they must first be converted to text format.

Converting PCAP to Text

Using tcpdump

tcpdump -r capture.pcap -tttt > capture.txt

Using tshark (Wireshark CLI)

tshark -r capture.pcap > capture.txt

Using Wireshark

  1. Open the PCAP file in Wireshark
  2. Go to File → Export Packet Dissections → As Plain Text
  3. Save the text file, then upload to LogScrub

What Gets Detected

Data TypeDetection RuleExample
IP addresses with portsIPv4, IPv6192.168.1.1:8080
MAC addressesMAC Address00:1A:2B:3C:4D:5E
Hostnames in DNS/HTTPHostname, URLapi.example.com
HTTP headersVarious token rulesAuth tokens, cookies

File Size & Performance

LogScrub runs entirely in your browser, so performance depends on your device's capabilities.

File SizePerformance
< 10 MBFast, smooth processing
10–50 MBWorks well, may take a few seconds
50–100 MBSlower processing, may take 10–30 seconds
> 100 MBMay cause browser memory issues

Tips for Large Files

  • Close other browser tabs to free up memory
  • Use a desktop browser rather than mobile
  • Consider splitting extremely large log files
  • The virtual scrolling feature helps keep the UI responsive

Privacy & Security

Important

LogScrub runs 100% in your browser. Your data is never uploaded to any server.

How We Protect Your Data

  • Client-side processing — All scrubbing happens in your browser using WebAssembly
  • No server communication — Your text never leaves your device
  • No analytics on content — We don't track what you scrub
  • Local storage only — Presets and settings are stored in your browser
Disclaimer

LogScrub does not guarantee detection of all PII. Pattern-based detection has inherent limitations. Always review your scrubbed output before sharing.

Best Practices

  • Always review the scrubbed output before sharing
  • Use "Analyze" first to understand what will be detected
  • Enable additional rules for sensitive data types not detected by default
  • Add custom patterns for organization-specific identifiers
  • Use the "Changed only" filter to quickly review modified lines