Why Use LogScrub?
Every day, developers and IT professionals need to share logs for debugging, support tickets, bug reports, and collaboration. But logs often contain sensitive information that shouldn't be shared.
Common Scenarios
Sharing Logs with AI Assistants
AI tools like ChatGPT, Claude, and Copilot are incredibly useful for debugging and analyzing logs. But pasting raw logs means sending customer emails, IP addresses, API keys, and other sensitive data to third-party services.
Solution: Scrub your logs first. The AI can still understand error patterns, stack traces, and timing issues without seeing real customer data.
Filing Bug Reports & Support Tickets
When reporting issues to software vendors or open-source projects, you often need to include logs. These logs may contain your company's internal hostnames, user data, or credentials that were accidentally logged.
Compliance & Data Protection
Regulations like GDPR, HIPAA, and CCPA restrict how personal data can be shared and processed. Sanitizing logs before sharing helps maintain compliance and reduces your data exposure footprint.
Why Client-Side Processing Matters
Many log sanitization tools require uploading your logs to a server. This defeats the purpose — you're sharing sensitive data with yet another third party.
LogScrub processes everything in your browser using WebAssembly. Your logs never leave your device. You can even use it offline or on an air-gapped machine.
Preserving Log Usefulness
The goal isn't just to remove data — it's to keep logs useful for their intended purpose:
- Consistency Mode ensures the same email always becomes
[EMAIL-1], so you can still trace a user's journey - Time Shift lets you anonymize timestamps while preserving the relative timing between events
- Log Crop trims log files to a specific time window, so you only share the relevant portion
- Selective Rules let you keep non-sensitive data like UUIDs or timestamps when needed for debugging
Quick Start
LogScrub helps you remove personally identifiable information (PII) and sensitive data from log files before sharing them.
Basic Workflow
- Paste or upload your log content into the "Original" pane
- Review detection rules by clicking the Rulesets button (toggle rules on/off, review matches)
- Click "Scrub" or press
⌘/Ctrl + Enter - Copy or download the scrubbed output
How It Works
LogScrub uses pattern matching to identify sensitive data in your text. Each detection rule has a regular expression (regex) that matches specific data formats.
Processing Pipeline
- Pattern Matching — Each enabled rule's regex is applied to find matches
- Replacement — Matched text is replaced according to your chosen strategy
- Consistency — When enabled, identical values get identical replacements
Use the "Analyze" feature to preview what will be detected before sanitizing. This also suggests disabled rules that would match content in your text.
Only enable rules for data types you expect to find in your content. Enabling all rules will likely cause false positives — for example, the "US Zip Code" pattern matches any 5-digit number. Be selective to get accurate results.
Replacement Strategies
Choose how detected PII should be replaced:
Label
Replaces with a descriptive label and counter.
john@example.com → [EMAIL-1]
Fake
Replaces with realistic fake data. Preserves structural prefixes (e.g. ICCID 89, BTC bc1).
john@example.com → maria.wilson@example.org
Fake (Country)
Fake data that preserves country-specific prefixes like phone country codes and TLDs.
+447508804412 → +447291635804
Redact
Replaces with blocks matching the original length.
john@example.com → ████████████████
Template
Custom replacement format using variables.
{TYPE}, {n}, {len}
Fake Data Generation
The Fake strategy uses a Rust-based data generation library to create realistic-looking replacements. This makes your scrubbed output look natural while still protecting sensitive information. Fake data is generated deterministically — the same input always produces the same fake output, ensuring consistency.
| PII Type | Fake Data Generated | Example |
|---|---|---|
| Realistic email addresses | maria.wilson@example.org | |
| Person Names (ML) | Full names from name database | James Rodriguez |
| Locations (ML) | City names | Portland |
| Organizations (ML) | Company names | Acme Industries |
| IPv4 | Valid IPv4 addresses | 142.58.201.33 |
| IPv6 | Valid IPv6 addresses | 2001:db8:85a3::8a2e:370:7334 |
| MAC Address | Valid MAC addresses | 4A:3B:2C:1D:5E:6F |
| Phone Numbers | Formatted phone numbers | (555) 842-9173 |
| Hostname | Realistic hostnames | server42.internal.net |
| URL | Valid URLs with paths | https://demo.io/api/4821 |
| UUID | Valid UUID v4 format | a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d |
| Credit Card | Luhn-valid card numbers | 4111-1111-1111-1234 |
| SSN | Valid format (not real) | 284-17-5932 |
| IBAN | Plausible IBAN format | DE89370400440532013000 |
| UK NHS Number | Check-digit valid format | 485 293 7164 |
| UK NINO | Valid NI number format | AB123456C |
| UK Postcode | Valid UK postcode format | SW1A 2AA |
| US Zip Code | 5-digit zip codes | 90210 |
| GPS Coordinates | Valid lat/long pairs | 51.5074, -0.1278 |
| File Paths | Realistic file paths | /home/user/data.json |
| Dates | Valid dates (randomized) | 2019-07-15 |
| Hashes (MD5/SHA) | Random hex strings | 5d41402abc4b2a76... |
| API Keys | Format-matching tokens | sk_test_Abc123xyz... |
| Crypto Addresses | Valid format addresses | 1A1zP1eP5QGefi2DMPTfTL... |
Fake data is generated using a seeded random number generator based on the original value. This means the same input (e.g., john@example.com) will always produce the same fake output across multiple runs. Combined with Consistency Mode, this ensures your scrubbed data maintains referential integrity.
Fake (Country) — Preserve Country Prefixes
The Fake (Country) strategy extends the Fake strategy by preserving country-specific prefixes. This is useful when you need to keep geographic context (e.g. which country a phone number belongs to) while still anonymizing the rest of the data.
| PII Type | What's Preserved | Example |
|---|---|---|
| International Phone (+) | + and country code digits | +447508804412 → +447291635804 |
| International Phone (no +) | Country code digits | 447508804412 → 447291635804 |
| US Phone | +1 or 1 prefix | +1-555-234-5678 → +1-832-671-9042 |
| UK Phone | Leading 0 | 07508804412 → 08291635804 |
| ICCID | 89 + country code (5 digits total) | 8944200011231044047 → 8944273829156308291 |
| IBAN | First 2 letters (country code) | GB82WEST12345698765432 → GB47XKRJ83920147562918 |
| Domain TLD | user@company.co.uk → jsmith@inbox.co.uk | |
| Hostname | TLD | server.example.co.uk → web42.co.uk |
| URL | Domain TLD | https://app.example.de/api → https://demo.de/users/4821 |
| MAC Address | OUI (first 3 octets) | AA:BB:CC:11:22:33 → AA:BB:CC:7F:3A:E2 |
| Credit Card | BIN (first 6 digits) | 4111111111111111 → 4111118294736150 |
The base Fake strategy preserves invariant structural prefixes (e.g. 89 on all ICCIDs, bc1 on Bech32 Bitcoin addresses) since these are the same regardless of country. Fake (Country) goes further by also preserving country-specific prefixes like phone country codes, IBAN country letters, and domain TLDs. For all PII types not listed above, Fake (Country) behaves identically to Fake.
Consistency Mode
When enabled, the same input value always produces the same replacement. This preserves relationships in your data:
Detection Rules Reference
LogScrub includes 95+ built-in detection rules organized by category.
Contact Information
| Rule | Example Match | Default |
|---|---|---|
Email
Show pattern\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b
|
user@example.com | Enabled |
Email Message-ID
Show pattern<[A-Za-z0-9!#$%&'*+/=?^_`.{|}~-]+@[A-Za-z0-9.-]+>
|
<abc123@mail.example.com> | Disabled |
Phone (US)
Show pattern\b(?:\+?1[-.\s]?)?\(?[0-9]{3}\)?[-.\s]?[0-9]{3}[-.\s]?[0-9]{4}\b
|
(555) 123-4567 | Enabled |
Phone (UK)
Show pattern\b(?:0[1-9][0-9]{8,9}|0[1-9][0-9]{2,4}[\s-][0-9]{3,4}[\s-]?[0-9]{3,4})\b
|
020 7946 0958 | Enabled |
Phone (Intl)
Show pattern\+[1-9][0-9]{1,3}[\s-]?[0-9]{6,14}\b
|
+44 7911 123456 | Enabled |
Phone (Intl, No +)
Show pattern\b[1-9][0-9]{9,14}\b
Validated against ~100 E.164 country codes with per-country digit length checks
|
447508804412 | Disabled |
Network
| Rule | Example Match | Default |
|---|---|---|
IPv4
Show pattern\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
|
192.168.1.1 | Enabled |
IPv6
Show pattern(?i)\[(?:[0-9a-f]{1,4}:){7}[0-9a-f]{1,4}\]... (full IPv6 pattern with bracketed, bare, link-local, and IPv4-mapped variants)
|
2001:0db8:85a3::8a2e:0370:7334 | Enabled |
MAC Address
Show pattern(?i)\b(?:[0-9A-F]{2}[:-]){5}[0-9A-F]{2}\b
|
00:1A:2B:3C:4D:5E | Enabled |
Hostname
Show pattern\b(?![a-zA-Z0-9-]+\.(?:txt|log|json|xml|...))\b)(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,12}\b
Excludes common file extensions (.txt, .log, .json, .xml, .csv, etc.) to reduce false positives
|
api.example.com | Enabled |
URL
Show patternhttps?://[^\s<>\[\]{}|\\^`\x00-\x1f\x7f]+
|
https://example.com/path | Enabled |
When enabling IPv4 or IPv6 scrubbing, you'll be asked whether to preserve private/internal IP addresses. Private IPs are not routable on the internet, so they're generally safe to share.
Preserved IPv4 ranges:
10.0.0.0/8- Class A private network (10.0.0.0 – 10.255.255.255)172.16.0.0/12- Class B private network (172.16.0.0 – 172.31.255.255)192.168.0.0/16- Class C private network (192.168.0.0 – 192.168.255.255)127.0.0.0/8- Loopback (127.0.0.1, localhost)169.254.0.0/16- Link-local (APIPA)
Preserved IPv6 ranges:
fe80::/10- Link-local addressesfc00::/7- Unique local addresses (fd00::/8 and fc00::/8)::1- Loopback::ffff:x.x.x.x- IPv4-mapped addresses (checks IPv4 portion)
You can toggle this setting via the "Preserve Private IPs" checkbox in the Settings panel.
Identity (US)
| Rule | Example Match | Default |
|---|---|---|
SSN
Show pattern\b[0-9]{3}-[0-9]{2}-[0-9]{4}\b
Validates area/group/serial ranges (no 000, 666, or 900-999 area codes)
|
123-45-6789 | Enabled |
US ITIN
Show pattern\b9[0-9]{2}[- ]?(5[0-9]|6[0-5]|7[0-9]|8[0-8]|9[0-24-9])[- ]?[0-9]{4}\b
|
912-54-1234 | Disabled |
Passport
Show pattern(?i)\b(?:passport[:\s#]*)?[A-Z]{1,2}[0-9]{6,9}\b
|
AB1234567 | Disabled |
Driver's License
Show pattern(?i)\b(?:d\.?l\.?|driver'?s?\s*(?:license|lic))[:\s#]*[A-Z0-9]{5,15}\b
|
DL: D1234567 | Disabled |
Identity (UK & International)
| Rule | Example Match | Default |
|---|---|---|
UK NHS Number
Show pattern\b([0-9]{3})[- ]?([0-9]{3})[- ]?([0-9]{4})\b
Mod-11 checksum validation
|
450 557 7104 | Disabled |
UK National Insurance
Show pattern(?i)\b[A-Z]{2}\s?[0-9]{2}\s?[0-9]{2}\s?[0-9]{2}\s?[A-D]\b
Validates prefix letters (excludes D, F, I, Q, U, V in first; D, F, I, O, Q, U, V in second)
|
AB 12 34 56 C | Disabled |
AU Tax File Number
Show pattern\b[0-9]{3}\s?[0-9]{3}\s?[0-9]{3}\b
Weighted checksum validation
|
123 456 789 | Disabled |
India PAN
Show pattern\b[A-Z]{3}[ABCFGHLJPT][A-Z][0-9]{4}[A-Z]\b
|
ABCPD1234E | Disabled |
Singapore NRIC
Show pattern(?i)\b[STFGM][0-9]{7}[A-Z]\b
Mod-11 with letter check table
|
S1234567A | Disabled |
Spanish NIF/DNI
Show pattern\b[0-9]{8}[A-Z]\b
Mod-23 checksum validation
|
12345678Z | Disabled |
Spanish NIE
Show pattern(?i)\b[XYZ][0-9]{7}[A-Z]\b
Mod-23 checksum (X/Y/Z prefix mapped to 0/1/2)
|
X1234567L | Disabled |
Canadian SIN
Show pattern\b[0-9]{3}[- ]?[0-9]{3}[- ]?[0-9]{3}\b
Luhn checksum validation
|
123-456-789 | Disabled |
VIN
Show pattern\b[A-HJ-NPR-Z0-9]{17}\b
Check-digit validation at position 9, transliteration + weighted sum
|
1HGBH41JXMN109186 | Disabled |
ICCID (SIM Card)
Show pattern\b89[0-9]{16,20}\b
Must start with 89 (telecom MII) + Luhn checksum validation
|
8944200011231044047 | Disabled |
Note: NHS Number, AU TFN, Singapore NRIC, Canadian SIN, ICCID, Spanish NIF/NIE, and VIN include checksum or check-digit validation to reduce false positives.
Financial
| Rule | Example Match | Default |
|---|---|---|
Credit Card
Show pattern\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13}|6(?:011|5[0-9]{2})[0-9]{12})\b
Luhn checksum validation
|
4111111111111111 | Enabled |
IBAN
Show pattern\b[A-Z]{2}[0-9]{2}[A-Z0-9]{4}[0-9]{7}(?:[A-Z0-9]?){0,16}\b
Mod-97 checksum validation (ISO 7064)
|
GB82WEST12345698765432 | Enabled |
Bitcoin Address
Show pattern\b(?:bc1|[13])[a-zA-HJ-NP-Z0-9]{25,62}\b
Base58 format check
|
1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2 | Enabled |
Ethereum Address
Show pattern\b0x[a-fA-F0-9]{40}\b
Hex format check (40 hex characters after 0x prefix)
|
0x742d35Cc6634C0532925a3b844Bc... | Enabled |
Money/Currency
Show pattern(?:[$£€¥₹₩₽¢฿₪₴₦₡₱₲₵₸₺₼₾][0-9]{1,3}(?:[,.\s][0-9]{2,3})*(?:[.,][0-9]{1,2})?|[0-9]{1,3}(?:[,.\s][0-9]{2,3})*(?:[.,][0-9]{1,2})?\s*(?:USD|EUR|GBP|JPY|CNY|INR|...))
|
$10.99, £1,000.00, 100 EUR | Disabled |
Tokens & API Keys
| Rule | Example Match | Default |
|---|---|---|
JWT
Show patterneyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+
|
eyJhbGciOiJIUzI1NiIsInR5cCI6... | Enabled |
Bearer Token
Show pattern(?i)bearer\s+[a-z0-9_-]+\.[a-z0-9_-]+\.?[a-z0-9_-]*
|
Bearer abc123.def456 | Enabled |
AWS Access Key
Show pattern\b(?:AKIA|ABIA|ACCA|ASIA)[0-9A-Z]{16}\b
|
AKIAIOSFODNN7EXAMPLE | Enabled |
AWS Secret Key
Show pattern(?i)(?:aws.?secret|secret.?access)[^a-z0-9]*['"]?([a-z0-9/+=]{40})['"]?
|
aws_secret_access_key = wJalrXUt... | Enabled |
Stripe Key
Show pattern\b(?:sk|pk)_(?:test|live)_[0-9a-zA-Z]{24,}\b
|
sk_test_4eC39HqLyjWDarjtT1zdp7dc | Enabled |
GCP API Key
Show pattern\bAIza[0-9A-Za-z_-]{35}\b
|
AIzaSyDaGmWKa4JsXZ-HjGw7ISLn... | Enabled |
GitHub Token
Show pattern\b(?:ghp|gho|ghu|ghs|ghr)_[A-Za-z0-9_]{36,}\b
|
ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxx | Enabled |
OpenAI API Key
Show pattern\bsk-(?:proj-)?[a-zA-Z0-9]{32,64}\b
|
sk-proj-abc123def456ghi789... | Enabled |
Anthropic API Key
Show pattern\bsk-ant-[a-zA-Z0-9_-]{32,64}\b
|
sk-ant-api03-abc123def456... | Enabled |
X AI API Key
Show pattern\bxai-[a-zA-Z0-9]{32,64}\b
|
xai-abc123def456ghi789... | Enabled |
Cerebras API Key
Show pattern\bcsk-[a-zA-Z0-9]{40,50}\b
|
csk-abc123def456ghi789... | Enabled |
Slack Token
Show pattern\bxox[baprs]-[0-9]{10,13}-[0-9]{10,13}[a-zA-Z0-9-]*\b
|
xoxb-123456789012-123456789012-abc... | Enabled |
NPM Token
Show pattern\bnpm_[A-Za-z0-9]{36}\b
|
npm_abc123def456ghi789jkl012... | Enabled |
SendGrid Key
Show pattern\bSG\.[A-Za-z0-9_-]{22}\.[A-Za-z0-9_-]{43}\b
|
SG.abc123def456.ghi789jkl012mno345... | Enabled |
Twilio Key
Show pattern\b(?:AC|SK)[a-f0-9]{32}\b
|
SK1234567890abcdef1234567890abcdef | Enabled |
Database URL
Show pattern(?i)(?:mongodb|postgres|postgresql|mysql|redis|amqp|mssql)://[^\s]+
|
postgres://user:pass@host:5432/db | Enabled |
Secrets
| Rule | Example Match | Default |
|---|---|---|
Generic Secret
Show pattern(?i)(?:password|passwd|pwd|secret|token|api[_-]?key|apikey|auth[_-]?token|access[_-]?token)\s*[:=]\s*['"]?([^\s'"]{8,})['"]?
|
password=MyS3cr3tP@ss! | Enabled |
High Entropy Secret
Show pattern['"][A-Za-z0-9!@#$%^&*_+\-]{8,64}['"]
Shannon entropy > 3.5 bits/char, requires mixed character types
|
xK9#mP2$vL7@nQ4! | Disabled |
Private Key
Show pattern-----BEGIN (?:RSA |DSA |EC |OPENSSH |PGP )?PRIVATE KEY-----
|
-----BEGIN RSA PRIVATE KEY----- | Enabled |
Basic Auth
Show pattern(?i)basic\s+[a-z0-9+/]+=*
|
Basic dXNlcjpwYXNzd29yZA== | Enabled |
URL Credentials
Show pattern(?i)(?:https?|ftp)://[^/:@\s"']+:[^@\s"']+@[^\s/"']+
|
https://user:pass@example.com | Enabled |
Session ID
Show pattern(?i)(?:session[_-]?id|sid|jsessionid|phpsessid|aspsessionid)[=:\s]*[a-z0-9_-]{16,}
|
JSESSIONID=ABC123DEF456789XYZ | Enabled |
Location
| Rule | Example Match | Default |
|---|---|---|
GPS Coordinates
Show pattern-?(?:[1-8]?[0-9](?:\.[0-9]{4,})?|90(?:\.0+)?)\s*,\s*-?(?:1[0-7][0-9]|[1-9]?[0-9])(?:\.[0-9]{4,})?
|
51.5074, -0.1278 | Disabled |
UK Postcode
Show pattern(?i)\b[A-Z]{1,2}[0-9][0-9A-Z]?\s?[0-9][A-Z]{2}\b
|
SW1A 1AA | Disabled |
US Zip Code
Show pattern\b[0-9]{5}(?:-[0-9]{4})?\b
|
90210 | Disabled |
Date & Time
| Rule | Example Match | Default |
|---|---|---|
Date (ISO)
Show pattern\b(?:19|20)[0-9]{2}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12][0-9]|3[01])\b
|
2024-01-15 | Disabled |
Date (MM/DD/YY)
Show pattern\b(?:0?[1-9]|1[0-2])[/-](?:0?[1-9]|[12][0-9]|3[01])[/-](?:19|20)?[0-9]{2}\b
|
01/15/24, 12/31/2024 | Disabled |
Date (DD/MM/YY)
Show pattern\b(?:0?[1-9]|[12][0-9]|3[01])[/-](?:0?[1-9]|1[0-2])[/-](?:19|20)?[0-9]{2}\b
|
15/01/24, 31/12/2024 | Disabled |
Time
Show pattern\b(?:[01]?[0-9]|2[0-3]):[0-5][0-9](?::[0-5][0-9])?(?:\s*[AaPp][Mm])?\b
|
14:30:00, 2:30 PM | Disabled |
DateTime (ISO)
Show pattern\b(?:19|20)[0-9]{2}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12][0-9]|3[01])[T\s](?:[01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9](?:\.[0-9]+)?(?:Z|[+-][0-9]{2}:?[0-9]{2})?\b
|
2024-01-15T10:30:00Z | Disabled |
DateTime (CLF)
Show pattern\[?\d{1,2}/(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)/\d{4}:\d{2}:\d{2}:\d{2}\s*[+-]?\d{4}\]?
|
[15/Jan/2024:10:30:00 +0000] | Disabled |
Unix Timestamp
Show pattern\b1[0-9]{9}(?:[0-9]{3})?\b
|
1705312200 | Disabled |
Date/time rules are disabled by default as they often match non-sensitive data.
SQL
| Rule | Example Match | Default |
|---|---|---|
SQL Tables
Show pattern(?i)(?:FROM|JOIN|INTO|UPDATE|TABLE)\s+(`[^`]+`|\[[^\]]+\]|"[^"]+"|[a-zA-Z_][a-zA-Z0-9_]*)
|
FROM users, INSERT INTO orders | Disabled |
SQL Strings
Show pattern'(?:[^'\\]|\\.)*'
|
'John Doe', "example@email.com" | Disabled |
SQL Identifiers
Show pattern(?i)(?:SELECT|WHERE|AND|OR|ON|SET|ORDER\s+BY|GROUP\s+BY|HAVING|AS|,)\s*(`[^`]+`|\[[^\]]+\])|(`[^`]+`|\[[^\]]+\])\.\s*(`[^`]+`|\[[^\]]+\])
|
column_name, table.field | Disabled |
SQL rules help scrub sensitive data from database queries and logs. Disabled by default to avoid false positives.
SQL Dump File Support
LogScrub processes SQL dump files from PostgreSQL (pg_dump), MySQL (mysqldump), SQLite (.dump), and other databases. Upload .sql files directly or paste SQL content.
When processing SQL dumps, LogScrub detects PII in:
- INSERT statements — String literals in VALUES clauses
- UPDATE statements — SET clause values
- Comments — Both
--and/* */style comments - String literals — Single and double quoted strings throughout
SQL structure is preserved — table names, column names, SQL keywords, and syntax remain untouched while only values are anonymized. This keeps your scrubbed dump valid and executable.
Exim (Mail Server)
| Rule | Example Match | Default |
|---|---|---|
Exim Subject
Show patternT="(?:[^"\\]|\\.)*"
|
T="Meeting reminder" | Disabled |
Exim Sender
Show patternF=<[^>]+>
|
F=<user@example.com> | Disabled |
Exim Auth
Show pattern(?i)A=[a-z_]+(?::[^\s]+)?
|
A=login:user | Disabled |
Exim User
Show patternU=[^\s]+
|
U=mailuser | Disabled |
Exim DN
Show patternDN=[^\s]+
|
DN=cn=user,dc=example,dc=com | Disabled |
Rules for Exim mail server logs. These follow Exim's well-documented log format with field prefixes (T=, F=, A=, U=, DN=).
Postfix (Mail Server)
| Rule | Example Match | Default |
|---|---|---|
Postfix From
Show patternfrom=<[^>]*>
|
from=<user@example.com> | Disabled |
Postfix To
Show patternto=<[^>]+>
|
to=<recipient@example.com> | Disabled |
Postfix Relay
Show patternrelay=[^\s,]+(?:\[[^\]]+\])?
|
relay=mail.example.com[192.168.1.1] | Disabled |
Postfix SASL User
Show patternsasl_username=[^\s,]+
|
sasl_username=admin | Disabled |
Rules for Postfix mail server logs. Postfix is one of the most widely used MTAs on Linux systems.
Dovecot (IMAP/POP3)
| Rule | Example Match | Default |
|---|---|---|
Dovecot User
Show patternuser=<[^>]+>
|
user=<mailuser> | Disabled |
Dovecot Remote IP
Show patternrip=[0-9a-fA-F.:]+
|
rip=192.168.1.100 | Disabled |
Dovecot Local IP
Show patternlip=[0-9a-fA-F.:]+
|
lip=10.0.0.1 | Disabled |
Rules for Dovecot IMAP/POP3 server logs. Captures login events with username and IP addresses.
Sendmail (Mail Server)
| Rule | Example Match | Default |
|---|---|---|
Sendmail From
Show patternfrom=<[^>]*>,
|
from=<user@domain.com>, | Disabled |
Sendmail Relay
Show patternrelay=[^\s,\[\]]+(?:\[[^\]]+\])?
|
relay=mail.example.com | Disabled |
Sendmail MsgID
Show patternmsgid=<[^>]+>
|
msgid=<abc123@host> | Disabled |
Rules for Sendmail mail server logs. One of the oldest Unix MTAs with a well-established log format.
SIP/VoIP
| Rule | Example Match | Default |
|---|---|---|
SIP Username
Show pattern(?i)username="[^"]+"
|
username="john.doe" | Disabled |
SIP Realm
Show pattern(?i)realm="[^"]+"
|
realm="sip.example.com" | Disabled |
SIP Nonce
Show pattern(?i)nonce="[^"]+"
|
nonce="abc123def456" | Disabled |
SIP Response
Show pattern(?i)response="[a-f0-9]+"
|
response="9f8e7d6c5b4a3210" | Disabled |
SIP From Name
Show pattern(?i)^From:\s*"[^"]*"
|
From: "John Doe" <sip:john@example.com> | Disabled |
SIP To Name
Show pattern(?i)^To:\s*"[^"]*"
|
To: "Jane Smith" <sip:jane@example.com> | Disabled |
SIP Contact
Show pattern(?i)^Contact:\s*<?sip:[^>]+>?
|
Contact: <sip:john@192.168.1.100:5060> | Disabled |
SIP URI
Show patternsips?:[^\s<>@]+@[^\s<>;]+
|
sip:user@domain.com:5060 | Disabled |
SIP Call-ID
Show pattern(?i)^Call-ID:\s*[^\s]+
|
Call-ID: abc123@192.168.1.100 | Disabled |
SIP Branch
Show pattern(?i)branch=z9hG4bK[a-zA-Z0-9]+
|
branch=z9hG4bK-abc123 | Disabled |
SIP User-Agent
Show pattern(?i)^User-Agent:\s*[^\r\n]+
|
User-Agent: Oasis SIP Phone | Disabled |
SIP Via
Show pattern(?i)^Via:\s*SIP/2\.0/[^\r\n]+
|
Via: SIP/2.0/UDP 192.168.1.100:5060 | Disabled |
Rules for SIP (Session Initiation Protocol) traces used in VoIP systems. Useful for scrubbing packet captures or debug logs from phone systems.
Hashes
| Rule | Example Match | Default |
|---|---|---|
MD5 Hash
Show pattern(?i)\b[a-f0-9]{32}\b
|
d41d8cd98f00b204e9800998ecf8427e | Disabled |
SHA1 Hash
Show pattern(?i)\b[a-f0-9]{40}\b
|
da39a3ee5e6b4b0d3255bfef95601890afd80709 | Disabled |
SHA256 Hash
Show pattern(?i)\b[a-f0-9]{64}\b
|
e3b0c44298fc1c149afbf4c8996fb924...27ae41e4649b934ca495991b7852b855 | Disabled |
Cryptographic hash detection. Disabled by default as hashes are often legitimate identifiers, but can be enabled if you need to scrub file checksums or content hashes.
Other
| Rule | Example Match | Default |
|---|---|---|
UUID
Show pattern(?i)\b[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\b
|
550e8400-e29b-41d4-a716-446655440000 | Enabled |
File Path (Unix)
Show pattern(?:/(?:home|Users)/[a-zA-Z0-9_-]+(?:/[a-zA-Z0-9._-]+)+|/tmp(?:/[a-zA-Z0-9._-]+)+)
|
/home/john/documents/file.txt | Disabled |
File Path (Windows)
Show pattern(?i)[a-z]:\\(?:Users|Documents and Settings)\\[^\s\\]+(?:\\[^\s\\]+)*
|
C:\Users\John\Documents\file.txt | Disabled |
Docker Container ID
Show pattern(?i)\b[a-f0-9]{12}\b
|
abc123def456789012345678901234567890abcd | Disabled |
URL Parameters
Show patternCaptures query string key-value pairs from URLs
|
?user=john&token=abc123 | Disabled |
Custom Rules
Create your own detection rules using regular expressions for patterns specific to your organization.
Adding a Custom Rule
- Click the "+ Regex" button in the Rulesets panel (Custom Rules tab)
- Enter a descriptive name for the rule
- Enter your regex pattern (JavaScript syntax)
- Click "Add Rule"
Example: Company Employee IDs
Name: Employee ID
Pattern: EMP-[0-9]{6}
Matches: EMP-123456, EMP-000001
Example: Internal Hostnames
Name: Internal Servers
Pattern: \b[a-z]+-(?:prod|staging|dev)-[0-9]+\.internal\.company\.com\b
Matches: api-prod-01.internal.company.com
Test your regex pattern using the "View Pattern" button (⚙) which includes a pattern tester.
Plain Text Patterns
For exact text matches that don't require regex (like specific hostnames or identifiers), use Plain Text patterns.
Adding a Plain Text Pattern
- Click the "+ Text" button in the Rulesets panel (Custom Rules tab)
- Enter a label for the pattern
- Enter the exact text to match
- Click "Add Pattern"
When to Use Plain Text vs Regex
| Use Plain Text | Use Regex |
|---|---|
Specific hostnames: db-master.prod.internal |
Pattern-based hostnames: db-\d+\.prod\.internal |
Known usernames: admin_system |
Username patterns: user_[a-z]+_[0-9]+ |
Fixed identifiers: ACME-CORP |
Variable IDs: ACME-[A-Z]{3}-[0-9]+ |
Presets
Presets let you save and quickly switch between different rule configurations.
Built-in Presets
- Minimal — Only critical PII (emails, SSN, credit cards)
- Standard — Common PII without dates/paths
- Paranoid — Everything enabled for maximum redaction
- Dev/Debug — Focus on secrets, tokens, and credentials
- GDPR — EU personal data (includes UK postcodes, IBAN)
Custom Presets
Save your current rule configuration as a preset:
- Configure your rules as desired
- Click "Presets" to expand the presets panel
- Enter a name and click "Save"
Import/Export
Share configurations between computers or team members using JSON export/import.
Multi-File Upload
Process multiple log files at once with batch operations. Upload files individually or as a ZIP archive, analyze and scrub them all together, and export the results as a single ZIP download.
Uploading Multiple Files
There are several ways to upload multiple files:
- File picker — Click "Upload" and select multiple files (hold Ctrl/Cmd to select multiple)
- Drag and drop — Drag multiple files onto the editor area
- ZIP archive — Upload a .zip file containing text files (they will be automatically extracted)
Supported file types: .log, .txt, .json, .xml, .csv, .sql, .zip
Files Tab
When you upload multiple files, a "Files" tab appears:
- File list — Shows all uploaded files with their names, sizes, and status
- Status badges — Pending, Analyzing, Analyzed, Processing, Done, or Error
- Detection count — Shows how many PII items were found in each file
- File selection — Click a file to view it in the editor
- Remove files — Click the X button to remove individual files, or "Clear All" to start over
Batch Operations
Process all files at once using the batch buttons at the top of the Files tab:
- Analyze All — Run detection on all files to see what PII will be found (preview mode)
- Scrub All — Process all files with your current rule settings
- Export ZIP — Download all scrubbed files as a single ZIP archive
A progress bar shows which file is currently being processed during batch operations.
File Navigation
When in multi-file mode, a navigation bar appears above the editor showing:
- Current file name
- Position indicator (e.g., "2 of 5")
- Previous/Next buttons to quickly switch between files
Combined Statistics
The Statistics view includes a toggle to show stats for either:
- Current File — Detection counts for the currently selected file
- All Files — Combined totals across all uploaded files
Cross-File Consistency
When Consistency Mode is enabled, the same PII value will receive the same replacement across ALL files in the batch. For example, if john@example.com appears in multiple files, it will always be replaced with [EMAIL-1] in every file.
For large batches, consider using "Analyze All" first to review what will be detected before running "Scrub All".
Maximum 50 files per batch. Maximum 100MB total combined size.
Organizing Rules
Customize the order of rule categories and individual rules to match your workflow.
Reordering Categories
Drag categories to change their display order:
- Grab the drag handle (
⋮⋮) to the left of a category name - Drag the category up or down to your preferred position
- Release to drop it in place
Reordering Rules Within a Category
- Expand the category by clicking its name
- Grab the drag handle (
⋮) to the left of a rule - Drag the rule up or down within the category
- Release to set its new position
Persistence
Your custom ordering is automatically saved to your browser's local storage and will be restored on your next visit. When you save a preset, the ordering is included.
Put frequently-used categories at the top for quick access. For example, if you primarily work with network logs, drag the "Network" category to the top.
Time Shift
Timestamps in logs can reveal when incidents occurred, work patterns, or timezone information. Time Shift lets you anonymize temporal data while preserving the relative timing between events.
Time Shift only appears when LogScrub detects timestamps in your input.
Supported Timestamp Formats
- ISO 8601 —
2024-01-15T10:30:00Z - ISO Date —
2024-01-15 - Apache error log —
[Sat Aug 12 04:05:51 2006] - Apache access log —
[17/May/2015:10:05:03 +0000] - Syslog —
Jan 15 10:30:00 - US Format —
01/15/2024 10:30:00
Offset Mode
Shift all timestamps by a fixed amount. Useful when you want to obscure the actual date/time while keeping the duration between events accurate.
Start From Mode
Set the first timestamp to a specific date/time. All subsequent timestamps shift by the same amount, preserving relative timing.
Scope: Line Start vs All Timestamps
Logs often contain two types of timestamps:
- Log timestamps — At the start of each line
- Content dates — Within the log message itself (DOBs, expiry dates, etc.)
| Scope | Best For |
|---|---|
| Line Start | Shifting log timing while sanitizing content dates with detection rules |
| All Timestamps | When all dates should shift together (e.g., test data generation) |
Use "Line Start" scope with date/time detection rules enabled. Log timestamps get shifted while content dates like DOBs get scrubbed to [DATE-1].
Log Crop
When working with large log files that span many hours or days, you often only need a specific time window. The Crop tool lets you trim a log file to a precise time range, keeping only the lines you need.
The Crop button appears above the Original pane when LogScrub detects timestamps in your input. It uses the same timestamp formats supported by Time Shift.
How to Use
- Paste or upload a log file with timestamps
- Click the Crop link above the Original pane
- Review the detected time range, duration, and line count
- Select the time window you want to keep
- Click Crop to trim the log
Custom Range
Set a specific start and end time to keep. Any lines with timestamps outside this range are removed. Lines without timestamps (such as stack traces or continuation lines) are kept if they follow a line within the selected range.
Start + Duration
Select a start time and then choose a duration preset. This is useful when you know the approximate start of an incident and want to capture a fixed window of time.
Available presets: 1 minute, 5 minutes, 15 minutes, 30 minutes, 1 hour, 2 hours, 4 hours, 8 hours, 12 hours, and 24 hours.
Cropping replaces the original text. If you need to go back, use your browser's undo or re-paste the original log. Crop before scrubbing for faster processing on large files.
Analyze Mode
Preview detections before scrubbing to fine-tune your rules.
Using Analyze
- Paste your content
- Click "Analyze" above the Original pane
- Review highlighted matches (shown in red)
- Check the suggestions for disabled rules that would match
- Adjust rules as needed, then Scrub
Smart Suggestions
After analysis, LogScrub suggests disabled rules that found matches in your text. This helps you discover rules you might want to enable.
Log Format Detection
When you run Analyze, LogScrub automatically detects common log formats and offers to load a suitable preset. This saves time by enabling the most relevant rules for your log type.
| Log Format | Detection Method | Suggested Preset |
|---|---|---|
| Apache/Nginx | Common Log Format (CLF) with HTTP methods | nginx / Apache |
| AWS CloudTrail/CloudWatch | AWS ARNs, eventSource, eventName patterns | AWS CloudWatch |
| SSH/Auth Logs | sshd, sudo, pam_unix, failed password messages | Auth / SSH Logs |
| Email Headers | Multiple Received: headers |
Opens Email Routing visualization |
| SIP/VoIP Traces | SIP protocol headers (Via, From, To, Call-ID) | SIP / VoIP |
When a log format is detected, a colored banner appears above the editor with a button to load the recommended preset. You can dismiss the banner if you prefer to configure rules manually.
ML Name Detection
LogScrub includes optional machine learning-based detection for identifying person names, locations, and organizations that pattern-based rules might miss.
How It Works
ML Name Detection uses a pre-trained Named Entity Recognition (NER) model to identify entities in text:
- PER (Persons) - Names of people (e.g., "John Smith", "Dr. Sarah Johnson")
- LOC (Locations) - Place names (e.g., "London", "Silicon Valley")
- ORG (Organizations) - Company and organization names (e.g., "Microsoft", "NHS")
Technology
- Library: Transformers.js by Hugging Face
- Model: BERT-based NER models (DistilBERT or BERT Base)
- Format: ONNX for efficient browser execution via WebAssembly
- Caching: Models are cached in IndexedDB after first download
Enabling ML Detection
- Click the Settings button in the toolbar
- Select a model under ML Name Detection (DistilBERT recommended for balance of speed/accuracy)
- Click Download Model (only required once — cached models auto-load on startup)
- Once "Ready" appears, click Run ML Analysis or it will run automatically during Analyze
Available Models
| Model | Size | Speed | Accuracy |
|---|---|---|---|
| DistilBERT NER | ~250 MB | Fast | Good |
| BERT Base NER | ~420 MB | Slower | Best |
| BERT Base NER (uncased) | ~420 MB | Slower | Best (case-insensitive) |
ML Detection Rules
When ML detection is enabled, three additional rules appear in the ML Detection category:
- Person Names (ML) - Names identified by the ML model
- Locations (ML) - Place names identified by the ML model
- Organizations (ML) - Company/org names identified by the ML model
These rules are automatically enabled when you turn on ML Name Detection, but you can disable individual entity types if needed.
When to Use ML Detection
ML detection is most useful when:
- Text contains person names not in email or username formats
- You need to detect organization names
- Location names need to be identified
- Pattern-based rules are missing names in free-form text
Syntax Validation
LogScrub automatically validates the syntax of structured file formats when you paste or upload content. This helps catch malformed files before processing.
Supported Formats
- JSON - Objects, arrays, and nested structures
- XML - Including SVG, HTML, GPX, and other XML-based formats
- CSV - Validates consistent column counts across rows
- YAML - Configuration files and structured data
- TOML - Configuration files (e.g., Cargo.toml, pyproject.toml)
Format Detection
The format is detected automatically by:
- File extension - When you upload a file, the extension determines the format
- Content analysis - For pasted content, LogScrub examines the structure (e.g., starts with
{or[for JSON,<for XML)
Validation Results
After analysis completes:
- Valid syntax - A green checkmark appears next to the "Original" heading. Hover over it to see the detected format (e.g., "Valid JSON syntax").
- Invalid syntax - A red error banner appears showing the format, line number, column number, and error message. Click the line number to scroll directly to the error location.
Syntax validation runs in WebAssembly for speed. Even large files are validated almost instantly.
Context-Aware Detection
Beyond regex patterns, LogScrub can detect potential secrets by analyzing JSON structures and identifying suspicious key names. This is especially useful for structured logs that contain key-value pairs.
How It Works
When you run Analyze, LogScrub automatically:
- Detects JSON content in your text (pure JSON, NDJSON, or embedded JSON in log lines)
- Parses the JSON and walks through all key-value pairs
- Flags values associated with suspicious keys like "password", "token", "api_key", etc.
- Shows findings in the Context-Aware tab within Smart Suggestions
Supported JSON Formats
| Format | Example |
|---|---|
| Pure JSON | {"database": {"password": "secret123"}} |
| JSON Lines (NDJSON) | Multiple JSON objects, one per line (common in logs) |
| Embedded JSON | 2024-01-15 INFO Request: {"user": "john", "token": "abc123"} |
Suspicious Keys
LogScrub looks for keys that commonly contain sensitive data:
High Confidence
Exact matches: password, passwd, pwd, secret, token, api_key, apikey, access_key, private_key, client_secret, auth_token, access_token, refresh_token, bearer, jwt, ssh_key, passphrase, credential, credentials
Medium Confidence
Pattern matches: Keys ending in _key, _token, _secret, _password; keys containing auth, cred, or ending in pass
Using Context-Aware Findings
After analysis, check the Context-Aware tab in Smart Suggestions. Each finding shows:
- Key name — The suspicious key that triggered detection
- Confidence level — High (exact match) or Medium (pattern match)
- JSON path — Full path like
config.database.password - Value preview — A sample of the detected value
Click "Add to scrub" to create a plain-text pattern that will redact that specific value. This is useful when you find a secret that wasn't caught by the regex-based rules.
Complementing Regex Detection
Context-aware detection complements (not replaces) the regex-based rules:
- Regex rules — Detect secrets by their format (JWT structure, API key prefixes, etc.)
- Context-aware — Detect secrets by their semantic context (what key they're assigned to)
Using both approaches together provides more comprehensive coverage for secrets that might not have a recognizable format but are stored under suspicious key names.
If you frequently work with JSON logs that contain secrets in predictable key names, consider adding custom plain-text patterns for those specific values after discovering them via context-aware detection.
Email Header Analysis
When LogScrub detects email headers (multiple Received: headers), it offers to visualize the email's routing path through mail servers.
Email Routing Visualization
Click "View Email Routing" in the blue banner to open a visual diagram showing:
- Server chain — Each mail server the message passed through, from origin to destination
- Timestamps — When each server received the message
- Transit time — Duration between each hop (shown on the connector lines)
- TLS encryption — A padlock icon indicates encrypted transmission, with TLS version and cipher details on hover
- Protocol — The mail protocol used (SMTP, ESMTP, LMTPS, etc.)
What Gets Parsed
| Header | Information Extracted |
|---|---|
Received: | Server hostnames, IP addresses, timestamps, TLS info, protocol |
Date: | Original send time (used as the origin timestamp) |
Understanding the Diagram
The routing diagram reads from top to bottom:
- Origin — The sending server (extracted from the first
Received:header's "from" field) - Intermediate servers — Mail relays, spam filters, or corporate gateways
- Destination — The final receiving server
A padlock icon on the connector line indicates the transmission was encrypted. Hover over it to see the TLS version (e.g., TLSv1.3) and cipher suite used.
Use Cases
- Debugging email delivery — Identify where delays occur in the delivery chain
- Security analysis — Verify TLS encryption was used throughout the route
- Spam investigation — Trace the origin of suspicious emails
- Compliance — Document that email was transmitted securely
Spam Report Parsing
LogScrub detects and parses spam filter reports from SpamAssassin and rspamd, displaying them in an easy-to-read sortable table.
Supported Formats
| Filter | Header | Format |
|---|---|---|
| SpamAssassin | X-Spam-Report |
Table with pts, rule name, and description columns |
| rspamd | X-Spam-Report |
Symbol: RULE_NAME(score) format with Action field |
Report View Features
- Sortable table — Click column headers to sort by rule name or score
- Color-coded rules — Green for ham (negative scores), red for spam (positive scores)
- Total score — Combined score with visual indicator
- Action taken — Shows what the filter decided (no action, add header, reject, etc.)
- Rule breakdown — Count of ham, neutral, and spam rules
Multi-line Headers
Spam reports are often split across multiple lines in email headers. LogScrub automatically handles header continuation lines (lines starting with whitespace) and reassembles the complete report.
Multiple Reports
Some mail systems run multiple spam filters. When both SpamAssassin and rspamd reports are present (e.g., X-Spam-Report and X-Spam-Report-Secondary), LogScrub detects both and displays them with tabs to switch between the two reports for comparison.
When an amber banner appears saying "spam reports detected", click "View Reports" to see the parsed rules. If multiple reports are present, use the tabs at the top to switch between them.
GPX Route Transposition
GPX files contain GPS track data from fitness devices, cycling computers, and navigation apps. When sharing routes for debugging or analysis, you may want to hide your actual location while preserving all other statistics.
How It Works
When LogScrub detects a GPX file (by extension or content), a green banner offers to transpose the route to a different continent. The transposition:
- Shifts all coordinates — Moves the entire route to a new location
- Preserves route shape — All turns, distances, and geometry remain identical
- Keeps timestamps — Duration and timing data unchanged
- Retains elevation — All elevation data preserved
- Includes waypoints — Waypoints and route points also transposed
Destination Regions
Choose from six destination regions:
| Region | Target Area |
|---|---|
| Europe | Paris, France |
| North America | New York, USA |
| South America | São Paulo, Brazil |
| Asia | Tokyo, Japan |
| Oceania | Sydney, Australia |
| Africa | Cape Town, South Africa |
Route Statistics
Before transposition, the modal displays:
- Track name and detected region
- Number of GPS points
- Total duration
- Elevation range
- Center coordinates
Perfect for sharing cycling or running routes in bug reports without revealing where you live or work. The route shape and performance data remain valid for debugging.
Shifting Timestamps
GPX files also contain timestamps for each point. Use the Time Shift feature (in the toolbar) to shift all timestamps by a fixed offset or to a new start date. This adds another layer of privacy by obscuring when the activity occurred.
More GPX/FIT Tools
For more advanced GPX and FIT file manipulation (merging, splitting, editing, converting), visit skeffling.net/gpxfit.
Audit Reports
Generate detailed reports of all detected PII for compliance documentation.
Report Contents
- Timestamp and source file name
- Summary of detection counts by type
- List of unique detected values (up to 100 per type)
Export Formats
- Text (.txt) — Plain text, easy to read
- JSON (.json) — Machine-readable, for integration
- HTML (.html) — Formatted report for sharing
Access audit reports by clicking the detection count badge, then "Download Audit Report".
RTF Export with Highlighting
For a visual export of your scrubbed output, use the rtf download button (shown in green). This generates a Rich Text Format file with all replacements highlighted in green, matching the diff view appearance.
- Visual review — Easily spot all changes at a glance
- Shareable — RTF opens in Word, LibreOffice, TextEdit, and most word processors
- Printable — Great for compliance documentation or review meetings
The RTF export uses a light green background with dark green text for replacements, making it easy to see what was scrubbed while keeping the document readable.
Reverse Lookup
When sharing scrubbed logs with others for analysis, you may receive feedback like "the IP on line 100 is causing the issue." LogScrub provides several ways to map scrubbed values back to their originals.
Hover Tooltips
Hover your mouse over any replacement in the scrubbed output (e.g., [IP-1]). A tooltip will show the original value, type, and line numbers.
Mapping Table
For a complete overview, open the Stats panel and click the Mapping tab. This shows a searchable table with every unique replacement.
Export Mapping Dictionary
- Open Stats → Mapping tab
- Click Export
- Choose JSON (for scripts) or CSV (for spreadsheets)
JSON Format Example
{
"[EMAIL-1]": {
"original": "john.doe@example.com",
"type": "email",
"count": 3,
"lines": [12, 45, 89]
}
}
Enable Consistency Mode before scrubbing to ensure the same original value always gets the same replacement label.
AI Explain
When sharing scrubbed logs with AI assistants like ChatGPT, Claude, or Copilot, they need context to understand the redaction format. The AI Explain feature generates a ready-to-use explanation.
How to Use
- Scrub your log as usual
- Click the "AI Explain" button in the scrubbed output toolbar
- Copy the generated explanation
- Paste it into your AI chat before pasting the scrubbed log
What's Included
The generated explanation includes:
- Replacement strategies — Describes each strategy used (Label, Fake, Fake (Country), Redact, Template)
- Consistency mode status — Tells the AI whether same tokens mean same values
- Detected types table — Lists each PII type found with its replacement strategy and example format
- Interpretation guidelines — Instructions for how the AI should reference replacements
Example Output
## Log Redaction Context
This log has been sanitized using LogScrub...
### Detected & Replaced Data Types
| Type | Strategy | Example Replacement | Count |
|------|----------|---------------------|-------|
| Email | Label | `[EMAIL-1]` | 5 |
| IPv4 | Fake | `142.58.201.33` | 12 |
| Hostname | Template | `<HOSTNAME-1>` | 3 |
The explanation reflects each rule's individual replacement strategy. If you use different strategies for different types (e.g., Label for emails, Fake for IPs), the AI will know exactly what each replacement format means.
Documents & Spreadsheets
LogScrub can scrub PII from document and spreadsheet files, not just plain text. All processing happens client-side in your browser using WebAssembly.
Supported Formats
| Format | Extension | Scrubbing | Preview |
|---|---|---|---|
| True redaction (text removed) | Full page rendering | ||
| Word Document | .docx | Full text replacement | Formatted preview |
| Excel Spreadsheet | .xlsx | Full text replacement | Table view |
| OpenDocument Text | .odt | Full text replacement | Basic text preview |
| OpenDocument Spreadsheet | .ods | Full text replacement | Table view |
How Document Processing Works
PDF Files
PDFs are rendered using MuPDF (WebAssembly). Due to the complexity of PDF format, scrubbing uses redaction only — detected PII is covered with black boxes rather than replaced with labels or fake data. This ensures the document structure remains intact.
The underlying text is permanently removed from the PDF, not just visually covered. The redacted text cannot be recovered by selecting, copying, or using PDF extraction tools.
The PDF preview shows page-by-page rendering with match counts per page, helping you verify all sensitive data was found.
Word Documents (.docx)
DOCX files are ZIP archives containing XML. LogScrub extracts the document content, applies your scrubbing rules, and repackages the file. The preview uses the docx-preview library to show formatted content including bold, italic, tables, and images.
Excel Spreadsheets (.xlsx)
XLSX files are processed using excelize-wasm. Cell contents are scrubbed while preserving the spreadsheet structure, formulas references, and formatting. The preview shows a table view of each sheet.
Legacy .xls format (Excel 97-2003) is not supported. Please save as .xlsx format first.
OpenDocument Files (.odt, .ods)
LibreOffice/OpenOffice formats are ZIP archives with XML content, similar to Microsoft Office formats. LogScrub extracts and scrubs the content XML while preserving document structure.
Document Preview Features
- Split view — See original and scrubbed documents side by side
- ScrollSync — Original and scrubbed previews scroll together
- Resizable panels — Drag the resize handle to adjust preview height
- Match highlighting — PDF preview shows match count per page
Document Metadata
Documents often contain metadata that may include sensitive information:
- Author name and company
- Creation and modification dates
- Application used to create the document
- Revision history and editing time
When you upload a document, LogScrub automatically checks for metadata. If found, you'll see a dialog showing all detected metadata fields and can choose to:
- Remove Metadata — Strip all metadata from the downloaded file
- Keep Metadata — Preserve original metadata in the downloaded file
The metadata choice you make at upload time applies when you download. If you upload a new file, you'll be asked again.
Downloading Scrubbed Documents
After scrubbing, click Download to save the sanitized document. The output file preserves the original format — a scrubbed .docx remains a .docx that can be opened in Word.
Keyboard Shortcuts
| Shortcut | Action |
|---|---|
⌘/Ctrl + Enter | Scrub the input text |
⌘/Ctrl + S | Download scrubbed output |
⌘/Ctrl + G | Go to line number |
Escape | Cancel processing / Close dialogs |
Working with Network Captures
LogScrub can scrub network packet captures (PCAP files), but they must first be converted to text format.
Converting PCAP to Text
Using tcpdump
tcpdump -r capture.pcap -tttt > capture.txt
Using tshark (Wireshark CLI)
tshark -r capture.pcap > capture.txt
Using Wireshark
- Open the PCAP file in Wireshark
- Go to File → Export Packet Dissections → As Plain Text
- Save the text file, then upload to LogScrub
What Gets Detected
| Data Type | Detection Rule | Example |
|---|---|---|
| IP addresses with ports | IPv4, IPv6 | 192.168.1.1:8080 |
| MAC addresses | MAC Address | 00:1A:2B:3C:4D:5E |
| Hostnames in DNS/HTTP | Hostname, URL | api.example.com |
| HTTP headers | Various token rules | Auth tokens, cookies |
File Size & Performance
LogScrub runs entirely in your browser, so performance depends on your device's capabilities.
| File Size | Performance |
|---|---|
| < 10 MB | Fast, smooth processing |
| 10–50 MB | Works well, may take a few seconds |
| 50–100 MB | Slower processing, may take 10–30 seconds |
| > 100 MB | May cause browser memory issues |
Tips for Large Files
- Close other browser tabs to free up memory
- Use a desktop browser rather than mobile
- Consider splitting extremely large log files
- The virtual scrolling feature helps keep the UI responsive
Privacy & Security
LogScrub runs 100% in your browser. Your data is never uploaded to any server.
How We Protect Your Data
- Client-side processing — All scrubbing happens in your browser using WebAssembly
- No server communication — Your text never leaves your device
- No analytics on content — We don't track what you scrub
- Local storage only — Presets and settings are stored in your browser
LogScrub does not guarantee detection of all PII. Pattern-based detection has inherent limitations. Always review your scrubbed output before sharing.
Best Practices
- Always review the scrubbed output before sharing
- Use "Analyze" first to understand what will be detected
- Enable additional rules for sensitive data types not detected by default
- Add custom patterns for organization-specific identifiers
- Use the "Changed only" filter to quickly review modified lines