LogScrub Help - Complete Guide to PII Redaction

Why Use LogScrub?

Every day, developers and IT professionals need to share logs for debugging, support tickets, bug reports, and collaboration. But logs often contain sensitive information that shouldn't be shared.

Common Scenarios

Sharing Logs with AI Assistants

AI tools like ChatGPT, Claude, and Copilot are incredibly useful for debugging and analyzing logs. But pasting raw logs means sending customer emails, IP addresses, API keys, and other sensitive data to third-party services.

Solution: Scrub your logs first. The AI can still understand error patterns, stack traces, and timing issues without seeing real customer data.

Filing Bug Reports & Support Tickets

When reporting issues to software vendors or open-source projects, you often need to include logs. These logs may contain your company's internal hostnames, user data, or credentials that were accidentally logged.

Compliance & Data Protection

Regulations like GDPR, HIPAA, and CCPA restrict how personal data can be shared and processed. Sanitizing logs before sharing helps maintain compliance and reduces your data exposure footprint.

Why Client-Side Processing Matters

Many log sanitization tools require uploading your logs to a server. This defeats the purpose — you're sharing sensitive data with yet another third party.

LogScrub processes everything in your browser using WebAssembly. Your logs never leave your device. You can even use it offline or on an air-gapped machine.

Preserving Log Usefulness

The goal isn't just to remove data — it's to keep logs useful for their intended purpose:

Consistency Mode ensures the same email always becomes [EMAIL-1], so you can still trace a user's journey
Time Shift lets you anonymize timestamps while preserving the relative timing between events
Log Crop trims log files to a specific time window, so you only share the relevant portion
Selective Rules let you keep non-sensitive data like UUIDs or timestamps when needed for debugging

Quick Start

LogScrub helps you remove personally identifiable information (PII) and sensitive data from log files before sharing them.

Basic Workflow

Paste or upload your log content into the "Original" pane
Review detection rules by clicking the Rulesets button (toggle rules on/off, review matches)
Click "Scrub" or press ⌘/Ctrl + Enter
Copy or download the scrubbed output

Before (Original)

2024-01-15 10:23:45 INFO User login email: john.doe@example.com ip: 192.168.1.105 session: sid=abc123def456

After (Scrubbed)

2024-01-15 10:23:45 INFO User login email: [EMAIL-1] ip: [IPV4-1] session: [SESSION_ID-1]

How It Works

LogScrub uses pattern matching to identify sensitive data in your text. Each detection rule has a regular expression (regex) that matches specific data formats.

Processing Pipeline

Pattern Matching — Each enabled rule's regex is applied to find matches
Replacement — Matched text is replaced according to your chosen strategy
Consistency — When enabled, identical values get identical replacements

Tip

Use the "Analyze" feature to preview what will be detected before sanitizing. This also suggests disabled rules that would match content in your text.

Important

Only enable rules for data types you expect to find in your content. Enabling all rules will likely cause false positives — for example, the "US Zip Code" pattern matches any 5-digit number. Be selective to get accurate results.

Replacement Strategies

Choose how detected PII should be replaced:

Label

Replaces with a descriptive label and counter.

john@example.com → [EMAIL-1]

Fake

Replaces with realistic fake data. Preserves structural prefixes (e.g. ICCID 89, BTC bc1).

john@example.com → maria.wilson@example.org

Fake (Country)

Fake data that preserves country-specific prefixes like phone country codes and TLDs.

+447508804412 → +447291635804

Redact

Replaces with blocks matching the original length.

john@example.com → ████████████████

Template

Custom replacement format using variables.

{TYPE}, {n}, {len}

Fake Data Generation

The Fake strategy uses a Rust-based data generation library to create realistic-looking replacements. This makes your scrubbed output look natural while still protecting sensitive information. Fake data is generated deterministically — the same input always produces the same fake output, ensuring consistency.

PII Type	Fake Data Generated	Example
Email	Realistic email addresses	`maria.wilson@example.org`
Person Names (ML)	Full names from name database	`James Rodriguez`
Locations (ML)	City names	`Portland`
Organizations (ML)	Company names	`Acme Industries`
IPv4	Valid IPv4 addresses	`142.58.201.33`
IPv6	Valid IPv6 addresses	`2001:db8:85a3::8a2e:370:7334`
MAC Address	Valid MAC addresses	`4A:3B:2C:1D:5E:6F`
Phone Numbers	Formatted phone numbers	`(555) 842-9173`
Hostname	Realistic hostnames	`server42.internal.net`
URL	Valid URLs with paths	`https://demo.io/api/4821`
UUID	Valid UUID v4 format	`a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d`
Credit Card	Luhn-valid card numbers	`4111-1111-1111-1234`
SSN	Valid format (not real)	`284-17-5932`
IBAN	Plausible IBAN format	`DE89370400440532013000`
UK NHS Number	Check-digit valid format	`485 293 7164`
UK NINO	Valid NI number format	`AB123456C`
UK Postcode	Valid UK postcode format	`SW1A 2AA`
US Zip Code	5-digit zip codes	`90210`
GPS Coordinates	Valid lat/long pairs	`51.5074, -0.1278`
File Paths	Realistic file paths	`/home/user/data.json`
Dates	Valid dates (randomized)	`2019-07-15`
Hashes (MD5/SHA)	Random hex strings	`5d41402abc4b2a76...`
API Keys	Format-matching tokens	`sk_test_Abc123xyz...`
Crypto Addresses	Valid format addresses	`1A1zP1eP5QGefi2DMPTfTL...`

Deterministic Generation

Fake data is generated using a seeded random number generator based on the original value. This means the same input (e.g., john@example.com) will always produce the same fake output across multiple runs. Combined with Consistency Mode, this ensures your scrubbed data maintains referential integrity.

Fake (Country) — Preserve Country Prefixes

The Fake (Country) strategy extends the Fake strategy by preserving country-specific prefixes. This is useful when you need to keep geographic context (e.g. which country a phone number belongs to) while still anonymizing the rest of the data.

PII Type	What's Preserved	Example
International Phone (+)	`+` and country code digits	`+44`7508804412 → `+44`7291635804
International Phone (no +)	Country code digits	`44`7508804412 → `44`7291635804
US Phone	`+1` or `1` prefix	`+1`-555-234-5678 → `+1`-832-671-9042
UK Phone	Leading `0`	`0`7508804412 → `0`8291635804
ICCID	`89` + country code (5 digits total)	`89442`00011231044047 → `89442`73829156308291
IBAN	First 2 letters (country code)	`GB`82WEST12345698765432 → `GB`47XKRJ83920147562918
Email	Domain TLD	user@company`.co.uk` → jsmith@inbox`.co.uk`
Hostname	TLD	server.example`.co.uk` → web42`.co.uk`
URL	Domain TLD	https://app.example`.de`/api → https://demo`.de`/users/4821
MAC Address	OUI (first 3 octets)	`AA:BB:CC`:11:22:33 → `AA:BB:CC`:7F:3A:E2
Credit Card	BIN (first 6 digits)	`411111`1111111111 → `411111`8294736150

Fake vs Fake (Country)

The base Fake strategy preserves invariant structural prefixes (e.g. 89 on all ICCIDs, bc1 on Bech32 Bitcoin addresses) since these are the same regardless of country. Fake (Country) goes further by also preserving country-specific prefixes like phone country codes, IBAN country letters, and domain TLDs. For all PII types not listed above, Fake (Country) behaves identically to Fake.

Consistency Mode

When enabled, the same input value always produces the same replacement. This preserves relationships in your data:

With Consistency Mode

User [EMAIL-1] logged in User [EMAIL-1] viewed page User [EMAIL-2] logged in

Without Consistency Mode

User [EMAIL-1] logged in User [EMAIL-2] viewed page User [EMAIL-3] logged in

Detection Rules Reference

LogScrub includes 95+ built-in detection rules organized by category.

Contact Information

Rule	Example Match	Default
Email Show pattern `\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b`	user@example.com	Enabled
Email Message-ID Show pattern <[A-Za-z0-9!#$%&'*+/=?^_`.{\|}~-]+@[A-Za-z0-9.-]+>	<abc123@mail.example.com>	Disabled
Phone (US) Show pattern `\b(?:\+?1[-.\s]?)?$?[0-9]{3}$?[-.\s]?[0-9]{3}[-.\s]?[0-9]{4}\b`	(555) 123-4567	Enabled
Phone (UK) Show pattern `\b(?:0[1-9][0-9]{8,9}\|0[1-9][0-9]{2,4}[\s-][0-9]{3,4}[\s-]?[0-9]{3,4})\b`	020 7946 0958	Enabled
Phone (Intl) Show pattern `\+[1-9][0-9]{1,3}[\s-]?[0-9]{6,14}\b`	+44 7911 123456	Enabled
Phone (Intl, No +) Show pattern `\b[1-9][0-9]{9,14}\b` Validated against ~100 E.164 country codes with per-country digit length checks	447508804412	Disabled

Network

Rule	Example Match	Default
IPv4 Show pattern `\b(?:(?:25[0-5]\|2[0-4][0-9]\|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]\|2[0-4][0-9]\|[01]?[0-9][0-9]?)\b`	192.168.1.1	Enabled
IPv6 Show pattern `(?i)\[(?:[0-9a-f]{1,4}:){7}[0-9a-f]{1,4}\]... (full IPv6 pattern with bracketed, bare, link-local, and IPv4-mapped variants)`	2001:0db8:85a3::8a2e:0370:7334	Enabled
MAC Address Show pattern `(?i)\b(?:[0-9A-F]{2}[:-]){5}[0-9A-F]{2}\b`	00:1A:2B:3C:4D:5E	Enabled
Hostname Show pattern `\b(?![a-zA-Z0-9-]+\.(?:txt\|log\|json\|xml\|...))\b)(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,12}\b` Excludes common file extensions (.txt, .log, .json, .xml, .csv, etc.) to reduce false positives	api.example.com	Enabled
URL Show pattern https?://[^\s<>\[\]{}\|\\^`\x00-\x1f\x7f]+	https://example.com/path	Enabled

Private IP Preservation (RFC1918)

When enabling IPv4 or IPv6 scrubbing, you'll be asked whether to preserve private/internal IP addresses. Private IPs are not routable on the internet, so they're generally safe to share.

Preserved IPv4 ranges:

10.0.0.0/8 - Class A private network (10.0.0.0 – 10.255.255.255)
172.16.0.0/12 - Class B private network (172.16.0.0 – 172.31.255.255)
192.168.0.0/16 - Class C private network (192.168.0.0 – 192.168.255.255)
127.0.0.0/8 - Loopback (127.0.0.1, localhost)
169.254.0.0/16 - Link-local (APIPA)

Preserved IPv6 ranges:

fe80::/10 - Link-local addresses
fc00::/7 - Unique local addresses (fd00::/8 and fc00::/8)
::1 - Loopback
::ffff:x.x.x.x - IPv4-mapped addresses (checks IPv4 portion)

You can toggle this setting via the "Preserve Private IPs" checkbox in the Settings panel.

Identity (US)

Rule	Example Match	Default
SSN Show pattern `\b[0-9]{3}-[0-9]{2}-[0-9]{4}\b` Validates area/group/serial ranges (no 000, 666, or 900-999 area codes)	123-45-6789	Enabled
US ITIN Show pattern `\b9[0-9]{2}[- ]?(5[0-9]\|6[0-5]\|7[0-9]\|8[0-8]\|9[0-24-9])[- ]?[0-9]{4}\b`	912-54-1234	Disabled
Passport Show pattern `(?i)\b(?:passport[:\s#]*)?[A-Z]{1,2}[0-9]{6,9}\b`	AB1234567	Disabled
Driver's License Show pattern `(?i)\b(?:d\.?l\.?\|driver'?s?\s(?:license\|lic))[:\s#][A-Z0-9]{5,15}\b`	DL: D1234567	Disabled

Identity (UK & International)

Rule	Example Match	Default
UK NHS Number Show pattern `\b([0-9]{3})[- ]?([0-9]{3})[- ]?([0-9]{4})\b` Mod-11 checksum validation	450 557 7104	Disabled
UK National Insurance Show pattern `(?i)\b[A-Z]{2}\s?[0-9]{2}\s?[0-9]{2}\s?[0-9]{2}\s?[A-D]\b` Validates prefix letters (excludes D, F, I, Q, U, V in first; D, F, I, O, Q, U, V in second)	AB 12 34 56 C	Disabled
AU Tax File Number Show pattern `\b[0-9]{3}\s?[0-9]{3}\s?[0-9]{3}\b` Weighted checksum validation	123 456 789	Disabled
India PAN Show pattern `\b[A-Z]{3}[ABCFGHLJPT][A-Z][0-9]{4}[A-Z]\b`	ABCPD1234E	Disabled
Singapore NRIC Show pattern `(?i)\b[STFGM][0-9]{7}[A-Z]\b` Mod-11 with letter check table	S1234567A	Disabled
Spanish NIF/DNI Show pattern `\b[0-9]{8}[A-Z]\b` Mod-23 checksum validation	12345678Z	Disabled
Spanish NIE Show pattern `(?i)\b[XYZ][0-9]{7}[A-Z]\b` Mod-23 checksum (X/Y/Z prefix mapped to 0/1/2)	X1234567L	Disabled
Canadian SIN Show pattern `\b[0-9]{3}[- ]?[0-9]{3}[- ]?[0-9]{3}\b` Luhn checksum validation	123-456-789	Disabled
VIN Show pattern `\b[A-HJ-NPR-Z0-9]{17}\b` Check-digit validation at position 9, transliteration + weighted sum	1HGBH41JXMN109186	Disabled
ICCID (SIM Card) Show pattern `\b89[0-9]{16,20}\b` Must start with 89 (telecom MII) + Luhn checksum validation	8944200011231044047	Disabled

Note: NHS Number, AU TFN, Singapore NRIC, Canadian SIN, ICCID, Spanish NIF/NIE, and VIN include checksum or check-digit validation to reduce false positives.

Financial

Rule	Example Match	Default
Credit Card Show pattern `\b(?:4[0-9]{12}(?:[0-9]{3})?\|5[1-5][0-9]{14}\|3[47][0-9]{13}\|6(?:011\|5[0-9]{2})[0-9]{12})\b` Luhn checksum validation	4111111111111111	Enabled
IBAN Show pattern `\b[A-Z]{2}[0-9]{2}[A-Z0-9]{4}[0-9]{7}(?:[A-Z0-9]?){0,16}\b` Mod-97 checksum validation (ISO 7064)	GB82WEST12345698765432	Enabled
Bitcoin Address Show pattern `\b(?:bc1\|[13])[a-zA-HJ-NP-Z0-9]{25,62}\b` Base58 format check	1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2	Enabled
Ethereum Address Show pattern `\b0x[a-fA-F0-9]{40}\b` Hex format check (40 hex characters after 0x prefix)	0x742d35Cc6634C0532925a3b844Bc...	Enabled
Money/Currency Show pattern `(?:[$£€¥₹₩₽¢฿₪₴₦₡₱₲₵₸₺₼₾][0-9]{1,3}(?:[,.\s][0-9]{2,3})(?:[.,][0-9]{1,2})?\|[0-9]{1,3}(?:[,.\s][0-9]{2,3})(?:[.,][0-9]{1,2})?\s*(?:USD\|EUR\|GBP\|JPY\|CNY\|INR\|...))`	$10.99, £1,000.00, 100 EUR	Disabled

Tokens & API Keys

Rule	Example Match	Default
JWT Show pattern `eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+`	eyJhbGciOiJIUzI1NiIsInR5cCI6...	Enabled
Bearer Token Show pattern `(?i)bearer\s+[a-z0-9_-]+\.[a-z0-9_-]+\.?[a-z0-9_-]*`	Bearer abc123.def456	Enabled
AWS Access Key Show pattern `\b(?:AKIA\|ABIA\|ACCA\|ASIA)[0-9A-Z]{16}\b`	AKIAIOSFODNN7EXAMPLE	Enabled
AWS Secret Key Show pattern `(?i)(?:aws.?secret\|secret.?access)[^a-z0-9]*['"]?([a-z0-9/+=]{40})['"]?`	aws_secret_access_key = wJalrXUt...	Enabled
Stripe Key Show pattern `\b(?:sk\|pk)_(?:test\|live)_[0-9a-zA-Z]{24,}\b`	sk_test_4eC39HqLyjWDarjtT1zdp7dc	Enabled
GCP API Key Show pattern `\bAIza[0-9A-Za-z_-]{35}\b`	AIzaSyDaGmWKa4JsXZ-HjGw7ISLn...	Enabled
GitHub Token Show pattern `\b(?:ghp\|gho\|ghu\|ghs\|ghr)_[A-Za-z0-9_]{36,}\b`	ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxx	Enabled
OpenAI API Key Show pattern `\bsk-(?:proj-)?[a-zA-Z0-9]{32,64}\b`	sk-proj-abc123def456ghi789...	Enabled
Anthropic API Key Show pattern `\bsk-ant-[a-zA-Z0-9_-]{32,64}\b`	sk-ant-api03-abc123def456...	Enabled
X AI API Key Show pattern `\bxai-[a-zA-Z0-9]{32,64}\b`	xai-abc123def456ghi789...	Enabled
Cerebras API Key Show pattern `\bcsk-[a-zA-Z0-9]{40,50}\b`	csk-abc123def456ghi789...	Enabled
Slack Token Show pattern `\bxox[baprs]-[0-9]{10,13}-[0-9]{10,13}[a-zA-Z0-9-]*\b`	xoxb-123456789012-123456789012-abc...	Enabled
NPM Token Show pattern `\bnpm_[A-Za-z0-9]{36}\b`	npm_abc123def456ghi789jkl012...	Enabled
SendGrid Key Show pattern `\bSG\.[A-Za-z0-9_-]{22}\.[A-Za-z0-9_-]{43}\b`	SG.abc123def456.ghi789jkl012mno345...	Enabled
Twilio Key Show pattern `\b(?:AC\|SK)[a-f0-9]{32}\b`	SK1234567890abcdef1234567890abcdef	Enabled
Database URL Show pattern `(?i)(?:mongodb\|postgres\|postgresql\|mysql\|redis\|amqp\|mssql)://[^\s]+`	postgres://user:pass@host:5432/db	Enabled

Secrets

Rule	Example Match	Default
Generic Secret Show pattern `(?i)(?:password\|passwd\|pwd\|secret\|token\|api[_-]?key\|apikey\|auth[_-]?token\|access[_-]?token)\s[:=]\s['"]?([^\s'"]{8,})['"]?`	password=MyS3cr3tP@ss!	Enabled
High Entropy Secret Show pattern `['"][A-Za-z0-9!@#$%^&*_+\-]{8,64}['"]` Shannon entropy > 3.5 bits/char, requires mixed character types	xK9#mP2$vL7@nQ4!	Disabled
Private Key Show pattern `-----BEGIN (?:RSA \|DSA \|EC \|OPENSSH \|PGP )?PRIVATE KEY-----`	-----BEGIN RSA PRIVATE KEY-----	Enabled
Basic Auth Show pattern `(?i)basic\s+[a-z0-9+/]+=*`	Basic dXNlcjpwYXNzd29yZA==	Enabled
URL Credentials Show pattern `(?i)(?:https?\|ftp)://[^/:@\s"']+:[^@\s"']+@[^\s/"']+`	https://user:pass@example.com	Enabled
Session ID Show pattern `(?i)(?:session[_-]?id\|sid\|jsessionid\|phpsessid\|aspsessionid)[=:\s]*[a-z0-9_-]{16,}`	JSESSIONID=ABC123DEF456789XYZ	Enabled

Location

Rule	Example Match	Default
GPS Coordinates Show pattern `-?(?:[1-8]?[0-9](?:\.[0-9]{4,})?\|90(?:\.0+)?)\s,\s-?(?:1[0-7][0-9]\|[1-9]?[0-9])(?:\.[0-9]{4,})?`	51.5074, -0.1278	Disabled
UK Postcode Show pattern `(?i)\b[A-Z]{1,2}[0-9][0-9A-Z]?\s?[0-9][A-Z]{2}\b`	SW1A 1AA	Disabled
US Zip Code Show pattern `\b[0-9]{5}(?:-[0-9]{4})?\b`	90210	Disabled

Date & Time

Rule	Example Match	Default
Date (ISO) Show pattern `\b(?:19\|20)[0-9]{2}-(?:0[1-9]\|1[0-2])-(?:0[1-9]\|[12][0-9]\|3[01])\b`	2024-01-15	Disabled
Date (MM/DD/YY) Show pattern `\b(?:0?[1-9]\|1[0-2])[/-](?:0?[1-9]\|[12][0-9]\|3[01])[/-](?:19\|20)?[0-9]{2}\b`	01/15/24, 12/31/2024	Disabled
Date (DD/MM/YY) Show pattern `\b(?:0?[1-9]\|[12][0-9]\|3[01])[/-](?:0?[1-9]\|1[0-2])[/-](?:19\|20)?[0-9]{2}\b`	15/01/24, 31/12/2024	Disabled
Time Show pattern `\b(?:[01]?[0-9]\|2[0-3]):[0-5][0-9](?::[0-5][0-9])?(?:\s*[AaPp][Mm])?\b`	14:30:00, 2:30 PM	Disabled
DateTime (ISO) Show pattern `\b(?:19\|20)[0-9]{2}-(?:0[1-9]\|1[0-2])-(?:0[1-9]\|[12][0-9]\|3[01])[T\s](?:[01][0-9]\|2[0-3]):[0-5][0-9]:[0-5][0-9](?:\.[0-9]+)?(?:Z\|[+-][0-9]{2}:?[0-9]{2})?\b`	2024-01-15T10:30:00Z	Disabled
DateTime (CLF) Show pattern `\[?\d{1,2}/(?:Jan\|Feb\|Mar\|Apr\|May\|Jun\|Jul\|Aug\|Sep\|Oct\|Nov\|Dec)/\d{4}:\d{2}:\d{2}:\d{2}\s*[+-]?\d{4}\]?`	[15/Jan/2024:10:30:00 +0000]	Disabled
Unix Timestamp Show pattern `\b1[0-9]{9}(?:[0-9]{3})?\b`	1705312200	Disabled

Date/time rules are disabled by default as they often match non-sensitive data.

SQL

Rule	Example Match	Default
SQL Tables Show pattern (?i)(?:FROM\|JOIN\|INTO\|UPDATE\|TABLE)\s+(`[^`]+`\|\[[^\]]+\]\|"[^"]+"\|[a-zA-Z_][a-zA-Z0-9_]*)	FROM users, INSERT INTO orders	Disabled
SQL Strings Show pattern `'(?:[^'\\]\|\\.)*'`	'John Doe', "example@email.com"	Disabled
SQL Identifiers Show pattern (?i)(?:SELECT\|WHERE\|AND\|OR\|ON\|SET\|ORDER\s+BY\|GROUP\s+BY\|HAVING\|AS\|,)\s(`[^`]+`\|\[[^\]]+\])\|(`[^`]+`\|\[[^\]]+\])\.\s(`[^`]+`\|\[[^\]]+\])	column_name, table.field	Disabled

SQL rules help scrub sensitive data from database queries and logs. Disabled by default to avoid false positives.

SQL Dump File Support

LogScrub processes SQL dump files from PostgreSQL (pg_dump), MySQL (mysqldump), SQLite (.dump), and other databases. Upload .sql files directly or paste SQL content.

When processing SQL dumps, LogScrub detects PII in:

INSERT statements — String literals in VALUES clauses
UPDATE statements — SET clause values
Comments — Both -- and /* */ style comments
String literals — Single and double quoted strings throughout

SQL structure is preserved — table names, column names, SQL keywords, and syntax remain untouched while only values are anonymized. This keeps your scrubbed dump valid and executable.

Learn more about SQL dump anonymization →

Exim (Mail Server)

Rule	Example Match	Default
Exim Subject Show pattern `T="(?:[^"\\]\|\\.)*"`	T="Meeting reminder"	Disabled
Exim Sender Show pattern `F=<[^>]+>`	F=<user@example.com>	Disabled
Exim Auth Show pattern `(?i)A=[a-z_]+(?::[^\s]+)?`	A=login:user	Disabled
Exim User Show pattern `U=[^\s]+`	U=mailuser	Disabled
Exim DN Show pattern `DN=[^\s]+`	DN=cn=user,dc=example,dc=com	Disabled

Rules for Exim mail server logs. These follow Exim's well-documented log format with field prefixes (T=, F=, A=, U=, DN=).

Postfix (Mail Server)

Rule	Example Match	Default
Postfix From Show pattern `from=<[^>]*>`	from=<user@example.com>	Disabled
Postfix To Show pattern `to=<[^>]+>`	to=<recipient@example.com>	Disabled
Postfix Relay Show pattern `relay=[^\s,]+(?:\[[^\]]+\])?`	relay=mail.example.com[192.168.1.1]	Disabled
Postfix SASL User Show pattern `sasl_username=[^\s,]+`	sasl_username=admin	Disabled

Rules for Postfix mail server logs. Postfix is one of the most widely used MTAs on Linux systems.

Dovecot (IMAP/POP3)

Rule	Example Match	Default
Dovecot User Show pattern `user=<[^>]+>`	user=<mailuser>	Disabled
Dovecot Remote IP Show pattern `rip=[0-9a-fA-F.:]+`	rip=192.168.1.100	Disabled
Dovecot Local IP Show pattern `lip=[0-9a-fA-F.:]+`	lip=10.0.0.1	Disabled

Rules for Dovecot IMAP/POP3 server logs. Captures login events with username and IP addresses.

Sendmail (Mail Server)

Rule	Example Match	Default
Sendmail From Show pattern `from=<[^>]*>,`	from=<user@domain.com>,	Disabled
Sendmail Relay Show pattern `relay=[^\s,\[\]]+(?:\[[^\]]+\])?`	relay=mail.example.com	Disabled
Sendmail MsgID Show pattern `msgid=<[^>]+>`	msgid=<abc123@host>	Disabled

Rules for Sendmail mail server logs. One of the oldest Unix MTAs with a well-established log format.

SIP/VoIP

Rule	Example Match	Default
SIP Username Show pattern `(?i)username="[^"]+"`	username="john.doe"	Disabled
SIP Realm Show pattern `(?i)realm="[^"]+"`	realm="sip.example.com"	Disabled
SIP Nonce Show pattern `(?i)nonce="[^"]+"`	nonce="abc123def456"	Disabled
SIP Response Show pattern `(?i)response="[a-f0-9]+"`	response="9f8e7d6c5b4a3210"	Disabled
SIP From Name Show pattern `(?i)^From:\s"[^"]"`	From: "John Doe" <sip:john@example.com>	Disabled
SIP To Name Show pattern `(?i)^To:\s"[^"]"`	To: "Jane Smith" <sip:jane@example.com>	Disabled
SIP Contact Show pattern `(?i)^Contact:\s*<?sip:[^>]+>?`	Contact: <sip:john@192.168.1.100:5060>	Disabled
SIP URI Show pattern `sips?:[^\s<>@]+@[^\s<>;]+`	sip:user@domain.com:5060	Disabled
SIP Call-ID Show pattern `(?i)^Call-ID:\s*[^\s]+`	Call-ID: abc123@192.168.1.100	Disabled
SIP Branch Show pattern `(?i)branch=z9hG4bK[a-zA-Z0-9]+`	branch=z9hG4bK-abc123	Disabled
SIP User-Agent Show pattern `(?i)^User-Agent:\s*[^\r\n]+`	User-Agent: Oasis SIP Phone	Disabled
SIP Via Show pattern `(?i)^Via:\s*SIP/2\.0/[^\r\n]+`	Via: SIP/2.0/UDP 192.168.1.100:5060	Disabled

Rules for SIP (Session Initiation Protocol) traces used in VoIP systems. Useful for scrubbing packet captures or debug logs from phone systems.

Hashes

Rule	Example Match	Default
MD5 Hash Show pattern `(?i)\b[a-f0-9]{32}\b`	d41d8cd98f00b204e9800998ecf8427e	Disabled
SHA1 Hash Show pattern `(?i)\b[a-f0-9]{40}\b`	da39a3ee5e6b4b0d3255bfef95601890afd80709	Disabled
SHA256 Hash Show pattern `(?i)\b[a-f0-9]{64}\b`	e3b0c44298fc1c149afbf4c8996fb924...27ae41e4649b934ca495991b7852b855	Disabled

Cryptographic hash detection. Disabled by default as hashes are often legitimate identifiers, but can be enabled if you need to scrub file checksums or content hashes.

Other

Rule	Example Match	Default
UUID Show pattern `(?i)\b[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\b`	550e8400-e29b-41d4-a716-446655440000	Enabled
File Path (Unix) Show pattern `(?:/(?:home\|Users)/[a-zA-Z0-9_-]+(?:/[a-zA-Z0-9._-]+)+\|/tmp(?:/[a-zA-Z0-9._-]+)+)`	/home/john/documents/file.txt	Disabled
File Path (Windows) Show pattern `(?i)[a-z]:\\(?:Users\|Documents and Settings)\\[^\s\\]+(?:\\[^\s\\]+)*`	C:\Users\John\Documents\file.txt	Disabled
Docker Container ID Show pattern `(?i)\b[a-f0-9]{12}\b`	abc123def456789012345678901234567890abcd	Disabled
URL Parameters Show pattern `Captures query string key-value pairs from URLs`	?user=john&token=abc123	Disabled

Custom Rules

Create your own detection rules using regular expressions for patterns specific to your organization.

Adding a Custom Rule

Click the "+ Regex" button in the Rulesets panel (Custom Rules tab)
Enter a descriptive name for the rule
Enter your regex pattern (JavaScript syntax)
Click "Add Rule"

Example: Company Employee IDs

Name: Employee ID

Pattern: EMP-[0-9]{6}

Matches: EMP-123456, EMP-000001

Example: Internal Hostnames

Name: Internal Servers

Pattern: \b[a-z]+-(?:prod|staging|dev)-[0-9]+\.internal\.company\.com\b

Matches: api-prod-01.internal.company.com

Tip

Test your regex pattern using the "View Pattern" button (⚙) which includes a pattern tester.

Plain Text Patterns

For exact text matches that don't require regex (like specific hostnames or identifiers), use Plain Text patterns.

Adding a Plain Text Pattern

Click the "+ Text" button in the Rulesets panel (Custom Rules tab)
Enter a label for the pattern
Enter the exact text to match
Click "Add Pattern"

When to Use Plain Text vs Regex

Use Plain Text	Use Regex
Specific hostnames: `db-master.prod.internal`	Pattern-based hostnames: `db-\d+\.prod\.internal`
Known usernames: `admin_system`	Username patterns: `user_[a-z]+_[0-9]+`
Fixed identifiers: `ACME-CORP`	Variable IDs: `ACME-[A-Z]{3}-[0-9]+`

Presets

Presets let you save and quickly switch between different rule configurations.

Built-in Presets

Minimal — Only critical PII (emails, SSN, credit cards)
Standard — Common PII without dates/paths
Paranoid — Everything enabled for maximum redaction
Dev/Debug — Focus on secrets, tokens, and credentials
GDPR — EU personal data (includes UK postcodes, IBAN)

Custom Presets

Save your current rule configuration as a preset:

Configure your rules as desired
Click "Presets" to expand the presets panel
Enter a name and click "Save"

Import/Export

Share configurations between computers or team members using JSON export/import.

Multi-File Upload

Process multiple log files at once with batch operations. Upload files individually or as a ZIP archive, analyze and scrub them all together, and export the results as a single ZIP download.

Uploading Multiple Files

There are several ways to upload multiple files:

File picker — Click "Upload" and select multiple files (hold Ctrl/Cmd to select multiple)
Drag and drop — Drag multiple files onto the editor area
ZIP archive — Upload a .zip file containing text files (they will be automatically extracted)

Supported file types: .log, .txt, .json, .xml, .csv, .sql, .zip

Files Tab

When you upload multiple files, a "Files" tab appears:

File list — Shows all uploaded files with their names, sizes, and status
Status badges — Pending, Analyzing, Analyzed, Processing, Done, or Error
Detection count — Shows how many PII items were found in each file
File selection — Click a file to view it in the editor
Remove files — Click the X button to remove individual files, or "Clear All" to start over

Batch Operations

Process all files at once using the batch buttons at the top of the Files tab:

Analyze All — Run detection on all files to see what PII will be found (preview mode)
Scrub All — Process all files with your current rule settings
Export ZIP — Download all scrubbed files as a single ZIP archive

A progress bar shows which file is currently being processed during batch operations.

File Navigation

When in multi-file mode, a navigation bar appears above the editor showing:

Current file name
Position indicator (e.g., "2 of 5")
Previous/Next buttons to quickly switch between files

Combined Statistics

The Statistics view includes a toggle to show stats for either:

Current File — Detection counts for the currently selected file
All Files — Combined totals across all uploaded files

Cross-File Consistency

When Consistency Mode is enabled, the same PII value will receive the same replacement across ALL files in the batch. For example, if john@example.com appears in multiple files, it will always be replaced with [EMAIL-1] in every file.

Tip

For large batches, consider using "Analyze All" first to review what will be detected before running "Scrub All".

Limits

Maximum 50 files per batch. Maximum 100MB total combined size.

Organizing Rules

Customize the order of rule categories and individual rules to match your workflow.

Reordering Categories

Drag categories to change their display order:

Grab the drag handle (⋮⋮) to the left of a category name
Drag the category up or down to your preferred position
Release to drop it in place

Reordering Rules Within a Category

Expand the category by clicking its name
Grab the drag handle (⋮) to the left of a rule
Drag the rule up or down within the category
Release to set its new position

Persistence

Your custom ordering is automatically saved to your browser's local storage and will be restored on your next visit. When you save a preset, the ordering is included.

Tip

Put frequently-used categories at the top for quick access. For example, if you primarily work with network logs, drag the "Network" category to the top.

Time Shift

Timestamps in logs can reveal when incidents occurred, work patterns, or timezone information. Time Shift lets you anonymize temporal data while preserving the relative timing between events.

Note

Time Shift only appears when LogScrub detects timestamps in your input.

Supported Timestamp Formats

ISO 8601 — 2024-01-15T10:30:00Z
ISO Date — 2024-01-15
Apache error log — [Sat Aug 12 04:05:51 2006]
Apache access log — [17/May/2015:10:05:03 +0000]
Syslog — Jan 15 10:30:00
US Format — 01/15/2024 10:30:00

Offset Mode

Shift all timestamps by a fixed amount. Useful when you want to obscure the actual date/time while keeping the duration between events accurate.

Original

2024-01-15T10:30:00Z Request started 2024-01-15T10:30:05Z Database query 2024-01-15T10:30:07Z Response sent

Shifted by -48 hours

2024-01-13T10:30:00Z Request started 2024-01-13T10:30:05Z Database query 2024-01-13T10:30:07Z Response sent

Start From Mode

Set the first timestamp to a specific date/time. All subsequent timestamps shift by the same amount, preserving relative timing.

Scope: Line Start vs All Timestamps

Logs often contain two types of timestamps:

Log timestamps — At the start of each line
Content dates — Within the log message itself (DOBs, expiry dates, etc.)

Scope	Best For
Line Start	Shifting log timing while sanitizing content dates with detection rules
All Timestamps	When all dates should shift together (e.g., test data generation)

Tip

Use "Line Start" scope with date/time detection rules enabled. Log timestamps get shifted while content dates like DOBs get scrubbed to [DATE-1].

Log Crop

When working with large log files that span many hours or days, you often only need a specific time window. The Crop tool lets you trim a log file to a precise time range, keeping only the lines you need.

Note

The Crop button appears above the Original pane when LogScrub detects timestamps in your input. It uses the same timestamp formats supported by Time Shift.

How to Use

Paste or upload a log file with timestamps
Click the Crop link above the Original pane
Review the detected time range, duration, and line count
Select the time window you want to keep
Click Crop to trim the log

Custom Range

Set a specific start and end time to keep. Any lines with timestamps outside this range are removed. Lines without timestamps (such as stack traces or continuation lines) are kept if they follow a line within the selected range.

Start + Duration

Select a start time and then choose a duration preset. This is useful when you know the approximate start of an incident and want to capture a fixed window of time.

Available presets: 1 minute, 5 minutes, 15 minutes, 30 minutes, 1 hour, 2 hours, 4 hours, 8 hours, 12 hours, and 24 hours.

Tip

Cropping replaces the original text. If you need to go back, use your browser's undo or re-paste the original log. Crop before scrubbing for faster processing on large files.

Analyze Mode

Preview detections before scrubbing to fine-tune your rules.

Using Analyze

Paste your content
Click "Analyze" above the Original pane
Review highlighted matches (shown in red)
Check the suggestions for disabled rules that would match
Adjust rules as needed, then Scrub

Smart Suggestions

After analysis, LogScrub suggests disabled rules that found matches in your text. This helps you discover rules you might want to enable.

Log Format Detection

When you run Analyze, LogScrub automatically detects common log formats and offers to load a suitable preset. This saves time by enabling the most relevant rules for your log type.

Log Format	Detection Method	Suggested Preset
Apache/Nginx	Common Log Format (CLF) with HTTP methods	nginx / Apache
AWS CloudTrail/CloudWatch	AWS ARNs, eventSource, eventName patterns	AWS CloudWatch
SSH/Auth Logs	sshd, sudo, pam_unix, failed password messages	Auth / SSH Logs
Email Headers	Multiple `Received:` headers	Opens Email Routing visualization
SIP/VoIP Traces	SIP protocol headers (Via, From, To, Call-ID)	SIP / VoIP

When a log format is detected, a colored banner appears above the editor with a button to load the recommended preset. You can dismiss the banner if you prefer to configure rules manually.

ML Name Detection

LogScrub includes optional machine learning-based detection for identifying person names, locations, and organizations that pattern-based rules might miss.

Privacy First: All ML processing happens entirely in your browser. Your data never leaves your device. The model is downloaded once and cached locally in your browser's storage.

How It Works

ML Name Detection uses a pre-trained Named Entity Recognition (NER) model to identify entities in text:

PER (Persons) - Names of people (e.g., "John Smith", "Dr. Sarah Johnson")
LOC (Locations) - Place names (e.g., "London", "Silicon Valley")
ORG (Organizations) - Company and organization names (e.g., "Microsoft", "NHS")

Technology

Library: Transformers.js by Hugging Face
Model: BERT-based NER models (DistilBERT or BERT Base)
Format: ONNX for efficient browser execution via WebAssembly
Caching: Models are cached in IndexedDB after first download

Enabling ML Detection

Click the Settings button in the toolbar
Select a model under ML Name Detection (DistilBERT recommended for balance of speed/accuracy)
Click Download Model (only required once — cached models auto-load on startup)
Once "Ready" appears, click Run ML Analysis or it will run automatically during Analyze

Available Models

Model	Size	Speed	Accuracy
DistilBERT NER	~250 MB	Fast	Good
BERT Base NER	~420 MB	Slower	Best
BERT Base NER (uncased)	~420 MB	Slower	Best (case-insensitive)

ML Detection Rules

When ML detection is enabled, three additional rules appear in the ML Detection category:

Person Names (ML) - Names identified by the ML model
Locations (ML) - Place names identified by the ML model
Organizations (ML) - Company/org names identified by the ML model

These rules are automatically enabled when you turn on ML Name Detection, but you can disable individual entity types if needed.

When to Use ML Detection

ML detection is most useful when:

Text contains person names not in email or username formats
You need to detect organization names
Location names need to be identified
Pattern-based rules are missing names in free-form text

Note: ML models can produce false positives (detecting non-names as names) or false negatives (missing actual names). Always review results, especially for unfamiliar text patterns.

Syntax Validation

LogScrub automatically validates the syntax of structured file formats when you paste or upload content. This helps catch malformed files before processing.

Supported Formats

JSON - Objects, arrays, and nested structures
XML - Including SVG, HTML, GPX, and other XML-based formats
CSV - Validates consistent column counts across rows
YAML - Configuration files and structured data
TOML - Configuration files (e.g., Cargo.toml, pyproject.toml)

Format Detection

The format is detected automatically by:

File extension - When you upload a file, the extension determines the format
Content analysis - For pasted content, LogScrub examines the structure (e.g., starts with { or [ for JSON, < for XML)

Validation Results

After analysis completes:

Valid syntax - A green checkmark appears next to the "Original" heading. Hover over it to see the detected format (e.g., "Valid JSON syntax").
Invalid syntax - A red error banner appears showing the format, line number, column number, and error message. Click the line number to scroll directly to the error location.

Tip

Syntax validation runs in WebAssembly for speed. Even large files are validated almost instantly.

Context-Aware Detection

Beyond regex patterns, LogScrub can detect potential secrets by analyzing JSON structures and identifying suspicious key names. This is especially useful for structured logs that contain key-value pairs.

How It Works

When you run Analyze, LogScrub automatically:

Detects JSON content in your text (pure JSON, NDJSON, or embedded JSON in log lines)
Parses the JSON and walks through all key-value pairs
Flags values associated with suspicious keys like "password", "token", "api_key", etc.
Shows findings in the Context-Aware tab within Smart Suggestions

Supported JSON Formats

Format	Example
Pure JSON	`{"database": {"password": "secret123"}}`
JSON Lines (NDJSON)	Multiple JSON objects, one per line (common in logs)
Embedded JSON	`2024-01-15 INFO Request: {"user": "john", "token": "abc123"}`

Suspicious Keys

LogScrub looks for keys that commonly contain sensitive data:

High Confidence

Exact matches: password, passwd, pwd, secret, token, api_key, apikey, access_key, private_key, client_secret, auth_token, access_token, refresh_token, bearer, jwt, ssh_key, passphrase, credential, credentials

Medium Confidence

Pattern matches: Keys ending in _key, _token, _secret, _password; keys containing auth, cred, or ending in pass

Using Context-Aware Findings

After analysis, check the Context-Aware tab in Smart Suggestions. Each finding shows:

Key name — The suspicious key that triggered detection
Confidence level — High (exact match) or Medium (pattern match)
JSON path — Full path like config.database.password
Value preview — A sample of the detected value

Click "Add to scrub" to create a plain-text pattern that will redact that specific value. This is useful when you find a secret that wasn't caught by the regex-based rules.

JSON Log Entry

{"config": {"db_password": "MyS3cr3tP@ss!"}}

Context-Aware Detection

Key: db_password Path: config.db_password Confidence: Medium (_password suffix) Value: MyS3cr3tP@ss!

Complementing Regex Detection

Context-aware detection complements (not replaces) the regex-based rules:

Regex rules — Detect secrets by their format (JWT structure, API key prefixes, etc.)
Context-aware — Detect secrets by their semantic context (what key they're assigned to)

Using both approaches together provides more comprehensive coverage for secrets that might not have a recognizable format but are stored under suspicious key names.

Tip

If you frequently work with JSON logs that contain secrets in predictable key names, consider adding custom plain-text patterns for those specific values after discovering them via context-aware detection.

Email Header Analysis

When LogScrub detects email headers (multiple Received: headers), it offers to visualize the email's routing path through mail servers.

Email Routing Visualization

Click "View Email Routing" in the blue banner to open a visual diagram showing:

Server chain — Each mail server the message passed through, from origin to destination
Timestamps — When each server received the message
Transit time — Duration between each hop (shown on the connector lines)
TLS encryption — A padlock icon indicates encrypted transmission, with TLS version and cipher details on hover
Protocol — The mail protocol used (SMTP, ESMTP, LMTPS, etc.)

What Gets Parsed

Header	Information Extracted
`Received:`	Server hostnames, IP addresses, timestamps, TLS info, protocol
`Date:`	Original send time (used as the origin timestamp)

Understanding the Diagram

The routing diagram reads from top to bottom:

Origin — The sending server (extracted from the first Received: header's "from" field)
Intermediate servers — Mail relays, spam filters, or corporate gateways
Destination — The final receiving server

TLS Indicator

A padlock icon on the connector line indicates the transmission was encrypted. Hover over it to see the TLS version (e.g., TLSv1.3) and cipher suite used.

Use Cases

Debugging email delivery — Identify where delays occur in the delivery chain
Security analysis — Verify TLS encryption was used throughout the route
Spam investigation — Trace the origin of suspicious emails
Compliance — Document that email was transmitted securely

Spam Report Parsing

LogScrub detects and parses spam filter reports from SpamAssassin and rspamd, displaying them in an easy-to-read sortable table.

Supported Formats

Filter	Header	Format
SpamAssassin	`X-Spam-Report`	Table with pts, rule name, and description columns
rspamd	`X-Spam-Report`	`Symbol: RULE_NAME(score)` format with Action field

Report View Features

Sortable table — Click column headers to sort by rule name or score
Color-coded rules — Green for ham (negative scores), red for spam (positive scores)
Total score — Combined score with visual indicator
Action taken — Shows what the filter decided (no action, add header, reject, etc.)
Rule breakdown — Count of ham, neutral, and spam rules

Multi-line Headers

Spam reports are often split across multiple lines in email headers. LogScrub automatically handles header continuation lines (lines starting with whitespace) and reassembles the complete report.

Multiple Reports

Some mail systems run multiple spam filters. When both SpamAssassin and rspamd reports are present (e.g., X-Spam-Report and X-Spam-Report-Secondary), LogScrub detects both and displays them with tabs to switch between the two reports for comparison.

Tip

When an amber banner appears saying "spam reports detected", click "View Reports" to see the parsed rules. If multiple reports are present, use the tabs at the top to switch between them.

GPX Route Transposition

GPX files contain GPS track data from fitness devices, cycling computers, and navigation apps. When sharing routes for debugging or analysis, you may want to hide your actual location while preserving all other statistics.

How It Works

When LogScrub detects a GPX file (by extension or content), a green banner offers to transpose the route to a different continent. The transposition:

Shifts all coordinates — Moves the entire route to a new location
Preserves route shape — All turns, distances, and geometry remain identical
Keeps timestamps — Duration and timing data unchanged
Retains elevation — All elevation data preserved
Includes waypoints — Waypoints and route points also transposed

Destination Regions

Choose from six destination regions:

Region	Target Area
Europe	Paris, France
North America	New York, USA
South America	São Paulo, Brazil
Asia	Tokyo, Japan
Oceania	Sydney, Australia
Africa	Cape Town, South Africa

Route Statistics

Before transposition, the modal displays:

Track name and detected region
Number of GPS points
Total duration
Elevation range
Center coordinates

Use Case

Perfect for sharing cycling or running routes in bug reports without revealing where you live or work. The route shape and performance data remain valid for debugging.

Shifting Timestamps

GPX files also contain timestamps for each point. Use the Time Shift feature (in the toolbar) to shift all timestamps by a fixed offset or to a new start date. This adds another layer of privacy by obscuring when the activity occurred.

More GPX/FIT Tools

For more advanced GPX and FIT file manipulation (merging, splitting, editing, converting), visit skeffling.net/gpxfit.

Audit Reports

Generate detailed reports of all detected PII for compliance documentation.

Report Contents

Timestamp and source file name
Summary of detection counts by type
List of unique detected values (up to 100 per type)

Export Formats

Text (.txt) — Plain text, easy to read
JSON (.json) — Machine-readable, for integration
HTML (.html) — Formatted report for sharing

Access audit reports by clicking the detection count badge, then "Download Audit Report".

RTF Export with Highlighting

For a visual export of your scrubbed output, use the rtf download button (shown in green). This generates a Rich Text Format file with all replacements highlighted in green, matching the diff view appearance.

Visual review — Easily spot all changes at a glance
Shareable — RTF opens in Word, LibreOffice, TextEdit, and most word processors
Printable — Great for compliance documentation or review meetings

The RTF export uses a light green background with dark green text for replacements, making it easy to see what was scrubbed while keeping the document readable.

Reverse Lookup

When sharing scrubbed logs with others for analysis, you may receive feedback like "the IP on line 100 is causing the issue." LogScrub provides several ways to map scrubbed values back to their originals.

Hover Tooltips

Hover your mouse over any replacement in the scrubbed output (e.g., [IP-1]). A tooltip will show the original value, type, and line numbers.

Mapping Table

For a complete overview, open the Stats panel and click the Mapping tab. This shows a searchable table with every unique replacement.

Export Mapping Dictionary

Open Stats → Mapping tab
Click Export
Choose JSON (for scripts) or CSV (for spreadsheets)

JSON Format Example

{
  "[EMAIL-1]": {
    "original": "john.doe@example.com",
    "type": "email",
    "count": 3,
    "lines": [12, 45, 89]
  }
}

Tip

Enable Consistency Mode before scrubbing to ensure the same original value always gets the same replacement label.

AI Explain

When sharing scrubbed logs with AI assistants like ChatGPT, Claude, or Copilot, they need context to understand the redaction format. The AI Explain feature generates a ready-to-use explanation.

How to Use

Scrub your log as usual
Click the "AI Explain" button in the scrubbed output toolbar
Copy the generated explanation
Paste it into your AI chat before pasting the scrubbed log

What's Included

The generated explanation includes:

Replacement strategies — Describes each strategy used (Label, Fake, Fake (Country), Redact, Template)
Consistency mode status — Tells the AI whether same tokens mean same values
Detected types table — Lists each PII type found with its replacement strategy and example format
Interpretation guidelines — Instructions for how the AI should reference replacements

Example Output

## Log Redaction Context

This log has been sanitized using LogScrub...

### Detected & Replaced Data Types

| Type | Strategy | Example Replacement | Count |
|------|----------|---------------------|-------|
| Email | Label | `[EMAIL-1]` | 5 |
| IPv4 | Fake | `142.58.201.33` | 12 |
| Hostname | Template | `<HOSTNAME-1>` | 3 |

Tip

The explanation reflects each rule's individual replacement strategy. If you use different strategies for different types (e.g., Label for emails, Fake for IPs), the AI will know exactly what each replacement format means.

Documents & Spreadsheets

LogScrub can scrub PII from document and spreadsheet files, not just plain text. All processing happens client-side in your browser using WebAssembly.

Supported Formats

Format	Extension	Scrubbing	Preview
PDF	.pdf	True redaction (text removed)	Full page rendering
Word Document	.docx	Full text replacement	Formatted preview
Excel Spreadsheet	.xlsx	Full text replacement	Table view
OpenDocument Text	.odt	Full text replacement	Basic text preview
OpenDocument Spreadsheet	.ods	Full text replacement	Table view

How Document Processing Works

PDF Files

PDFs are rendered using MuPDF (WebAssembly). Due to the complexity of PDF format, scrubbing uses redaction only — detected PII is covered with black boxes rather than replaced with labels or fake data. This ensures the document structure remains intact.

True Redaction

The underlying text is permanently removed from the PDF, not just visually covered. The redacted text cannot be recovered by selecting, copying, or using PDF extraction tools.

Tip

The PDF preview shows page-by-page rendering with match counts per page, helping you verify all sensitive data was found.

Word Documents (.docx)

DOCX files are ZIP archives containing XML. LogScrub extracts the document content, applies your scrubbing rules, and repackages the file. The preview uses the docx-preview library to show formatted content including bold, italic, tables, and images.

Excel Spreadsheets (.xlsx)

XLSX files are processed using excelize-wasm. Cell contents are scrubbed while preserving the spreadsheet structure, formulas references, and formatting. The preview shows a table view of each sheet.

Note

Legacy .xls format (Excel 97-2003) is not supported. Please save as .xlsx format first.

OpenDocument Files (.odt, .ods)

LibreOffice/OpenOffice formats are ZIP archives with XML content, similar to Microsoft Office formats. LogScrub extracts and scrubs the content XML while preserving document structure.

Document Preview Features

Split view — See original and scrubbed documents side by side
ScrollSync — Original and scrubbed previews scroll together
Resizable panels — Drag the resize handle to adjust preview height
Match highlighting — PDF preview shows match count per page

Document Metadata

Documents often contain metadata that may include sensitive information:

Author name and company
Creation and modification dates
Application used to create the document
Revision history and editing time

When you upload a document, LogScrub automatically checks for metadata. If found, you'll see a dialog showing all detected metadata fields and can choose to:

Remove Metadata — Strip all metadata from the downloaded file
Keep Metadata — Preserve original metadata in the downloaded file

Tip

The metadata choice you make at upload time applies when you download. If you upload a new file, you'll be asked again.

Downloading Scrubbed Documents

After scrubbing, click Download to save the sanitized document. The output file preserves the original format — a scrubbed .docx remains a .docx that can be opened in Word.

Keyboard Shortcuts

Shortcut	Action
`⌘/Ctrl + Enter`	Scrub the input text
`⌘/Ctrl + S`	Download scrubbed output
`⌘/Ctrl + G`	Go to line number
`Escape`	Cancel processing / Close dialogs

Working with Network Captures

LogScrub can scrub network packet captures (PCAP files), but they must first be converted to text format.

Converting PCAP to Text

Using tcpdump

tcpdump -r capture.pcap -tttt > capture.txt

Using tshark (Wireshark CLI)

tshark -r capture.pcap > capture.txt

Using Wireshark

Open the PCAP file in Wireshark
Go to File → Export Packet Dissections → As Plain Text
Save the text file, then upload to LogScrub

What Gets Detected

Data Type	Detection Rule	Example
IP addresses with ports	IPv4, IPv6	`192.168.1.1:8080`
MAC addresses	MAC Address	`00:1A:2B:3C:4D:5E`
Hostnames in DNS/HTTP	Hostname, URL	`api.example.com`
HTTP headers	Various token rules	Auth tokens, cookies

File Size & Performance

LogScrub runs entirely in your browser, so performance depends on your device's capabilities.

File Size	Performance
< 10 MB	Fast, smooth processing
10–50 MB	Works well, may take a few seconds
50–100 MB	Slower processing, may take 10–30 seconds
> 100 MB	May cause browser memory issues

Tips for Large Files

Close other browser tabs to free up memory
Use a desktop browser rather than mobile
Consider splitting extremely large log files
The virtual scrolling feature helps keep the UI responsive

Privacy & Security

Important

LogScrub runs 100% in your browser. Your data is never uploaded to any server.

How We Protect Your Data

Client-side processing — All scrubbing happens in your browser using WebAssembly
No server communication — Your text never leaves your device
No analytics on content — We don't track what you scrub
Local storage only — Presets and settings are stored in your browser

Disclaimer

LogScrub does not guarantee detection of all PII. Pattern-based detection has inherent limitations. Always review your scrubbed output before sharing.

Best Practices

Always review the scrubbed output before sharing
Use "Analyze" first to understand what will be detected
Enable additional rules for sensitive data types not detected by default
Add custom patterns for organization-specific identifiers
Use the "Changed only" filter to quickly review modified lines