Quick Navigation
- • Entropy Basics & Shannon's Theory
- • The Fundamental Equation
- • Complexity vs. Length Analysis
- • Understanding Exponential Growth
- • Human Predictability & Entropy Loss
- • Markov Chains & Password Patterns
- • Computational Complexity & Crack Time
- • The Birthday Paradox in Passwords
- • Information-Theoretic vs. Computational Security
- • Pseudorandom vs. Cryptographically Secure
- • Practical Recommendations
Introduction: What Is Password Entropy?
In information theory, we measure uncertainty using a concept called entropy, formulated by Claude Shannon in 1948. When applied to passwords, entropy quantifies how many yes-or-no questions an attacker would need to ask, on average, to guess your password correctly.
Consider this: if I'm thinking of a number between 1 and 4, you'd need at most 2 questions to find it ("Is it greater than 2?" then "Is it 3?"). That's 2 bits of entropy. This simple principle scales up to explain why some passwords that look complex are actually weak, while simple-looking ones can be virtually uncrackable.
The Fundamental Equation
Let's establish our mathematical framework. For a password with:
- L = length (number of characters)
- N = size of the character set (available symbols)
The number of possible passwords is N^L, and the entropy H in bits is:
H = log₂(N^L) = L × log₂(N)
This logarithmic relationship is crucial—it explains why doubling your password length is far more effective than doubling your character set. Let me demonstrate with concrete calculations.
A Mathematical Comparison: Complexity vs. Length
Let's examine two passwords and calculate their theoretical entropy:
Case 1: The "Complex" Password
Consider "P@$$w0rd!" (9 characters using uppercase, lowercase, numbers, and symbols)
- Character set size: N ≈ 94 (full ASCII printable)
- Length: L = 9
- Entropy: H = 9 × log₂(94) ≈ 9 × 6.55 ≈ 59 bits
Case 2: The "Simple" Password
Consider "correcthorsebatterystaple" (25 lowercase letters)
- Character set size: N = 26
- Length: L = 25
- Entropy: H = 25 × log₂(26) ≈ 25 × 4.70 ≈ 118 bits
The "simple" password has roughly 2^59 times more possible combinations—that's 576 quintillion times more secure! This counterintuitive result demonstrates why our mathematical intuition often fails us with passwords.
Understanding Exponential Growth
The power of length comes from exponential growth. Let me illustrate with a thought experiment:
Imagine you have a 4-digit PIN (like 1234). That's 10^4 = 10,000 possibilities. Now, add just one digit. You now have 10^5 = 100,000 possibilities—a 10x increase from adding just one character!
This scales dramatically:
- 8-character password: N^8 possibilities
- 16-character password: N^16 = (N^8)² possibilities
Doubling the length squares the number of combinations. In mathematical terms, we're dealing with an exponential function, not a linear one.
The Entropy Reduction Problem: Human Predictability
Here's where theory meets reality. The entropy formula assumes each character is chosen uniformly at random—each possibility is equally likely. But humans don't work this way.
When we analyze real password databases, we observe severe entropy reduction:
Predictable Position Distribution
Through statistical analysis of breached passwords, we find:
- First position: 68% probability of uppercase letter (should be ~27.7% if random)
- Last position: 31% probability of "!" (should be ~1.1% if random)
- Numbers: 82% appear at the end (should be evenly distributed)
Let's calculate the entropy loss. If the first character is uppercase 68% of the time instead of uniformly random:
Entropy loss = log₂(94) - [-0.68×log₂(0.68) - 0.32×log₂(0.32/93)]
≈ 6.55 - 2.41 ≈ 4.14 bits lost
Just from the first character's predictability, we've lost 4 bits of entropy!
Markov Chains and Password Patterns
Password patterns can be modeled as Markov chains, where each character depends on the previous ones. For instance, after "p@s", the probability of "s" is much higher than uniform (users type "p@ssword" variations).
The transition probability matrix for common passwords shows:
- After "@", probability of "s" ≈ 0.15 (vs. 0.011 if random)
- After "1", probability of "2" ≈ 0.23 (vs. 0.011 if random)
This dependency structure reduces effective entropy by approximately 40-60% in typical human-chosen passwords.
Computational Complexity and Time-to-Crack
Let's formalize the relationship between entropy and security. Given an attack rate of R guesses per second, the expected time T to crack a password is:
T = 2^(H-1) / R
(We divide by 2 because on average, you find the password halfway through the search space)
With modern GPU clusters achieving R ≈ 10^11 guesses/second for MD5 hashes:
Entropy (bits) | Time to Crack | Mathematical Expression |
---|---|---|
40 | 5.5 seconds | 2^39 / 10^11 |
60 | 58 days | 2^59 / 10^11 |
80 | 3.8 million years | 2^79 / 10^11 |
128 | 10^21 years | 2^127 / 10^11 |
Note: This assumes the hash function is fast. Deliberately slow functions like bcrypt reduce R by a factor of ~10^6.
The Birthday Paradox in Password Security
An interesting phenomenon occurs when we consider multiple passwords. The birthday paradox tells us that in a group of just 23 people, there's a 50% chance two share a birthday.
For passwords, if we have a space of 2^n possible passwords and k users, the probability of at least one collision is approximately:
P(collision) ≈ 1 - e^(-k²/2^(n+1))
This means with 2^32 possible passwords (~4.3 billion) and just 77,000 users, we expect a 50% chance of collision. This mathematical reality is why password reuse is so dangerous—attackers can exploit these collisions.
Information-Theoretic Security vs. Computational Security
There's an important distinction in cryptography:
- Information-theoretic security: Secure even against an attacker with infinite computational power. A one-time pad achieves this—it has entropy equal to message length.
- Computational security: Secure against attackers with realistic computational resources. Most passwords fall here.
For passwords, we aim for computational security. With current technology, 80 bits of entropy provides a comfortable margin. But Moore's Law tells us computing power doubles every ~2 years, meaning:
Future-proof entropy = Current requirement + 1.5 × years of desired security
For 20 years of security: 80 + 1.5 × 20 = 110 bits recommended.
Pseudorandom vs. Cryptographically Secure Generation
The quality of randomness matters mathematically. Consider two random number generators:
Linear Congruential Generator (like simple Math.random())
- Next = (a × Current + c) mod m
- Period ≤ m (typically 2^48)
- Entropy ceiling: 48 bits, regardless of password length
Cryptographically Secure PRNG (like crypto.getRandomValues())
- Based on hardware entropy sources
- Passes statistical randomness tests (NIST SP 800-22)
- Entropy limited only by password length
The mathematical difference: CSPRNGs are indistinguishable from true random sequences under polynomial-time analysis, while simple PRNGs have detectable patterns.
Practical Recommendations Based on Mathematical Analysis
Given our analysis, here are evidence-based recommendations:
For L-length passwords with N-size character set:
- Maximize L before N: Each additional character multiplies possibilities by N
- Ensure true randomness: Use cryptographically secure generation
- Target entropy based on threat model:
- General use: H ≥ 60 bits
- Sensitive data: H ≥ 80 bits
- Long-term secrets: H ≥ 110 bits
Optimal strategies by constraint:
- If memorization required: Use diceware method (5-7 random words)
- If length limited: Maximize character set and use true randomness
- If no constraints: Generate 20+ random characters
Conclusion: The Mathematical Reality
The mathematics reveals a simple truth: password strength is fundamentally about entropy, and entropy is best achieved through length and randomness, not complexity patterns.
A randomly generated 20-character lowercase password (94 bits of entropy) is mathematically superior to a human-generated 12-character "complex" password (~40 bits effective entropy after pattern analysis). The difference? About 18 quadrillion times more secure.
This isn't just theoretical—it's the mathematical foundation that determines whether your password can be cracked in seconds or would outlast the heat death of the universe.
The mathematical principles discussed here are implemented in our password generator, which uses cryptographically secure randomness to ensure maximum entropy within your chosen parameters.