Verifying valid email addresses with PowerShell…sort of…
July 11th, 2009
I was recently asked to whip up a quick function that would verify email addresses from a list, and extract the ones that were valid and unique. Apparently, someone had run a report which should have resulted in a list of email addresses, sent it to the requester, and then left for the day. Unfortunately, they ran it incorrectly, and the results were a mix of email addresses, names, LDAP paths, partial home addresses, and the like…in fact, tens of thousands of lines worth.
Now, if your familiar with email address standards, you know that there’s a very complex set of characters which would be considered valid for an email address. I’ve seen huge multi-line RegExes designed to verify *almost* any valid email address, and fortunately for me, I’m not interested in showing you any of those here. The fact is, they all fall short in some way. Instead, I’m going to show you a simple RegEx that should catch a majority of modern email addresses. The key is to know who’s standard you’re parsing for, and the company that I did this for has a strict one-or-more-word-characters along with one-or-more-dots standard on either side of the “@”.
Let’s look at it.
function Get-ValidAddresses() {
$lines = @($input);
$addresses = @();
$addresses_unique = @();
$pattern_address = '(?<Address>[\w\.\+]+@[\w\.]+)';
$pattern_replace = '(\A\.+|\.+\Z)';
foreach ($line in $lines) {
if ($line -match $pattern_address) {
$addresses += @($matches.Address);
}
}
$addresses = $addresses -replace $pattern_replace, '';
$addresses_unique = @($addresses | sort-object -unique);
return $addresses_unique;
}
First, we take in the piped input and assign it to an array. I like to avoid using the default/built-in variables where possible (a practice that was reinforced by what I read in Perl Best Practices a few years ago). Anyway, then set up a couple of arrays to store our email addresses in. After that, we set up our Regular Expressions for verifying which email addresses we believe to be valid.
Let me explain our RegEx in detail here. The outer part, "(?<Address>)" is used with the $matches variable to help make things easier for us, allowing us to access our match by using something like “$matches.Address”, in the same way that $1, $2, etc. are used to access matches within “()” in RegExes.
Inside there, our actual pattern looks like “[\w\.\+]+@[\w\.]+”. The square brackets denote that a range of characters are held inside. Inside those, our range contains “\w”, matching alphanumeric characters (short for “a-zA-Z0-9_”), “\.”, which matches “.”, and “+”, which matches itself. Outside this, the “+” indicates that we want to match one or more of the proceeding characters. In short, we’d like to match one or more letters, numbers, underscores, periods, and pluses on the left side of the “@”, and one or more letters, numbers, underscores, and periods on the right side.
$lines = @($input);
$addresses = @();
$addresses_unique = @();
$pattern_address = '(?<address>[\w\.\+]+@[\w\.]+)';
Our second RegEx will be used as a pattern replacement, to remove any prefix or postfixed periods. “\A” at the beginning of a RegEx denotes characters at the beginning of a line, and “\Z” at the end of a RegEx deontes characters at the very end of a line. I thought that it would be good to include this since the particular dataset I worked with included some otherwise valid addresses that had periods stuck at the beginning or end, making them invalid as email addresses. Replacing these is a quick fix to make them valid.
$pattern_replace = '(\A\.+|\.+\Z)';
Next, iterate over each element of the array, see which ones match, and add the match to the list of valid, non-unique addresses.
foreach ($line in $lines) {
if ($line -match $pattern_address) {
$addresses += @($matches.Address);
}
}
Next, replace any periods at the very beginning or very end of the line using the previously mentioned RegEx pattern. Then, store the unique addresses in a seperate array, and pass that array back as the results of the function.
$addresses = $addresses -replace $pattern_replace, '';
$addresses_unique = @($addresses | sort-object -unique);
return $addresses_unique;
Once you have made the function available, accessing it works like so. We’ll use a test dataset, stored in $data, then pipe it in and see what the results are.
PS > $data = @('test@test.local', '123 Anywhere St.')
PS > $data += '.firstname.lastname@names.test.local'
PS > $data += 'LDAP://dc=test,dc=com'
PS > $data | Get-ValidAddresses
firstname.lastname@names.test.local
test@test.local
Perfect.
Categories: PowerShell




