Sometimes you may want to check that an email address is not syntactically invalid, i.e. it looks like a recognisable email address. I use this approach in my zetact contact form processor.
Of course, it does not mean the address actually leads anywhere, but at least you know are dealing with an email address that could exist.
This is the code I have been using, albeit I have changed it from a class method to a simple function to make this post simpler.
"""Email check using regex.""" def invalidreg(emailkey): """Email validation, checks for syntactically invalid email courtesy of Mark Nenadov. See http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/65215""" import re emailregex = "^.+\\@(\\[?)[a-zA-Z0-9\\-\\.]+\\.([a-zA-Z]{2,3}|[0-9]{1,3\ })(\\]?)$" if len(emailkey) > 7: if re.match(emailregex, emailkey) != None: return False return True else: return True
I decided it would be more Pythonic to try to do this using the built-in string methods, rather than importing the re module and using a monster regular expression. Here was my first attempt.
"""Email checks using string methods - simple version.""" def invalidemail(emailaddress): """Checks for a syntactically invalid email address.""" try: emailitems = emailaddress.rsplit('@', 1) emailitems.extend(emailitems[1].rsplit('.', 1)) except IndexError: return True if [x for x in emailitems if not x.replace(".","").isalnum()] \ and emailaddress >= 7: return True else: return False
After a bit of testing and playing with this, a friend pointed me towards the relevant RFC on restrictions of email addresses. While the standard allows the use of many different special characters, in practice email addresses have to be much stricter if you actually want people in the real world to be able to send email to you.
For example, if we allow the email address []@commandline.org.uk, will whatever receives the output of this function be able to use it? As pointed out by Jan Goyvaerts, most software won't actually be able to handle obscure special characters.
We also don't want to water down the syntax check and allow junk for the sake of theoretical but non-existent addresses.
My compromise is to allow these special symbols -_.%+. in the local-part of the email address, and -_. in the domain name. I also do sanity checking on the top-level domain, it needs to be either a generic name or two characters long (country codes are all two letters).
So below is my current version, I added lots of comments and white space to make it easy to read.
"""Ditch nonsense email addresses.""" GENERIC_DOMAINS = "aero", "asia", "biz", "cat", "com", "coop", \ "edu", "gov", "info", "int", "jobs", "mil", "mobi", "museum", \ "name", "net", "org", "pro", "tel", "travel" def invalid(emailaddress, domains = GENERIC_DOMAINS): """Checks for a syntactically invalid email address.""" # Email address must be 7 characters in total. if len(emailaddress) < 7: return True # Address too short. # Split up email address into parts. try: localpart, domainname = emailaddress.rsplit('@', 1) host, toplevel = domainname.rsplit('.', 1) except ValueError: return True # Address does not have enough parts. # Check for Country code or Generic Domain. if len(toplevel) != 2 and toplevel not in domains: return True # Not a domain name. for i in '-_.%+.': localpart = localpart.replace(i, "") for i in '-_.': host = host.replace(i, "") if localpart.isalnum() and host.isalnum(): return False # Email address is fine. else: return True # Email address has funny characters. # Start the ball rolling. if __name__ == "__main__": print invalid("warrior@example.com")
May 03, 2008 02:00 AM :: West Midlands, England 

























).
)
Wow, that was quick.
Sounds like Robin Hood in a way, doesn’t it?
And Freud will say you’re stuck in the oral stage. Too much? Okay.