Thursday, July 23, 2009

Field Notes - Exchange 2007 Edge Troubleshooting

Ehlo All,

So, I figured I would share more of my daily work from the field. That means more troubleshooting, upgrade issues, successes, and all around adventure. A client of REEF Solutions (using REEF Solutions' hosted clustered spam, virus, and DoS protection solution which has handled about 550,000+ messages a day over the past month) with their own in-house IT staff was working on an Exchange 2007 migration from 2003 that had email flow problems and almost 9,000 valid messages were stuck in the Edge queue. I was called in to assist after the client's IT was on the phone with Microsoft Professional Support Services for over 3 hrs and there was no solution and they were considering reinstalling Edge. Client was restless, since email downtime was suppose to end after 7 days or so, but it didn't.

Background
The client had migrated to 2 new servers, an Exchange 2007 Mailbox/CAS/Hub & Edge both on Windows 2008 Server 64 bit. During the upgrade they implemented an Exchange 2007 Edge Server. This was to replace an existing non-Exchange smtp gateway server. They previous had a single Exchange 2003 environment. After the Edge implementation, email would flow from the Mailbox Server to Edge to Internet, but not the reverse. Client IT had tested and telneting between the Edge and Mailbox worked, and vice versa, but email would not flow. Edge was in a DMZ. MS PSS had done a lot of things, but the email was still not flowing. During the the entire week long downtime, REEF Solutions had queued up email off-site (9k of valid non-spam messages) for the client.

Troubleshooting and Solution
1) Running the built-in Exchange troubleshooting analyzer reported errors on both servers. Running it on Mailbox reported not seeing Edge, and vice versa. This was because the DMZ didn't have those ports open for RPC and other ports. Not a big deal, but makes troubleshooting harder.
2) pinging the Mailbox and Edge servers NETBIOS name worked from both servers.
3) from Mailbox and Edge, telneting via port 25 to generate "homemade" email both ways was successful.
4) on Mailbox ran "Test-EdgeSynchronization" and it passed with flying colors.
5) on Mailbox ran "Test-EdgeSynchronization -VerifyRecipient bgates@yourdomain.com" and it was successful. Obviously, pick an email in your domain. This is testing the AD Application Mode (ADAM) replication [1 way from AD -> Edge] for storage of configuration and recipient information. This is because Edge is a non-domain computer and doesn't have access to AD like a normal domain based server.
6) checked the hosts files on both servers. And added due to a known IPv6 issue, the NETBIOS and FQDN of each server and the other server in their hosts file. So, if your mailbox server was called "mailboxsrv", in the hosts file would say "192.168.1.2 mailboxsrv" and then line 2 would be "192.168.1.2 mailboxsrv.corp.yourdomain.com" and comment out the ::1 localhost entry to "#::1 localhost".
7) on Mailbox server in EMC - Organization Configuration - Hub Transport - Send Connectors - EdgeSync - Inbound to Mailbox Server - Route mail through the following smart hosts: {your mailbox server IP})
8) on Edge, saw an Event log error for a non-valid SSL cert, so on the Mailbox and Edge server, if I recall, under EMC - Hub Transport - Send Connectors - Network - unchecked "Enable Domain Security (Mutual-Auth TLS)". This is an excellent article by MVP Elan Shudnow that discusses transport layer security between Edge and Transport.
9) on Mailbox, ran "Start-EdgeSynchronization" and the configuration changes I made replicated to the Edge server.
10) since all inbound port 25 is restricted from REEF's clustered email filtering solution, I generated email from their and tested inbound flow from cluster - edge - mailbox, and it was successful. And then I tested outbound email and it worked. Then the 9k message queue quickly reduced down to 0.

FYI: if you need to reinstall the Edge or Transport Server and have messages in your queue, you can backup it up, re-install Edge or Transport services, and then restore the database. Edge queue database is ESE based, like Exchange. An excellent article by explaining the backup and restore process by Joshua Raymond is here.

Problem solved.
.
QUIT

No comments: