Planning and Architecture
50 TopicsExchange On-Premises Best Practices for Migrations from 2010 to 2016
We have had some requests for guidance on moving from on-premises Exchange 2010 to 2016. If you have a hybrid configuration, mailboxes, or public folders on Exchange 2010, you should prepare to install Exchange 2016 before October 13, 2020.110KViews8likes5CommentsMicrosoft Ignite 2020 - One Week Away!
In about a week from today Microsoft Ignite 2020 gets underway. It’s quite a bit different this year but the one thing you can be certain of is we have lots of new and interesting content for you to enjoy. This post is to highlight the Exchange, Outlook and Bookings sessions we have created and curated for your viewing pleasure. The links won’t be live until the event starts, but we wanted to give you a peek into what to expect.17KViews4likes14CommentsSetting Up for Success With Exchange Online – A Video Series
When Microsoft Ignite 2020 become a digital event we took it as an opportunity to think about the sessions and subjects people always ask us about, but which we rarely have the space for given the physical constraints of the venue.14KViews3likes0CommentsNamespace Planning in Exchange 2016
If you are like the vast majority of our customers, you already have some version(s) of Exchange deployed in your environment. Depending on the version, you may have different namespace requirements today. Exchange 2010 Exchange 2010 leverages the Autodiscover service for enabling client profile changes, so that namespace exists. Exchange 2010 introduced additional namespace requirements, which resulted in additional complexity around namespace planning, especially for site resilient solutions: Primary datacenter Internet protocol namespace (mail.contoso.com) Secondary datacenter Internet protocol namespace (mail2.contoso.com) Primary datacenter Outlook Web App failback namespace (mailpri.contoso.com) Secondary datacenter Outlook Web App failback namespace (mailsec.contoso.com) Transport namespace (smtp.contoso.com) Primary datacenter RPC Client Access namespace (rpc.contoso.com) Secondary datacenter RPC Client Access namespace (rpc2.contoso.com) Out of these seven namespaces, five of them were required on certificates. The RPC Client Access namespaces were not required on the certificate because they were accessed via RPC connectivity and not via an Internet-based protocol, like HTTP. Exchange 2016 One of the benefits of the Exchange 2016 architecture (first introduced in Exchange 2013) is that the namespace model can be simplified, when compared to Exchange 2010. An example of how it can be simplified can be seen when thinking about a site resilience scenario. If you have two datacenters participating in a site resilient architecture, by replacing the Exchange 2010 infrastructure with Exchange 2016, five namespaces could potentially be removed: Secondary datacenter Internet protocol namespace (mail2.contoso.com) Primary datacenter Outlook Web App failback namespace (mailpri.contoso.com) Secondary datacenter Outlook Web App failback namespace (mailsec.contoso.com) Primary datacenter RPC Client Access namespace (rpc.contoso.com) Secondary datacenter RPC Client Access namespace (rpc2.contoso.com) There’s two reasons for this. First, Exchange 2016 no longer leverages an RPC Client Access namespace.This is due to the architectural changes within the product - for a given mailbox, the protocol that services the request is always going to be the protocol instance on the Mailbox server that hosts the active copy of the database for the user’s mailbox. In other words, the RPC Client Access service is no longer decoupled from the store, like it was in Exchange 2010. Second, as mentioned, the Client Access services proxies requests to the Mailbox server hosting the active database copy. Figure 1: Client Access services (on MBX Server 1) proxying traffic to the Mailbox server hosting the active database copy (on MBX Server 3) This proxy logic is not limited to the Active Directory site boundary. Unlike Exchange 2010, Exchange 2016 does not require the client namespaces to move with the DAG during an activation event – a Mailbox server in one Active Directory site can proxy a session to a Mailbox server that is located in another Active Directory site. This means that unique namespaces are no longer required for each datacenter (mail.contoso.com and mail2.contoso.com); instead, only a single namespace is needed for the datacenter pair – mail.contoso.com. This also means failback namespaces are also not required during DAG activation scenarios, so mailpri.contoso.com and mailsec.contoso.com are removed. Namespace Models Depending on your architecture and infrastructure you have two choices: Deploy a unified namespace for the site resilient datacenter pair (unbound model). Deploy a dedicated namespace for each datacenter in the site resilient pair (bound model). It’s also worth mentioning that these choices are also tied to the DAG architecture. Unbound Model In an unbound model, you have a single DAG deployed across the datacenter pair. This DAG has Mailbox servers in each datacenter – typically all Mailbox servers are active and host active database copies, however you could deploy all active copies in a single datacenter. Mailboxes for both datacenters are dispersed across the mailbox databases within this DAG. In this model, clients can connect to both datacenters in the event there is a WAN failure – neither datacenter’s connectivity is a boundary, hence the term unbound. It does not guarantee, however, the connectivity provides users an equal experience; meaning one connection may provide a better user experience because it has lower latency or more bandwidth. In an unbound model, a single namespace is preferred because either datacenter can service the user request. This means that from a load balancing perspective, the Exchange 2016 Mailbox servers in both datacenters participate in handling traffic, as seen in the following diagram, where VIP (virtual IP address) is the load balanced IP address associated with the namespace: Figure 2: Single Namespace used across Site Resilient Datacenter Pair (Unbound Model) As a result, for a given datacenter, the expectation is that 50% of the traffic will be proxied from the other datacenter. Bound Model As its name implies, in a bound model, users are associated (or bound) to a specific datacenter. In other words, there is preference to have the users operate out of one datacenter during normal operations and only have the users operate out of the second datacenter during failure events. There is also a possibility that users do not have equal connectivity to both datacenters. Typically, in a bound model, there are two DAGs deployed in the datacenter pair. Each DAG contains a set of mailbox databases for a particular datacenter; by controlling where the databases are mounted, you control connectivity. In a bound model, multiple namespaces are preferred, two per datacenter (primary and failback namespaces), to prevent clients trying to connect to the datacenter where they may have no connectivity. Switchover to the other datacenter is a controlled event. Figure 3: Multiple Namespaces used across Site Resilient Datacenter Pair (Bound Model) Autodiscover Namespace Exchange 2016 takes advantage of the Autodiscover service for client profile configuration; so the autodiscover.contoso.com namespace remains in place. Office Online Server Namespaces The document collaboration features included in Outlook on the web require Office Online Server. In site resilient deployments, you want to deploy an Office Online Server farm in each datacenter that participates in the site resilient datacenter pair.This ensures that there is a local instance that can service the document collaboration requests for the local mailboxes and avoids cross-site proxy scenarios. From a namespace perspective, this means that each datacenter in the site resilient datacenter pair requires a unique namespace for Office Online Server; in other words, the namespace model for Office Online Server is a bound model.The namespace model that is used by Office Online Server is mutually exclusive from the model used by Exchange, meaning that you can deploy Exchange using an unbound model, while utilizing a bound model for Office Online Server as seen in the following figure: Figure 4: Office Online Server Namespaces (Bound Model) with an Exchange Unbound Model Namespace As all the data serviced by Office Online Server is either stored in Exchange or SharePoint, during a datacenter outage, namespace manipulation steps are not required. For example, if we refer to the previous diagram – if the West datacenter fails, you don’t need to change the DNS record for the Office Online Server namespace in West and point it to the load balancer in East. This is due to the architecture of Exchange and Office Online Server. Any Exchange 2016 Mailbox server will always proxy the client’s request to the Mailbox server that hosts the user’s mailbox database. The Mailbox server hosting the user’s mailbox is responsible for generating the Office Online Server URL that is used by OWA. This URL is defined per-Mailbox server, thereby ensuring that any Office Online Server interactions are always local to the Mailbox server. Internal vs. External Namespaces Since the release of Exchange 2007, the recommendation is to deploy a split-brain DNS infrastructure for the Internet-based client namespaces.A split-brain DNS infrastructure enables different IP addresses to be returned for a given namespace based on where the client resides – if the client is within the internal network, the IP address of the internal load balancer is returned; if the client is external, the IP address of the external gateway/firewall is returned. This approach simplifies the end-user experience – users only have to know a single namespace (e.g., mail.contoso.com) to access their data, regardless of where they are connecting. A split-brain DNS infrastructure, also simplifies the configuration of the Exchange virtual directories, as the InternalURL and ExternalURL values within the environment can be the same value. In the event that you do not deploy a split-brain DNS (also known as split-DNS) infrastructure, Exchange 2016 does allow you to specify different namespaces for internal clients vs. external clients for all clients. Important: In the event that you are utilizing a split-brain DNS infrastructure, then you must utilize the same authentication value for both your internal and external Outlook Anywhere settings, or switch to use different names for Outlook Anywhere inside and out. Outlook gives priority to the internal settings over the external settings and since the same namespace is used for both, regardless of whether the client is internal or external, it will utilize only the internal authentication settings. Regional Namespace The concept of regional namespaces has existed since OWA debuted in 1997. A regional namespace is a way for clients to connect to the client access endpoint that is closest to the Mailbox servers hosting the data. Use of a regional namespace does not necessarily mean you are restricted to a bound model, either. This is because depending on your infrastructure and network capabilities, you may choose to have a dedicated namespace for each datacenter pair. For example, your company may have a set of datacenters in North America and in Europe, and due to a desire to reduce cross-region network traffic, you deploy a dedicated namespace for each region (notice that within a region, the unbound model is used): Figure 5: : Regional Namespaces coupled with Geo-DNS to Round-Robin between Datacenters within a Region Namespaces and Active Directory Site Topologies When planning your namespace architecture, it is important to understand that namespaces and authentication settings must be identical and/or consistent within an Active Directory site. For example, when Autodiscover generates a response to send to the client, it generates a list of internal URLs based on the virtual directory settings of the Mailbox servers located in the Active Directory where the mailbox is located. If you attempt to have multiple namespaces within a single Active Directory site, clients will be randomly directed to different namespaces. Likewise, setting different authentication settings within an Active Directory site will lead to different behaviors for the clients. In other words, you can only define different namespace and authentication settings between Active Directory sites, not within Active Directory sites. Conclusion Exchange 2016 introduces significant flexibility in your namespace architecture, enabling deployment of a single unified namespace for a site resilient datacenter pair (or worldwide), or deployment of multiple namespaces. As we delve into the intricacies surrounding load balancing principles and client connectivity, you will understand (hopefully) how to choose the best namespace model. Ross Smith IV Principal Program Manager Office 365 Customer Experience120KViews1like5CommentsLoad Balancing in Exchange 2016
Like Exchange 2013, Exchange 2016 does not require session affinity at the load balancing layer. To understand this statement better, and see how this impacts your designs, we need to look at how MBX2016 functions. From a protocol perspective, the following will happen: A client resolves the namespace to a load balanced virtual IP address. The load balancer assigns the session to a MBX server in the load balanced pool. The Client Access services located on the MBX server authenticates the request and performs a service discovery by accessing Active Directory to retrieve the following information: Mailbox version (for this discussion, we will assume an Exchange 2016 mailbox) Mailbox location information (e.g., database information, ExternalURL values, etc.) The Client Access services located on the MBX server makes a decision on whether to proxy the request or redirect the request to another MBX infrastructure (within the same forest). The Client Access services located on the MBX server queries an Active Manager instance that is responsible for the database to determine which Mailbox server is hosting the active copy. The Client Access services located on the MBX server proxies the request to the Mailbox server hosting the active copy. Step 5 is the fundamental change that enables the removal of session affinity at the load balancer. For a given protocol session, the Client Access services located on the Mailbox server now maintains a 1:1 relationship with the Mailbox server hosting the user’s data. In the event that the active database copy is moved to a different Mailbox server, MBX closes the sessions to the previous server and establishes sessions to the new server. This means that all sessions, regardless of their origination point (i.e., MBX servers in the load balanced array), end up at the same place, the Mailbox server hosting the active database copy.This is vastly different in releases prior to Exchange 2013 – for example, in Exchange 2010, if all requests from a specific client did not go to the same endpoint, the user experience was negatively affected. The protocol used in step 6 depends on the protocol used to connect to MBX. If the client leverages the HTTP protocol, then the protocol used between Mailbox servers is HTTP (secured via SSL using a self-signed certificate). If the protocol leveraged by the client is IMAP or POP, then the protocol used between the Mailbox servers is IMAP or POP. Telephony requests are unique, however. Instead of proxying the request at step 6, MBX will redirect the request to the Mailbox server hosting the active copy of the user’s database, as the telephony devices support redirection and need to establish their SIP and RTP sessions directly with the Unified Messaging components on the Mailbox server. Figure 1: Exchange 2016 Client Access Protocol Architecture However, there is a concern with this architectural change. Since session affinity is not used by the load balancer, this means that the load balancer has no knowledge of the target URL or request content.All the load balancer uses is layer 4 information, the IP address and the protocol/port (TCP 443): Figure 2: Layer 4 Load Balancing The load balancer can use a variety of means to select the target server from the load balanced pool, such as, round-robin (each inbound connection goes to the next target server in the circular list) or least-connection (load balancer sends each new connection to the server that has the fewest established connections at that time). Health Probe Checking Unfortunately, this lack of knowledge around target URL (or the content of the request), introduces complexities around health probes. Exchange 2016 includes a built-in monitoring solution, known as Managed Availability. Managed Availability includes an offline responder. When the offline responder is invoked, the affected protocol (or server) is removed from service. To ensure that load balancers do not route traffic to a Mailbox server that Managed Availability has marked as offline, load balancer health probes must be configured to check <virtualdirectory>/healthcheck.htm (e.g., https://mail.contoso.com/owa/healthcheck.htm). Note that healthcheck.htmdoes not actually exist within the virtual directories; it is generated in-memory based on the component state of the protocol in question. If the load balancer health probe receives a 200 status response, then the protocol is up; if the load balancer receives a different status code, then Managed Availability has marked that protocol instance down on the Mailbox server. As a result, the load balancer should also consider that end point down and remove the Mailbox server from the applicable load balancing pool. Administrators can also manually take a protocol offline for maintenance, thereby removing it from the applicable load balancing pool. For example, to take the OWA proxy protocol on a Mailbox server out of rotation, you would execute the following command: Set-ServerComponentState <Mailbox Server> -Component OwaProxy –Requestor Maintenance –State Inactive For more information on server component states, see the article Server Component States in Exchange 2013. What if the load balancer health probe did not monitor healthcheck.htm? If the load balancer did not utilize the healthcheck.htm in its health probe, then the load balancer would have no knowledge of Managed Availability’s removal of (or adding back) a server from the applicable load balancing pool. The end result is that the load balancer would have one view of the world, while Managed Availability would have another view of the world. In this situation, the load balancer could direct requests to a Mailbox server that Managed Availability has marked down, which would result in a negative (or broken) user experience. This is why the recommendation exists to utilize healthcheck.htm in the load balancing health probes. Namespace and Affinity Scenarios Now that we understand how health checks are performed, let’s look at four scenarios: Single Namespace / Layer 4 (No Session Affinity) Single Namespace / Layer 7 (No Session Affinity) Single Namespace / Session Affinity Multiple Namespaces / No Session Affinity Single Namespace / Layer 4 (No Session Affinity) In this scenario, a single namespace is deployed for all the HTTP protocol clients (mail.contoso.com). The load balancer is operating at layer 4 and is not maintaining session affinity. The load balancer is also configured to check the health of the target Mailbox servers in the load balancing pool; however, because this is a layer 4 solution, the load balancer is configured to check the health of only a single virtual directory (as it cannot distinguish OWA requests from RPC requests). Administrators will have to choose which virtual directory they want to target for the health probe; you will want to choose a virtual directory that is heavily used.For example, if the majority of your users utilize OWA, then targeting the OWA virtual directory in the health probe is appropriate. Figure 3: Single Namespace with No Session Affinity As long as the OWA health probe response is healthy, the load balancer will keep the target MBX in the load balancing pool. However, if the OWA health probe fails for any reason, then the load balancer will remove the target MBX from the load balancing pool for all requests associated with that particular namespace. In other words, in this example, health from the perspective of the load balancer, is per-server, not per-protocol, for the given namespace.This means that if the health probe fails, all client requests for that namespace will have to be directed to another server, regardless of protocol. Figure 4: Single Namespace with No Session Affinity - Health Probe Failure Single Namespace / Layer 7 (No Session Affinity) In this scenario, a single namespace is deployed for all the HTTP protocol clients (mail.contoso.com). The load balancer is configured to utilize layer 7, meaning SSL termination occurs and the load balancer knows the target URL. The load balancer is also configured to check the health of the target Mailbox servers in the load balancing pool; in this MBXe, a health probe is configured on each virtual directory. As long as the OWA health probe response is healthy, the load balancer will keep the target MBX in the OWA load balancing pool. However, if the OWA health probe fails for any reason, then the load balancer will remove the target MBX from the load balancing pool for OWA requests. In other words, in this example, health is per-protocol; this means that if the health probe fails, only the affected client protocol will have to be directed to another server. Figure 5: Single Namespace with Layer 7 (No Session Affinity) - Health Probe Failure A single namespace utilizing layer 7 without session affinity is the recommended namespace and load balancer configuration for Exchange 2016. Single Namespace / Layer 7 with Session Affinity this scenario, a single namespace is deployed for all the HTTP protocol clients (mail.contoso.com). The load balancer is configured to maintain session affinity (layer 7), meaning SSL termination occurs and the load balancer knows the target URL. The load balancer is also configured to check the health of the target Mailbox servers in the load balancing pool; in this MBXe, the health probe is configured on each virtual directory. As long as the OWA health probe response is healthy, the load balancer will keep the target MBX in the OWA load balancing pool. However, if the OWA health probe fails for any reason, then the load balancer will remove the target MBX from the load balancing pool for OWA requests. In other words, in this example, health is per-protocol; this means that if the health probe fails, only the affected client protocol will have to be directed to another server. Figure 6: Single Namespace with Layer 7 with Session Affinity - Health Probe Failure Note: By having session affinity enabled, the load balancer’s capacity and utilization are decreased because processing is used to maintain more involved affinity options, such as cookie-based load balancing or Secure Sockets Layer (SSL) session-ID. Check with your vendor on the impacts session affinity will have in your load balancing scalability. Multiple Namespaces / No Session Affinity This scenario combines the best of both worlds – provides per-protocol health checking, while not requiring complex load balancing logic. In this scenario, a unique namespace is deployed for each HTTP protocol client; for example: Figure 7: Multiple Namespaces with No Session Affinity Note: As seen in the picture depicted above, ECP is provided its own namespace. ECP and OWA are intimately tied together, and thus, ECP does not necessarily require its own namespace. However, ECP does have its own application pool, is the endpoint for the Exchange Administration Center, and used by Outlook clients for certain configuration items. Therefore, you may want to provide a unique namespace for ECP. The load balancer is configured to not maintain session affinity (layer 4). The load balancer is also configured to check the health of the target Mailbox servers in the load balancing pool; in this case, the health probes are effectively configured to target the health of each virtual directory, as each virtual directory is defined with a unique namespace, and while the load balancer still has no idea what the URL is being accessed, the result is as if it does know. As long as the OWA health probe response is healthy, the load balancer will keep the target MBX in the OWA load balancing pool. However, if the OWA health probe fails for any reason, then the load balancer will remove the target MBX from the load balancing pool for OWA requests. In other words, in this example, health is per-protocol; this means that if the health probe fails, only the affected client protocol will have to be directed to another server. Figure 8: Multiple Namespaces with No Session Affinity - Health Probe Failure The downside to this approach is that it introduces additional namespaces, additional VIPs (one per namespace), and increases the number of names added as subject alternative names on the certificate, which can be costly depending on your certificate provider. But, this does not introduce extra complexity to the end user – the only URL the user needs to know is the OWA URL. ActiveSync, Outlook, and Exchange Web Services clients will utilize Autodiscover to determine the correct URL. Exchange Scenario Summary The following table identifies the benefits and concerns with each approach: Benefits Concerns Single Namespace / Layer 4 (No Session Affinity) Single namespace Reduced load balancer complexity Session affinity maintained at MBX Per-server health Recommended: Single Namespace / Layer 7 (No Session Affinity) Single namespace Per-protocol health SSL offloading which may impact load balancer scalability Single Namespace / Layer 7 (Session Affinity) Single namespace Per-protocol health Session affinity maintained at load balancer Increased load balancer complexity Reduced load balancer scalability Multiple Namespaces / No Session Affinity Per-protocol health Session affinity maintained at MBX Users only have to know OWA URL Multiple namespaces Additional names on certificate Increased rule set on load balancer Multiple VIPs Office Online Server Load Balancing Exchange Server 2016 leverages Office Online Server to provide the rich document preview and editing capabilities for OWA. While this was a necessary change to ensure a homogenous experience across the Office Server suite, this does introduce additional complexity for environments that don’t have Office Online Server. As discussed in the architecture and namespace planning articles, the Office Online Server infrastructure requires unique namespaces. The load balancer is configured to maintain layer 7 with session affinity (using cookie-based persistence) for each Office Online Server namespace, meaning SSL termination occurs and the load balancer knows the target URL.This ensures the client is always directed to the same Office Online Server while the user is utilizing the document collaboration capabilities within OWA. Conclusion Exchange 2016 introduces significant flexibility in your namespace and load balancing architecture. With load balancing, the decision ultimately comes down to balancing functionality vs. simplicity. The simplest solution lacks session affinity management and per-protocol health checking, but provides the capability to deploy a single namespace. At the other end of the spectrum, you can utilize session affinity management, per-protocol health checking with a single namespace, but at the cost of increased complexity. Or you could balance the functionality and simplicity spectrums, and deploy a load balancing solution that doesn’t leverage session affinity, but provides per-protocol health checking at the expense of requiring a unique namespace per protocol. Ross Smith IV Principal Program Manager Office 365 Customer Experience129KViews1like6CommentsThe Exchange 2016 Preferred Architecture
The Preferred Architecture (PA) is the Exchange Engineering Team’s best practice recommendation for what we believe is the optimum deployment architecture for Exchange 2016, and one that is very similar to what we deploy in Office 365. While Exchange 2016 offers a wide variety of architectural choices for on-premises deployments, the architecture discussed below is our most scrutinized one ever. While there are other supported deployment architectures, they are not recommended. The PA is designed with several business requirements in mind, such as the requirement that the architecture be able to: Include both high availability within the datacenter, and site resilience between datacenters Support multiple copies of each database, thereby allowing for quick activation Reduce the cost of the messaging infrastructure Increase availability by optimizing around failure domains and reducing complexity The specific prescriptive nature of the PA means of course that not every customer will be able to deploy it (for example, customers without multiple datacenters). And some of our customers have different business requirements or other needs which necessitate a different architecture. If you fall into those categories, and you want to deploy Exchange on-premises, there are still advantages to adhering as closely as possible to the PA, and deviate only where your requirements widely differ. Alternatively, you can consider Office 365 where you can take advantage of the PA without having to deploy or manage servers. The PA removes complexity and redundancy where necessary to drive the architecture to a predictable recovery model: when a failure occurs, another copy of the affected database is activated. The PA is divided into four areas of focus: Namespace design Datacenter design Server design DAG design Namespace Design In the Namespace Planning and Load Balancing Principles articles, I outlined the various configuration choices that are available with Exchange 2016. For the namespace, the choices are to either deploy a bound namespace (having a preference for the users to operate out of a specific datacenter) or an unbound namespace (having the users connect to any datacenter without preference). The recommended approach is to utilize the unbounded model, deploying a single Exchange namespace per client protocol for the site resilient datacenter pair (where each datacenter is assumed to represent its own Active Directory site - see more details on that below). For example: autodiscover.contoso.com For HTTP clients: mail.contoso.com For IMAP clients: imap.contoso.com For SMTP clients: smtp.contoso.com Each Exchange namespace is load balanced across both datacenters in a layer 7 configuration that does not leverage session affinity, resulting in fifty percent of traffic being proxied between datacenters. Traffic is equally distributed across the datacenters in the site resilient pair, via round robin DNS, geo-DNS, or other similar solutions. From our perspective, the simpler solution is the least complex and easier to manage, so our recommendation is to leverage round robin DNS. For the Office Online Server farm, a namespace is deployed per datacenter, with the load balancer utilizing layer 7, maintaining session affinity using cookie based persistence. Figure 1: Namespace Design in the Preferred Architecture In the event that you have multiple site resilient datacenter pairs in your environment, you will need to decide if you want to have a single worldwide namespace, or if you want to control the traffic to each specific datacenter by using regional namespaces. Ultimately your decision depends on your network topology and the associated cost with using an unbound model; for example, if you have datacenters located in North America and Europe, the network link between these regions might not only be costly, but it might also have high latency, which can introduce user pain and operational issues. In that case, it makes sense to deploy a bound model with a separate namespace for each region. However, options like geographical DNS offer you the ability to deploy a single unified namespace, even when you have costly network links; geo-DNS allows you to have your users directed to the closest datacenter based on their client’s IP address. Figure 2: Geo-distributed Unbound Namespace Site Resilient Datacenter Pair Design To achieve a highly available and site resilient architecture, you must have two or more datacenters that are well-connected (ideally, you want a low round-trip network latency, otherwise replication and the client experience are adversely affected). In addition, the datacenters should be connected via redundant network paths supplied by different operating carriers. While we support stretching an Active Directory site across multiple datacenters, for the PA we recommend that each datacenter be its own Active Directory site. There are two reasons: Transport site resilience via Shadow Redundancy and Safety Net can only be achieved when the DAG has members located in more than one Active Directory site. Active Directory has published guidance that states that subnets should be placed in different Active Directory sites when the round trip latency is greater than 10ms between the subnets. Server Design In the PA, all servers are physical servers. Physical hardware is deployed rather than virtualized hardware for two reasons: The servers are scaled to use 80% of resources during the worst-failure mode. Virtualization adds an additional layer of management and complexity, which introduces additional recovery modes that do not add value, particularly since Exchange provides that functionality. Commodity server platforms are used in the PA. Commodity platforms are and include: 2U, dual socket servers (20-24 cores) up to 192GB of memory a battery-backed write cache controller 12 or more large form factor drive bays within the server chassis Additional drive bays can be deployed per-server depending on the number of mailboxes, mailbox size, and the server’s scalability. Each server houses a single RAID1 disk pair for the operating system, Exchange binaries, protocol/client logs, and transport database. The rest of the storage is configured as JBOD, using large capacity 7.2K RPM serially attached SCSI (SAS) disks (while SATA disks are also available, the SAS equivalent provides better IO and a lower annualized failure rate). Each disk that houses an Exchange database is formatted with ReFS (with the integrity feature disabled) and the DAG is configured such that AutoReseed formats the disks with ReFS: Set-DatabaseAvailabilityGroup <DAG> -FileSystem ReFS BitLocker is used to encrypt each disk, thereby providing data encryption at rest and mitigating concerns around data theft or disk replacement. For more information, see Enabling BitLocker on Exchange Servers. To ensure that the capacity and IO of each disk is used as efficiently as possible, four database copies are deployed per-disk. The normal run-time copy layout ensures that there is no more than a single active copy per disk. At least one disk in the disk pool is reserved as a hot spare. AutoReseed is enabled and quickly restores database redundancy after a disk failure by activating the hot spare and initiating database copy reseeds. Database Availability Group Design Within each site resilient datacenter pair you will have one or more DAGs. DAG Configuration As with the namespace model, each DAG within the site resilient datacenter pair operates in an unbound model with active copies distributed equally across all servers in the DAG. This model: Ensures that each DAG member’s full stack of services (client connectivity, replication pipeline, transport, etc.) is being validated during normal operations. Distributes the load across as many servers as possible during a failure scenario, thereby only incrementally increasing resource use across the remaining members within the DAG. Each datacenter is symmetrical, with an equal number of DAG members in each datacenter. This means that each DAG has an even number of servers and uses a witness server for quorum maintenance. The DAG is the fundamental building block in Exchange 2016. With respect to DAG size, a larger DAG provides more redundancy and resources. Within the PA, the goal is to deploy larger DAGs (typically starting out with an eight member DAG and increasing the number of servers as required to meet your requirements). You should only create new DAGs when scalability introduces concerns over the existing database copy layout. DAG Network Design The PA leverages a single, non-teamed network interface for both client connectivity and data replication. A single network interface is all that is needed because ultimately our goal is to achieve a standard recovery model regardless of the failure - whether a server failure occurs or a network failure occurs, the result is the same: a database copy is activated on another server within the DAG. This architectural change simplifies the network stack, and obviates the need to manually eliminate heartbeat cross-talk. Note: While your environment may not use IPv6, IPv6 remains enabled per IPv6 support in Exchange. Witness Server Placement Ultimately, the placement of the witness server determines whether the architecture can provide automatic datacenter failover capabilities or whether it will require a manual activation to enable service in the event of a site failure. If your organization has a third location with a network infrastructure that is isolated from network failures that affect the site resilient datacenter pair in which the DAG is deployed, then the recommendation is to deploy the DAG’s witness server in that third location. This configuration gives the DAG the ability to automatically failover databases to the other datacenter in response to a datacenter-level failure event, regardless of which datacenter has the outage. If your organization does not have a third location, consider placing the witness in Azure; alternatively, place the witness server in one of the datacenters within the site resilient datacenter pair. If you have multiple DAGs within the site resilient datacenter pair, then place the witness server for all DAGs in the same datacenter (typically the datacenter where the majority of the users are physically located). Also, make sure the Primary Active Manager (PAM) for each DAG is also located in the same datacenter. Data Resiliency Data resiliency is achieved by deploying multiple database copies. In the PA, database copies are distributed across the site resilient datacenter pair, thereby ensuring that mailbox data is protected from software, hardware and even datacenter failures. Each database has four copies, with two copies in each datacenter, which means at a minimum, the PA requires four servers. Out of these four copies, three of them are configured as highly available. The fourth copy (the copy with the highest Activation Preference number) is configured as a lagged database copy. Due to the server design, each copy of a database is isolated from its other copies, thereby reducing failure domains and increasing the overall availability of the solution as discussed in DAG: Beyond the “A”. The purpose of the lagged database copy is to provide a recovery mechanism for the rare event of system-wide, catastrophic logical corruption. It is not intended for individual mailbox recovery or mailbox item recovery. The lagged database copy is configured with a seven day ReplayLagTime. In addition, the Replay Lag Manager is also enabled to provide dynamic log file play down for lagged copies when availability is compromised. By using the lagged database copy in this manner, it is important to understand that the lagged database copy is not a guaranteed point-in-time backup. The lagged database copy will have an availability threshold, typically around 90%, due to periods where the disk containing a lagged copy is lost due to disk failure, the lagged copy becoming an HA copy (due to automatic play down), as well as, the periods where the lagged database copy is re-building the replay queue. To protect against accidental (or malicious) item deletion, Single Item Recovery or In-Place Hold technologies are used, and the Deleted Item Retention window is set to a value that meets or exceeds any defined item-level recovery SLA. With all of these technologies in play, traditional backups are unnecessary; as a result, the PA leverages Exchange Native Data Protection. Office Online Server Design At a minimum, you will want to deploy two Office Online Servers in each datacenter that hosts Exchange 2016 servers. Each Office Online Server should have 8 processor cores, 32GB of memory and at least 40GB of space dedicated for log files. Note: The Office Online Server infrastructure does not need to be exclusive to Exchange. As such, the hardware guidance takes into account usage by SharePoint and Skype for Business. Be sure to work with any other teams using the Office Online Server infrastructure to ensure the servers are adequately sized for your specific deployment. The Exchange servers within a particular datacenter are configured to use the local Office Online Server farm via the following cmdlet: Set-MailboxServer <East MBX Server> –WACDiscoveryEndPoint https://oos-east.contoso.com/hosting/discovery Summary Exchange Server 2016 continues in the investments introduced in previous versions of Exchange by reducing the server role architecture complexity, aligning with the Preferred Architecture and Office 365 design principles, and improving coexistence with Exchange Server 2013. These changes simplify your Exchange deployment, without decreasing the availability or the resiliency of the deployment. And in some scenarios, when compared to previous generations, the PA increases availability and resiliency of your deployment. Ross Smith IV Principal Program Manager Office 365 Customer Experience264KViews1like20Comments