Quick HOWTO : Ch02 : Data Center Relocation - Preparation
- 1 Introduction
- 2 Data Center Selection Criteria
- 3 The Relocation Project Plan
- 3.1 Coordination Preparation
- 3.1.1 Project Management
- 3.1.2 Roles and Responsibilities
- 3.1.3 Disaster Recovery Team
- 3.1.4 Procedures Documentation
- 3.1.5 Interpersonal Communications
- 3.1.6 Participant Lists
- 3.1.7 Contractors
- 3.1.8 Vendors / Purchasing
- 3.1.9 Inventory
- 3.1.10 Equipment Leases
- 3.1.11 Relocation Date and Time
- 3.1.12 Failed Equipment Identification
- 3.1.13 Plan of Retreat
- 3.1.14 Practice Migrations
- 3.2 Customer Communications Preparation
- 3.3 Server Area Preparation
- 3.4 Network Preparation
- 3.5 Server Preparation
- 3.6 DNS Preparation
- 3.7 Transportation Preparation
- 3.1 Coordination Preparation
- 4 Conclusion
The rationale for deciding to relocate or consolidate data centers was discussed at length in Chapter 1, "Justification". This chapter explains in detail the criteria you should use to select your new data center and create a project plan. A lot of information will be covered and the numerous action items mentioned are included in Appendix I, "Relocation Check Sheets" to help make the process easier.
Data Center Selection Criteria
There are two broad categories of data center providers. The first only supply computer room floor space, access to an ISP, basic monitoring and power. These are called collocation providers. The second group provides more comprehensive management that may include all possible IT services related to your site including systems development. These are called managed hosting providers. There are a wide range of varying service levels in between and the interpretation of the terms within the industry can often be very loose. Always request a very specific list of the services your data center provides as part of your selection process.
As expected, the selection of a suitable data center will play an important role in any data center or web farm relocation project. There are many factors related to the facility and its services that need to be considered that are often overlooked. These include:
The data center should be positioned away from zones at risk from natural disasters such as flooding from rivers and dams, hurricanes and earthquakes. It should also be no closer than a quarter kilometer away from major highways and railroads to reduce the evacuation risk from toxic spills. Locations close to hazardous production facilities and aircraft flight corridors should be avoided.
Your employees may have other personal interests in the location such as the presence of reasonably priced housing nearby, recreational attractions in the area, access to public transportation, and the availability of amenities such as schools and parks in the neighborhood. You should monitor how traffic patterns affect the ease of accessibility to the site to see whether they are unsuitable.
The immediate vicinity of the site is also important. Rainwater should drain away from the building and then off site to prevent localized flooding. In high security environments the building should be surrounded by embankments and perimeter fencing, reducing the risk physical attack.
The facility should have access to multiple ISPs with the cable entering from different points of the building. This reduces the risk of outages due to a technical failures as well as construction and landscaping accidents. Verify the roof access rights in the event you need to have a satellite or microwave line of sight antenna installed.
It is also extremely important to visually verify the type of connectivity you have. Be certain that both the ISPs that enter the building and the types of data circuits they can provide are suitable. Don't sign a contract with an ISP where you are held hostage to unsuitable or otherwise inadequate connectivity. This is discussed in greater detail in the Appendix II, "Selecting an ISP".
Power should be supplied from multiple feeds from different substations. The facility should also be able to run without interruption if its largest standby generator or UPS are offline for maintenance. Ensure that the building has sufficient excess capacity to handle future growth.
In large facilities, the UPS feeds a network of power distribution units (PDUs) to supply each section of the floor with a series of circuit breaker panels. Make sure that every rack or cabinet you intend to use has access to outlets from at least two PDUs and that each PDU is operating at no more than 45% so that it can handle the full load of the other one if it fails.
Request a history of outages or other irregularities in the feeds from the site's utilities and ask how you'll be notified by the facility of any electrical maintenance work to be carried out by either themselves or their providers. The facility's staff should also be automatically notified by monitoring equipment of any disruptions in the power supply to the area.
Ask how quickly the generators respond to an electrical outage and how long the UPS batteries can last. Inquire about whether the UPSs have ever supplied the full load of the data center and when last the system, including the batteries, was last maintained. Standby generators can be regularly started without revealing any apparent problems, ask whether testing includes the use of a load bank to simulate the power consumption of the data center. Investigate how frequently the equipment is tested and how often it is maintained.
Verify that the power per square foot that the data center can provide meets your needs. Racks of densely packed servers and data storage can be power hungry.
Most data centers try to maintain a 75F/25C air temperature, verify this. On your plant tour be on the lookout for computer room air conditioning (CRAC) units that squeak or rattle loudly as it could be a sign of poor maintenance. Condensation from CRAC units should be drained away immediately through piping, be on the lookout for water leaks.
Verify that there is 24/7 security enforcement. This should include offices and common areas being isolated from the data center floor, mandatory visitor/employee registration or electronic ID access and interior/exterior video surveillance. Some data centers also link visitor ID cards with a person's biometric information through the use of a palm reader. This helps to deter ID card fraud.
Not only should there be smoke and heat detectors, but they should be linked to an alarm panel that graphically shows the location of the fire on the building's floor plan. The first line of defense should be a gaseous system that suffocates the fire by displacing the oxygen in the air. These systems are less damaging than water based ones but they are usually designed for fires of short duration.
Larger fires will often require a pre-action water based system. Here the pipe lines are pre-filled with pressurized air to reduce the risk of flooding during normal operation. Water only enters the piping after an alarm signal has been detected, then the sprinklers release the water only after a pre-defined temperature has been reached. False alarms are minimized by requiring two events to occur before the system is activated. This is an industry standard method of fire prevention and it should be on your checklist.
If your data center is situated on raised floor tiles, you should ask whether there are liquid detectors underneath. This helps to prevent problems due to extinguisher and CRAC unit leaks. Also in this case, make sure that the cabling lies in trays above the floor out of harms way from minor flooding. If possible, the server area should also be isolated using fire proof doors.
Not all data centers will provide you with Internet connectivity. Some will only have a demarcation point where ISPs have placed their equipment. You will then have to contract with the ISPs to extend a data circuit to your server area. Connectivity can become more complex than it first appears. There are different types of data circuits requiring varying types of adapters on your network equipment.
If you require only one link, then you'll need to configure a single default gateway on your network equipment to get to the Internet. When multiple links are required, you'll need to configure a dynamic routing protocol on your network equipment. This will automatically calculate which of the many links will get to the data to its final destination most quickly. It can also be used to bias traffic to and from your web site on the cheapest ISP link and will automatically fail traffic over to the remaining ISP circuits if one of the other circuits fail.
Detailed discussion of typical network connectivity issues usually requires the services of a network engineer and is beyond the scope of this chapter. Appendix II, "Selecting an ISP" will cover many frequently used terminologies and scenarios to help you evaluate your options better.
It is often taken for granted that your data center provider continuously monitors its equipment for failure. Ask about the frequency of the checks. A polling cycle of five minutes or less is generally acceptable. Also ask about the types of checks done, ICMP (Internet Control Message Protocol) or "ping" tests only check basic network connectivity and server response. The facility should also use SNMP (Simple Network Management Protocol) to track CPU, memory, error and data throughput rates. It is possible for SNMP enabled systems to send notifications, or "traps", when components fail, or a predefined event, such as high CPU usage, occurs. This information should be fed into some form of a job ticketing system that will ensure that the problem is fixed quickly. Ask about the number of failed polls that will trigger an alarm and whether they too will automatically generate a ticket.
Ensure that your data center uses multiple DNS servers, behind different firewalls, in multiple locations to prevent your web site from being affected by one of the servers going down. Some facilities will provide not just caching DNS for the exclusive use of your servers, but also authoritative DNS services to handle Internet queries for your Web domain. With authoritative services, ask about the procedures for updating DNS, the lead time for requesting changes and the format of the DNS data the provider will need to enter it into their systems.
Ask about the availability of a web portal through which you can view statistics, billing, contact, and server information related to your site. Also ask about the times during which scheduled maintenance is done and the types of notifications that are provided. Request a summary of escalation procedures used when problems occur and whether there is a formalized means of documenting and permanently fixing problems. From time to time you may need simple services such remote hands on help in rebooting a server or changing a backup tape. Ask about the availability of such services and possibly more complex ones through an as-needed contractual or longer term retainer based agreements.
The backup system you are using at your current location may be different from the one used at the new facility. This could be the source of difficulties if you have to restore historical data during or after the relocation due to server failure or human error. Verify whether the new facility can handle data backed up using your software on your backup media. If not, you may have to invest in data conversion services with a third party. Good backup services usually store data for a predetermined period of time before reusing the media. They should also store most of the data at a secured secondary facility. This protects the data from catastrophic events at the main data center. Verify that this type of extra data security exists.
For improved safety, the data center floor should use anti static tiles to reduce the risk of electrostatic shock damaging your equipment components. Water pipes, steam lines, bathrooms, kitchens and other sources of moisture should all be located a safe distance away. Also, they should not be directly above the area. You should also determine whether the location has sufficient floor space to handle our current and future needs. Some facilities allow you to reserve the area immediately surrounding your server area for future expansion.
This factor can present itself in many different ways to include pricing for bandwidth, power, cooling, security, floor space rental and custom services. It is a good idea to determine what the total costs would be over the time period you expect your current website architecture to be used as the costs can be presented as recurring and/or one time expenses. Lower recurring costs can easily give the perception of cheaper operating expenses but the price may become unfavorable when higher setup fees are taken into account.
Remember that this is a perfect wish list. The data centers in your vicinity may not meet all the criteria but the list should allow you to reduce your final candidates to a manageable number. Data center selection is only the first phase of the physical planning for the relocation and will largely be the responsibility of your facilities and networking teams. The work that will follow will demand a lot more from your IT support staff and will have to be carefully coordinated as we will see in the following sections.
The Relocation Project Plan
Detailed logistical planning of all the steps related to the relocation needs to be started well in advance of the deadline date. You'll probably need to start with a number of meetings to inform each of the affected groups about the project. These will have to be followed by project planning meetings in which roles and responsibilities are assigned and progress reports given. As the deadline date draws near or as the complexity increases, be prepared to schedule daily and sometimes twice daily meetings to achieve your goals.
There are many aspects of the migration that need to be thought about prior to arranging the first meeting. Some of the most pressing ones will now be discussed.
There are a number of things that need to be considered prior to setting up specific functional groups for each aspect of the relocation. These are discussed next.
Have a single overall project manager for the activity. If the project starts to become complicated invest the time in tracking it with software tools such as Microsoft Project. Spreadsheets can track static information well but do relatively poorly in monitoring the status of dynamically changing deadlines. Constantly changing priorities can be disruptive. Plan to include deadlines after which time no further changes may be made. Set up meetings on these days to determine whether the project or sub-project should be aborted, continued as planned or given a preparation time extension.
Roles and Responsibilities
Create an activity checklist that assigns each member in the team clearly defined roles and time frames in which to get activities done. Specifically assign someone the task of keeping track of the problem equipment that may fail. The project manager will inevitably be distracted by other events and this will help to ensure that forgotten technical issues don't threaten the success of the migration. There should be persons to lead transportation, networking, server shutdown/startup, application testing, customer communications and the locking of the doors at both the old and new facilities once the relocation is complete.
An often overlooked role is one of the "gofer", someone who will go for anything that you have forgotten. It could be to buy cables you forgot to order, pickup catered food, or to find the software CDs that "must be somewhere over there in those boxes". Remember to give them some small reward when it's all over as it is one of the most thankless jobs.
Disaster Recovery Team
Have a group of persons assigned to disaster recovery. It should include staff that is familiar with systems administration, database administration, networking and backups. These persons don't necessarily have to be sitting idly by waiting for something to break. They can still play important roles in the preparation steps, but should be given a reduced workload during the relocation itself that will allow them to dedicate their time to such activities.
There are three types of procedural documentation that will have to be up to date. The first relates to those used by your existing systems which won't change as a result of the relocation. The second would obviously be the documentation for systems that will change after the project is over.
The third type is equally important. It is the documentation of the steps each participant is expected to do during the relocation. As part of the definition of the roles and responsibilities, some participants will need to have a detailed task list to help prevent them from making errors. These would include step by step commands that a technician or engineer would need to execute as part of the process. This person can cross this activity off their check list when the tasks are completed for better control of the change process.
Make preparations to have a permanent conference bridge open so that all members of the team can be better coordinated in the event of a crisis. Make sure all active participants at the time of the migration all have mobile phones on their person.
Have a complete list of participants in the relocation. This should include their work, mobile, and if possible, their home phone numbers. It should also include contact information for all third parties involved with the activity such as movers, technicians and contractors. This list should be distributed to the entire team.
You may need to use contractors to do some of the work your staff may have neither the time nor ability to do. They should be qualified, experienced and authorized by the manufacturers they represent. Contractors should also use the correct tools and be able to test the quality of their work. A check list for contractors is provided in Appendix I, "Relocation Check Sheets".
Vendors / Purchasing
One of the most difficult aspects of a data center move is the coordination of purchases from your vendors. There are many things to track. Items have varying delivery lead times, you may forget to order something, items may have to be returned or replaced, and deadlines may shift. A sample purchasing check list for purchases is provided in Appendix I, "Relocation Check Sheets". You may want to adapt it to a spreadsheet format to make it easier to share with your vendor.
You will have to do a complete inventory of all the equipment to be moved. This will also have to include "before" and "after" data related to the network connectivity and physical location of each device. The actual required information for each type of equipment will be covered later in the chapter and accompanying check lists are provided in Appendix I, "Relocation Check Sheets".
Record this information in a database if possible. It will allow for very flexible reporting including individual status and data sheets for each device. Not everyone will have access to the application, so ensure that it has the capability of creating word processing or spreadsheet versions of the reports for more universal distribution.
Some relocations cannot afford any downtime at all and you be forced to purchase or lease equipment to create a duplicate environment at the new location. You may have to assign the acquisition of such equipment to a team lead and adjust your budget accordingly.
Relocation Date and Time
Determine the best time for the migration. If it takes place at night, and/or over an extended period, allow for overtime, catered food and possibly compensatory time off. For nighttime moves, make sure your daytime skeleton staff is capable of handling regular business issues and can relieve the night staff of some of the technical problems that may arise. Verify that there won't be delays due to rush hour traffic or road maintenance at the planned time.
Failed Equipment Identification
Have some way of marking equipment that isn't working. A brightly colored sticky note stuck to a server and the rack or cabinet in which it is located is usually sufficient. This makes it very easy to identify broken equipment from a distance. Make everyone aware of this process.
Plan of Retreat
Create a plan of retreat in the event that things go dreadfully wrong. Create a shortlist of scenarios during the actual relocation under which the project cannot go forward. Have some mechanism of informing everyone of the decision. Create a list that defines the sequence in which servers should be returned. Also identify a point of no return at which you cannot roll back your changes. In this case create a minimum list of servers that need to be functioning for the website to be adequately operational. If things go wrong, ensure that these servers are functioning correctly.
Plan to do a practice relocation of some non critical servers to see whether you are really prepared for the full scale operation. These are some of the general preparatory tasks that need to be done and you may have to add a few that cater to your unique needs. It is a good first step before proceeding with more specific plans.
Customer Communications Preparation
Notifying your internal and external customers of the expected changes will be critical to the success of the project. Consider these activities as part of the plan whether the relocation is successful or not.
Provide ample warning of the impending activities so that your customers can plan for the change.
Have a single message with varying degrees of detail for each customer group depending on their information needs. For example, web surfers may need to know that your site may be unavailable for maintenance for a specific time period but business partners may need to know about any new procedures the change may create. The message should explain the reasons, expected features and associated benefits of the change. Plan to have this message delivered in person to your most valued customers. Make sure it is delivered to all internal departments that could be affected by the planned activities.
Create a communications plan in the event of a failure. Here is a list of guidelines I strongly recommend be followed:
- Create a web page or blog to provide updates on the progress of problem resolution. This should also include contact information for your crisis spokesperson.
- Provide sufficient information to make all affected parties aware that you're taking the matter seriously. Describe the extent of the problem and a statement that addresses any possible concerns of those who may not be affected.
- Update this page at predictable stated intervals even when there's nothing to report. Significant events outside this schedule should be reported immediately.
- If the issue has public visibility, have a video or audio clip of the company President/CEO making a factual statement about the matter that expresses remorse, and a promise to have quick resolution.
- Create lists of potential questions, answers and discussion points for those parties that are affected, not affected, business prospects, the press, and analysts.
- In times of crisis, personalized verbal assurances and updates can be valuable. Attempt to give every customer a personal phone call. Use the Q&A discussion points. The callers will have to be prepared to take some abuse. Have them allow the customers vent while sticking to the script. Senior management should make some calls too to gain first hand understanding of the pain everyone is feeling.
- Don't forget to contact customers that are not impacted. Focus on the most important ones to provide them with personalized verbal service assurances. Send e-mails to all other non-impacted customers with statements about service continuity.
- Have senior management call important sales prospects to provide justifications to choose your company in spite of the disruption.
- Use a company wide all-hands meeting to provide a first hand situation status. It should be brief as there will be lots of work to do. Provide regular email updates to employees via e-mail.
- You should also have someone monitoring the web media and blogosphere to get feedback on what people are saying, and who's saying it. When comments are necessary, simply reference your updates web page. Use this updates page to correct misstated facts neither making excuses nor being defensive. Only authorized persons do such limited commentary on blogs.
- Contact key press and analysts. They will appreciate hearing directly from you even if they remain highly critical.
Some aspects of the migration may fail and documenting the issues can lead to much better experiences in future. Develop an easy to use post-mortem template that can be used to document any failures related to the migration. It should include the persons involved, dates and times the events occurred, reasons for the failure, error messages, the final solution and steps that will be taken to prevent the recurrence of the problem. This information should be made available to all members of staff involved in the migration and to customers who may demand detailed explanations of the cause of disruptions.
The establishment of clear channels of communication with your customers is always important but especially so during projects of high risk. Always include them as part of any relocation plan to ensure a more complete success.
Server Area Preparation
The health of your server farm depends on the quality of your physical infrastructure. A poorly prepared area can cause unacceptable delays and even a complete site shutdown. Follow these steps to reduce your risk.
Your servers and disk storage will consume the most amount of power in your environment. Do an audit of the power consumption to determine how much you will require in the new location. Get a total figure and an estimate of what you expect to consume per rack or cabinet. Verify that the servers in each cabinet or rack won't overload the power circuits supplying them. Some power hungry devices may need unusual voltages or electrical connectors, double check this information ahead of time.
Make sure each rack or cabinet can receive power from redundant PDUs and that there is adequate excess capacity on the PDUs to support not only your server farm but also the failure of one of the PDU units.
Have the facility's management prove that you are getting UPS protected power in your area.
It can be very frustrating to arrive at the new facility to discover that the server power cords used at the old location are too short for the racks in the new one. The problem usually occurs when converting from fixed racks to ones in which the servers are mounted on rails. This allows you to slide the servers out into the aisles for better access but requires cables that can stretch the distance. Verify that you have sufficient quantities of adequate cables.
Verify that the area has an adequate number of CRAC units to cover your anticipated power load. The rule of thumb is that each watt of power consumed by a server requires a watt of cooling. Once the migration is completed you'll have to test air temperatures and humidity to ensure they meet the requirements of your equipment.
In raised computer room flooring, hot air is extracted from the room, chilled and then returns to the server area under the floor blowing up into the cabinets through vents in the floor. The floor under the tiles should therefore be clean and generally clear of obstructions such as cabling and ducting. If possible baffles can be placed under each CRAC unit to guide the air flow in the direction of the cabinets it needs to cool. In some cases the floor under the cooling zone of the unit may need to be sealed off to force the air only to the required cabinets.
You may also find yourself in a situation where the overall cooling requirements of the server farm are within the specification of your combined CRAC units but certain concentrations of servers within the farm could overtax the capacity of individual units. Plan to spread these high power density racks across the server floor to help balance the load across all CRAC units.
This is very important. Ensure the power outlets are labeled with the PDU and circuit breaker number. This is especially important for systems with dual power supplies that should be plugged into separate power sources. Make sure power cables are labeled with the name of the server at both ends too.
Make provisions to have all servers labeled on the front and the back to reduce the risk of incorrect cabling and likelihood of making a mistake with a hard (power cycle) reboot in the event of an unexpected server failure. Also make sure that all network cables are labeled at both ends.
How do you start to number? Numbering schemes for cabinets and racks are usually straight forward. Split up the server area into zones serviced by the same patch panels or switches. Each zone will have a number of rows of racks and/or cabinets. A location number such as 1- 11-4 could mean zone 1, row 11, cabinet 4. You should also label patch panels in a similar way so that 1-11-4 p7-2 would refer to the 2nd port of the 7th patch panel in cabinet 1-11-4.
Create diagrams that map the precise layout of servers in the racks and pre-install shelving and rack kits at the expected locations. It is a good idea to put the heavier equipment at locations near the bottom as this will make them easier to insert and remove. Make room for monitors and their KVM (Keyboard, video monitor, mouse) switches also.
Video monitors on carts should also be available at the new facility for troubleshooting servers that need to be removed from racks or cabinets.
Rack Usage and Orientation
Install the servers in the same direction in the racks. This will make all power and network cabling connections reside neatly on one side of the racks.
Servers on opposite sides of an aisle should either face each other or be back to back. This creates a better cooling environment as the hot power supply exhausts of one server won't be sucked in by the front facing air of the server behind it. CRAC units extract air through filters on the top of the unit and blow chilled air through vents at the bottom. It is for this reason that CRAC units should be placed in line with the hot aisles so that the air can easily be extracted from them. When regular flooring is used, you may require ducting to blow the chilled air into the cold aisle. With raised floors, the CRAC unit vents are physically below the floor level blowing air up into the server cabinets. In this case the baffles and sealed floor techniques mentioned in this chapter would help channel the air flow better. Sometimes with raised flooring, the air blown up through the cabinets is insufficient to cool the servers and perforated floor tiles need to be placed in the cold aisles for added cooling. Remember that perforated tiles located in hot aisles are counter productive as they will help to cool air the servers never use.
As expected, the servers should be stacked vertically. When cabinets are used you should insert unperforated blanking panels in any spaces between the servers to better channel the cooling air from the front of the cabinet to the hot aisle in the back. Without the blanking panels, the usual swirling vortices of exhaust air can easily be blown back to the front of the rack through the spaces.
Server cabinets come in a variety of widths, the most common one being 19 inches wide. Sometimes the walls between adjacent cabinets are removed to facilitate cabling. This can affect the correct channeling of the cooling airflow through the servers and can usually be avoided through better patch panel layouts ahead of moving time.
Remember to make the aisles wide enough to allow people to easily mount and dismount servers in them. Finally, some types of equipment may be too heavy for regular server cabinets and will require the use of racks as an alternative source of support. This equipment will need to be identified and located accordingly.
Determine your network cabling requirements based on your server layout. You may have to install patch panels to connect the server racks and cabinets to those containing your network equipment. You would then connect your server to a patch panel port in its rack with a standard network cable. This port is in turn connected to an equivalent port on a patch panel in the network rack. By using another standard length cable you can extend the connection to your network gear from the network rack patch panel. You may have to plan for the purchase and installation of such a system.
Remember to consider the use of both copper and fiber connections. Copper Ethernet cable used for 100 Mbps communication can be no longer than 100m in length. Make sure that the combined length of your connections, via your patch panel system, does not exceed this length. Multimode fiber has a maximum distance of 2Km when running at 100 Mbps and between 220m and 500m when running at 1 Gbps.
Glass fiber cables for servers are delicate in comparison to copper. Wherever possible ensure that they are run in separate cable trays to help prevent possible damage. Also make sure that the power cables run in separate trays or conduits from the data cables to reduce the risk of damage and electrical interference. To further reduce the risk of damage, cables shouldn't hang in the air or be stretched taught.
Bundled data cables should be wrapped together with Velcro, and not plastic, tie wraps to make it easier to add additional wiring to the bunch. The bundles should be run to the sides of racks and cabinets so as not to impede airflow.
Make sure each person responsible for the racking of the servers has a correct set of tools. The most noticeable time saving tools will be electric screwdrivers. Have many and also have lots of charged replacement batteries.
Verify that each person that is going to have access to the server area has key access and parking rights at the data center beforehand.
Ensure that your Internet connectivity to the area has been secured. Verify that the network links have been installed and tested prior to your migration date. Some sites require T1 data circuit links to credit card facilities or VPNs to remote offices. Make sure these are in place and tested before the move too.
The entire relocation depends on the proper preparation of the server area but fortunately you can save time by simultaneously preparing for other aspects of the move. These will be explored next.
One of the most obvious reasons for having redundant network hardware is to help protect against hardware failure causing your Internet connectivity to fail. Another equally important reason is to help in server farm relocations.
Redundancy allows you to shutdown network equipment, move it to the new location, and preconfigure it in anticipation of the server migrations. That isn't all, there are more preparations that need to be done.
As mentioned previously, do a complete inventory of all your networking equipment. Create a comprehensive list of all the important networking information that will change as a result of the move.
Ensure you have a complete set of network diagrams that include each server and network device that will be relocated. They should include every IP address, switch port, gateway, route, and ISP circuit number.
Have separate drawings that clearly show how the network cables plug into the switches from each server. This will help illustrate whether too many of your servers are vulnerable to the failure of a single network device. Servers that play similar roles, such as database servers, should be directly connected to different switches.
Setup your new network equipment at the target data center and test connectivity ahead of time. Connectivity should include tests from the Internet and practice servers at the new location. Make sure your routing, access control lists, VPN tunnels and firewall rules all take the IP addressing scheme you will be using at the new location into account.
Special attention should be given to network monitoring. Verify that you'll be able to switch monitoring from your old server address to the new one seamlessly.
Keep close track on the provisioning of data circuits for the new location so that they are installed prior to the migration date. These circuits should not only be sized to capably handle your expected data transfer rates but also tested at various times of the day to ensure your ISP has met their contractual commitments. Some types of equipment require modem lines to provide emergency out of band technician access in the event of an emergency. This would require the additional installation of one or more POTS telephone lines.
Make sure that network cables are all re-labeled to reduce the risk of human error when the servers are reconnected to their new network.
Many managed networks have centralized error logging and authentication servers. Make sure your relocated network devices can continue to do so.
Create copies of all your network configurations, both old and new prior to the relocation. The old ones will be helpful if you have to quickly roll back the work to the original data center. The new ones will help protect against hardware failure in your new facility.
Some corporate offices use VPNs to gain access to their Web server farm. If possible, terminate some test VPN tunnels on the network equipment at the new location ahead of time. Once the migration is complete you'll have to plan for recreating a redundant network architecture in the new location. This will be covered in Chapter 3, "Post Relocation Activities".
Server preparation for the migration is probably the most complicated task because there are usually many of them with each running multiple applications that rely on the functioning of varying components. Follow these simple steps to make the job easier.
Do a complete inventory of all your servers. Create a comprehensive list of all important server information that will change as a result of the move. This can be recorded in a simple spreadsheet and would include IP addresses, subnet masks, routing gateways, backup server IP addresses, the switch ports the servers will use and server rack locations. It should also include information such as the server's name and serial number for inventory purposes.
Each server should also have its own separate worksheet document that contains all its relocation information. This would be attached to the server so that the engineers working on it would be able to instantaneously reconfigure it when it arrives at the new location. Samples of both documents are available in Appendix I, "Relocation Check Sheets".
It may seem tedious, but get a printout of all the TCP/IP ports on which the server is listening and also which clients have active connections to your server. This can be done with the
netstat -a command in most operating systems, including Windows. This will help you to identify the applications that should be running before and after the relocation and can be used as a quick check to detect any unexpected failures. It will also be helpful in more precisely restricting the TCP/IP access between servers on your new networks. Finally, it will help to identify inter-network application dependencies between servers which can be used to determine the servers that should be relocated together as part of the same group.
Most operating systems can also give you a snapshot of services or applications that should be running on startup. Get printouts of these for each server as part of the server's more comprehensive post migration system check.
Note the routing tables of all servers before the migration using the
netstat -nr command. Determine what the new routes should look like at the new location and note it down. This is especially important for noting the default gateways and also for analyzing routes on servers with either multiple NICs or routers.
Note: A server should have only one default gateway. In Linux and UNIX systems there is only a single place to enter this value. Windows servers provide the option of having a default gateway per NIC. In this case, make sure that only one NIC has a default gateway configured.
Archive all your server data. Make sure that you have a data restoration unit at the remote location that will be able to restore your data from your backup media using the same software. The recent advent of portable external hard disks using USB 2.0 connections could simplify smaller backup and restoration work.
Large databases are often stored on storage area network (SAN) and network attached storage (NAS) devices and are too large for the USB solution. In these cases data restoration by tape can be excessively long which can make this an option of last resort. With SAN and NAS data bases you may need to lease a duplicate device and clone your data to it. Disaster recovery can be much faster as the secondary device will be preconfigured to replace the failed primary one.
Another option is to restore the data ahead of time on the new NAS / SAN equipment located at the new facility. You can then set up a data circuit between the old and new data centers so that any new transactions can be replicated between the two. At any moment in time the data bases at the two locations will be synchronized.
Always remember to do practice data backups and restorations for key servers and applications.
RAID BIOS Settings
An often ignored item is the server's BIOS settings. The regular parameters are usually easy to determine as the defaults are usually sufficient. The real problem is with the BIOS metadata on hardware RAID cards. This metadata lists all the drives in each RAID set, the order in which they are accessed in the RAID set and the type of RAID being used. This often cannot be guessed. Schedule a server reboot before the relocation and enter the RAID controllers BIOS setup to record this information. Without this simple plan, a sudden jolt of a RAID card's loose onboard battery backup could cause you hours of downtime.
If possible, have a set of spare servers that can be used for spare parts or complete system replacements in case of failure.
In many cases applications on servers have to be started or shutdown in a particular sequence. Sometimes co-dependent servers have to booted in a special order. You will need to document these special procedures wherever they exist and make note of them in your project plan. It may also influence the order in which the servers are relocated to the new data center.
Perform a careful audit can help to determine the number of days over which the migration should be spread and the sequence of the server moves. This will require you to account for each application within your environment which should also include their interdependencies with every other application, their software interfaces, the firewall rules that protect them and any application batch or cron jobs they rely on. The audit will also help to determine the groupings of systems that should move together.
Base Equipment List
Create a minimum list of servers that absolutely have to be up and running in order to maintain the web farm's functionality. It will help to focus the minds of the team members in the event of multiple failures.
Create a short list of tests for each server that will be used to verify that it is functioning correctly. Don't limit this to just network connectivity, but also check that all the required applications have started correctly and that some simple business operations can be successfully completed.
Application Code Surveys
Test to make sure your applications don't use IP addresses to access information on remote servers but use DNS names instead. The relocation may force you to change the IP addresses of devices and could cause some programs to fail unless this precaution has been taken.
The general steps required to prepare your servers for the migration are not hard but the process can become difficult due to the sheer volume of information you need to track. Plan well and you should be able to have a successful project.
If the relocation requires the IP addresses of your site or servers to change then you'll have to make plans to adjust your DNS settings during the relocation.
There are two things to remember with DNS. The first is that it will take at least 48 hours for any DNS change to propagate across the Internet. The second is that every DNS entry has an associated time to live (TTL) value which defines how long DNS caching servers should store the entry for local use before being required to query the entry's authoritative DNS server to see whether there have been any changes. With this in mind, here is what needs to be done:
Set The TTL
There is no magic bullet that will allow you to tell all the caching DNS servers in the world to simultaneously flush their caches of your zone file entries. Your best alternative is to request your existing service provider to set the TTL on your web site, for example
www.myweb-site.org, in the DNS zone file to a very low value, say one minute. As the TTL is usually set to a number of days, it will take at least three to five days for all remote DNS servers to recognize the change. Once the propagation is complete, it will take only one minute to see the results of the final DNS configuration switch to your new server. If anything goes wrong, you can then revert to the old configuration, knowing it will rapidly recover within minutes rather than days.
Server Based Testing
Set up your test server in house. Edit the
/etc/hosts file to make
www.my-web-site.org refer to its own IP address, not that of the
www.my-web-site.org site that is currently in production. This file is usually given a higher priority than DNS, therefore the test server will begin to think that
www.my-web-site.org is really hosted on itself. You may also want to add an entry for
mail.my-web-site.org if the new Web server is going to also be your new mail server.
Test your server based applications from the server itself. This should include mail, Web, and so on.
Client Based Testing
Test the server from a remote client. You can test the server running as
www.my-web-site.org even though DNS hasn't been updated. Just edit your /etc/hosts file on your Web browsing Linux PC to make
www.my-web-site.org map to the IP address of the new server. In the case of Windows, the file would be
C:\WINDOWS\system32\drivers\etc\hosts. You may also want to add an entry for
mail.my-web-site.org if the new Web server is going to also be your new mail server. Your client will usually refer to these files first before checking DNS, hence you can use them to predefine some DNS lookups at the local client level only.
Check All Domains
Make sure similar steps are taken for all your DNS domains. Remember to also update the DNS entries for your mail servers, they are generally located in a different section of the DNS zone file and can be easily overlooked.
Prepare to Switch
Once testing is completed, coordinate with your Web hosting provider to update your domain registration's DNS records for
www.my-web-site.org to point to your new Web server at the time of the relocation.
Plan to change your DNS TTL at least a week before the expected migration to limit it risking the success of your project. DNS management is probably the easiest task to accomplish but poor DNS planning can unexpectedly delay your project with your only recourse being to sit and wait for the changes to propagate.
A relocation would certainly not succeed without adequate transportation therefore it should be planned well. Here are some factors to consider.
Moving Company Selection
Get multiple quotations from movers, preferably with each provider giving a guaranteed maximum price for the job. Cost shouldn't be the only factor. Investigate the reputation, staff training, moving van cleanliness and safety, performance record, reliability, and the claims settlement customer service of each moving company. Visit the mover's office to verify they have a business. Determine whether the company belongs to a trade organization that requires a code of ethics and operation. Carefully consider whether the staff are people you want to do business with.
If you choose to rent or use movers, have guarantees that the transportation will arrive on time.
For large quantities of servers you'll need to have racks or shelving preinstalled to accommodate as many servers as possible. You should also ensure that the servers are securely fastened to prevent shifting in transit. Do not stack servers one on top of the other as this increases the risk of damage.
Some of the more delicate devices may have to be specially wrapped for their protection in bubble wrap or foam. Some may have to be bolted to shipping pallets and mechanically moved. Finally, some vendors with full coverage maintenance contracts my stipulate that their staff be the only persons authorized to package the equipment. Make adequate preparations in advance.
Servers can be heavy. Get access to hand carts or wheeled dollies on which the servers can be manually pushed within the buildings. If practical, rent ramps to reduce the need to manually transfer servers at the various stages of transportation along the way.
You may have to insure the equipment prior to the relocation. Check to make sure the selected moving company carries the required insurance coverage. If they break or lose something, you should request that it be fixed or replaced to the limits of their liability. There can be many clauses to this type of coverage, make sure the mover clearly explains the extent of your exposure.
Transportation is often given the least amount of thought and servers will inevitably be carried on the back seat of cars. Avoid this as much as possible, renting a truck or using professional movers will be much faster, easier to track, less prone to equipment damage and easier to insure. You may be tempted to save money here, but the few dollars spent on ensuring proper transportation can save thousands in potential down time.
The preparative tasks for a server farm relocation can be complex but with the right tools and planning it can be very manageable. Sample check lists, and post mortem forms are available in Appendix I "Relocation Check Sheets". Chapter 3, "Post Relocation Activities" will begin by discussing what needs to be done during the relocation and will end with a number of activities that need to be completed once the project appears to be over. Most importantly, it specifically outlines what to do if things start to go wrong.