Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Cloud Computing

120 Articles
article-image-monitoring-physical-network-bandwidth-using-openstack-ceilometer
Packt
14 Oct 2015
9 min read
Save for later

Monitoring Physical Network Bandwidth Using OpenStack Ceilometer

Packt
14 Oct 2015
9 min read
In this article by Chandan Dutta Chowdhury and Sriram Subramanian, author of the book, OpenStack Networking Cookbook, we will explore various means to monitor network resource utilization using Ceilometer. (For more resources related to this topic, see here.) Introduction Due to the dynamic nature of virtual infrastructure and multitenancy, OpenStack administrators need to monitor the resources used by tenants. The resource utilization data can be used to bill the users of a public cloud and to debug infrastructure-related problems. Administrators can also use the utilization reports for better capacity planning. In this post, we will look at OpenStack Ceilometer to meter the physical network resource utilization. Ceilometer components The OpenStack Ceilometer project provides you with telemetry services. It can collect the statistics of resource utilization and also provide threshold monitoring and alarm services. The Ceilometer project is composed of the following components: ceilometer-agent-central: This component is a centralized agent that collects the monitoring statistics by polling individual resources ceilometer-agent-notification: This agent runs on a central server and consumes notification events on the OpenStack message bus ceilometer-collector: This component is responsible for dispatching events to the data stores ceilometer-api: This component provides the REST API server that the Ceilometer clients interact with ceilometer-agent-compute: This agent runs on each compute node and collects resource utilization statistics for the local node ceilometer-alarm-notifier: This component runs on the central server and is responsible for sending alarms ceilometer-alarm-evaluator: This componentmonitors the resource utilization and evaluates the conditions to raise an alarm when the resource utilization crosses predefined thresholds The following diagram describes how the Ceilometer components are installed on a typical OpenStack deployment: Installation An installation of any typical OpenStack service consists of the following steps: Creating a catalog entry in keystone for the service Installing the service packages that provide you with a REST API server to access the service Creating a catalog entry The following steps describe the creation of a catalog entry for the Ceilometer service: An OpenStack service requires a user associated with it for the authorization and authentication. To create a service user, use the following command and provide the password when prompted: openstack user create --password-prompt ceilometer This service user will need to have administrative privileges. Use the following command in order to add an administrative role to the service user: openstack role add --project service --user ceilometer admin Next, we will create a service record for the metering service: openstack service create --name ceilometer --description "Telemetry" metering The final step in the creation of the service catalog is to associate this service with its API endpoints. These endpoints are REST URLs to access the OpenStack service and provide access for public, administrative, and internal use: openstack endpoint create --publicurl http://controller:8777 --internalurl http://controller:8777 --adminurl http://controller:8777 --region RegionOne metering Installation of the service packages Let's now install the packages required to provide the metering service. On the Ubuntu system, use the following command to install the Ceilometer packages: apt-get install ceilometer-api ceilometer-collector ceilometer-agent-central ceilometer-agent-notification ceilometer-alarm-evaluator ceilometer-alarm-notifier python-ceilometerclient Configuration The configuration of an OpenStack service requires the following information: Service database-related details such as the database server address, database name, and login credentials in order to access the database. Ceilometer uses MongoDB to store the resource monitoring data. A MongoDB database user with appropriate access for this service should be created. Conventionally, the user and database name matches the name of the OpenStack service, for example, for the metering service, we will use the database user and database name as ceilometer. A keystone user associated with the service (one that we created in the installation section). The credentials to connect to the OpenStack message bus such as rabbitmq. All these details need to be provided in the service configuration file. The main configuration file for Ceilometer is /etc/ceilometer/ceilometer.conf. The following is a sample template for the Ceilometer configuration. The placeholders mentioned here must be replaced appropriately. CEILOMETER_PASS: This is the password for the Ceilometer service user RABBIT_PASS: This is the message queue password for an OpenStack user CEILOMETER_DBPASS: This is the MongoDB password for the Ceilometer database TELEMETRY_SECRET: This is the secret key used by Ceilometer to sign messages Let's have a look at the following code: [DEFAULT] ... auth_strategy = keystone rpc_backend = rabbit [keystone_authtoken] ... auth_uri = http://controller:5000/v2.0 identity_uri = http://controller:35357 admin_tenant_name = service admin_user = ceilometer admin_password = CEILOMETER_PASS [oslo_messaging_rabbit] ... rabbit_host = controller rabbit_userid = openstack rabbit_password = RABBIT_PASS [database] ... connection = mongodb://ceilometer:CEILOMETER_DBPASS@controller:27017/ceilometer [service_credentials] ... os_auth_url = http://controller:5000/v2.0 os_username = ceilometer os_tenant_name = service os_password = CEILOMETER_PASS os_endpoint_type = internalURL os_region_name = RegionOne [publisher] ... telemetry_secret = TELEMETRY_SECRET Starting the metering service Once the configuration is done, make sure that all the Ceilometer components are restarted in order to use the updated configuration: # service ceilometer-agent-central restart # service ceilometer-agent-notification restart # service ceilometer-api restart # service ceilometer-collector restart # service ceilometer-alarm-evaluator restart # service ceilometer-alarm-notifier restart The Ceilometer compute agent must be installed and configured on the Compute node to monitor the OpenStack virtual resources. In this article, we will focus on using the SNMP daemon running on the Compute and Network node to monitor the physical resource utilization data. Physical Network monitoring The SNMP daemon must be installed and configured on all the OpenStack Compute and Network nodes for which we intend to monitor the physical network bandwidth. The Ceilometer central agent polls the SNMP daemon on the OpenStack Compute and Network nodes to collect the physical resource utilization data. The SNMP configuration The following steps describe the process of the installation and configuration of the SNMP daemon: To configure the SNMP daemon on the OpenStack nodes, you will need to install the following packages: apt-get install snmpd snmp Update the SNMP configuration file, /etc/snmp/snmpd.conf, to bind the daemon on all the network interfaces: agentAddress udp:161 Update the /etc/snmp/snmpd.conf file to create an SNMP community and to view and provide access to the required MIBs. The following configuration assumes that the OpenStack nodes, where the Ceilometer central agent runs, have addresses in the 192.168.0.0/24 Subnet: com2sec OS_net_sec 192.168.0.0/24 OS_community group OS_Group any OS_net_sec view OS_view included .1 80 access OS_Group "" any noauth 0 OS_view none none Next, restart the SNMP daemon with the updated configuration: service snmpd restart You can verify that the SNMP data is accessible from the Ceilometer central agent by running the following snmpwalk command from the nodes that run the central agent. In this example, 192.168.0.2 is the IP of the Compute node running the SNMP daemon while the snmpwalk command itself is executed on the node running the central agent. You should see a list of SNMP OIDs and their values: snmpwalk -mALL -v2c -cOS_community 192.168.0.2 The Ceilometer data source configuration Once the SNMP configuration is completed on the Compute and Network nodes, a Ceilometer data source must be configured for these nodes. A pipeline defines the transformation of the data collected by Ceilometer. The pipeline definition consists of a data source and sink. A source defines the start of the pipeline and is associated with meters, resources, and a collection interval. It also defines the associated sink. A sink defines the end of the transformation pipeline, transformation applied on the data, and the method used to publish the transformed data. Sending a notification over the message bus is the commonly used publishing mechanism. To use data samples provided by the SNMP daemon on the Compute and Network nodes, we will configure a meter_snmp data source with a predefined meter_sink in the Ceilometer pipeline on the Central node. We will use the OpenStack Controller node to run the central agent. The data source will also map the central agent to the Compute and Network nodes. In the /etc/ceilometer/pipeline.yaml file, add the following lines. This configuration will allow the central agent to poll the SNMP daemon on three OpenStack nodes with the IPs of 192.168.0.2, 192.168.0.3 and 192.168.0.4: - name: meter_snmp interval: 60 resources: - snmp://[email protected] - snmp://[email protected] - snmp://[email protected] meters: - "hardware.cpu*" - "hardware.memory*" - "hardware.disk*" - "hardware.network*" sinks: - meter_sink Restart the Ceilometer central agent in order to load the pipeline configuration change: # service ceilometer-agent-central restart Sample Network usage data With the SNMP daemon configured, the data source in place, and the central agent reloaded, the statistics of the physical network utilization will be collected by Ceilometer. Use the following commands to view the utilization data. ceilometer statistics -m hardware.network.incoming.datagram -q resource=<node-ip> You can use various meters such as hardware.network.incoming.datagram and hardware.network.outgoing.datagram. The following image shows some sample data: All the meters associated with the physical resource monitoring for a host can be viewed using the following command, where 192.168.0.2 is the node IP as defined in the pipeline SNMP source: ceilometer meter-list|grep 192.168.0.2 The following is the output of the preceding command: How does it work The SNMP daemon runs on each Compute and Network node. It provides a measure of the physical resources usage on the local node. The data provided by the SNMP daemon on the various Compute and Network nodes is collected by the Ceilometer central agent. The Ceilometer central agent must be told about the various nodes that are capable of providing an SNMP-based usage sample; this mapping is provided in the Ceilometer pipeline definition. A single central agent can collect the data from multiple nodes or multiple central agents can be used to partition the data collection task with each central agent collecting the data for a few nodes. Summary In this article, we learned the Ceilometer components, its installation and configuration. We also learned about monitoring physical network. You can also refer to the following books by Packt Publishing that can help you build your knowledge on OpenStack: • OpenStack Cloud Computing Cookbook, Second edition by Kevin Jackson and Cody Bunch• Learning OpenStack Networking (Neutron) by James Denton• Implementing Cloud Storage with OpenStack Swift by Amar Kapadia, Sreedhar Varma, and Kris Rajana• VMware vCloud Director Cookbook by Daniel Langenhan Resources for Article: Further resources on this subject: Using the OpenStack Dash-board[article] The orchestration service for OpenStack[article] Setting up VPNaaS in OpenStack [article]
Read more
  • 0
  • 0
  • 3448

article-image-monitoring-openstack-networks
Packt
07 Oct 2015
6 min read
Save for later

Monitoring OpenStack Networks

Packt
07 Oct 2015
6 min read
In this article by Chandan Dutta Chowdhury and Sriram Subramanian, authors of the book OpenStack Networking Cookbook, we will explore various means to monitor the network resource utilization using Ceilometer. We will cover the following topics: Virtual Machine bandwidth monitoring L3 bandwidth monitoring (For more resources related to this topic, see here.) Introduction Due to the dynamic nature of virtual infrastructure and multiple users sharing the same cloud platform, the OpenStack administrator needs to track how the tenants use the resources. The data can also help in capacity planning by giving an estimate of the capacity of the physical devices and the trends of resource usage. An OpenStack Ceilometer project provides you with telemetry service. It can measure the usage of the resources by collecting statistics across the various OpenStack components. The usage data is collected over the message bus or by polling the various components. The OpenStack Neutron provides Ceilometer with the statistics that are related to the virtual networks. The following figure shows you how Ceilometer interacts with the Neutron and Nova services: To implement these recipes, we will use an OpenStack setup as described in the following screenshot: This setup has two compute nodes and one node for the controller and networking services. Virtual Machine bandwidth monitoring OpenStack Ceilometer collects the resource utilization of virtual machines by running a Ceilometer compute agent on all the compute nodes. These agents collect the various metrics that are related to each virtual machine running on the compute node. The data that is collected is periodically sent to the Ceilometer collector over the message bus. In this recipe, we will learn how to use the Ceilometer client to check the bandwidth utilization by a virtual machine. Getting ready For this recipe, you will need the following information: The SSH login credentials for a node where the OpenStack client packages are installed A shell RC file that initializes the environment variables for CLI How to do it… The following steps will show you how to determine the bandwidth utilization of a virtual machine: Using the appropriate credentials, SSH into the OpenStack node installed with the OpenStack client packages. Source the shell RC file to initialize the environment variables required for the CLI commands. Use the nova list command to find the ID of the virtual machine instance that is to be monitored: Use the ceilometer resource-list| grep <virtual-machine-id> command to find the resource ID of the network port associated with the virtual machine. Note down the resource ID for the virtual port associated to the virtual machine for use in the later commands. The virtual port resource ID is a combination of the virtual machine ID and the name of the tap interface for the virtual port. It's named in the form instance-<virtual-machine-id>-<tap-interface-name>: Use ceilometer meter-list –q resource=<virtual-port-resource-id> to find the meters associated with the network port on the virtual machine: Next, use ceilometer statistics –m <meter-name> –q resource=<virtual-port-resource-id> to view the network usage statistics. Use the meters that we discovered in the last step to view the associated data: Ceilometer stores the port bandwidth data for the incoming and outgoing packets and the bytes and their rates. How it works… The OpenStack Ceilometer compute agent collects the statistics related to the network port connected to the virtual machines and posts them on the message bus. These statistics are collected by the Ceilometer collector daemon. The Ceilometer client can be used to query a meter and filter the statistical data based on the resource ID. L3 bandwidth monitoring The OpenStack Neutron provides you with metering commands in order to enable the Layer 3 (L3) traffic monitoring. The metering commands create a label that can hold a list of the packet matching rules. Neutron counts and associates any L3 packet that matches these rules with the metering label. In this recipe, we will learn how to use the L3 traffic monitoring commands of Neutron to enable packet counting. Getting ready For this recipe, we will use a virtual machine that is connected to a network that, in turn, is connected to a router. The following figure describes the topology: We will use a network called private with CIDR of 10.10.10.0/24. For this recipe, you will need the following information: The SSH login credentials for a node where the OpenStack client packages are installed A shell RC file that initializes the environment variables for CLI The name of the L3 metering label The CIDR for which the traffic needs to be measured How to do it… The following steps will show you how to enable the monitoring traffic to or from any L3 network: Using the appropriate credentials, SSH into the OpenStack node installed with the OpenStack client packages. Source the shell RC file to initialize the environment variables required for the CLI commands. Use the Neutron meter-label-create command to create a metering label. Note the label ID as this will be used later with the Ceilometer commands: Use the Neutron meter-label-rule-create command to create a rule that associates a network address to the label that we created in the last step. In our case, we will count any packet that reaches the gateway from the CIDR 10.10.10.0/24 network to which the virtual machine is connected: Use the ceilometer meter-list command with the resource filter to find the meters associated with the label resource: Use the ceilometer statistics command to view the number of packets matching the metering label: The packet counting is now enabled and the bandwidth statistics can be viewed using Ceilometer. How it works… The Neutron monitoring agent implements the packet counting meter in the L3 router. It uses iptables to implement a packet counter. The Neutron agent collects the counter statistics periodically and posts them on the message bus, which is collected by the Ceilometer collector daemon. Summary In this article, we learned about ways to monitor the usage of virtual and physical networking resources. The resource utilization data can be used to bill the users of a public cloud and debug the infrastructure-related problems. Resources for Article: Further resources on this subject: Using the OpenStack Dash-board [Article] Installing Red Hat CloudForms on Red Hat OpenStack [Article] Securing OpenStack Networking [Article]
Read more
  • 0
  • 0
  • 4200

article-image-deploying-highly-available-openstack
Packt
21 Sep 2015
17 min read
Save for later

Deploying Highly Available OpenStack

Packt
21 Sep 2015
17 min read
In this article by Arthur Berezin, the author of the book OpenStack Configuration Cookbook, we will cover the following topics: Installing Pacemaker Installing HAProxy Configuring Galera cluster for MariaDB Installing RabbitMQ with mirrored queues Configuring highly available OpenStack services (For more resources related to this topic, see here.) Many organizations choose OpenStack for its distributed architecture and ability to deliver the Infrastructure as a Service (IaaS) platform for mission-critical applications. In such environments, it is crucial to configure all OpenStack services in a highly available configuration to provide as much possible uptime for the control plane services of the cloud. Deploying a highly available control plane for OpenStack can be achieved in various configurations. Each of these configurations would serve certain set of demands and introduce a growing set of prerequisites. Pacemaker is used to create active-active clusters to guarantee services' resilience to possible faults. Pacemaker is also used to create a virtual IP addresses for each of the services. HAProxy serves as a load balancer for incoming calls to service's APIs. This article discusses neither high availably of virtual machine instances nor Nova-Compute service of the hypervisor. Most of the OpenStack services are stateless, OpenStack services store persistent in a SQL database, which is potentially a single point of failure we should make highly available. In this article, we will deploy a highly available database using MariaDB and Galera, which implements multimaster replication. To ensure availability of the message bus, we will configure RabbitMQ with mirrored queues. This article discusses configuring each service separately on three controllers' layout that runs OpenStack controller services, including Neutron, database, and RabbitMQ message bus. All can be configured on several controller nodes, or each service could be implemented on its separate set of hosts. Installing Pacemaker All OpenStack services consist of system Linux services. The first step of ensuring services' availability is to configure Pacemaker clusters for each service, so Pacemaker monitors the services. In case of failure, Pacemaker restarts the failed service. In addition, we will use Pacemaker to create a virtual IP address for each of OpenStack's services to ensure services are accessible using the same IP address when failures occurs and the actual service has relocated to another host. In this section, we will install Pacemaker and prepare it to configure highly available OpenStack services. Getting ready To ensure maximum availability, we will install and configure three hosts to serve as controller nodes. Prepare three controller hosts with identical hardware and network layout. We will base our configuration for most of the OpenStack services on the configuration used in a single controller layout, and we will deploy Neutron network services on all three controller nodes. How to do it… Run the following steps on three highly available controller nodes: Install pacemaker packages: [root@controller1 ~]# yum install -y pcs pacemaker corosync fence-agents-all resource-agents Enable and start the pcsd service: [root@controller1 ~]# systemctl enable pcsd [root@controller1 ~]# systemctl start pcsd Set a password for hacluster user; the password should be identical on all the nodes: [root@controller1 ~]# echo 'password' | passwd --stdin hacluster We will use the hacluster password through the HAProxy configuration. Authenticate all controller nodes running using -p option to give the password on the command line, and provide the same password you have set in the previous step: [root@controller1 ~] # pcs cluster auth controller1 controller2 controller3 -u hacluster -p password --force At this point, you may run pcs commands from a single controller node instead of running commands on each node separately. [root@controller1 ~]# rabbitmqctl set_policy HA '^(?!amq.).*' '{"ha-mode": "all"}' There's more... You may find the complete Pacemaker documentation, which includes installation documentation, complete configuration reference, and examples in Cluster Labs website at http://clusterlabs.org/doc/. Installing HAProxy Addressing high availability for OpenStack includes avoiding high load of a single host and ensuring incoming TCP connections to all API endpoints are balanced across the controller hosts. We will use HAProxy, an open source load balancer, which is particularly suited for HTTP load balancing as it supports session persistence and layer 7 processing. Getting ready In this section, we will install HAProxy on all controller hosts, configure Pacemaker cluster for HAProxy services, and prepare for OpenStack services configuration. How to do it... Run the following steps on all controller nodes: Install HAProxy package: # yum install -y haproxy Enable nonlocal binding Kernel parameter: # echo net.ipv4.ip_nonlocal_bind=1 >> /etc/sysctl.d/haproxy.conf # echo 1 > /proc/sys/net/ipv4/ip_nonlocal_bind Configure HAProxy load balancer settings for the GaleraDB, RabbitMQ, and Keystone service as shown in the following diagram: Edit /etc/haproxy/haproxy.cfg with the following configuration: global    daemon defaults    mode tcp    maxconn 10000    timeout connect 2s    timeout client 10s    timeout server 10s   frontend vip-db    bind 192.168.16.200:3306    timeout client 90s    default_backend db-vms-galera   backend db-vms-galera    option httpchk    stick-table type ip size 2    stick on dst    timeout server 90s    server rhos5-db1 192.168.16.58:3306 check inter 1s port 9200    server rhos5-db2 192.168.16.59:3306 check inter 1s port 9200    server rhos5-db3 192.168.16.60:3306 check inter 1s port 9200   frontend vip-rabbitmq    bind 192.168.16.213:5672    timeout client 900m    default_backend rabbitmq-vms   backend rabbitmq-vms    balance roundrobin    timeout server 900m    server rhos5-rabbitmq1 192.168.16.61:5672 check inter 1s    server rhos5-rabbitmq2 192.168.16.62:5672 check inter 1s    server rhos5-rabbitmq3 192.168.16.63:5672 check inter 1s   frontend vip-keystone-admin    bind 192.168.16.202:35357    default_backend keystone-admin-vms backend keystone-admin-vms    balance roundrobin    server rhos5-keystone1 192.168.16.64:35357 check inter 1s    server rhos5-keystone2 192.168.16.65:35357 check inter 1s    server rhos5-keystone3 192.168.16.66:35357 check inter 1s   frontend vip-keystone-public    bind 192.168.16.202:5000    default_backend keystone-public-vms backend keystone-public-vms    balance roundrobin    server rhos5-keystone1 192.168.16.64:5000 check inter 1s    server rhos5-keystone2 192.168.16.65:5000 check inter 1s    server rhos5-keystone3 192.168.16.66:5000 check inter 1s This configuration file is an example for configuring HAProxy with load balancer for the MariaDB, RabbitMQ, and Keystone service. We need to authenticate on all nodes before we are allowed to change the configuration to configure all nodes from one point. Use the previously configured hacluster user and password to do this. # pcs cluster auth controller1 controller2 controller3 -u hacluster -p password --force Create a Pacemaker cluster for HAProxy service as follows: Note that you can run pcs commands now from a single controller node. # pcs cluster setup --name ha-controller controller1 controller2 controller3 # pcs cluster enable --all # pcs cluster start --all Finally, using pcs resource create command, create a cloned systemd resource that will run a highly available active-active HAProxy service on all controller hosts: pcs resource create lb-haproxy systemd:haproxy op monitor start-delay=10s --clone Create the virtual IP address for each of the services: # pcs resource create vip-db IPaddr2 ip=192.168.16.200 # pcs resource create vip-rabbitmq IPaddr2 ip=192.168.16.213 # pcs resource create vip-keystone IPaddr2 ip=192.168.16.202 You may use pcs status command to verify whether all resources are successfully running: # pcs status Configuring Galera cluster for MariaDB Galera is a multimaster cluster for MariaDB, which is based on synchronous replication between all cluster nodes. Effectively, Galera treats a cluster of MariaDB nodes as one single master node that reads and writes to all nodes. Galera replication happens at transaction commit time, by broadcasting transaction write set to the cluster for application. Client connects directly to the DBMS and experiences close to the native DBMS behavior. wsrep API (write set replication API) defines the interface between Galera replication and the DBMS: Getting ready In this section, we will install Galera cluster packages for MariaDB on our three controller nodes, then we will configure Pacemaker to monitor all Galera services. Pacemaker can be stopped on all cluster nodes, as shown, if it is running from previous steps: # pcs cluster stop --all How to do it.. Perform the following steps on all controller nodes: Install galera packages for MariaDB: # yum install -y mariadb-galera-server xinetd resource-agents Edit /etc/sysconfig/clustercheck and add the following lines: MYSQL_USERNAME="clustercheck" MYSQL_PASSWORD="password" MYSQL_HOST="localhost" Edit Galera configuration file /etc/my.cnf.d/galera.cnf with the following lines: Make sure to enter host's IP address at the bind-address parameter. [mysqld] skip-name-resolve=1 binlog_format=ROW default-storage-engine=innodb innodb_autoinc_lock_mode=2 innodb_locks_unsafe_for_binlog=1 query_cache_size=0 query_cache_type=0 bind-address=[host-IP-address] wsrep_provider=/usr/lib64/galera/libgalera_smm.so wsrep_cluster_name="galera_cluster" wsrep_slave_threads=1 wsrep_certify_nonPK=1 wsrep_max_ws_rows=131072 wsrep_max_ws_size=1073741824 wsrep_debug=0 wsrep_convert_LOCK_to_trx=0 wsrep_retry_autocommit=1 wsrep_auto_increment_control=1 wsrep_drupal_282555_workaround=0 wsrep_causal_reads=0 wsrep_notify_cmd= wsrep_sst_method=rsync You can learn more on each of the Galera's default options on the documentation page at http://galeracluster.com/documentation-webpages/configuration.html. Add the following lines to the xinetd configuration file /etc/xinetd.d/galera-monitor: service galera-monitor {        port           = 9200        disable         = no        socket_type     = stream        protocol       = tcp        wait           = no        user           = root        group           = root        groups         = yes        server         = /usr/bin/clustercheck        type           = UNLISTED        per_source     = UNLIMITED        log_on_success =        log_on_failure = HOST        flags           = REUSE } Start and enable the xinetd service: # systemctl enable xinetd # systemctl start xinetd # systemctl enable pcsd # systemctl start pcsd Authenticate on all nodes. Use the previously configured hacluster user and password to do this as follows: # pcs cluster auth controller1 controller2 controller3 -u hacluster -p password --force Now commands can be run from a single controller node. Create a Pacemaker cluster for Galera service: # pcs cluster setup --name controller-db controller1 controller2 controller3 # pcs cluster enable --all # pcs cluster start --all Add the Galera service resource to the Galera Pacemaker cluster: # pcs resource create galera galera enable_creation=true wsrep_cluster_address="gcomm://controller1,controller2,controll er3" meta master-max=3 ordered=true op promote timeout=300s on- fail=block --master Create a user for CLusterCheck xinetd service: mysql -e "CREATE USER 'clustercheck'@'localhost' IDENTIFIED BY 'password';" See also You can find the complete Galera documentation, which includes installation documentation and complete configuration reference and examples in Galera cluster website at http://galeracluster.com/documentation-webpages/. Installing RabbitMQ with mirrored queues RabbitMQ is used as a message bus for services to inner-communicate. The queues are located on a single node that makes the RabbitMQ service a single point of failure. To avoid RabbitMQ being a single point of failure, we will configure RabbitMQ to use mirrored queues across multiple nodes. Each mirrored queue consists of one master and one or more slaves, with the oldest slave being promoted to the new master if the old master disappears for any reason. Messages published to the queue are replicated to all slaves. Getting Ready In this section, we will install RabbitMQ packages on our three controller nodes and configure RabbitMQ to mirror its queues across all controller nodes, then we will configure Pacemaker to monitor all RabbitMQ services. How to do it.. Perform the following steps on all controller nodes: Install RabbitMQ packages on all controller nodes: # yum -y install rabbitmq-server Start and enable rabbitmq-server service: # systemctl start rabbitmq-server # systemctl stop rabbitmq-server RabbitMQ cluster nodes use a cookie to determine whether they are allowed to communicate with each other; for nodes to be able to communicate, they must have the same cookie. Copy erlang.cookie from controller1 to controller2 and controller3: [root@controller1 ~]# scp /var/lib/rabbitmq/.erlang.cookie root@controller2:/var/lib/rabbitmq/ [root@controller1 ~]## scp /var/lib/rabbitmq/.erlang.cookie root@controller3:/var/lib/rabbitmq/ Start and enable Pacemaker on all nodes: # systemctl enable pcsd # systemctl start pcsd Since we already authenticated all nodes of the cluster in the previous section, we can now run following commands on controller1. Create a new Pacemaker cluster for RabbitMQ service as follows: [root@controller1 ~]# pcs cluster setup --name rabbitmq controller1 controller2 controller3 [root@controller1 ~]# pcs cluster enable --all [root@controller1 ~]# pcs cluster start --all To the Pacemaker cluster, add a systemd resource for RabbitMQ service: [root@controller1 ~]# pcs resource create rabbitmq-server systemd:rabbitmq-server op monitor start-delay=20s --clone Since all RabbitMQ nodes must join the cluster one at a time, stop RabbitMQ on controller2 and controller3: [root@controller2 ~]# rabbitmqctl stop_app [root@controller3 ~]# rabbitmqctl stop_app Join controller2 to the cluster and start RabbitMQ on it: [root@controller2 ~]# rabbitmqctl join_cluster rabbit@controller1 [root@controller2 ~]# rabbitmqctl start_app Now join controller3 to the cluster as well and start RabbitMQ on it: [root@controller3 ~]# rabbitmqctl join_cluster rabbit@controller1 [root@controller3 ~]# rabbitmqctl start_app At this point, the cluster should be configured and we need to set RabbitMQ's HA policy to mirror the queues to all RabbitMQ cluster nodes as follows: There's more.. The RabbitMQ cluster should be configured with all the queues cloned to all controller nodes. To verify cluster's state, you can use the rabbitmqctl cluster_status and rabbitmqctl list_policies commands from each of controller nodes as follows: [root@controller1 ~]# rabbitmqctl cluster_status [root@controller1 ~]# rabbitmqctl list_policies To verify Pacemaker's cluster status, you may use pcs status command as follows: [root@controller1 ~]# pcs status See also For a complete documentation on how RabbitMQ implements the mirrored queues feature and additional configuration options, you can refer to project's documentation pages at https://www.rabbitmq.com/clustering.html and https://www.rabbitmq.com/ha.html. Configuring Highly OpenStack Services Most OpenStack services are stateless web services that keep persistent data on a SQL database and use a message bus for inner-service communication. We will use Pacemaker and HAProxy to run OpenStack services in an active-active highly available configuration, so traffic for each of the services is load balanced across all controller nodes and cloud can be easily scaled out to more controller nodes if needed. We will configure Pacemaker clusters for each of the services that will run on all controller nodes. We will also use Pacemaker to create a virtual IP addresses for each of OpenStack's services, so rather than addressing a specific node, services will be addressed by their corresponding virtual IP address. We will use HAProxy to load balance incoming requests to the services across all controller nodes. Get Ready In this section, we will use the virtual IP address we created for the services with Pacemaker and HAProxy in previous sections. We will also configure OpenStack services to use the highly available Galera-clustered database, and RabbitMQ with mirrored queues. This is an example for the Keystone service. Please refer to the Packt website URL here for complete configuration of all OpenStack services. How to do it.. Perform the following steps on all controller nodes: Install the Keystone service on all controller nodes: yum install -y openstack-keystone openstack-utils openstack-selinux Generate a Keystone service token on controller1 and copy it to controller2 and controller3 using scp: [root@controller1 ~]# export SERVICE_TOKEN=$(openssl rand -hex 10) [root@controller1 ~]# echo $SERVICE_TOKEN > ~/keystone_admin_token [root@controller1 ~]# scp ~/keystone_admin_token root@controller2:~/keystone_admin_token Export the Keystone service token on controller2 and controller3 as well: [root@controller2 ~]# export SERVICE_TOKEN=$(cat ~/keystone_admin_token) [root@controller3 ~]# export SERVICE_TOKEN=$(cat ~/keystone_admin_token) Note: Perform the following commands on all controller nodes. Configure the Keystone service on all controller nodes to use vip-rabbit: # openstack-config --set /etc/keystone/keystone.conf DEFAULT admin_token $SERVICE_TOKEN # openstack-config --set /etc/keystone/keystone.conf DEFAULT rabbit_host vip-rabbitmq Configure the Keystone service endpoints to point to Keystone virtual IP: # openstack-config --set /etc/keystone/keystone.conf DEFAULT admin_endpoint 'http://vip-keystone:%(admin_port)s/' # openstack-config --set /etc/keystone/keystone.conf DEFAULT public_endpoint 'http://vip-keystone:%(public_port)s/' Configure Keystone to connect to the SQL databases use Galera cluster virtual IP: # openstack-config --set /etc/keystone/keystone.conf database connection mysql://keystone:keystonetest@vip-mysql/keystone # openstack-config --set /etc/keystone/keystone.conf database max_retries -1 On controller1, create Keystone KPI and sync the database: [root@controller1 ~]# keystone-manage pki_setup --keystone-user keystone --keystone-group keystone [root@controller1 ~]# chown -R keystone:keystone /var/log/keystone   /etc/keystone/ssl/ [root@controller1 ~] su keystone -s /bin/sh -c "keystone-manage db_sync" Using scp, copy Keystone SSL certificates from controller1 to controller2 and controller3: [root@controller1 ~]# rsync -av /etc/keystone/ssl/ controller2:/etc/keystone/ssl/ [root@controller1 ~]# rsync -av /etc/keystone/ssl/ controller3:/etc/keystone/ssl/ Make sure that Keystone user is owner of newly copied files controller2 and controller3: [root@controller2 ~]# chown -R keystone:keystone /etc/keystone/ssl/ [root@controller3 ~]# chown -R keystone:keystone /etc/keystone/ssl/ Create a systemd resource for the Keystone service, use --clone to ensure it runs with active-active configuration: [root@controller1 ~]# pcs resource create keystone systemd:openstack-keystone op monitor start-delay=10s --clone Create endpoint and user account for Keystone with the Keystone VIP as given: [root@controller1 ~]# export SERVICE_ENDPOINT="http://vip-keystone:35357/v2.0" [root@controller1 ~]# keystone service-create --name=keystone --type=identity --description="Keystone Identity Service" [root@controller1 ~]# keystone endpoint-create --service keystone --publicurl 'http://vip-keystone:5000/v2.0' --adminurl 'http://vip-keystone:35357/v2.0' --internalurl 'http://vip-keystone:5000/v2.0'   [root@controller1 ~]# keystone user-create --name admin --pass keystonetest [root@controller1 ~]# keystone role-create --name admin [root@controller1 ~]# keystone tenant-create --name admin [root@controller1 ~]# keystone user-role-add --user admin --role admin --tenant admin Create all controller nodes on a keystonerc_admin file with OpenStack admin credentials using the Keystone VIP: cat > ~/keystonerc_admin << EOF export OS_USERNAME=admin export OS_TENANT_NAME=admin export OS_PASSWORD=password export OS_AUTH_URL=http://vip-keystone:35357/v2.0/ export PS1='[u@h W(keystone_admin)]$ ' EOF Source the keystonerc_admin credentials file to be able to run the authenticated OpenStack commands: [root@controller1 ~]# source ~/keystonerc_admin At this point, you should be able to execute the Keystone commands and create the Services tenant: [root@controller1 ~]# keystone tenant-create --name services --description "Services Tenant" Summary In this article, we have covered the installation of Pacemaker and HAProxy, configuration of Galera cluster for MariaDB, installation of RabbitMQ with mirrored queues, and configuration of highly available OpenStack services. Resources for Article: Further resources on this subject: Using the OpenStack Dash-board [article] Installing OpenStack Swift [article] Architecture and Component Overview [article]
Read more
  • 0
  • 0
  • 5266
Banner background image

article-image-how-to-run-code-in-the-cloud-with-aws-lambda
Ankit Patial
11 Sep 2015
5 min read
Save for later

How to Run Code in the Cloud with AWS Lambda

Ankit Patial
11 Sep 2015
5 min read
AWS Lambda is a new compute service introduced by AWS to run a piece of code in response to events. The source of these events can be AWS S3, AWS SNS, AWS Kinesis, AWS Cognito and User Application using AWS-SDK. The idea behind this is to create backend services that are cost effective and highly scaleable. If you believe in the Unix Philosophy and you build your applications as components, then AWS Lambda is a nice feature that you can make use of. Some of Its Benefits Cost-effective: AWS Lambdas are not always executing, they are triggered on certain events and have a maximum execution time of 60 seconds (it's a lots of time to do many operations, but not all). There is zero wastage, and a maximum savings on resources used. No hassle of maintaining infrastructure: Create Lambda and forget. There is no need to worry about scaling infrastructure as load increases. It will be all done automatically by AWS. Integrations with other AWS service: The AWS Lambda function can be triggered in response to various events of other AWS Services. The following are services that can trigger a Lambda: AWS S3 AWS SNS(Publish) AWS Kinesis AWS Cognito Custom call using aws-sdk Creating a Lambda function First, login to your AWS account(create one if you haven't got one). Under Compute Services click on the Lambda option. You will see a screen with a "Get Started Now" button. Click on it, and then you will be on a screen to write your first Lambda function. Choose a name for it that will describe it best. Give it a nice description and move on to the code. We can code it in one of the following two ways: Inline code or Upload a zip file. Inline Code Inline code will be very helpful for writing simple scripts like image editing. The AMI (Amazon Machine Image) that Lambda runs on comes with preinstalled Ghostscript and ImageMagick libraries and NodeJs packages like aws-sdk and imagemagick. Let's create a Lambda that can list install packages on AMI and that runs Lambda. I will name it ls-packages The description will be list installed packages on AMI For code entry, type Edit Code Inline For the code template None, paste the below code in: var cp = require('child_process'); exports.handler = function(event, context) { cp.exec('rpm -qa', function (err, stdout, stderr ) { if (err) { return context.fail(err); } console.log(stdout); context.succeed('Done'); }); }; Handler name handler, this will be the entry point function name. You can change it as you like. Role, select Create new role Basic execution role. You will be prompted to create an IAM role with the required permission i.e. access to create logs. Press "Allow." For the Memory(MB), I am going to keep it low 128 Timeout(s), keep it default 3 Press Create Lambda function You will see your first Lambda created and showing up in Lambda: Function list, select it if it is not already selected, and click on the Actions drop-down. On the top select the Edit/Test option. You will see your Lambda function in edit mode, ignore the left side Sample event section just client Invoke button on the right bottom, wait for a few seconds and you will see nice details in Execution result. The "Execution logs" is where you will find out the list of installed packages on the machine that you can utilize. I wish there was a way to install custom packages, or at least have the latest version running of installed packages. I mean, look at ghostscript-8.70-19.23.amzn1.x86_64. It is an old version published in 2009. Maybe AWS will add such features in the future. I certainly hope so. Upload a zip file You now have created something complicated that is included in multiple code files and NPM packages that are not available on Lambda AMI. No worries, just create a simple NodeJs app, install you packages in write up your code and we are good to deploy it. Few things that need to be take care of are: Zip node_modules folder along with code don't exclude it while zipping your code. Steps will be the same as are of Inline Code online, but one addition is File name. File name will be path to entry file, so if you have lib dir in your code with index.js file then you can mention it as bin/index.js. Monitoring On the Lambda Dashboard you will see a nice graph of various events like Invocation Count, Invocation Duration, Invocation failures and Throttled invocations. You will also view the logs created by Lambda functions in AWS Cloud Watch(Administration & Security) Conclusion AWS Lambda is a unique, and very useful service. It can help us build nice scaleable backends for mobile applications. It can also help you to centralize many components that can be shared across applications that you are running on and off the AWS infrastructure. About the author Ankit Patial has a Masters in Computer Applications, and nine years of experience with custom APIs, web and desktop applications using .NET technologies, ROR and NodeJs. As a CTO with SimSaw Inc and Pink Hand Technologies, his job is to learn and and help his team to implement the best practices of using Cloud Computing and JavaScript technologies.
Read more
  • 0
  • 0
  • 3262

article-image-how-optimize-put-requests
Packt
04 Sep 2015
8 min read
Save for later

How to Optimize PUT Requests

Packt
04 Sep 2015
8 min read
In this article by Naoya Hashiotmo, author of the book Amazon S3 Cookbook, explains how to optimize PUT requests, it would be effective to use multipart uploads because it can aggregate throughput by parallelizing PUT requests and uploading a large object into parts. It is recommended that the size of each part should be between 25 and 50 MB for higher networks and 10 MB for mobile networks. (For more resources related to this topic, see here.) Amazon S3 is a highly-scalable, reliable, and low-latency data storage service at a very low cost, designed for mission-critical and primary data storage. It provides the Amazon S3 APIs to simplify your programming tasks. S3 performance optimization is composed of several factors, for example, which region to choose to reduce latency, considering the naming scheme and optimizing the put and get operations. Multipart upload consists of three-step processes; the first step is initiating the upload, next is uploading the object parts, and finally, after uploading all the parts, the multipart upload is finished. The following methods are currently supported to upload objects with multipart upload: AWS SDK for Android AWS SDK for iOS AWS SDK for Java AWS SDK for JavaScript AWS SDK for PHP AWS SDK for Python AWS SDK for Ruby AWS SDK for .NET REST API AWS CLI In order to try multipart upload and see how much it aggregates throughput, we use AWS SDK for Node.js and S3 via NPM (package manager for Node.js). AWS CLI also supports multipart upload. When you use the AWS CLI s3 or s3api subcommand to upload an object, the object is automatically uploaded via multipart requests. Getting ready You need to complete the following set up in advance: Sign up on AWS and be able to access S3 with your IAM credentials Install and set up AWS CLI in your PC or use Amazon Linux AMI Install Node.js It is recommended that you score the benchmark from your local PC or if you use the EC2 instance, you should launch an instance and create an S3 bucket in different regions. For example, if you launch an instance in the Asia Pacific Tokyo region, you should create an S3 bucket in the US standard region. The reason is that the latency between EC2 and S3 is very low, and it is hard to see the difference. How to do it… We upload a 300 GB file in an S3 bucket over HTTP in two ways; one is to use a multipart upload and the other is not to use multipart upload to compare the time. To clearly see how the performance differs, I launched an instance and created an S3 bucket in different regions as follows: EC2 instance: Asia Pacific Tokyo Region (ap-northeast-1) S3 bucket: US Standard region (us-east-1) First, we install the S3 Node.js module via npm, create a dummy file, upload the object into a bucket using a sample Node.js script without enabling multipart upload, and then do the same enabling multipart upload, so that we can see how multipart upload performs the operation. Now, let's move on to the instructions: Install s3 via the npm command: $ cdaws-nodejs-sample/ $ npm install s3 Create a 300 GB dummy file: $ file=300mb.dmp $ dd if=/dev/zero of=${file} bs=10M count=30 Put the following script and save the script as s3_upload.js: // Load the SDK var AWS = require('aws-sdk'); var s3 = require('s3'); var conf = require('./conf'); // Load parameters var client = s3.createClient({ maxAsyncS3: conf.maxAsyncS3, s3RetryCount: conf.s3RetryCount, s3RetryDelay: conf.s3RetryDelay, multipartUploadThreshold: conf.multipartUploadThreshold, multipartUploadSize: conf.multipartUploadSize, }); var params = { localFile: conf.localFile, s3Params: { Bucket: conf.Bucket, Key: conf.localFile, }, }; // upload objects console.log("## s3 Parameters"); console.log(conf); console.log("## Begin uploading."); var uploader = client.uploadFile(params); uploader.on('error', function(err) { console.error("Unable to upload:", err.stack); }); uploader.on('progress', function() { console.log("Progress", uploader.progressMd5Amount,uploader.progressAmount, uploader.progressTotal); }); uploader.on('end', function() { console.log("## Finished uploading."); }); Create a configuration file and save the file conf.js in the same directory as s3_upload.js: exports.maxAsyncS3 = 20; // default value exports.s3RetryCount = 3; // default value exports.s3RetryDelay = 1000; // default value exports.multipartUploadThreshold = 20971520; // default value exports.multipartUploadSize = 15728640; // default value exports.Bucket = "your-bucket-name"; exports.localFile = "300mb.dmp"; exports.Key = "300mb.dmp"; How it works… First of all, let's try uploading a 300 GB object using multipart upload, and then upload the same file without using multipart upload. You can upload an object and see how long it takes by typing the following command: $ time node s3_upload.js ## s3 Parameters { maxAsyncS3: 20, s3RetryCount: 3, s3RetryDelay: 1000, multipartUploadThreshold: 20971520, multipartUploadSize: 15728640, localFile: './300mb.dmp', Bucket: 'bucket-sample-us-east-1', Key: './300mb.dmp' } ## Begin uploading. Progress 0 16384 314572800 Progress 0 32768 314572800 … Progress 0 314572800 314572800 Progress 0 314572800 314572800 ## Finished uploading. real 0m16.111s user 0m4.164s sys 0m0.884s As it took about 16 seconds to upload the object, the transfer rate was 18.75 MB/sec. Then, let's change the following parameters in the configuration (conf.js) as follows and see the result. The 300 GB object is uploaded through only one S3 client and exports.maxAsyncS3 = 1; exports. multipartUploadThreshold = 2097152000; exports.maxAsyncS3 = 1; exports.s3RetryCount = 3; // default value exports.s3RetryDelay = 1000; // default value exports.multipartUploadThreshold = 2097152000; exports.multipartUploadSize = 15728640; // default value exports.Bucket = "your-bucket-name"; exports.localFile = "300mb.dmp"; exports.Key = "300mb.dmp"; Let's see the result after changing the parameters in the configuration (conf.js): $ time node s3_upload.js ## s3 Parameters … ## Begin uploading. Progress 0 16384 314572800 … Progress 0 314572800 314572800 ## Finished uploading. real 0m41.887s user 0m4.196s sys 0m0.728s As it took about 42 seconds to upload the object, the transfer rate was 7.14 MB/sec. Now, let's quickly check each parameter, and then get to the conclusion; maxAsyncS3 defines the maximum number of simultaneous requests that S3 clients are open to Amazon S3. The default value is 20. s3RetryCount defines the number of retries when a request fails. The default value is 3. s3RetryDelay is how many milliseconds S3 clients will wait when a request fails. The default value is 1000. multipartUploadThreshold defines the size of uploading objects via multipart requests. The object will be uploaded via multipart request, if you choose an object that is greater than the size you specified. The default value is 20 MB, the minimum is 5 MB, and the maximum is 5 GB. multipartUploadSize defines the size for each part when uploaded via the multipart request. The default value is 15 MB, the minimum is 5 MB, and the maximum is 5 GB. The following table shows the speed test score with different parameters: maxAsyncS3 1 20 20 40 30 s3RetryCount 3 3 3 3 3 s3RetryDelay 1000 1000 1000 1000 1000 multipartUploadThreshold 2097152000 20971520 20971520 20971520 20971520 multipartUploadSize 15728640 15728640 31457280 15728640 10728640 Time (seconds) 41.88 16.11 17.41 16.37 9.68 Transfer Rate (MB) 7.51 19.53 18.07 19.22 32.50 In conclusion, multipart upload is effective for optimizing the PUT operation, aggregating throughput. However, you need to consider the following: Benchmark your scenario and evaluate the number of retry count, delay, parts, and the multipart upload size based on the networks that your application belongs to. There's more… Multipart upload specification There are limits to using multipart upload. The following table shows the specification of multipart upload: Item Specification Maximum object size 5 TB Maximum number of parts per upload 10,000 Part numbers 1 to 10,000 (inclusive) Part size 5 MB to 5 GB, last part can be more than 5 MB Maximum number of parts returned for a list of parts request 1,000 Maximum number of multipart uploads returned in a list of multipart uploads request 1,000 Multipart upload and charging If you initiate multipart upload and abort the request, Amazon S3 deletes the upload artifacts and any parts you have uploaded and you are not charged for the bills. However, you are charged for all storage, bandwidth, and requests for the multipart upload requests and the associated parts of an object after the operation is completed. The point is you are charged when a multipart upload is completed (not aborted). See also Multipart Upload Overview https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html AWS SDK for Node.js http://docs.aws.amazon.com/AWSJavaScriptSDK/guide/node-intro.htm Node.js S3 package npm https://www.npmjs.com/package/s3 Amazon Simple Storage Service: Introduction to Amazon S3 http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014 http://www.slideshare.net/AmazonWebServices/pfc403-maximizing-amazon-s3-performance-aws-reinvent-2014 AWS re:Invent 2014 | (PFC403) Maximizing Amazon S3 Performance https://www.youtube.com/watch?v=_FHRzq7eHQc Summary In this article, we learned how to optimize the PUT requests and uploading a large object into parts. Resources for Article: Further resources on this subject: Amazon Web Services[article] Achieving High-Availability on AWS Cloud[article] Architecture and Component Overview [article]
Read more
  • 0
  • 0
  • 2171

article-image-installing-red-hat-cloudforms-red-hat-openstack
Packt
04 Sep 2015
8 min read
Save for later

Installing Red Hat CloudForms on Red Hat OpenStack

Packt
04 Sep 2015
8 min read
In this article by Sangram Rath, the author of the book Hybrid Cloud Management with Red Hat CloudForms, this article takes you through the steps required to install, configure, and use Red Hat CloudForms on Red Hat Enterprise Linux OpenStack. However, you should be able to install it on OpenStack running on any other Linux distribution. The following topics are covered in this article: System requirements Deploying the Red Hat CloudForms Management Engine appliance Configuring the appliance Accessing and navigating the CloudForms web console (For more resources related to this topic, see here.) System requirements Installing the Red Hat CloudForms Management Engine Appliance requires an existing virtual or cloud infrastructure. The following are the latest supported platforms: OpenStack Red Hat Enterprise Virtualization VMware vSphere The system requirements for installing CloudForms are different for different platforms. Since this book talks about installing it on OpenStack, we will see the system requirements for OpenStack. You need a minimum of: Four VCPUs 6 GB RAM 45 GB disk space The flavor we select to launch the CloudForms instance must meet or exceed the preceding requirements. For a list of system requirements for other platforms, refer to the following links: System requirements for Red Hat Enterprise Virtualization: https://access.redhat.com/documentation/en-US/Red_Hat_CloudForms/3.1/html/Installing_CloudForms_on_Red_Hat_Enterprise_Virtualization/index.html System requirements for installing CloudForms on VMware vSphere: https://access.redhat.com/documentation/en-US/Red_Hat_CloudForms/3.1/html/Installing_CloudForms_on_VMware_vSphere/index.html Additional OpenStack requirements Before we can launch a CloudForms instance, we need to ensure that some additional requirements are met: Security group: Ensure that a rule is created to allow traffic on port 443 in the security group that will be used to launch the appliance. Flavor: Based on the system requirements for running the CloudForms appliance, we can either use an existing flavor, such as m1.large, or create a new flavor for the CloudForms Management Engine Appliance. To create a new flavor, click on the Create Flavor button under the Flavor option in Admin and fill in the required parameters, especially these three: At least four VCPUs At least 6144 MB of RAM At least 45 GB of disk space Key pair: Although, at the VNC console, you can just use the default username and password to log in to the appliance, it is good to have access to a key pair as well, if required, for remote SSH. Deploying the Red Hat CloudForms Management Engine appliance Now that we are aware of the resource and security requirements for Red Hat CloudForms, let's look at how to obtain a copy of the appliance and run it. Obtaining the appliance The CloudForms Management appliance for OpenStack can be downloaded from your Red Hat customer portal under the Red Hat CloudForms product page. You need access to a Red Hat CloudForms subscription to be able to do so. At the time of writing this book, the direct download link for this is https://rhn.redhat.com/rhn/software/channel/downloads/Download.do?cid=20037. For more information on obtaining the subscription and appliance, or to request a trial, visit http://www.redhat.com/en/technologies/cloud-computing/cloudforms. Note If you are unable to get access to Red Hat CloudForms, ManageIQ (the open source version) can also be used for hands-on experience. Creating the appliance image in OpenStack Before launching the appliance, we need to create an image in OpenStack for the appliance, since OpenStack requires instances to be launched from an image. You can create a new Image under Project with the following parameters (see the screenshot given for assistance): Enter a name for the image. Enter the image location in Image Source (HTTP URL). Set the Format as QCOW2. Optionally, set the Minimum Disk size. Optionally, set Minimum Ram. Make it Public if required and Create An Image. Note that if you have a newer release of OpenStack, there may be some additional options, but the preceding are what need to be filled in—most importantly the download URL of the Red Hat CloudForms appliance. Wait for the Status field to reflect as Active before launching the instance, as shown in this screenshot: Launching the appliance instance In OpenStack, under Project, select Instances and then click on Launch Instance. In the Launch Instance wizard enter the following instance information in the Details tab: Select an Availabilty Zone. Enter an Instance Name. Select Flavor. Set Instance Count. Set Instance Boot Source as Boot from image. Select CloudForms Management Engine Appliance under Image Name. The final result should appear similar to the following figure: Under the Access & Security tab, ensure that the correct Key Pair and Security Group tab are selected, like this: For Networking, select the proper networks that will provide the required IP addresses and routing, as shown here: Other options, such as Post-Creation and Advanced Options, are optional and can be left blank. Click on Launch when ready to start creating the instance. Wait for the instance state to change to Running before proceeding to the next step. Note If you are accessing the CloudForms Management Engine from the Internet, a Floating IP address needs to be associated with the instance. This can be done from Project, under Access & Security and then the Floating IPs tab. The Red Hat CloudForms web console The web console provides a graphical user interface for working with the CloudForms Management Engine Appliance. The web console can be accessed from a browser on any machine that has network access to the CloudForms Management Engine server. System requirements The system requirements for accessing the Red Hat CloudForms web console are: A Windows, Linux, or Mac computer A modern browser, such as Mozilla Firefox, Google Chrome, and Internet Explorer 8 or above Adobe Flash Player 9 or above The CloudForms Management Engine Appliance must already be installed and activated in your enterprise environment Accessing the Red Hat CloudForms Management Engine web console Type the hostname or floating IP assigned to the instance prefixed by https in a supported browser to access the appliance. Enter default username as admin and the password as smartvm to log in to the appliance, as shown in this screenshot: You should log in to only one tab in each browser, as the console settings are saved for the active tab only. The CloudForms Management Engine also does not guarantee that the browser's Back button will produce the desired results. Use the breadcrumbs provided in the console. Navigating the web console The web console has a primary top-level menu that provides access to feature sets such as Insight, Control, and Automate, along with menus used to add infrastructure and cloud providers, create service catalogs and view or raise requests. The secondary menu appears below the top primary menu, and its options change based on the primary menu option selected. In certain cases, a third-sublevel menu may also appear for additional options based on the selection in the secondary menu. The feature sets available in Red Hat CloudForms are categorized under eight menu items: Cloud Intelligence: This provides a dashboard view of your hybrid cloud infrastructure for the selected parameters. Whatever is displayed here can be configured as a widget. It also provides additional insights into the hybrid cloud in the form of reports, chargeback configuration and information, timeline views, and an RSS feeds section. Services: This provides options for creating templates and service catalogs that help in provisioning multitier workloads across providers. It also lets you create and approve requests for these service catalogs. Clouds: This option in the top menu lets you add cloud providers; define availability zones; and create tenants, flavors, security groups, and instances. Infrastructure: This option, in a way similar to clouds, lets you add infrastructure providers; define clusters; view, discover, and add hosts; provision VMs; work with data stores and repositories; view requests; and configure the PXE. Control: This section lets you define compliance and control policies for the infrastructure providers using events, conditions, and actions based on the conditions. You can further combine these policies into policy profiles. Another important feature is alerting the administrators, which is configured from here. You can also simulate these policies, import and export them, and view logs. Automate: This menu option lets you manage life cycle tasks such as provisioning and retirement, and automation of resources. You can create provisioning dialogs to provision hosts and virtual machines and service dialogs to provision service catalogs. Dialog import/export, logs, and requests for automation are all managed from this menu option. Optimize: This menu option provides utilization, planning, and bottleneck summaries for the hybrid cloud environment. You can also generate reports for these individual metrics. Configure: Here, you can customize the look of the dashboard; view queued, running tasks and check errors and warnings for VMs and the UI. It let's you configure the CloudForms Management Engine appliance settings such as database, additional worker appliances, SmartProxy, and white labelling. One can also perform tasks maintenance tasks such as updates and manual modification of the CFME server configuration files. Summary In this article, we deployed the Red Hat CloudForms Management Engine Appliance in an OpenStack environment, and you learned where to configure the hostname, network settings, and time zone. We then used the floating IP of the instance to access the appliance from a web browser, and you learned where the different feature sets are and how to navigate around. Resources for Article: Further resources on this subject: Introduction to Microsoft Azure Cloud Services[article] Apache CloudStack Architecture[article] Using OpenStack Swift [article]
Read more
  • 0
  • 0
  • 3178
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $15.99/month. Cancel anytime
article-image-using-openstack-dash-board
Packt
18 Aug 2015
11 min read
Save for later

Using the OpenStack Dashboard

Packt
18 Aug 2015
11 min read
In this article by Kevin Jackson, Cody Bunch, and Egle Sigler, the authors of the book OpenStack Cloud Computing Cookbook - Third Edition, we will cover the following topics: Using OpenStack Dashboard with LBaaS Using OpenStack Dashboard with OpenStack Orchestration (For more resources related to this topic, see here.) Using OpenStack Dashboard with LBaaS The OpenStack Dashboard has the ability to view, create, and edit Load Balancers, add Virtual IPs (VIPs), and add nodes behind a Load Balancer. Dashboard also provides a user interface for creating HA Proxy server Load Balance services for our instances. We do this first by creating load balancing pools and then adding running instances to those pools. In this section, we will create an HTTP Load Balance pool, create a VIP, and configure instances to be part of the pool. The result will be the ability to use the HTTP Load Balancer pool address to send traffic to two instances running Apache. Getting ready Load a web browser, point it to our OpenStack Dashboard address at http://192.168.100.200/, and log in as an admin user. How to do it... First we will create an HTTP Load Balance pool. Creating pools To create a Load Balancer pool for a logged in user, carry out the following steps: To manage Load Balancers within our OpenStack Dashboard, select the Load Balancers tab: This will show available Load Balancer pools. Since we currently do not have any created, click on the Add Pool button in the top-right corner to add a pool. After clicking on the Add Pool button, we are presented with a modal window. Fill out the details to add a new pool: Set name, description, and provider in the modal window. We name our pool web-pool and give an appropriate description. We choose to go with a default provider since we are creating an HA Proxy. Select a subnet for the pool by clicking on the drop-down menu. All of our instances are attached to the private network, so we select 10.200.0.0/24. We select the HTTP protocol, but HTTPS and TCP are also available. This selection will depend on what kind of applications you are running. Select your preferred routing algorithm; we choose the ROUND_ROBIN balancing method. Other options are LEAST_CONNECTIONS, and SOURCE_IP. We leave the Admin State set to UP. Click on the Add button to create a new pool. You should see the new pool created in the pool list: Adding pool members To add instances to the Load Balancer, follow these steps: After adding a pool, you should still be on the Load Balancer page. Click on the Members tab. You should see a list of active members, if you have any, or an empty list: On the Members tab, click on the Add Member button. This will present you with the following menu: We select the pool we just created, web-pool, and specify members or instances that will be part of this pool. Select weights for the members of the pool. In our case, both members of the pool will have equal weights, so we assign the weight as 1. The selected protocol port will be used to access all members of the pool and, since we are using HTTP, we set the port to 80. We set Admin State to UP. Click on the Add button to add members to the pool. Now the member list should contain two newly added nodes: Adding a VIP to the Load Balancer pool Creating a VIP on the external network will allow access to the Load Balance pool and the instances behind it. To create the VIP, carry out the following steps: From the Load Balancer page, select the Pools tab and click on the drop-down arrow next to the Edit Pool button. This will give you a drop-down menu with an option to add a VIP: Click on the Add VIP option. This will present you with the modal window for creating a new VIP: We enter a custom name and description for our VIP. For VIP Subnet, we pick external subnet 192.168.100.0/24, followed by an available IP in that subnet. We choose 192.168.100.12. Enter 80 for Protocol Port, followed by selecting HTTP for Protocol. Set -1 for Connection Limit if you do not wish to have a maximum number of connections for this VIP. Click on the Add button to create the VIP. This will create a new VIP and show the current pool list: Now, when we click on web-pool, we will see the following details: Click on the web-vip link in the details to view the VIP details: You can test this Load Balancer by entering the VIP's IP in a browser. If you selected ROUND_ROBIN for your routing algorithm, each time you refresh your browser it should hit a different node. Deleting the Load Balancer To delete the Load Balancer, we will first need to delete the attached VIP and then delete the pool. From the Load Balancer page, check the Pools tab and click on the drop-down arrow next to the Edit Pool button. This will give you a drop-down menu with an option to delete a VIP: Selecting the Delete VIP drop-down option will give you a warning and ask you to confirm the deletion of the VIP. Click on the Delete VIP button to confirm: After deleting the VIP, now we can delete the pool. From the Load Balancer page's Pools tab, click on the drop-down arrow next to the appropriate Edit Pool button: Select the Delete Pool option from the drop-down list to delete the pool. You will get asked to confirm the deletion. If you are ready to delete the Load Balance pool, click on the Delete Pool button: How it works... We created a Load Balance pool and added two instances with Apache to it. We also created a virtual IP to be used on the external network and assigned it to our pool. To do this, we executed the following steps: Create a pool from the Load Balancer page's Pools tab. Select the subnet to which all the nodes are attached when creating the pool. Add members to the pool. Create a VIP for the pool. Both the pool and VIP can be edited after being created. Additional members can also be added to the pool at a later time. Using OpenStack Dashboard with OpenStack Orchestration Heat is the OpenStack Orchestration engine that enables users to quickly spin up whole environments using templates. Heat templates, known as Heat Orchestration Templates (HOT), are Yet Another Markup Language (YAML) based files. The files describe the resources being used, the type and the size of the instances, the network an instance will be attached to, among other pieces of information required to run that environment. We showed you how to use the Heat command line client. In this section, we will show how to use an existing Heat template file in OpenStack Dashboard to spin up two web servers running Apache, connected to a third instance running HA Proxy. Getting ready Load a web browser, point it to our OpenStack Dashboard address at http://192.168.100.200/ and log in as an admin user. How to do it... First, we will launch stack within our OpenStack Dashboard. Launching stacks To launch a Heat stack for a logged in user, carry out the following steps: To view available Heat stacks within our OpenStack Dashboard, select the Stacks tab under the Orchestration menu: After clicking on the Stacks tab, you will see all running stacks in your environment. In our case, our list is empty: Click on the Launch Stack button to create a new stack. You will see the following window: There are several ways to specify what template source to use in a stack: File, Direct Input, or URL. Choose which option is the most convenient for you. For our example, you can either use the URL directly or upload a file. The template file can be downloaded from https://raw.githubusercontent.com/OpenStackCookbook/OpenStackCookbook/master/cookbook.yaml. We will upload files from our system. Just like we downloaded the cookbook.yaml file, we can also download the Environment Source file. In this case, we do not have to use the environment source, but it makes it convenient. The environment file stores the values we would have to enter into the browser manually, but instead loads the values for us on the Launch Stack screen, as shown in step 8. In our example, we are using the environment file that can be downloaded from https://raw.githubusercontent.com/OpenStackCookbook/OpenStackCookbook/master/cookbook-env.yaml. Update the public_net_id, private_net_id, and private_subnet_id fields to match your environment. After selecting the Template Source and Environment Source files, click on Next: Our sample environment file contains the following code: parameters: key_name: demokey image: trusty-image flavor: m1.tiny public_net_id: 5e5d24bd-9d1f-4ed1-84b5-0b7e2a9a233b private_net_id: 25153759-994f-4835-9b13-bf0ec77fb336 private_subnet_id: 4cf2c09c-b3d5-40ed-9127-ec40e5e38343 Clicking on Next will give you a Launch Stack window with all the inputs: Note that most of the inputs in our template are now populated. If you did not specify the environment source file in the previous step, you will need to enter the key_name, image, flavor, public_net_id, private_net_id, and private_subnet_id fields. These fields are specific to each template used. Your templates may have different fields. Enter the stack name and user password for your user. If you are logged in as admin or demo, the password is openstack. Click on the Launch button to start stack creation. If all inputs were correct, you should see your stack being created: After the stack creation finishes and if there were no errors during creation, you will see your stack's status updated to Complete: Viewing stack details After launching a stack, there is a lot of information associated with it, including inputs, outputs, and, in the case of errors, information about why stack creation failed. To view the details of the stack, click on the stack name from the Stacks list. The first available view is Topology: Explore the topology by clicking on the nodes. If the graph does not fully fit or you would like a different perspective, you can drag the graph around the window. The next tab under Stack Detail will provide all of the information that was used in creating the stack: Stack information available in the Overview tab is as follows:    Info    Status    Outputs    Stack parameters    Launch parameters The Resources tab will show all the HEAT resources that were created during stack launch: If there were any errors during stack launch, check this page to see which component's creation failed. The Events tab shows all the events that occurred when the stack was created. This page can also be very helpful in troubleshooting Heat templates: While your Heat stack is running, you can also see how many instances it created under the Compute tab's Instance option. The following is what our instances look like on the Instances page: Note that the test1 instance was not part of the stack creation. All the other VMs were created during the stack launch. Deleting stacks Stack deletion is simple; however, it will delete all resources that were created during stack launch. Follow these steps: To delete a stack, first view the available stacks on the Stacks page: Click on the appropriate Delete Stack button to delete a stack. You will be asked to confirm the deletion: After confirming deletion, all resources associated with the stack will be deleted. How it works... We have used the OpenStack Dashboard to launch, view, and delete Orchestration stacks. We first needed to download a sample HA Proxy Heat Orchestration Template from GitHub. Since we were using an environment file, we also had to modify the appropriate inputs. Your own templates may have different inputs. After launching our HA Proxy stack, we explored its topology, resources, and events. Resources created during stack launch will also be reflected in the rest of your environment. If you are launching new instances, all of them will also be available on the Instance page. Delete and modify resources created during the stack launch only through Orchestration section in the OpenStack dashboard or on the command line. Deleting stacks through the dashboard will delete all associated resources. Unlock the full potential of OpenStack with 'OpenStack Cloud Computing Third Edition' - updated and improved for the latest challenges and opportunities unleashed by cloud.
Read more
  • 0
  • 0
  • 2147

article-image-achieving-high-availability-aws-cloud
Packt
11 Aug 2015
18 min read
Save for later

Achieving High-Availability on AWS Cloud

Packt
11 Aug 2015
18 min read
In this article, by Aurobindo Sarkar and Amit Shah, author of the book Learning AWS, we will introduce some key design principles and approaches to achieving high availability in your applications deployed on the AWS cloud. As a good practice, you want to ensure that your mission-critical applications are always available to serve your customers. The approaches in this article will address availability across the layers of your application architecture including availability aspects of key infrastructural components, ensuring there are no single points of failure. In order to address availability requirements, we will use the AWS infrastructure (Availability Zones and Regions), AWS Foundation Services (EC2 instances, Storage, Security and Access Control, Networking), and the AWS PaaS services (DynamoDB, RDS, CloudFormation, and so on). (For more resources related to this topic, see here.) Defining availability objectives Achieving high availability can be costly. Therefore, it is important to ensure that you align your application availability requirements with your business objectives. There are several options to achieve the level of availability that is right for your application. Hence, it is essential to start with a clearly defined set of availability objectives and then make the most prudent design choices to achieve those objectives at a reasonable cost. Typically, all systems and services do not need to achieve the highest levels of availability possible; at the same time ensure you do not introduce a single point of failure in your architecture through dependencies between your components. For example, a mobile taxi ordering service needs its ordering-related service to be highly available; however, a specific customer's travel history need not be addressed at the same level of availability. The best way to approach high availability design is to assume that anything can fail, at any time, and then consciously design against it. "Everything fails, all the time." - Werner Vogels, CTO, Amazon.com In other words, think in terms of availability for each and every component in your application and its environment because any given component can turn into a single point of failure for your entire application. Availability is something you should consider early on in your application design process, as it can be hard to retrofit it later. Key among these would be your database and application architecture (for example, RESTful architecture). In addition, it is important to understand that availability objectives can influence and/or impact your design, development, test, and running your system on the cloud. Finally, ensure you proactively test all your design assumptions and reduce uncertainty by injecting or forcing failures instead of waiting for random failures to occur. The nature of failures There are many types of failures that can happen at any time. These could be a result of disk failures, power outages, natural disasters, software errors, and human errors. In addition, there are several points of failure in any given cloud application. These would include DNS or domain services, load balancers, web and application servers, database servers, application services-related failures, and data center-related failures. You will need to ensure you have a mitigation strategy for each of these types and points of failure. It is highly recommended that you automate and implement detailed audit trails for your recovery strategy, and thoroughly test as many of these processes as possible. In the next few sections, we will discuss various strategies to achieve high availability for your application. Specifically, we will discuss the use of AWS features and services such as: VPC Amazon Route 53 Elastic Load Balancing, auto-scaling Redundancy Multi-AZ and multi-region deployments Setting up VPC for high availability Before setting up your VPC, you will need to carefully select your primary site and a disaster recovery (DR) site. Leverage AWS's global presence to select the best regions and availability zones to match your business objectives. The choice of a primary site is usually the closest region to the location of a majority of your customers and the DR site could be in the next closest region or in a different country depending on your specific requirements. Next, we need to set up the network topology, which essentially includes setting up the VPC and the appropriate subnets. The public facing servers are configured in a public subnet; whereas the database servers and other application servers hosting services such as the directory services will usually reside in the private subnets. Ensure you chose different sets of IP addresses across the different regions for the multi-region deployment, for example 10.0.0.0/16 for the primary region and 192.168.0.0/16 for the secondary region to avoid any IP addressing conflicts when these regions are connected via a VPN tunnel. Appropriate routing tables and ACLs will also need to be defined to ensure traffic can traverse between them. Cross-VPC connectivity is required so that data transfer can happen between the VPCs (say, from the private subnets in one region over to the other region). The secure VPN tunnels are basically IPSec tunnels powered by VPN appliances—a primary and a secondary tunnel should be defined (in case the primary IPSec tunnel fails). It is imperative you consult with your network specialists through all of these tasks. An ELB is configured in the primary region to route traffic across multiple availability zones; however, you need not necessarily commission the ELB for your secondary site at this time. This will help you avoid costs for the ELB in your DR or secondary site. However, always weigh these costs against the total cost/time required for recovery. It might be worthwhile to just commission the extra ELB and keep it running. Gateway servers and NAT will need to be configured as they act as gatekeepers for all inbound and outbound Internet access. Gateway servers are defined in the public subnet with appropriate licenses and keys to access your servers in the private subnet for server administration purposes. NAT is required for servers located in the private subnet to access the Internet and is typically used for automatic patch updates. Again, consult your network specialists for these tasks. Elastic load balancing and Amazon Route 53 are critical infrastructure components for scalable and highly available applications; we discuss these services in the next section. Using ELB and Route 53 for high availability In this section, we describe different levels of availability and the role ELBs and Route 53 play from an availability perspective. Instance availability The simplest guideline here is to never run a single instance in a production environment. The simplest approach to improving greatly from a single server scenario is to spin up multiple EC2 instances and stick an ELB in front of them. The incoming request load is shared by all the instances behind the load balancer. ELB uses the least outstanding requests routing algorithm to spread HTTP/HTTPS requests across healthy instances. This algorithm favors instances with the fewest outstanding requests. Even though it is not recommended to have different instance sizes between or within the AZs, the ELB will adjust for the number of requests it sends to smaller or larger instances based on response times. In addition, ELBs use cross-zone load balancing to distribute traffic across all healthy instances regardless of AZs. Hence, ELBs help balance the request load even if there are unequal number of instances in different AZs at any given time (perhaps due to a failed instance in one of the AZs). There is no bandwidth charge for cross-zone traffic (if you are using an ELB). Instances that fail can be seamlessly replaced using auto scaling while other instances continue to operate. Though auto-replacement of instances works really well, storing application state or caching locally on your instances can be hard to detect problems. Instance failure is detected and the traffic is shifted to healthy instances, which then carries the additional load. Health checks are used to determine the health of the instances and the application. TCP and/or HTTP-based heartbeats can be created for this purpose. It is worthwhile implementing health checks iteratively to arrive at the right set that meets your goals. In addition, you can customize the frequency and the failure thresholds as well. Finally, if all your instances are down, then AWS will return a 503. Zonal availability or availability zone redundancy Availability zones are distinct geographical locations engineered to be insulated from failures in other zones. It is critically important to run your application stack in more than one zone to achieve high availability. However, be mindful of component level dependencies across zones and cross-zone service calls leading to substantial latencies in your application or application failures during availability zone failures. For sites with very high request loads, a 3-zone configuration might be the preferred configuration to handle zone-level failures. In this situation, if one zone goes down, then other two AZs can ensure continuing high availability and better customer experience. In the event of a zone failure, there are several challenges in a Multi-AZ configuration, resulting from the rapidly shifting traffic to the other AZs. In such situations, the load balancers need to expire connections quickly and lingering connections to caches must be addressed. In addition, careful configuration is required for smooth failover by ensuring all clusters are appropriately auto scaled, avoiding cross-zone calls in your services, and avoiding mismatched timeouts across your architecture. ELBs can be used to balance across multiple availability zones. Each load balancer will contain one or more DNS records. The DNS record will contain multiple IP addresses and DNS round-robin can be used to balance traffic between the availability zones. You can expect the DNS records to change over time. Using multiple AZs can result in traffic imbalances between AZs due to clients caching DNS records. However, ELBs can help reduce the impact of this caching. Regional availability or regional redundancy ELB and Amazon Route 53 have been integrated to support a single application across multiple regions. Route 53 is AWS's highly available and scalable DNS and health checking service. Route 53 supports high availability architectures by health checking load balancer nodes and rerouting traffic to avoid the failed nodes, and by supporting implementation of multi-region architectures. In addition, Route 53 uses Latency Based Routing (LBR) to route your customers to the endpoint that has the least latency. If multiple primary sites are implemented with appropriate health checks configured, then in cases of failure, traffic shifts away from that site to the next closest region. Region failures can present several challenges as a result of rapidly shifting traffic (similar to the case of zone failures). These can include auto scaling, time required for instance startup, and the cache fill time (as we might need to default to our data sources, initially). Another difficulty usually arises from the lack of information or clarity on what constitutes the minimal or critical stack required to keep the site functioning as normally as possible. For example, any or all services will need to be considered as critical in these circumstances. The health checks are essentially automated requests sent over the Internet to your application to verify that your application is reachable, available, and functional. This can include both your EC2 instances and your application. As answers are returned only for the resources that are healthy and reachable from the outside world, the end users can be routed away from a failed application. Amazon Route 53 health checks are conducted from within each AWS region to check whether your application is reachable from that location. The DNS failover is designed to be entirely automatic. After you have set up your DNS records and health checks, no manual intervention is required for failover. Ensure you create appropriate alerts to be notified when this happens. Typically, it takes about 2 to 3 minutes from the time of the failure to the point where traffic is routed to an alternate location. Compare this to the traditional process where an operator receives an alarm, manually configures the DNS update, and waits for the DNS changes to propagate. The failover happens entirely within the Amazon Route 53 data plane. Depending on your availability objectives, there is an additional strategy (using Route 53) that you might want to consider for your application. For example, you can create a backup static site to maintain a presence for your end customers while your primary dynamic site is down. In the normal course, Route 53 will point to your dynamic site and maintain health checks for it. Furthermore, you will need to configure Route 53 to point to the S3 storage, where your static site resides. If your primary site goes down, then traffic can be diverted to the static site (while you work to restore your primary site). You can also combine this static backup site strategy with a multiple region deployment. Setting up high availability for application and data layers In this section, we will discuss approaches for implementing high availability in the application and data layers of your application architecture. The auto healing feature of AWS OpsWorks provides a good recovery mechanism from instance failures. All OpsWorks instances have an agent installed. If an agent does not communicate with the service for a short duration, then OpsWorks considers the instance to have failed. If auto healing is enabled at the layer and an instance becomes unhealthy, then OpsWorks first terminates the instance and starts a new one as per the layer configuration. In the application layer, we can also do cold starts from preconfigured images or a warm start from scaled down instances for your web servers and application servers in a secondary region. By leveraging auto scaling, we can quickly ramp up these servers to handle full production loads. In this configuration, you would deploy the web servers and application servers across multiple AZs in your primary region while the standby servers need not be launched in your secondary region until you actually need them. However, keep the preconfigured AMIs for these servers ready to launch in your secondary region. The data layer can comprise of SQL databases, NoSQL databases, caches, and so on. These can be AWS managed services such as RDS, DynamoDB, and S3, or your own SQL and NoSQL databases such as Oracle, SQL Server, or MongoDB running on EC2 instances. AWS services come with HA built-in, while using database products running on EC2 instances offers a do-it-yourself option. It can be advantageous to use AWS services if you want to avoid taking on database administration responsibilities. For example, with the increasing sizes of your databases, you might choose to share your databases, which is easy to do. However, resharding your databases while taking in live traffic can be a very complex undertaking and present availability risks. Choosing to use the AWS DynamoDB service in such a situation offloads this work to AWS, thereby resulting in higher availability out of the box. AWS provides many different data replication options and we will discuss a few of those in the following several paragraphs. DynamoDB automatically replicates your data across several AZs to provide higher levels of data durability and availability. In addition, you can use data pipelines to copy your data from one region to another. DynamoDB streams functionality that can be leveraged to replicate to another DynamoDB in a different region. For very high volumes, low latency Kinesis services can also be used for this replication across multiple regions distributed all over the world. You can also enable the Multi-AZ setting for the AWS RDS service to ensure AWS replicates your data to a different AZ within the same region. In the case of Amazon S3, the S3 bucket contents can be copied to a different bucket and the failover can be managed on the client side. Depending on the volume of data, always think in terms of multiple machines, multiple threads and multiple parts to significantly reduce the time it takes to upload data to S3 buckets. While using your own database (running on EC2 instances), use your database-specific high availability features for within and cross-region database deployments. For example, if you are using SQL Server, you can leverage the SQL Server Always-on feature for synchronous and asynchronous replication across the nodes. If the volume of data is high, then you can also use the SQL Server log shipping to first upload your data to Amazon S3 and then restore into your SQL Server instance on AWS. A similar approach in case of Oracle databases uses OSB Cloud Module and RMAN. You can also replicate your non-RDS databases (on-premise or on AWS) to AWS RDS databases. You will typically define two nodes in the primary region with synchronous replication and a third node in the secondary region with asynchronous replication. NoSQL databases such as MongoDB and Cassandra have their own asynchronous replication features that can be leveraged for replication to a different region. In addition, you can create Read Replicas for your databases in other AZs and regions. In this case, if your master database fails followed by a failure of your secondary database, then one of the read replicas can be promoted to being the master. In hybrid architectures, where you need to replicate between on-premise and AWS data sources, you can do so through a VPN connection between your data center and AWS. In case of any connectivity issues, you can also temporarily store pending data updates in SQS, and process them when the connectivity is restored. Usually, data is actively replicated to the secondary region while all other servers like the web servers and application servers are maintained in a cold state to control costs. However, in cases of high availability for web scale or mission critical applications, you can also choose to deploy your servers in active configuration across multiple regions. Implementing high availability in the application In this section, we will discuss a few design principles to use in your application from a high availability perspective. We will briefly discuss using highly available AWS services to implement common features in mobile and Internet of Things (IoT) applications. Finally, we also cover running packaged applications on the AWS cloud. Designing your application services to be stateless and following a micro services-oriented architecture approach can help the overall availability of your application. In such architectures, if a service fails then that failure is contained or isolated to that particular service while the rest of your application services continue to serve your customers. This approach can lead to an acceptable degraded experience rather than outright failures or worse. You should also store user or session information in a central location such as the AWS ElastiCache and then spread information across multiple AZs for high availability. Another design principle is to rigorously implement exception handling in your application code, and in each of your services to ensure graceful exit in case of failures. Most mobile applications share common features including user authentication and authorization, data synchronization across devices; user behavior analytics; retention tracking, storing, sharing, and delivering media globally; sending push notifications; store shared data; stream real-time data; and so on. There are a host of highly available AWS services that can be used for implementing such mobile application functionality. For example, you can use Amazon Cognito to authenticate users, Amazon Mobile Analytics for analyzing user behavior and tracking retention, Amazon SNS for push notifications and Amazon Kinesis for streaming real-time data. In addition, other AWS services such as S3, DynamoDB, IAM, and so on can also be effectively used to complete most mobile application scenarios. For mobile applications, you need to be especially sensitive about latency issues; hence, it is important to leverage AWS regions to get as close to your customers as possible. Similar to mobile applications, for IoT applications you can use the same highly available AWS services to implement common functionality such as device analytics and device messaging/notifications. You can also leverage Amazon Kinesis to ingest data from hundreds of thousands of sensors that are continuously generating massive quantities of data. Aside from your own custom applications, you can also run packaged applications such as SAP on AWS. These would typically include replicated standby systems, Multi-AZ and multi-region deployments, hybrid architectures spanning your own data center, and AWS cloud (connected via VPN or AWS Direct Connect service), and so on. For more details, refer to the specific package guides for achieving high availability on the AWS cloud. Summary In this article, we reviewed some of the strategies you can follow for achieving high availability in your cloud application. We emphasized the importance of both designing your application architecture for availability and using the AWS infrastructural services to get the best results. Resources for Article: Further resources on this subject: Securing vCloud Using the vCloud Networking and Security App Firewall [article] Introduction to Microsoft Azure Cloud Services [article] AWS Global Infrastructure [article]
Read more
  • 0
  • 0
  • 4511

article-image-setting-vpnaas-openstack
Packt
11 Aug 2015
7 min read
Save for later

Setting up VPNaaS in OpenStack

Packt
11 Aug 2015
7 min read
In this article by Omar Khedher, author of the book Mastering OpenStack, we will create a VPN setup between two sites using the Neutron VPN driver plugin in Neutron. The VPN as a Service is a new network functionality that is provided by Neutron, which is the network service that introduces the VPN feature set. It was completely integrated in OpenStack since the Havana release. We will set up two private networks in two different OpenStack sites and create some IPSec site-to-site connections. The instances that are located in each OpenStack private network should be able to connect to each other across the VPN tunnel. The article assumes that you have two OpenStack environments in different networks. General settings The following figure illustrates the VPN topology between two OpenStack sites to provide a secure connection between a database instance that runs in a data center in Amsterdam along with a web server that runs in a data center in Tunis. Each site has a private and a public network as well as subnets that are managed by a neutron router. Enabling the VPNaaS The VPN driver plugin must be configured in /etc/neutron/neutron.conf. To enable it, add the following driver: # nano /etc/neutron/neutron.conf service_plugins = neutron.services.vpn.plugin.VPNDriverPlugin Next, we'll add the VPNaaS module interface in /usr/share/openstack-dashboard/openstack_dashboard/local/local_settings.py, as follows: 'enable_VPNaaS': True, Finally, restart the neutron-server and server web using the following commands: # service httpd restart # /etc/init.d/neutron-server restart Site's configuration Let's assume that each OpenStack site has at least a router attached within an external gateway and can provide access to the private network attached to an instance, such as a database or a web server virtual machine. A simple network topology of a private OpenStack environment running in the Amsterdam site will look like this: The private OpenStack environment running in the Tunis site will look like the following image: VPN management in the CLI Neutron offers several commands that can be used to create and manage the VPN connections in OpenStack. The essential commands that are associated with router management include the following: vpn-service-create vpn-service-delete vpn-service-list vpn-ikepolicy-create vpn-ikepolicy-list vpn-ipsecpolicy-create vpn-ipsecpolicy-delete vpn-ipsecpolicy-list vpn-ipsecpolicy-show ipsec-site-connection-create ipsec-site-connection-delete ipsec-site-connection-list Creating an IKE policy In the first VPN phase, we can create an IKE policy in Horizon. The following figure shows a simple IKE setup of the OpenStack environment that is located in the Amsterdam site: The same settings can be applied via the Neutron CLI using the vpn-ikepolicy-create command, as follows: Syntax: neutron vpn-ikepolicy-create --description <description> --auth-algorithm <auth-algorithm> --encryption-algorithm <encryption-algorithm> --ike-version <ike-version> --lifetime <units=UNITS, value=VALUE> --pfs <pfs> --phase1-negotiation-mode <phase1-negotiation-mode> --name <NAME> Creating an IPSec policy An IPSec policy in an OpenStack environment that is located in the Amsterdam site can be created in Horizon in the following way: The same settings can be applied via the Neutron CLI using the vpn-ipsecpolicy-create command, as follows: Syntax: neutron vpn-ipsecpolicy-create --description <description> --auth-algorithm <auth-algorithm> --encapsulation-mode <encapsulation-mode> --encryption-algorithm <encryption-algorithm> --lifetime <units=UNITS,value=VALUE> --pfs <pfs> --transform-protocol <transform-algorithm> --name <NAME> Creating a VPN service To create a VPN service, you will need to specify the router facing the external and attach the web server instance to the private network in the Amsterdam site. The router will act as a VPN gateway. We can add a new VPN service from Horizon in the following way: The same settings can be applied via the Neutron CLI using the vpn-service-create command, as follows: Syntax: neutron vpn-service-create --tenant-id <tenant-id> --description <description> ROUTER SUBNET --name <name> Creating an IPSec connection The following step requires an identification of the peer gateway of the remote site that is located in Tunis. We will need to collect the IPv4 address of the external gateway interface of the remote site as well as the remote private subnet. In addition to this, a pre-shared key (PSK) has to be defined and exchanged between both the sites in order to bring the tunnel up after establishing a successful PSK negotiation during the second phase of the VPN setup. You can collect the router information either from Horizon or from the command line. To get subnet-related information on the OpenStack Cloud that is located in the Tunis site, you can run the following command in the network node: # neutron router-list The preceding command yields the following result: We can then list the ports of Router-TN and check the attached subnets of each port using the following command line: # neutron router-port-list Router-TN The preceding command gives the following result: Now, we can add a new IPSec site connection from the Amsterdam OpenStack Cloud in the following way: Similarly, we can perform the same configuration to create the VPN service using the neutron ipsec-site-connection-create command line, as follows: Syntax: neutron ipsec-site-connection-create --name <NAME> --description <description> ---vpnservice-id <VPNSERVICE> --ikepolicy-id <IKEPOLICY> --ipsecpolicy-id <IPSECPOLICY> --peer-address <PEER-ADDRESS> --peer-id <PEER-ID> --psk <PRESHAREDKEY> Remote site settings A complete VPN setup requires you to perform the same steps in the OpenStack Cloud that is located in the Tunis site. For a successful VPN phase 1 configuration, you must set the same IKE and IPSec policy attributes and change only the naming convention for each IKE and IPSec setup. Creating a VPN service on the second OpenStack Cloud will look like this: The last tricky part involves gathering the same router information in the Amsterdam site. Using the command line in the network node in the Amsterdam Cloud side, we can perform the following Neutron command: # neutron router-list The preceding command yields the following result: To get a list of networks that are attached to Router-AMS, we can execute the following command line: # neutron router-port-list Router-AMS The preceding command gives the following result: The external and private subnets that are associated with Router-AMS can be checked from Horizon as well. Now that we have the peer router gateway and remote private subnet, we will need to fill the same PSK that was configured previously. The next figure illustrates the new IPSec site connection on the OpenStack Cloud that is located in the Tunis site: At the Amsterdam site, we can check the creation of the new IPSec site connection by means of the neutron CLI, as follows: # neutron ipsec-site-connection-list The preceding command will give the following result: The same will be done at the Tunis site, as follows: # neutron ipsec-site-connection-list The preceding command gives the following result: Managing security groups For VPNaaS to work when connecting the Amsterdam and Tunis subnets, you will need to create a few additional rules in each project's default security group to enable not only the pings by adding a general ICMP rule, but also SSH on port 22. Additionally, we can create a new security group called Application_PP to restrict traffic on port 53 (DNS), 80 (HTTP), and 443 (HTTPS), as follows: # neutron security-group-create Application_PP --description "allow web traffic from the Internet" # neutron security-group-rule-create --direction ingress --protocol tcp --port_range_min 80 --port_range_max 80 Application_PP --remote-ip-prefix 0.0.0.0/0 # neutron security-group-rule-create --direction ingress --protocol tcp --port_range_min 53 --port_range_max 53 Application_PP --remote-ip-prefix 0.0.0.0/0 # neutron security-group-rule-create --direction ingress --protocol tcp --port_range_min 443 --port_range_max 443 Application_PP --remote-ip-prefix 0.0.0.0/0 From Horizon, we will see the following security group rules added: Summary In this article, we created a VPN setup between two sites using the Neutron VPN driver plugin in Neutron. This process included enabling the VPNaas, configuring the site, and creating an IKE policy, an IPSec policy, a VPN service, and an IPSec connection. To continue Mastering OpenStack, take a look to see what else you will learn in the book here.
Read more
  • 0
  • 0
  • 5242

article-image-aws-global-infrastructure
Packt
06 Jul 2015
5 min read
Save for later

AWS Global Infrastructure

Packt
06 Jul 2015
5 min read
In this article by Uchit Vyas, who is the author of the Mastering AWS Development book, we will see how to use AWS services in detail. It is important to have a choice of placing applications as close as possible to your users or customers, in order to ensure the lowest possible latency and best user experience while deploying them. AWS offers a choice of nine regions located all over the world (for example, East Coast of the United States, West Coast of the United States, Europe, Tokyo, Singapore, Sydney, and Brazil), 26 redundant Availability Zones, and 53 Amazon CloudFront points of presence. (For more resources related to this topic, see here.) It is very crucial and important to have the option to put applications as close as possible to your customers and end users by ensuring the best possible lowest latency and user-expected features and experience, when you are creating and deploying apps for performance. For this, AWS provides worldwide means to the regions located all over the world. To be specific via name and location, they are as follows: US East (Northern Virginia) region US West (Oregon) region US West (Northern California) region EU (Ireland) Region Asia Pacific (Singapore) region Asia Pacific (Sydney) region Asia Pacific (Tokyo) region South America (Sao Paulo) region US GovCloud In addition to regions, AWS has 25 redundant Availability Zones and 51 Amazon CloudFront points of presence. Apart from these infrastructure-level highlights, they have plenty of managed services that can be the cream of AWS candy bar! The managed services bucket has the following listed services: Security: For every organization, security in each and every aspect is the vital element. For that, AWS has several remarkable security features that distinguishes AWS from other Cloud providers as follows : Certifications and accreditations Identity and Access Management Right now, I am just underlining the very important security features. Global infrastructure: AWS provides a fully-functional, flexible technology infrastructure platform worldwide, with managed services over the globe with certain characteristics, for example: Multiple global locations for deployment Low-latency CDN service Reliable, low-latency DNS service Compute: AWS offers a huge range of various cloud-based core computing services (including variety of compute instances that can be auto scaled to justify the needs of your users and application), a managed elastic load balancing service, and more of fully managed desktop resources on the pathway of cloud. Some of the common characteristics of computer services include the following: Broad choice of resizable compute instances Flexible pricing opportunities Great discounts for always on compute resources Lower hourly rates for elastic workloads Wide-ranging networking configuration selections A widespread choice of operating systems Virtual desktops Save further as you grow with tiered pricing model Storage: AWS offers low cost with high durability and availability with their storage services. With pay-as-you-go pricing model with no commitment, provides more flexibility and agility in services and processes for storage with a highly secured environment. AWS provides storage solutions and services for backup, archive, disaster recovery, and so on. They also support block, file, and object kind of storages with a highly available and flexible infrastructure. A few major characteristics for storage are as follows: Cost-effective, high-scale storage varieties Data protection and data management Storage gateway Choice of instance storage options Content delivery and networking: AWS offers a wide set of networking services that enables you to create a logical isolated network that the architect defines and to create a private network connection to the AWS infrastructure, with fault tolerant, scalable, and highly available DNS service. It also provides delivery services for content to your end users, by very low latency and high data transfer speed with AWS CDN service. A few major characteristics for content delivery and networking are as follows: Application and media files delivery Software and large file distribution Private content Databases: AWS offers fully managed, distributed relational and NoSQL type of database services. Moreover, database services are capable of in-memory caching, sharding, and scaling with/without data warehouse solutions. A few major characteristics for databases are as follows: RDS DynamoDB Redshift ElastiCache Application services: AWS provides a variety of managed application services with lower cost such as application streaming and queuing, transcoding, push notification, searching, and so on. A few major characteristics for databases are as follows: AppStream CloudSearch Elastic transcoder SWF, SES, SNS, SQS Deployment and management: AWS offers management of credentials to explore AWS services such as monitor services, application services, and updating stacks of AWS resources. They also have deployment and security services alongside with AWS API activity. A few major characteristics for deployment and management services are as follows: IAM CloudWatch Elastic Beanstalk CloudFormation Data Pipeline OpsWorks CloudHSM Cloud Trail Summary There are a few more additional important services from AWS, such as support, integration with existing infrastructure, Big Data, and ecosystem, which puts it on the top of other infrastructure providers. As a cloud architect, it is necessary to learn about cloud service offerings and their all-important functionalities. Resources for Article: Further resources on this subject: Amazon DynamoDB - Modelling relationships, Error handling [article] Managing Microsoft Cloud [article] Amazon Web Services [article]
Read more
  • 0
  • 0
  • 1429
article-image-installing-openstack-swift
Packt
04 Jun 2015
10 min read
Save for later

Installing OpenStack Swift

Packt
04 Jun 2015
10 min read
In this article by Amar Kapadia, Sreedhar Varma, and Kris Rajana, authors of the book OpenStack Object Storage (Swift) Essentials, we will see how IT administrators can install OpenStack Swift. The version discussed here is the Juno release of OpenStack. Installation of Swift has several steps and requires careful planning before beginning the process. A simple installation consists of installing all Swift components on a single node, and a complex installation consists of installing Swift on several proxy server nodes and storage server nodes. The number of storage nodes can be in the order of thousands across multiple zones and regions. Depending on your installation, you need to decide on the number of proxy server nodes and storage server nodes that you will configure. This article demonstrates a manual installation process; advanced users may want to use utilities such as Puppet or Chef to simplify the process. This article walks you through an OpenStack Swift cluster installation that contains one proxy server and five storage servers. (For more resources related to this topic, see here.) Hardware planning This section describes the various hardware components involved in the setup. Since Swift deals with object storage, disks are going to be a major part of hardware planning. The size and number of disks required should be calculated based on your requirements. Networking is also an important component, where factors such as a public or private network and a separate network for communication between storage servers need to be planned. Network throughput of at least 1 GB per second is suggested, while 10 GB per second is recommended. The servers we set up as proxy and storage servers are dual quad-core servers with 12 GB of RAM. In our setup, we have a total of 15 x 2 TB disks for Swift storage; this gives us a total size of 30 TB. However, with in-built replication (with a default replica count of 3), Swift maintains three copies of the same data. Therefore, the effective capacity for storing files and objects is approximately 10 TB, taking filesystem overhead into consideration. This is further reduced due to less than 100 percent utilization. The following figure depicts the nodes of our Swift cluster configuration: The storage servers have container, object, and account services running in them. Server setup and network configuration All the servers are installed with the Ubuntu server operating system (64-bit LTS version 14.04). You'll need to configure three networks, which are as follows: Public network: The proxy server connects to this network. This network provides public access to the API endpoints within the proxy server. Storage network: This is a private network and it is not accessible to the outside world. All the storage servers and the proxy server will connect to this network. Communication between the proxy server and the storage servers and communication between the storage servers take place within this network. In our configuration, the IP addresses assigned in this network are 172.168.10.0 and 172.168.10.99. Replication network: This is also a private network that is not accessible to the outside world. It is dedicated to replication traffic, and only storage servers connect to it. All replication-related communication between storage servers takes place within this network. In our configuration, the IP addresses assigned in this network are 172.168.9.0 and 172.168.9.99. This network is optional, and if it is set up, the traffic on it needs to be monitored closely. Pre-installation steps In order for various servers to communicate easily, edit the /etc/hosts file and add the host names of each server in it. This has to be done on all the nodes. The following screenshot shows an example of the contents of the /etc/hosts file of the proxy server node: Install the Network Time Protocol (NTP) service on the proxy server node and storage server nodes. This helps all the nodes to synchronize their services effectively without any clock delays. The pre-installation steps to be performed are as follows: Run the following command to install the NTP service: # apt-get install ntp Configure the proxy server node to be the reference server for the storage server nodes to set their time from the proxy server node. Make sure that the following line is present in /etc/ntp.conf for NTP configuration in the proxy server node: server ntp.ubuntu.com For NTP configuration in the storage server nodes, add the following line to /etc/ntp.conf. Comment out the remaining lines with server addresses such as 0.ubuntu.pool.ntp.org, 1.ubuntu.pool.ntp.org, 2.ubuntu.pool.ntp.org, and 3.ubuntu.pool.ntp.org: # server 0.ubuntu.pool.ntp.org# server 1.ubuntu.pool.ntp.org# server 2.ubuntu.pool.ntp.org# server 3.ubuntu.pool.ntp.orgserver s-swift-proxy Restart the NTP service on each server with the following command: # service ntp restart Downloading and installing Swift The Ubuntu Cloud Archive is a special repository that provides users with the ability to install new releases of OpenStack. The steps required to download and install Swift are as follows: Enable the capability to install new releases of OpenStack, and install the latest version of Swift on each node using the following commands. The second command shown here creates a file named cloudarchive-juno.list in /etc/apt/sources.list.d, whose content is "deb http://ubuntu-cloud.archieve.canonical.com/ubuntu": Now, update the OS using the following command: # apt-get update && apt-get dist-upgrade On all the Swift nodes, we will install the prerequisite software and services using this command: # apt-get install swift rsync memcached python-netifaces python-xattr python-memcache Next, we create a Swift folder under /etc and give users the permission to access this folder, using the following commands: # mkdir –p /etc/swift/# chown –R swift:swift /etc/swift Download the /etc/swift/swift.conf file from GitHub using this command: # curl –o /etc/swift/swift.conf https://raw.githubusercontent.com/openstack/swift/stable/juno/etc/swift.conf-sample Modify the /etc/swift/swift.conf file and add a variable called swift_hash_path_suffix in the swift-hash section. We then create a unique hash string using # python –c "from uuid import uuid4; print uuid4()" or # openssl rand –hex 10, and assign it to this variable, as shown in the following configuration option: We then add another variable called swift_hash_path_prefix to the swift-hash section, and assign to it another hash string created using the method described in the preceding step. These strings will be used in the hashing process to determine the mappings in the ring. The swift.conf file should be identical on all the nodes in the cluster. Setting up storage server nodes This section explains additional steps to set up the storage server nodes, which will contain the object, container, and account services. Installing services The first step required to set up the storage server node is installing services. Let's look at the steps involved: On each storage server node, install the packages for swift-account services, swift-container services, swift-object services, and xfsprogs (XFS Filesystem) using this command: # apt-get install swift-account swift-container swift-object xfsprogs Download the account-server.conf, container-server.conf, and object-server.conf samples from GitHub, using the following commands: # curl –o /etc/swift/account-server.conf https://raw.githubusercontent.com/openstack/swift/stable/juno/etc/account-server.conf-sample# curl –o /etc/swift/container-server.conf https://raw.githubusercontent.com/openstack/swift/stable/juno/etc/container-server.conf-sample# curl –o /etc/swift/object-server.conf https://raw.githubusercontent.com/openstack/swift/stable/juno/etc/object-server.conf-sample Edit the /etc/swift/account-server.conf file with the following section: Edit the /etc/swift/container-server.conf file with this section: Edit the /etc/swift/object-server.conf file with the following section: Formatting and mounting hard disks On each storage server node, we need to identify the hard disks that will be used to store the data. We will then format the hard disks and mount them on a directory, which Swift will then use to store data. We will not create any RAID levels or subpartitions on these hard disks because they are not necessary for Swift. They will be used as entire disks. The operating system will be installed on separate disks, which will be RAID configured. First, identify the hard disks that are going to be used for storage and format them. In our storage server, we have identified sdb, sdc, and sdd to be used for storage. We will perform the following operations on sdb. These four steps should be repeated for sdc and sdd as well: Carry out the partitioning for sdb and create the filesystem using this command: # fdisk /dev/sdb# mkfs.xfs /dev/sdb1 Then let's create a directory in /srv/node/sdb1 that will be used to mount the filesystem. Give the permission to the swift user to access this directory. These operations can be performed using the following commands: # mkdir –p /srv/node/sdb1# chown –R swift:swift /srv/node/sdb1 We set up an entry in fstab for the sdb1 partition in the sdb hard disk, as follows. This will automatically mount sdb1 on /srv/node/sdb1 upon every boot. Add the following command line to the /etc/fstab file: /dev/sdb1 /srv/node/sdb1 xfsnoatime,nodiratime,nobarrier,logbufs=8 0 2 Mount sdb1 on /srv/node/sdb1 using the following command: # mount /srv/node/sdb1 RSYNC and RSYNCD In order for Swift to perform the replication of data, we need to configure rsync by configuring rsyncd.conf. This is done by performing the following steps: Create the rsyncd.conf file in the /etc folder with the following content: # vi /etc/rsyncd.conf We are setting up synchronization within the network by including the following lines in the configuration file: 172.168.9.52 is the IP address that is on the replication network for this storage server. Use the appropriate replication network IP addresses for the corresponding storage servers. We then have to edit the /etc/default/rsync file and set RSYNC_ENABLE to true using the following configuration option: RSYNC_ENABLE=true Next, we restart the rsync service using this command: # service rsync restart Then we create the swift, recon, and cache directories using the following commands, and then set its permissions: # mkdir -p /var/cache/swift# mkdir -p /var/swift/recon Setting permissions is done using these commands: # chown -R swift:swift /var/cache/swift# chown -R swift:swift /var/swift/recon Repeat these steps on every storage server. Setting up the proxy server node This section explains the steps required to set up the proxy server node, which are as follows: Install the following services only on the proxy server node: # apt-get install python-swiftclient python-keystoneclientpython-keystonemiddleware swift-proxy Swift doesn't support HTTPS. OpenSSL has already been installed as part of the operating system installation to support HTTPS. We are going to use the OpenStack Keystone service for authentication. In order to set up the proxy-server.conf file for this, we download the configuration file from the following link and edit it: https://raw.githubusercontent.com/openstack/swift/stable/juno/etc/proxy-server.conf-sample# vi /etc/swift/proxy-server.conf The proxy-server.conf file should be edited to get the correct auth_host, admin_token, admin_tenant_name, admin_user, and admin_password values: admin_token = 01d8b673-9ebb-41d2-968a-d2a85daa1324admin_tenant_name = adminadmin_user = adminadmin_password = changeme Next, we create a keystone-signing directory and give permissions to the swift user using the following commands: # mkdir -p /home/swift/keystone-signing# mkdir -R swift:swift /home/swift/keystone-signing Summary In this article, you learned how to install and set up the OpenStack Swift service to provide object storage, and install and set up the Keystone service to provide authentication for users to access the Swift object storage. Resources for Article: Further resources on this subject: Troubleshooting in OpenStack Cloud Computing [Article] Using OpenStack Swift [Article] Playing with Swift [Article]
Read more
  • 0
  • 0
  • 4263

article-image-introduction-microsoft-azure-cloud-services
Packt
04 Jun 2015
10 min read
Save for later

Introduction to Microsoft Azure Cloud Services

Packt
04 Jun 2015
10 min read
In this article by Gethyn Ellis, author of the book Microsoft Azure IaaS Essentials, we will understand cloud computing and the various services offered by it. (For more resources related to this topic, see here.) Understanding cloud computing What do we mean when we talk about cloud from an information technology perspective? People mention cloud services; where do we get the services from? What services are offered? The Wikipedia definition of cloud computing is as follows: "Cloud computing is a computing term or metaphor that evolved in the late 1990s, based on utility and consumption of computer resources. Cloud computing involves application systems which are executed within the cloud and operated through internet enabled devices. Purely cloud computing does not rely on the use of cloud storage as it will be removed upon users download action. Clouds can be classified as public, private and [hybrid cloud|hybrid]." If you have worked with virtualization, then the concept of cloud is not completely alien to you. With virtualization, you can group a bunch of powerful hardware together, using a hypervisor. A hypervisor is a kind of software, operating system, or firmware that allows you to run virtual machines. Some of the popular Hypervisors on the market are VMware ESX or Microsoft's Hyper-V. Then, you can use this powerful hardware to run a set of virtual servers or guests. The guests share the resources of the host in order to execute and provide the services and computing resources of your IT department. The IT department takes care of everything from maintaining the hypervisor hosts to managing and maintaining the virtual servers and guests. The internal IT department does all the work. This is sometimes termed as a private cloud. Third-party suppliers, such as Microsoft, VMware, and Amazon, have a public cloud offering. With a public cloud, some computing services are provided to you on the Internet, and you can pay for what you use, which is like a utility bill. For example, let's take the utilities you use at home. This model can be really useful for start-up business that might not have an accurate demand forecast for their services, or the demand may change very quickly. Cloud computing can also be very useful for established businesses, who would like to make use of the elastic billing model. The more services you consume, the more you pay when you get billed at the end of the month. There are various types of public cloud offerings and services from a number of different providers. The TechNet top ten cloud providers are as follows: VMware Microsoft Bluelock Citrix Joyent Terrmark Salesforce.com Century Link RackSpace Amazon Web Services It is interesting to read that in 2013, Microsoft was only listed ninth in the list. With a new CEO, Microsoft has taken a new direction and put its Azure cloud offering at the heart of the business model. To quote one TechNet 2014 attendee: "TechNet this year was all about Azure, even the on premises stuff was built on the Azure model" With a different direction, it seems pretty clear that Microsoft is investing heavily in its cloud offering, and this will be further enhanced with further investment. This will allow a hybrid cloud environment, a combination of on-premises and public cloud, to be combined to offer organizations that ultimate flexibility when it comes to consuming IT resources. Services offered The term cloud is used to describe a variety of service offerings from multiple providers. You could argue, in fact, that the term cloud doesn't actually mean anything specific in terms of the service that you're consuming. It is, in fact, just a term that means you are consuming an IT service from a provider. Be it an internal IT department in the form of a private cloud or a public offering from some cloud provider, a public cloud, or it could be some combination of both in the form of a hybrid cloud. So, then what are the services that cloud providers offer? Virtualization and on-premises technology Most business even in today's cloudy environment has some on-premises technology. Until virtualization became popular and widely deployed several years ago, it was very common to have a one-to-one relationship between a physical hardware server with its own physical resources, such as CPU, RAM, storage, and the operating system installed on the physical server. It became clear that in this type of environment, you would need a lot of physical servers in your data center. An expanding and sometimes, a sprawling environment brings its own set of problems. The servers need cooling and heat management as well as a power source, and all the hardware and software needs to be maintained. Also, in terms of utilization, this model left lots of resources under-utilized: Virtualization changed this to some extent. With virtualization, you can create several guests or virtual servers that are configured to share the resources of the underlying host, each with their own operating system installed. It is possible to run both a Windows and Linux guest on the same physical host using virtualization. This allows you to maximize the resource utilization and allows your business to get a better return on investment on its hardware infrastructure: Virtualization is very much a precursor to cloud; many virtualized environments are sometimes called private clouds, so having an understanding of virtualization and how it works will give you a good grounding in some of the concepts of a cloud-based infrastructure. Software as a service (SaaS) SaaS is a subscription where you need to pay to use the software for the time that you're using it. You don't own any of the infrastructures, and you don't have to manage any of the servers or operating systems, you simply consume the software that you will be using. You can think of SaaS as like taking a taxi ride. When you take a taxi ride, you don't own the car, you don't need to maintain the car, and you don't even drive the car. You simply tell the taxi driver or his company when and where you want to travel somewhere, and they will take care of getting you there. The longer the trip, that is, the longer you use the taxi, the more you pay. An example of Microsoft's Software as a service would be the Azure SQL Database. The following diagram shows the cloud-based SQL database: Microsoft offers customers a SQL database that is fully hosted and maintained in Microsoft data centers, and the customer simply has to make use of the service and the database. So, we can compare this to having an on-premises database. To have an on-premises database, you need a Windows Server machine (physical or virtual) with the appropriate version of SQL Server installed. The server would need enough CPU, RAM, and storage to fulfill the needs of your database, and you need to manage and maintain the environment, applying various patches to the operating systems as they become available, installing, and testing various SQL Server service packs as they become available, and all the while, your application makes use of the database platform. With the SQL Azure database, you have no overhead, you simply need to connect to the Microsoft Azure portal and request a SQL database by following the wizard: Simply, give the database a name. In this case, it's called Helpdesk, select the service tier you want. In this example, I have chosen the Basic service tier. The service tier will define things, such as the resources available to your database, and impose limits, in terms of database size. With the Basic tier, you have a database size limit of 2 GB. You can specify the server that you want to create your database with, accept the defaults on the other settings, click on the check button, and the database gets created: It's really that simple. You will then pay for what you use in terms of database size and data access. In a later section, you will see how to set up a Microsoft Azure account. Platform as a service (PaaS) With PaaS, you rent the hardware, operating system, storage, and network from the public cloud service provider. PaaS is an offshoot of SaaS. Initially, SaaS didn't take off quickly, possibly because of the lack of control that IT departments and business thought they were going to suffer as a result of using the SaaS cloud offering. Going back to the transport analogy, you can compare PaaS to car rentals. When you rent a car, you don't need to make the car, you don't need to own the car, and you have no responsibility to maintain the car. You do, however, need to drive the car if you are going to get to your required destination. In PaaS terms, the developer and the system administrator have slightly more control over how the environment is set up and configured but still much of the work is taken care of by the cloud service provider. So, the hardware, operating system, and all the other components that run your application are managed and taken care of by the cloud provider, but you get a little more control over how things are configured. A geographically dispersed website would be a good example of an application offered on a PaaS offering. Infrastructure as a service (IaaS) With IaaS, you have much more control over the environment, and everything is customizable. Going with the transport analogy again, you can compare it to buying a car. The service provides you with the car upfront, and you are then responsible for using the car to ensure that it gets you from A to B. You are also responsible to fix the car if something goes wrong, and also ensure that the car is maintained by servicing it regularly, adding fuel, checking the tyre pressure, and so on. You have more control, but you also have more to do in terms of maintenance. Microsoft Azure has an offering. You can deploy a virtual machine, you can specify what OS you want, how much RAM you want the virtual machine to have, you can specify where the server will sit in terms of Microsoft data centers, and you can set up and configure recoverability and high availability for your Azure virtual machine: Hybrid environments With a hybrid environment, you get a combination of on-premises infrastructure and cloud services. It allows you to flexibly add resilience and high availability to your existing infrastructure. It's perfectly possible for the cloud to act as a disaster recovery site for your existing infrastructure. Microsoft Azure In order to work with the examples in this article, you need sign up for a Microsoft account. You can visit http://azure.microsoft.com/, and create an account all by yourself by completing the necessary form as follows: Here, you simply enter your details; you can use your e-mail address as your username. Enter the credentials specified. Return to the Azure website, and if you want to make use of the free trial, click on the free trial link. Currently, you get $125 worth of free Azure services. Once you have clicked on the free trial link, you will have to verify your details. You will also need to enter a credit card number and its details. Microsoft assures that you won't be charged during the free trial. Enter the appropriate details and click on Sign Up: Summary In this article, we looked at and discussed some of the terminology around the cloud. From the services offered to some of the specific features available in Microsoft Azure, you should be able to differentiate between a public and private cloud. You can also now differentiate between some of the public cloud offerings. Resources for Article: Further resources on this subject: Windows Azure Service Bus: Key Features [article] Digging into Windows Azure Diagnostics [article] Using the Windows Azure Platform PowerShell Cmdlets [article]
Read more
  • 0
  • 0
  • 1535

article-image-architecture-and-component-overview
Packt
22 May 2015
14 min read
Save for later

Architecture and Component Overview

Packt
22 May 2015
14 min read
In this article by Dan Radez, author of the book OpenStack Essentials, we will be understanding the internal architecture of the components that make up OpenStack. OpenStack has a very modular design, and because of this design, there are lots of moving parts. It's overwhelming to start walking through installing and using OpenStack without understanding the internal architecture of the components that make up OpenStack. Each component in OpenStack manages a different resource that can be virtualized for the end user. Separating each of the resources that can be virtualized into separate components makes the OpenStack architecture very modular. If a particular service or resource provided by a component is not required, then the component is optional to an OpenStack deployment. Let's start by outlining some simple categories to group these services into. (For more resources related to this topic, see here.) OpenStack architecture Logically, the components of OpenStack can be divided into three groups: Control Network Compute The control tier runs the Application Programming Interfaces (API) services, web interface, database, and message bus. The network tier runs network service agents for networking, and the compute node is the virtualization hypervisor. It has services and agents to handle virtual machines. All of the components use a database and/or a message bus. The database can be MySQL, MariaDB, or PostgreSQL. The most popular message buses are RabbitMQ, Qpid, and ActiveMQ. For smaller deployments, the database and messaging services usually run on the control node, but they could have their own nodes if required. In a simple multi-node deployment, each of these groups is installed onto a separate server. OpenStack could be installed on one node or two nodes, but a good baseline for being able to scale out later is to put each of these groups on their own node. An OpenStack cluster can also scale far beyond three nodes. Now that a base logical architecture of OpenStack is defined, let's look at what components make up this basic architecture. To do that, we'll first touch on the web interface and then work towards collecting the resources necessary to launch an instance. Finally, we will look at what components are available to add resources to a launched instance. Dashboard The OpenStack dashboard is the web interface component provided with OpenStack. You'll sometimes hear the terms dashboard and Horizon used interchangeably. Technically, they are not the same thing. This web interface is referred to as the dashboard. The team that develops the web interface maintains both the dashboard interface and the Horizon framework that the dashboard uses. More important than getting these terms right is understanding the commitment that the team that maintains this code base has made to the OpenStack project. They have pledged to include support for all the officially accepted components that are included in OpenStack. Visit the OpenStack website (http://www.openstack.org/) to get an official list of OpenStack components. The dashboard cannot do anything that the API cannot do. All the actions that are taken through the dashboard result in calls to the API to complete the task requested by the end user. Throughout, we will examine how to use the web interface and the API clients to execute tasks in an OpenStack cluster. Next, we will discuss both the dashboard and the underlying components that the dashboard makes calls to when creating OpenStack resources. Keystone Keystone is the identity management component. The first thing that needs to happen while connecting to an OpenStack deployment is authentication. In its most basic installation, Keystone will manage tenants, users, and roles and be a catalog of services and endpoints for all the components in the running cluster. Everything in OpenStack must exist in a tenant. A tenant is simply a grouping of objects. Users, instances, and networks are examples of objects. They cannot exist outside of a tenant. Another name for a tenant is project. On the command line, the term tenant is used. In the web interface, the term project is used. Users must be granted a role in a tenant. It's important to understand this relationship between the user and a tenant via a role. We will look at how to create the user and tenant and how to associate the user with a role in a tenant. For now, understand that a user cannot log in to the cluster unless they are members of a tenant. Even the administrator has a tenant. Even the users the OpenStack components use to communicate with each other have to be members of a tenant to be able to authenticate. Keystone also keeps a catalog of services and endpoints of each of the OpenStack components in the cluster. This is advantageous because all of the components have different API endpoints. By registering them all with Keystone, an end user only needs to know the address of the Keystone server to interact with the cluster. When a call is made to connect to a component other than Keystone, the call will first have to be authenticated, so Keystone will be contacted regardless. Within the communication to Keystone, the client also asks Keystone for the address of the component the user intended to connect to. This makes managing the endpoints easier. If all the endpoints were distributed to the end users, then it would be a complex process to distribute a change in one of the endpoints to all of the end users. By keeping the catalog of services and endpoints in Keystone, a change is easily distributed to end users as new requests are made to connect to the components. By default, Keystone uses username/password authentication to request a token and Public Key Infrastructure (PKI) tokens for subsequent requests. The token has a user's roles and tenants encoded into it. All the components in the cluster can use the information in the token to verify the user and the user's access. Keystone can also be integrated into other common authentication systems instead of relying on the username and password authentication provided by Keystone. Glance Glance is the image management component. Once we're authenticated, there are a few resources that need to be available for an instance to launch. The first resource we'll look at is the disk image to launch from. Before a server is useful, it needs to have an operating system installed on it. This is a boilerplate task that cloud computing has streamlined by creating a registry of pre-installed disk images to boot from. Glance serves as this registry within an OpenStack deployment. In preparation for an instance to launch, a copy of a selected Glance image is first cached to the compute node where the instance is being launched. Then, a copy is made to the ephemeral disk location of the new instance. Subsequent instances launched on the same compute node using the same disk image will use the cached copy of the Glance image. The images stored in Glance are sometimes called sealed-disk images. These images are disk images that have had the operating system installed but have had things such as Secure Shell (SSH) host key, and network device MAC addresses removed. This makes the disk images generic, so they can be reused and launched repeatedly without the running copies conflicting with each other. To do this, the host-specific information is provided or generated at boot. The provided information is passed in through a post-boot configuration facility called cloud-init. The images can also be customized for special purposes beyond a base operating system install. If there was a specific purpose for which an instance would be launched many times, then some of the repetitive configuration tasks could be performed ahead of time and built into the disk image. For example, if a disk image was intended to be used to build a cluster of web servers, it would make sense to install a web server package on the disk image before it was used to launch an instance. It would save time and bandwidth to do it once before it is registered with Glance instead of doing this package installation and configuration over and over each time a web server instance is booted. There are quite a few ways to build these disk images. The simplest way is to do a virtual machine install manually, make sure that the host-specific information is removed, and include cloud-init in the built image. Cloud-init is packaged in most major distributions; you should be able to simply add it to a package list. There are also tools to make this happen in a more autonomous fashion. Some of the more popular tools are virt-install, Oz, and appliance-creator. The most important thing about building a cloud image for OpenStack is to make sure that cloud-init is installed. Cloud-init is a script that should run post boot to connect back to the metadata service. Neutron Neutron is the network management component. With Keystone, we're authenticated, and from Glance, a disk image will be provided. The next resource required for launch is a virtual network. Neutron is an API frontend (and a set of agents) that manages the Software Defined Networking (SDN) infrastructure for you. When an OpenStack deployment is using Neutron, it means that each of your tenants can create virtual isolated networks. Each of these isolated networks can be connected to virtual routers to create routes between the virtual networks. A virtual router can have an external gateway connected to it, and external access can be given to each instance by associating a floating IP on an external network with an instance. Neutron then puts all configuration in place to route the traffic sent to the floating IP address through these virtual network resources into a launched instance. This is also called Networking as a Service (NaaS). NaaS is the capability to provide networks and network resources on demand via software. By default, the OpenStack distribution we will install uses Open vSwitch to orchestrate the underlying virtualized networking infrastructure. Open vSwitch is a virtual managed switch. As long as the nodes in your cluster have simple connectivity to each other, Open vSwitch can be the infrastructure configured to isolate the virtual networks for the tenants in OpenStack. There are also many vendor plugins that would allow you to replace Open vSwitch with a physical managed switch to handle the virtual networks. Neutron even has the capability to use multiple plugins to manage multiple network appliances. As an example, Open vSwitch and a vendor's appliance could be used in parallel to manage virtual networks in an OpenStack deployment. This is a great example of how OpenStack is built to provide flexibility and choice to its users. Networking is the most complex component of OpenStack to configure and maintain. This is because Neutron is built around core networking concepts. To successfully deploy Neutron, you need to understand these core concepts and how they interact with one another. Nova Nova is the instance management component. An authenticated user who has access to a Glance image and has created a network for an instance to live on is almost ready to tie all of this together and launch an instance. The last resources that are required are a key pair and a security group. A key pair is simply an SSH key pair. OpenStack will allow you to import your own key pair or generate one to use. When the instance is launched, the public key is placed in the authorized_keys file so that a password-less SSH connection can be made to the running instance. Before that SSH connection can be made, the security groups have to be opened to allow the connection to be made. A security group is a firewall at the cloud infrastructure layer. The OpenStack distribution we'll use will have a default security group with rules to allow instances to communicate with each other within the same security group, but rules will have to be added for Internet Control Message Protocol (ICMP), SSH, and other connections to be made from outside the security group. Once there's an image, network, key pair, and security group available, an instance can be launched. The resource's identifiers are provided to Nova, and Nova looks at what resources are being used on which hypervisors, and schedules the instance to spawn on a compute node. The compute node gets the Glance image, creates the virtual network devices, and boots the instance. During the boot, cloud-init should run and connect to the metadata service. The metadata service provides the SSH public key needed for SSH login to the instance and, if provided, any post-boot configuration that needs to happen. This could be anything from a simple shell script to an invocation of a configuration management engine. Cinder Cinder is the block storage management component. Volumes can be created and attached to instances. Then, they are used on the instances as any other block device would be used. On the instance, the block device can be partitioned and a file system can be created and mounted. Cinder also handles snapshots. Snapshots can be taken of the block volumes or of instances. Instances can also use these snapshots as a boot source. There is an extensive collection of storage backends that can be configured as the backing store for Cinder volumes and snapshots. By default, Logical Volume Manager (LVM) is configured. GlusterFS and Ceph are two popular software-based storage solutions. There are also many plugins for hardware appliances. Swift Swift is the object storage management component. Object storage is a simple content-only storage system. Files are stored without the metadata that a block file system has. These are simply containers and files. The files are simply content. Swift has two layers as part of its deployment: the proxy and the storage engine. The proxy is the API layer. It's the service that the end user communicates with. The proxy is configured to talk to the storage engine on the user's behalf. By default, the storage engine is the Swift storage engine. It's able to do software-based storage distribution and replication. GlusterFS and Ceph are also popular storage backends for Swift. They have similar distribution and replication capabilities to those of Swift storage. Ceilometer Ceilometer is the telemetry component. It collects resource measurements and is able to monitor the cluster. Ceilometer was originally designed as a metering system for billing users. As it was being built, there was a realization that it would be useful for more than just billing and turned into a general-purpose telemetry system. Ceilometer meters measure the resources being used in an OpenStack deployment. When Ceilometer reads a meter, it's called a sample. These samples get recorded on a regular basis. A collection of samples is called a statistic. Telemetry statistics will give insights into how the resources of an OpenStack deployment are being used. The samples can also be used for alarms. Alarms are nothing but monitors that watch for a certain criterion to be met. These alarms were originally designed for Heat autoscaling. Heat Heat is the orchestration component. Orchestration is the process of launching multiple instances that are intended to work together. In orchestration, there is a file, known as a template, used to define what will be launched. In this template, there can also be ordering or dependencies set up between the instances. Data that needs to be passed between the instances for configuration can also be defined in these templates. Heat is also compatible with AWS CloudFormation templates and implements additional features in addition to the AWS CloudFormation template language. To use Heat, one of these templates is written to define a set of instances that needs to be launched. When a template launches a collection of instances, it's called a stack. When a stack is spawned, the ordering and dependencies, shared conflagration data, and post-boot configuration are coordinated via Heat. Heat is not configuration management. It is orchestration. It is intended to coordinate launching the instances, passing configuration data, and executing simple post-boot configuration. A very common post-boot configuration task is invoking an actual configuration management engine to execute more complex post-boot configuration. Summary The list of components that have been covered is not the full list. This is just a small subset to get you started with using and understanding OpenStack. Resources for Article: Further resources on this subject: Creating Routers [article] Using OpenStack Swift [article] Troubleshooting in OpenStack Cloud Computing [article]
Read more
  • 0
  • 0
  • 1226
article-image-patterns-data-processing
Packt
29 Apr 2015
12 min read
Save for later

Patterns for Data Processing

Packt
29 Apr 2015
12 min read
In this article by Marcus Young, the author of the book Implementing Cloud Design Patterns for AWS, we will cover the following patterns: Queuing chain pattern Job observer pattern (For more resources related to this topic, see here.) Queuing chain pattern In the queuing chain pattern, we will use a type of publish-subscribe model (pub-sub) with an instance that generates work asynchronously, for another server to pick it up and work with. This is described in the following diagram: The diagram describes the scenario we will solve, which is solving fibonacci numbers asynchronously. We will spin up a Creator server that will generate random integers, and publish them into an SQS queue myinstance-tosolve. We will then spin up a second instance that continuously attempts to grab a message from the queue myinstance-tosolve, solves the fibonacci sequence of the numbers contained in the message body, and stores that as a new message in the myinstance-solved queue. Information on the fibonacci algorithm can be found at http://en.wikipedia.org/wiki/Fibonacci_number. This scenario is very basic as it is the core of the microservices architectural model. In this scenario, we could add as many worker servers as we see fit with no change to infrastructure, which is the real power of the microservices model. The first thing we will do is create a new SQS queue. From the SQS console select Create New Queue. From the Create New Queue dialog, enter myinstance-tosolve into the Queue Name text box and select Create Queue. This will create the queue and bring you back to the main SQS console where you can view the queues created. Repeat this process, entering myinstance-solved for the second queue name. When complete, the SQS console should list both the queues. In the following code snippets, you will need the URL for the queues. You can retrieve them from the SQS console by selecting the appropriate queue, which will bring up an information box. The queue URL is listed as URL in the following screenshot: Next, we will launch a creator instance, which will create random integers and write them into the myinstance-tosolve queue via its URL noted previously. From the EC2 console, spin up an instance as per your environment from the AWS Linux AMI. Once it is ready, SSH into it (note that acctarn, mykey, and mysecret need to be replaced with your actual credentials): [ec2-user@ip-10-203-10-170 ~]$ [[ -d ~/.aws ]] && rm -rf ~/.aws/config ||mkdir ~/.aws[ec2-user@ip-10-203-10-170 ~]$ echo $'[default]naws_access_key_id=mykeynaws_secret_access_key=mysecretnregion=us-east-1' > .aws/config[ec2-user@ip-10-203-10-170 ~]$ for i in {1..100}; dovalue=$(shuf -i 1-50 -n 1)aws sqs send-message --queue-url https://queue.amazonaws.com/acctarn/myinstance-tosolve --message-body ${value} >/dev/null 2>&1done Once the snippet completes, we should have 100 messages in the myinstance-tosolve queue, ready to be retrieved. Now that those messages are ready to be picked up and solved, we will spin up a new EC2 instance: again as per your environment from the AWS Linux AMI. Once it is ready, SSH into it (note that acctarn, mykey, and mysecret need to be valid and set to your credentials): [ec2-user@ip-10-203-10-169 ~]$ [[ -d ~/.aws ]] && rm -rf ~/.aws/config ||mkdir ~/.aws[ec2-user@ip-10-203-10-169 ~]$ echo $'[default]naws_access_key_id=mykeynaws_secret_access_key=mysecretnregion=us-east-1' > .aws/config[ec2-user@ip-10-203-10-169 ~]$ sudo yum install -y ruby-devel gcc >/dev/null 2>&1[ec2-user@ip-10-203-10-169 ~]$ sudo gem install json >/dev/null 2>&1[ec2-user@ip-10-203-10-169 ~]$ cat <<EOF | sudo tee -a /usr/local/bin/fibsqs >/dev/null 2>&1#!/bin/shwhile [ true ]; dofunction fibonacci {a=1b=1i=0while [ $i -lt $1 ]doprintf "%dn" $alet sum=$a+$blet a=$blet b=$sumlet i=$i+1done}message=$(aws sqs receive-message --queue-url https://queue.amazonaws.com/acctarn/myinstance-tosolve)if [[ -n $message ]]; thenbody=$(echo $message | ruby -e "require 'json'; p JSON.parse(gets)['Messages'][0]['Body']" | sed 's/"//g')handle=$(echo $message | ruby -e "require 'json'; p JSON.parse(gets)['Messages'][0]['ReceiptHandle']" | sed 's/"//g')aws sqs delete-message --queue-url https://queue.amazonaws.com/acctarn/myinstance-tosolve --receipt-handle $handleecho "Solving '${body}'."solved=$(fibonacci $body)parsed_solve=$(echo $solved | sed 's/n/ /g')echo "'${body}' solved."aws sqs send-message --queue-url https://queue.amazonaws.com/acctarn/myinstance-solved --message-body "${parsed_solve}"fisleep 1doneEOF[ec2-user@ip-10-203-10-169 ~]$ sudo chown ec2-user:ec2-user /usr/local/bin/fibsqs && chmod +x /usr/local/bin/fibsqs There will be no output from this code snippet yet, so now let's run the fibsqs command we created. This will continuously poll the myinstance-tosolve queue, solve the fibonacci sequence for the integer, and store it into the myinstance-solved queue: [ec2-user@ip-10-203-10-169 ~]$ fibsqsSolving '48'.'48' solved.{"MD5OfMessageBody": "73237e3b7f7f3491de08c69f717f59e6","MessageId": "a249b392-0477-4afa-b28c-910233e7090f"}Solving '6'.'6' solved.{"MD5OfMessageBody": "620b0dd23c3dddbac7cce1a0d1c8165b","MessageId": "9e29f847-d087-42a4-8985-690c420ce998"} While this is running, we can verify the movement of messages from the tosolve queue into the solved queue by viewing the Messages Available column in the SQS console. This means that the worker virtual machine is in fact doing work, but we can prove that it is working correctly by viewing the messages in the myinstance-solved queue. To view messages, right click on the myinstance-solved queue and select View/Delete Messages. If this is your first time viewing messages in SQS, you will receive a warning box that displays the impact of viewing messages in a queue. Select Start polling for Messages. From the View/Delete Messages in myinstance-solved dialog, select Start Polling for Messages. We can now see that we are in fact working from a queue. Job observer pattern The previous two patterns show a very basic understanding of passing messages around a complex system, so that components (machines) can work independently from each other. While they are a good starting place, the system as a whole could improve if it were more autonomous. Given the previous example, we could very easily duplicate the worker instance if either one of the SQS queues grew large, but using the Amazon-provided CloudWatch service we can automate this process. Using CloudWatch, we might end up with a system that resembles the following diagram: For this pattern, we will not start from scratch but directly from the previous priority queuing pattern. The major difference between the previous diagram and the diagram displayed in the priority queuing pattern is the addition of a CloudWatch alarm on the myinstance-tosolve-priority queue, and the addition of an auto scaling group for the worker instances. The behavior of this pattern is that we will define a depth for our priority queue that we deem too high, and create an alarm for that threshold. If the number of messages in that queue goes beyond that point, it will notify the auto scaling group to spin up an instance. When the alarm goes back to OK, meaning that the number of messages is below the threshold, it will scale down as much as our auto scaling policy allows. Before we start, make sure any worker instances are terminated. The first thing we should do is create an alarm. From the CloudWatch console in AWS, click Alarms on the side bar and select Create Alarm. From the new Create Alarm dialog, select Queue Metrics under SQS Metrics. This will bring us to a Select Metric section. Type myinstance-tosolve-priority ApproximateNumberOfMessagesVisible into the search box and hit Enter. Select the checkbox for the only row and select Next. From the Define Alarm, make the following changes and then select Create Alarm: In the Name textbox, give the alarm a unique name. In the Description textbox, give the alarm a general description. In the Whenever section, set 0 to 1. In the Actions section, click Delete for the only Notification. In the Period drop-down, select 1 Minute. In the Statistic drop-down, select Sum. Now that we have our alarm in place, we need to create a launch configuration and auto scaling group that refers this alarm. Create a new launch configuration from the AWS Linux AMI with details as per your environment. However, set the user data to (note that acctarn, mykey, and mysecret need to be valid): #!/bin/bash[[ -d /home/ec2-user/.aws ]] && rm -rf /home/ec2-user/.aws/config ||mkdir /home/ec2-user/.awsecho $'[default]naws_access_key_id=mykeynaws_secret_access_key=mysecretnregion=us-east-1' > /home/ec2-user/.aws/configchown ec2-user:ec2-user /home/ec2-user/.aws -Rcat <<EOF >/usr/local/bin/fibsqs#!/bin/shfunction fibonacci {a=1b=1i=0while [ $i -lt $1 ]doprintf "%dn" $alet sum=$a+$blet a=$blet b=$sumlet i=$i+1done}number="$1"solved=$(fibonacci $number)parsed_solve=$(echo $solved | sed 's/n/ /g')aws sqs send-message --queue-url https://queue.amazonaws.com/acctarn/myinstance-solved --message-body "${parsed_solve}"exit 0EOFchown ec2-user:ec2-user /usr/local/bin/fibsqschmod +x /usr/local/bin/fibsqsyum install -y libxml2 libxml2-devel libxslt libxslt-devel gcc ruby-devel>/dev/null 2>&1gem install nokogiri -- --use-system-libraries >/dev/null 2>&1gem install shoryuken >/dev/null 2>&1cat <<EOF >/home/ec2-user/config.ymlaws:access_key_id: mykeysecret_access_key: mysecretregion: us-east-1 # or <%= ENV['AWS_REGION'] %>receive_message:attributes:- receive_count- sent_atconcurrency: 25, # The number of allocated threads to process messages.Default 25delay: 25, # The delay in seconds to pause a queue when it'sempty. Default 0queues:- [myinstance-tosolve-priority, 2]- [myinstance-tosolve, 1]EOFcat <<EOF >/home/ec2-user/worker.rbclass MyWorkerinclude Shoryuken::Workershoryuken_options queue: 'myinstance-tosolve', auto_delete: truedef perform(sqs_msg, body)puts "normal: #{body}"%x[/usr/local/bin/fibsqs #{body}]endendclass MyFastWorkerinclude Shoryuken::Workershoryuken_options queue: 'myinstance-tosolve-priority', auto_delete:truedef perform(sqs_msg, body)puts "priority: #{body}"%x[/usr/local/bin/fibsqs #{body}]endendEOFchown ec2-user:ec2-user /home/ec2-user/worker.rb /home/ec2-user/config.ymlscreen -dm su - ec2-user -c 'shoryuken -r /home/ec2-user/worker.rb -C /home/ec2-user/config.yml' Next, create an auto scaling group that uses the launch configuration we just created. The rest of the details for the auto scaling group are as per your environment. However, set it to start with 0 instances and do not set it to receive traffic from a load balancer. Once the auto scaling group has been created, select it from the EC2 console and select Scaling Policies. From here, click Add Policy to create a policy similar to the one shown in the following screenshot and click Create: Next, we get to trigger the alarm. To do this, we will again submit random numbers into both the myinstance-tosolve and myinstance-tosolve-priority queues: [ec2-user@ip-10-203-10-79 ~]$ [[ -d ~/.aws ]] && rm -rf ~/.aws/config ||mkdir ~/.aws[ec2-user@ip-10-203-10-79 ~]$ echo $'[default]naws_access_key_id=mykeynaws_secret_access_key=mysecretnregion=us-east-1' > .aws/config[ec2-user@ip-10-203-10-79 ~]$ for i in {1..100}; dovalue=$(shuf -i 1-50 -n 1)aws sqs send-message --queue-url https://queue.amazonaws.com/acctarn/myinstance-tosolve --message-body ${value} >/dev/null 2>&1done[ec2-user@ip-10-203-10-79 ~]$ for i in {1..100}; dovalue=$(shuf -i 1-50 -n 1)aws sqs send-message --queue-url https://queue.amazonaws.com/acctarn/myinstance-tosolvepriority--message-body ${value} >/dev/null 2>&1done After five minutes, the alarm will go into effect and our auto scaling group will launch an instance to respond to it. This can be viewed from the Scaling History tab for the auto scaling group in the EC2 console. Even though our alarm is set to trigger after one minute, CloudWatch only updates in intervals of five minutes. This is why our wait time was not as short as our alarm. Our auto scaling group has now responded to the alarm by launching an instance. Launching an instance by itself will not resolve this, but using the user data from the Launch Configuration, it should configure itself to clear out the queue, solve the fibonacci of the message, and finally submit it to the myinstance-solved queue. If this is successful, our myinstance-tosolve-priority queue should get emptied out. We can verify from the SQS console as before. And finally, our alarm in CloudWatch is back to an OK status. We are now stuck with the instance because we have not set any decrease policy. I won't cover this in detail, but to set it, we would create a new alarm that triggers when the message count is a lower number such as 0, and set the auto scaling group to decrease the instance count when that alarm is triggered. This would allow us to scale out when we are over the threshold, and scale in when we are under the threshold. This completes the final pattern for data processing. Summary In this article, in the queuing chain pattern, we walked through creating independent systems that use the Amazon-provided SQS service that solve fibonacci numbers without interacting with each other directly. Then, we took the topic even deeper in the job observer pattern, and covered how to tie in auto scaling policies and alarms from the CloudWatch service to scale out when the priority queue gets too deep. Resources for Article: Further resources on this subject: In the Cloud [Article] Mobile Administration [Article] Testing Your Recipes and Getting Started with ChefSpec [Article]
Read more
  • 0
  • 0
  • 3396

article-image-high-availability-protection-and-recovery-using-microsoft-azure
Packt
02 Apr 2015
23 min read
Save for later

High Availability, Protection, and Recovery using Microsoft Azure

Packt
02 Apr 2015
23 min read
Microsoft Azure can be used to protect your on-premise assets such as virtual machines, applications, and data. In this article by Marcel van den Berg, the author of Managing Microsoft Hybrid Clouds, you will learn how to use Microsoft Azure to store backup data, replicate data, and even for orchestration of a failover and failback of a complete data center. We will focus on the following topics: High Availability in Microsoft Azure Introduction to geo-replication Disaster recovery using Azure Site Recovery (For more resources related to this topic, see here.) High availability in Microsoft Azure One of the most important limitations of Microsoft Azure is the lack of an SLA for single-instance virtual machines. If a virtual machine is not part of an availability set, that instance is not covered by any kind of SLA. The reason for this is that when Microsoft needs to perform maintenance on Azure hosts, in many cases, a reboot is required. Reboot means the virtual machines on that host will be unavailable for a while. So, in order to accomplish High Availability for your application, you should have at least two instances of the application running at any point in time. Microsoft is working on some sort of hot patching which enables virtual machines to remain active on hosts being patched. Details are not available at the moment of writing. High Availability is a crucial feature that must be an integral part of an architectural design, rather than something that can be "bolted on" to an application afterwards. Designing for High Availability involves leveraging both the development platform as well as available infrastructure in order to ensure an application's responsiveness and overall reliability. The Microsoft Azure Cloud platform offers software developers PaaS extensibility features and network administrators IaaS computing resources that enable availability to be built into an application's design from the beginning. The good news is that organizations with mission-critical applications can now leverage core features within the Microsoft Azure platform in order to deploy highly available, scalable, and fault-tolerant cloud services that have been shown to be more cost-effective than traditional approaches that leverage on-premises systems. Microsoft Failover Clustering support Windows Server Failover Clustering (WSFC) is not supported on Azure. However, Microsoft does support SQL Server AlwaysOn Availability Groups. For AlwaysOn Availability Groups, there is currently no support for availability group listeners in Azure. Also, you must work around a DHCP limitation in Azure when creating WSFC clusters in Azure. After you create a WSFC cluster using two Azure virtual machines, the cluster name cannot start because it cannot acquire a unique virtual IP address from the DHCP service. Instead, the IP address assigned to the cluster name is a duplicate address of one of the nodes. This has a cascading effect that ultimately causes the cluster quorum to fail, because the nodes cannot properly connect to one another. So if your application uses Failover Clustering, it is likely that you will not move it over to Azure. It might run, but Microsoft will not assist you when you encounter issues. Load balancing Besides clustering, we can also create highly available nodes using load balancing. Load balancing is useful for stateless servers. These are servers that are identical to each other and do not have a unique configuration or data. When two or more virtual machines deliver the same application logic, you will need a mechanism that is able to redirect network traffic to those virtual machines. The Windows Network Load Balancing (NLB) feature in Windows Server is not supported on Microsoft Azure. An Azure load balancer does exactly this. It analyzes incoming network traffic of Azure, determines the type of traffic, and reroutes it to a service.   The Azure load balancer is running provided as a cloud service. In fact, this cloud service is running on virtual appliances managed by Microsoft. These are completely software-defined. The moment an administrator adds an endpoint, a set of load balancers is instructed to pass incoming network traffic on a certain port to a port on a virtual machine. If a load balancer fails, another one will take over. Azure load balancing is performed at layer 4 of the OSI mode. This means the load balancer is not aware of the application content of the network packages. It just distributes packets based on network ports. To load balance over multiple virtual machines, you can create a load-balanced set by performing the following steps: In Azure Management Portal, select the virtual machine whose service should be load balanced. Select Endpoints in the upper menu. Click on Add. Select Add a stand-alone endpoint and click on the right arrow. Select a name or a protocol and set the public and private port. Enable create a load-balanced set and click on the right arrow. Next, fill in a name for the load-balanced set. Fill in the probe port, the probe interval, and the number of probes. This information is used by the load balancer to check whether the service is available. It will connect to the probe port number; do that according to the interval. If the specified number of probes all result in unable to connect, the load balancer will no longer distribute traffic to this virtual machine. Click on the check mark. The load balancing mechanism available is based on a hash. Microsoft Azure Load Balancer uses a five tuple (source IP, source port, destination IP, destination port, and protocol type) to calculate the hash that is used to map traffic to the available servers. A second load balancing mode was introduced in October 2014. It is called Source IP Affinity (also known as session affinity or client IP affinity). On using Source IP affinity, connections initiated from the same client computer go to the same DIP endpoint. These load balancers provide high availability inside a single data center. If a virtual machine part of a cluster of instances fails, the load balancer will notice this and remove that virtual machine IP address from a table. However, load balancers will not protect for failure of a complete data center. The domains that are used to direct clients to an application will route to a particular virtual IP that is bound to an Azure data center. To access application even if an Azure region has failed, you can use Azure Traffic Manager. This service can be used for several purposes: To failover to a different Azure region if a disaster occurs To provide the best user experience by directing network traffic to Azure region closest to the location of the user To reroute traffic to another Azure region whenever there's any planned maintenance The main task of Traffic Manager is to map a DNS query to an IP address that is the access point of a service. This job can be compared for example with a job of someone working with the X-ray machine at an airport. I'm guessing that you have all seen those multiple rows of X-ray machines. Each queue at an X-ray machine is different at any moment. An officer standing at the entry of the area distributes people over the available X-rays machine such that all queues remain equal in length. Traffic Manager provides you with a choice of load-balancing methods, including performance, failover, and round-robin. Performance load balancing measures the latency between the client and the cloud service endpoint. Traffic Manager is not aware of the actual load on virtual machines servicing applications. As Traffic Manager resolved endpoints of Azure cloud services only, it cannot be used for load balancing between an Azure region and a non-Azure region (for example, Amazon EC2) or between on-premises and Azure services. It will perform health checks on a regular basis. This is done by querying the endpoints of the services. If the endpoint does not respond, Traffic Manager will stop distributing network traffic to that endpoint for as long as the state of the endpoint is unavailable. Traffic Manager is available in all Azure regions. Microsoft charges for using this service based on the number of DNS queries that are received by Traffic Manager. As the service is attached to an Azure subscription, you will be required to contact Azure support to transfer Traffic Manager to a different subscription. The following table shows the difference between Azure's built-in load balancer and Traffic Manager:   Load balancer Traffic Manager Distribution targets Must reside in same region Can be across regions Load balancing 5 tuple Source IP Affinity  Performance, failover, and round-robin Level OSI layer 4 TCP/UDP ports OSI Layer 4 DNS queries Third-party load balancers In certain configurations, the default Azure load balancer might not be sufficient. There are several vendors supporting or starting to support Azure. One of them is Kemp Technologies. Kemp Technologies offers a free load balancer for Microsoft Azure. The Virtual LoadMaster (VLM) provides layer 7 application delivery. The virtual appliance has some limitations compared to the commercially available unit. The maximum bandwidth is limited to 100 Mbps and High Availability is not offered. This means the Kemp LoadMaster for Azure free edition is a single point of failure. Also, the number of SSL transactions per second is limited. One of the use cases in which a third-party load balancer is required is when we use Microsoft Remote Desktop Gateway. As you might know, Citrix has been supporting the use of Citrix XenApp and Citrix XenDesktop running on Azure since 2013. This means service providers can offer cloud-based desktops and applications using these Citrix solutions. To make this a working configuration, session affinity is required. Session affinity makes sure that network traffic is always routed over the same server. Windows Server 2012 Remote Desktop Gateway uses two HTTP channels, one for input and one for output, which must be routed over the same Remote Desktop Gateway. The Azure load balancer is only able to do round-robin load balancing, which does not guarantee both channels using the same server. However, hardware and software load balancers that support IP affinity, cookie-based affinity, or SSL ID-based affinity (and thus ensure that both HTTP connections are routed to the same server) can be used with Remote Desktop Gateway. Another use case is load balancing of Active Directory Federation Services (ADFS). Microsoft Azure can be used as a backup for on-premises Active Directory (AD). Suppose your organization is using Office 365. To provide single sign-on, a federation has been set up between Office 365 directory and your on-premises AD. If your on-premises ADFS fails, external users would not be able to authenticate. By using Microsoft Azure for ADFS, you can provide high availability for authentication. Kemp LoadMaster for Azure can be used to load balance network traffic to ADFS and is able to do proper load balancing. To install Kemp LoadMaster, perform the following steps: Download the Publish Profile settings file from https://windows.azure.com/download/publishprofile.aspx. Use PowerShell for Azure with the Import-AzurePublishSettingsFile command. Upload the KEMP supplied VHD file to your Microsoft Azure storage account. Publish the VHD as an image. The VHD will be available as an image. The image can be used to create virtual machines. The complete steps are described in the documentation provided by Kemp. Geo-replication of data Microsoft Azure has geo-replication of Azure Storage enabled by default. This means all of your data is not only stored at three different locations in the primary region, but also replicated and stored at three different locations at the paired region. However, this data cannot be accessed by the customer. Microsoft has to declare a data center or storage stamp as lost before Microsoft will failover to the secondary location. In the rare circumstance where a failed storage stamp cannot be recovered, you will experience many hours of downtime. So, you have to make sure you have your own disaster recovery procedures in place. Zone Redundant Storage Microsoft offers a third option you can use to store data. Zone Redundant Storage (ZRS) is a mix of two options for data redundancy and allows data to be replicated to a secondary data center / facility located in the same region or to a paired region. Instead of storing six copies of data like geo-replicated storage does, only three copies of data are stored. So, ZRS is a mix of local redundant storage and geo-replicated storage. The cost for ZRS is about 66 percent of the cost for GRS. Snapshots of the Microsoft Azure disk Server virtualization solutions such as Hyper-V and VMware vSphere offer the ability to save the state of a running virtual machine. This can be useful when you're making changes to the virtual machine but want to have the ability to reverse those changes if something goes wrong. This feature is called a snapshot. Basically, a virtual disk is saved by marking it as read only. All writes to the disk after a snapshot has been initiated are stored on a temporary virtual disk. When a snapshot is deleted, those changes are committed from the delta disk to the initial disk. While the Microsoft Azure Management Portal does not have a feature to create snapshots, there is an ability to make point-in-time copies of virtual disks attached to virtual machines. Microsoft Azure Storage has the ability of versioning. Under the hood, this works differently than snapshots in Hyper-V. It creates a snapshot blob of the base blob. Snapshots are by no ways a replacement for a backup, but it is nice to know you can save the state as well as quickly reverse if required. Introduction to geo-replication By default, Microsoft replicates all data stored on Microsoft Azure Storage to the secondary location located in the paired region. Customers are able to enable or disable the replication. When enabled, customers are charged. When Geo Redundant Storage has been enabled on a storage account, all data is asynchronous replicated. At the secondary location, data is stored on three different storage nodes. So even when two nodes fail, the data is still accessible. However, before the read access Geo-Redundant feature was available, customers had no way to actually access replicated data. The replicated data could only be used by Microsoft when the primary storage could not be recovered again. Microsoft will try everything to restore data in the primary location and avoid a so-called geo-failover process. A geo-failover process means that a storage account's secondary location (the replicated data) will be configured as the new primary location. The problem is that a geo-failover process cannot be done per storage account, but needs to be done at the storage stamp level. A storage stamp has multiple racks of storage nodes. You can imagine how much data and how many customers are involved when a storage stamp needs to failover. Failover will have an effect on the availability of applications. Also, because of the asynchronous replication, some data will be lost when a failover is performed. Microsoft is working on an API that allows customers to failover a storage account themselves. When geo-redundant replication is enabled, you will only benefit from it when Microsoft has a major issue. Geo-redundant storage is neither a replacement for a backup nor for a disaster recovery solution. Microsoft states that the Recover Point Objective (RPO) for Geo Redundant Storage will be about 15 minutes. That means if a failover is required, customers can lose about 15 minutes of data. Microsoft does not provide a SLA on how long geo-replication will take. Microsoft does not give an indication for the Recovery Time Objective (RTO). The RTO indicates the time required by Microsoft to make data available again after a major failure that requires a failover. Microsoft once had to deal with a failure of storage stamps. They did not do a failover but it took many hours to restore the storage service to a normal level. In 2013, Microsoft introduced a new feature called Read Access Geo Redundant Storage (RA-GRS). This feature allows customers to perform reads on the replicated data. This increases the read availability from 99.9 percent when GRS is used to above 99.99 percent when RA-GRS is enabled. Microsoft charges more when RA-GRS is enabled. RA-GRS is an interesting addition for applications that are primarily meant for read-only purposes. When the primary location is not available and Microsoft has not done a failover, writes are not possible. The availability of the Azure Virtual Machine service is not increased by enabling RA-GRS. While the VHD data is replicated and can be read, the virtual machine itself is not replicated. Perhaps this will be a feature for the future. Disaster recovery using Azure Site Recovery Disaster recovery has always been on the top priorities for organizations. IT has become a very important, if not mission-critical factor for doing business. A failure of IT could result in loss of money, customers, orders, and brand value. There are many situations that can disrupt IT such as: Hurricanes Floods Earthquakes Disasters such as a failure of a nuclear power plant Fire Human error Outbreak of a virus Hardware or software failure While these threads are clear and the risk of being hit by such a thread can be calculated, many organizations do not have a proper protection against those threads. In three different situations, disaster recovery solutions can help an organization to continue doing business: Avoiding a possible failure of IT infrastructure by moving servers to a different location. Avoiding a disaster situation, such as hurricanes or floods, since such situations are generally well known in advance due to weather forecasting capabilities. Recovering as quickly as possible when a disaster has hit the data center. Disaster recovery is done when a disaster unexpectedly hit the data center, such as a fire, hardware error, or human error. Some reasons for not having a proper disaster recovery plan are complexity, lack of time, and ignorance; however, in most cases, a lack of budget and the belief that disaster recovery is expensive are the main reasons. Almost all organizations that have been hit by a major disaster causing unacceptable periods of downtime started to implement a disaster recovery plan, including technology immediately after they recovered. However, in many cases, this insight came too late. According to Gartner, 43 percent of companies experiencing disasters never reopen and 29 percent close within 2 years. Server virtualization has made disaster recovery a lot easier and cost effective. Verifying that your DR procedure actually works as designed and matches RTO and RPO is much easier using virtual machines. Since Windows Server 2012, Hyper-V has a feature for asynchronous replication of virtual machine virtual disks to another location. This feature, Hyper-V Replica, is very easy to enable and configure. It does not cost extra. Hyper-V Replica is storage agnostic, which means the storage type at the primary site can be different than the storage type used in the secondary site. So, Hyper-V Replica perfectly works when your virtual machines are hosted on, for example, EMC storage while in the secondary a HP solution is used. While replication is a must for DR, another very useful feature in DR is automation. As an administrator, you really appreciate the option to click on a button after deciding to perform a failover and sit back and relax. Recovery is mostly a stressful job when your primary location is flooded or burned and lots of things can go wrong if recovery is done manually. This is why Microsoft designed Azure Site Recovery. Azure Site Recovery is able to assist in disaster recovery in several scenarios: A customer has two data centers both running Hyper-V managed by System Center Virtual Machine Manager. Hyper-V Replica is used to replicate data at the virtual machine level. A customer has two data centers both running Hyper-V managed by System Center Virtual Machine Manager. NetApp storage is used to replicate between two sites at the storage level. A customer has a single data center running Hyper-V managed by System Center Virtual Machine Manager. A customer has two data centers both running VMware vSphere. In this case InMage Scout software is used to replicate between two datacenters. Azure is not used for orchestration. A customer has a single data centers not managed by System Center Virtual Machine Manager. In the second scenario, Microsoft Azure is used as a secondary data center if a disaster makes the primary data center unavailable. Microsoft announced also to support a scenario where vSphere is used on-premises and Azure Site Recovery can be used to replicate data to Azure. To enable this InMage software will be used. Details were not available at the time this article was written. In the first two described scenarios Site Recovery is used to orchestrate the failover and failback to the secondary location. The management is done using Azure Management Portal. This is available using any browser supporting HTML5. So a failover can be initiated even from a tablet or smartphone. Using Azure as a secondary data center for disaster recovery Azure Site Recovery went into preview in June 2014. For organizations using Hyper-V, there is no direct need to have a secondary data center as Azure can be used as a target for Hyper-V Replica. Some of the characteristics of the service are as follows: Allows nondisruptive disaster recovery failover testing Automated reconfigure of network configuration of guests Storage agnostic supports any type of on-premises storage supported by Hyper-V Support for VSS to enable application consistency Protects more than 1,000 virtual machines (Microsoft tested with 2,000 virtual machines and this went well) To be able to use Site Recovery, customers do not have to use System Center Virtual Machine Manager. Site Recovery can be used without this installed. System Center Virtual Machine Manager. Site Recovery will use information such as? virtual networks provided by SCVMM to map networks available in Microsoft Azure. Site Recovery does not support the ability to send a copy of the virtual hard disks on removable media to an Azure data center to prevent the initial replication using WAN (seeding). Customers will need to transfer all the replication data over the network. ExpressRoute will help to get a much better throughput compared to a site-to-site VPN over the Internet. Failover to Azure can be as simple as clicking on a single button. Site Recovery will then create new virtual machines in Azure and start the virtual machines in the order defined in the recovery plan. A recovery plan is a workflow that defines the startup sequence of virtual machines. It is possible to stop the recovery plan to allow a manual check, for example. If all is okay, the recovery plan will continue doing its job. Multiple recovery plans can be created. Microsoft Volume Shadow Copy Services (VSS) is supported. This allows application consistency. Replication of data can be configured at intervals of 15 seconds, 5 minutes, or 15 minutes. Replication is performed asynchronously. For recovery, 24 recovery points are available. These are like snapshots or point-in-time copies. If the most recent replica cannot be used (for example, because of damaged data), another replica can be used for restore. You can configure extended replication. In extended replication, your Replica server forwards changes that occur on the primary virtual machines to a third server (the extended Replica server). After a planned or unplanned failover from the primary server to the Replica server, the extended Replica server provides further business continuity protection. As with ordinary replication, you configure extended replication by using Hyper-V Manager, Windows PowerShell (using the –Extended option), or WMI. At the moment, only VHD virtual disk format is supported. Generation 2 virtual machines that can be created on Hyper-V are not supported by Site Recovery. Generation 2 virtual machines have a simplified virtual hardware model and support Unified Extensible Firmware Interface (UEFI) firmware instead of BIOS-based firmware. Also, boot from PXE, SCSI hard disk, SCSCI DVD, and Secure Boot are supported in Generation 2 virtual machines. However on March 19 Microsoft responded to numerous customer requests on support of Site Recovery for Generation 2 virtual machines. Site Recovery will soon support Gen 2 VM's. On failover, the VM will be converted to a Gen 1 VM. On failback, the VM will be converted to Gen 2. This conversion is done till the Azure platform natively supports Gen 2 VM's. Customers using Site Recovery are charged only for consumption of storage as long as they do not perform a failover or failover test. Failback is also supported. After running for a while in Microsoft Azure customers are likely to move their virtual machines back to the on-premises, primary data center. Site Recovery will replicate back only the changed data. Mind that customer data is not stored in Microsoft Azure when Hyper-V Recovery Manager is used. Azure is used to coordinate the failover and recovery. To be able to do this, it stores information on network mappings, runbooks, and names of virtual machines and virtual networks. All data sent to Azure is encrypted. By using Azure Site Recovery, we can perform service orchestration in terms of replication, planned failover, unplanned failover, and test failover. The entire engine is powered by Azure Site Recovery Manager. Let's have a closer look on the main features of Azure Site Recovery. It enables three main scenarios: Test Failover or DR Drills: Enable support for application testing by creating test virtual machines and networks as specified by the user. Without impacting production workloads or their protection, HRM can quickly enable periodic workload testing. Planned Failovers (PFO): For compliance or in the event of a planned outage, customers can use planned failovers, virtual machines are shutdown, final changes are replicated to ensure zero data loss, and then virtual machines are brought up in order on the recovery site as specified by the RP. More importantly, failback is a single-click gesture that executes a planned failover in the reverse direction. Unplanned Failovers (UFO): In the event of unplanned outage or a natural disaster, HRM opportunistically attempts to shut down the primary machines if some of the virtual machines are still running when the disaster strikes. It then automates their recovery on the secondary site as specified by the RP. If your secondary site uses a different IP subnet, Site Recovery is able to change the IP configuration of your virtual machines during the failover. Part of the Site Recovery installation is the installation of a VMM provider. This component communicates with Microsoft Azure. Site Recovery can be used even if you have a single VMM to manage both primary and secondary sites. Site Recovery does not rely on availability of any component in the primary site when performing a failover. So it doesn't matter if the complete site including link to Azure has been destroyed, as Site Recovery will be able to perform the coordinated failover. Azure Site Recovery to customer owned sites is billed per protected virtual machine per month. The costs are approximately €12 per month. Microsoft bills for the average consumption of virtual machines per month. So if you are protecting 20 virtual machines in the first half and 0 in the second half, you will be charged for 10 virtual machines for that month. When Azure is used as a target, Microsoft will only charge for consumption of storage during replication. The costs for this scenario are €40.22/month per instance protected. As soon as you perform a test failover or actual failover Microsoft will charge for the virtual machine CPU and memory consumption. Summary Thus this article has covered the concepts of High Availability in Microsoft Azure and disaster recovery using Azure Site Recovery, and also gives an introduction to the concept of geo-replication. Resources for Article: Further resources on this subject: Windows Azure Mobile Services - Implementing Push Notifications using [article] Configuring organization network services [article] Integration with System Center Operations Manager 2012 SP1 [article]
Read more
  • 0
  • 0
  • 2544