Thursday 9 July 2015

ManagingGlusterVolumeSnapshotsUsingOvirt

Gluster - An Introduction
Gluster is a scale-out, software only, distributed file system which provides ability to store peta-bytes of data. Gluster can support thousands of clients. It is a file-system completely in user space and runs on commodity hardware. Gluster aggregates storage exports over network interconnect to provide a single unified namespace.



oVirt - An Introduction
oVirt is an open source virtualization management platform to manage virtual machines, storage and networks. It's an alternative to VMware vSphere and provides an awesome KVM management interface for multi-node virtualization. oVirt has two main components -
  • Engine (oVirt Engine) - Manages the oVirt hosts and allows the system administrators to create and deploy new Vms. It also supports Gluster volumes management.
  • Host Agent (VDSM) - oVirt engine communicates with VDSM (Virtaul Desktop Server Manager) to manage the VMs, storage and networks.
 
Gluster Volume Snapshots

Gluster volume snapshots feature provides an on-line crash consistency mechanism for the Gluster volumes. Volume snapshots are actually read only, point in time view of the volume itself. In a case of inconsistency, these snapshots could be used to restore the volume to a consistent stage. The snapshots are also a mechanism of volume backup for future references.
oVirt 3.6 had added features to manage Gluster volume snapshots using which an administrator can take scheduled or unscheduled snapshots of a volume and thereby backup a Gluster volume. Using this feature in oVirt, you can create, schedule, list, delete, activate, de-activate snapshots. It also provides a mechanism to restore a volume to the state of a given snapshot.
To read more on Gluster Volume Snapshot feature, refer https://forge.gluster.org/snapshot/pages/Home

Gluster Volume Snapshot management in oVirt
All the Gluster volume snapshot related actions are put under menu option Snapshot for main tab Volumes in oVirt UI. Under Volumes main tab, there is sub-tab named Snapshots which lists all the available volume snapshots for a selected volume from the list. This sub-tab also provides snapshot level actions like Restore, Delete, Delete All, Activate and Deactivate.
The menu option Snapshot under main tab Volumes, provides actions for creating volume snapshots, scheduling volume snapshot creation and editing of an existing volume snapshot creation schedule for a selected volume. It also provides actions for maintenance of volume snapshot configuration parameters which could be set at volume level or cluster level.



1. Volume Snapshot Creation
You can select the menu option Snapshot -> New under main tab Volumes for creating a snapshot for the selected volume. Provide the snapshot name prefix value and optional description for the snapshot to create a snapshot for the volume. Gluster appends the time zone and time stamp details while creation of the snapshot. The snapshot name created would be of the form <prefix>_<time zone name (e.g. GMT)>-yyyy.MM.dd-hh.mm.ss. All the created snapshot are by default in de-activated state. You need to activate a snapshot to make is usable (mount and read).

2. Scheduling volume snapshot creation
You can select the same menu option Snapshot -> New under main tab Volumes for creating a volume snapshot schedule. The only difference is to select the tab Schedule in the dialog prompted for snapshot schedule creation. As usual provide the snapshot name prefix and optional description for the snapshots in General tab of the dialog. Then in Schedule tab, provide, the schedule details.
In the Schedule tab of the dialog first thing to select is the Recurrence type. Different types of available Recurrence types are
  • Minute - In this case you can schedule snapshot creation every n-minutes
  • Hourly - In this case you can select hourly creation of snapshots
  • Daily - In this case you can select daily snapshot creation at a defined time
  • Weekly - In this case you can select weekly snapshot creation on specified days of the week and at specified time
  • Monthly - In this case you can select snapshot creation on specified days of the month and at specified time
2.1 Scheduling a snapshot creation every n-minutes
In this recurrence type, you need to provide below details for scheduling the snapshot creation
  • Interval - at the interval of n-minutes the snapshots would be created
  • Time Zone - time zone in which the schedule would be created
  • Start Schedule by - at what time the schedule should start creating snapshots
  • End by - select the end by option to mention if there is an end date for schedule or no end date
  • End Schedule by - if end by option is selected as "Date" select an end date for the schedule using this option
2.2 Scheduling a snapshot creation hourly
In this recurrence type, you need to provide below details for scheduling the snapshot creation
  • Time Zone - time zone in which the schedule would be created
  • Start Schedule by - at what time the schedule should start creating snapshots
  • End by - select the end by option to mention if there is an end date for schedule or no end date
  • End Schedule by - if end by option is selected as "Date" select an end date for the schedule using this option
  
2.3 Scheduling a snapshot creation daily
In this recurrence type, you need to provide below details for scheduling the snapshot creation
  • Time Zone - time zone in which the schedule would be created
  • Start Schedule by - at what time the schedule should start creating snapshots
  • Execute At - time of day at which snapshot creation should happen daily
  • End by - select the end by option to mention if there is an end date for schedule or no end date
  • End Schedule by - if end by option is selected as "Date" select an end date for the schedule using this option

 2.4 Scheduling a snapshot creation weekly
In this recurrence type, you need to provide below details for scheduling the snapshot creation
  • Time Zone - time zone in which the schedule would be created
  • Start Schedule by - at what time the schedule should start creating snapshots
  • Execute At - time of day at which snapshot creation should happen daily
  • Days Of Week - select the week days on which snapshot should be created
  • End by - select the end by option to mention if there is an end date for schedule or no end date
  • End Schedule by - if end by option is selected as "Date" select an end date for the schedule using this option

2.5 Scheduling a snapshot creation monthly
In this recurrence type, you need to provide below details for scheduling the snapshot creation
  • Time Zone - time zone in which the schedule would be created
  • Start Schedule by - at what time the schedule should start creating snapshots
  • Execute At - time of day at which snapshot creation should happen daily
  • Days Of month - select the month days on which snapshot should be created (if last day of month selected, that could be only selected day)
  • End by - select the end by option to mention if there is an end date for schedule or no end date
  • End Schedule by - if end by option is selected as "Date" select an end date for the schedule using this option


3. Editing Volume Snapshot schedule
If there is a volume snapshot schedule available for a selected volume, the menu option Snapshot -> Edit Schedule gets enabled for it. Also there is an indicator under Info column of volumes listing showing that volume snapshot creation is enabled for the volume. Select the menu option Snapshot -> Edit Schedule to modify the details of the schedule including snapshot name prefix, description and recurrence details. The pre-populated values are displayed in the dialog for the volume which could be modified and saved.

Once there is a volume snapshot schedule available for a volume, the new snapshot dialog does not show the tab Schedule, so effectively there is only one snapshot schedule possible for a volume. If a schedule available for a volume, this option Snapshot -> New can be used for one-off creation of volume snapshot though.


4. List/Edit volume level snapshot configuration parameters
Select the menu option Snapshot -> Options - Volume for listing and editing the snapshot configuration parameters. The only configuration parameter which can be set at volume level is snap-max-hard-limit, which represent the number of maximum snapshots possible  for a volume. The cluster level value for the parameter is also listed for reference in the dialog.



5. List/Edit cluster level snapshot configuration parameters
Select the menu option Snapshot -> Options - Cluster for listing and editing the snapshot configuration parameters. The configuration parameters which can be set at cluster level are -
  • snap-max-hard-limit - which represent the number of maximum snapshots possible for volumes of this cluster.If specific value is set for a volume that overrides the cluster level value.
  • activate-on-create - valid values are enable and disable. This flag represents whether a newly created snapshot in the cluster should be auto activated.
  • auto-delete - valid values are enable and disable. This flag represents whether older snapshots should start automatic removal once hard limit reached for the volumes of the cluster.
  • snap-max-soft-limit - this represents the percentage of the snap-max-hard-limit after which Gluster starts warning while snapshot creation that snapshot limit is reaching
The menu option Snapshot -> Options - Cluster is always enabled as long as a cluster is selected in system tree. If a volume is selected in the list and you click the menu option Snapshot -> Options - Cluster, the values for the respective cluster are listed in the dialog.


6. Activating a volume snapshot
Select the menu option Activate under Snapshots sub-tab of main tab Volumes to activate a selected volume snapshot. It asks for a confirmation and then activates the snapshot. The status of the snapshot is accordingly changed in list as UP. This action can be performed on a de-activated snapshot only, else the option remains disabled.

7. De-Activating a volume snapshot
Select the menu option Deactivate under Snapshots sub-tab of main tab Volumes to de-activate a selected volume snapshot. It asks for a confirmation and then de-activates the snapshot. The status of the snapshot is accordingly changed in list as DOWN. This action can be performed on an activated snapshot only, else the option remains disabled.


 8. Deleting a volume snapshot
Select the menu option Delete under Snapshots sub-tab of main tab Volumes to delete a selected volume snapshot. It asks for a confirmation and then deletes the snapshot. This option is enabled only if one or more snapshots are selected from the list else remains disabled.

9. Deleting all the volume snapshots
Select the menu option Delete All under Snapshots sub-tab of main tab Volumes to delete all the volume snapshots. It asks for a confirmation and then deletes all the snapshots. This option is enabled even if no snapshots are selected from the list.

10. Restoring a volume to the state of a specific snapshot
Select the menu option Restore under Snapshots sub-tab of main tab Volumes to restore the selected volume to the state of the snapshot. It warns that the volume would be restored to the state of selected snapshot and after confirmation restores the snapshot. This option is enabled only if a snapshot is selected from the list else remains disabled. After restore the brick names of the volume get changed accordingly.

View of the volume bricks before restore


View of the volume bricks after restore


The volume seamlessly brought DOWN and then restored to the state of the snapshot. Once action is successful, the volume is brought UP again. Also once snapshot is restored, it gets deleted and vanishes from the list accordingly.

Gluster Volume Snapshots with geo-replication enabled for a volume
If geo-replication is set up for a volume, snapshot creation and restore should happen for slave volume first and then for master volume. Also the geo-replication session should be in PAUSED state for these actions. oVirt seamlessly takes care of the multiple steps involved for volume snapshot creation and restore if geo-replication setup for the volume.
Steps involved (in order) while creation of volume snapshot (if geo-replication setup for the volume) are
  • Pause the geo-replication session
  • Create a snapshot for the slave volume
  • Create a snapshot for the master volume
  • Resume the geo-replication session
Steps involved (in order) while restoring a volume to the state of a snapshot (if geo-replication setup for the volume) are -
  • Stop the geo-replication session
  • Stop the slave volume
  • Restore the slave volume to the state of corresponding snapshot
  • Stop the master volume
  • Restore the master volume to the state of selected snapshot
  • Start the slave volume
  • Start the master volume
  • Start the geo-replication session
  • Resume the geo-replication session
All the above mentioned steps for create/restore snapshots, if taken as part of one click from oVirt and makes administrator's life quite easy and (s)he does not need to perform these steps manually one by one.

oVirt sync job for volume snapshots
There is background sync job which keeps refreshing the list of snapshots and configuration from Gluster cluster every 5 minutes (default and configurable). If there are snapshots created/deleted or snapshot configurations modified from Gluster CLI, they would get reflected in oVirt UI in 5 minutes.

Known Issues
There is known issue where snapshot restore fails if geo-replication is enabled for the volume. While snapshot creation the master and slave could be different time zones and snapshot would be created with different names (as time zone and time stamp gets appended by Gluster). While restore of slave snapshot first, error is thrown that said snapshot does not exist for slave, as from UI a master volume snapshot is selected and then restore action is triggered. Certainly there wont be same named snapshot available for the slave volume.

Summary
Gluster Volume Snapshots feature in oVirt provides a very easy to use mechanism for maintaining volume snapshot and configurations. It does take care of seamless execution of snapshot create and restore if geo-replication setup for a volume, which is a pretty painful job for an administrator to perform from Gluster CLI.

Gluster volume snapshot scheduling is the highlight feature which stands this out, as administrator get enough flexibility to set up snapshot creation with different recurrence types available. Also there is a way to stop the schedule by a defined date and time and the schedule gets removed after that specified time.

Gluster does provide a rudimentary cron based scheduler for snapshot creation. If it's already enabled for a Gluster cluster, the moment oVirt starts maintaining the cluster and first volume snapshot schedule gets created from oVirt, the Gluster CLI based snapshot schedule gets disabled and onus shifts to oVirt.