Table of Contents
The OpenDedup Deduplication NAS Virtual Appliance is designed to is designed to provide Deduplication backed ISCSI and NFS Storage as a guest, within virtual environments. OpenDedup Deduplication NAS Virtual Appliance includes capabilities to create, mount,delete, and export SDFS volumes via NFS and ISCSI. It also includes VMWare storage api integration that allow the quick cloning of Virtual machines located on SDFS Volumes and creation of data stores.
The SDFS file system, which is used by OpenDedup Deduplication NAS Appliance, is designed to support the unique needs of virtual environments and supports enhanced functionality for VMWare, Xen, and KVM. It can deduplicate a petabyte or more of data. It supports over 5TB per gigabyte of memory at a 128k chunk size. It can perform deduplication/reduplication at a line speed of 800 MB/s or more. It supports VMware environments and can deduplicate at 4k block sizes. This is required to deduplicate Virtual Machines effectively. Deduplicated data can be stored locally, on the network across multiple nodes, or in the cloud. The filesystem can deduplicate inline or periodically based on needs. This can be changed on the fly. There is support for volume,file, or folder level snapshots and replication.
This document will guide you through setup and basic management of the Virtual Alliance. The instructions are written from the perspective of using VMware VCenter but are applicable to all Virtual Environments.
Virtual Appliance Build Information
The Virtual appliance is built on Ubuntu 14.0.4 Server and includes some customization to create a more streamlined NAS Appliance.
OS Build Information:
*Ubuntu 14.0.4 Server
* Base Ubuntu Server Install
* NFS Kernel Server
* OpenSSH Server
* SDFS Manager Version 3.2.3
* LIO ISCSI Target
Virtual Hardware Setup
* 6 GB / (root ext4) 2.2 GB Used
* 1 TB /opt/ (to store SDFS Data ext4) 0 GB Used
* 6 GB of RAM
* 4 CPUs
The Datish Virtual Appliance licensed for up to 1TB of backend storage. A larger license can be requested from Polar Key. Polar Key is a partner that has invested heavily in OpenDedupe and has great expertise in supporting customers. To attain a key go to http://polarkey.com/.
The guest is setup by deploying the ovf file. An OVF file is a openstandard for packaging Virtual Guests for deployment. The link for this OVA File Can be found on the download page. Refer to your HyperVisor (vmware) documentation on how to deploy a ova file.
Once the OVF is deployed and the system is booted the Virtual Appliance console will present a message with the url to navigate to access the web based management console.
The shell login and the web based login are the same and synchronized. If the Login screen is blank, flashing, with out a prompt, press return in the console and it will refresh
To login the web based management console go to https://<server-ip>. This will take you to the login page for the management console. To login for the first time use the default user name and password.
The default password is set to:
Username : root
Initial Administrative Console Setup
Once you have logged in for the first time you will be presented with a setup wizard for initial configuration.
By default networking for the virtual appliance is set to use DHCP. The network configuration can be hard set within the management console. To do this navigate to Storage Nodes->sdfsnas2->Network Configuration->Ethernet – eth0. In the networking screen unselect dhcp and enter the appropriate network information. After you have clicked submit, you will want to navigate to the new IP address of the system within the browser to continue the setup.
Remote Shell Access
Remote access to the appliance is provided via ssh. Most volume level actions should be performed fromt he web based console.
NTP Client Configuration
Time synchronization is important when setting up replication between node or for multi-node management setups. By default the Virtual Storage Appliance is setup with the default ubuntu ntp configuration with GMT -8 (PDT). This can be changed at anytime by navigating to Storage Nodes->sdfsnas2->time configuration. Within this panel you can change the time zone as well as the time servers. The time servers are formatted as one per line.
Web Based Virtual Storage Management
The OpenDedup NAS Virtual Appliance provides a web based management interface that is used to manage the lifecycle of data stored within the appliance.The NAS appliance URL is https://<IP/HostName>. The Appliance IP address is presented on the banner before console login or can be found by logging into the console and typing ifconfig.
Capabilities of the interface include:
Muti Node Central Management:
* Manage Mutiple nodes from a single console
* View events
* Manage Folder/Volume Replication
* Creating Volumes
* Mounting Volumes
* Expanding Volumes
* Exporting Volumes via NFS
* Exporting Volumes vis ISCSI
* Reporting on volume usage
* Creating folders
* Deleting files/folders
* Fast Cloning/Snapshotting Files and Folders
* Reporting on file deduplication statistics in real time
Creating a Volume
The SDFS Volume Manager provides the capability to create multiple volumes on the appliance. The Volume manager allow you to create two types of volumes, Local Storage or Amazon Web Service (S3). The Local Storage volume will store all unique blocks of data on the local appliance disk. The
Amazon Web Service option will store all unique blocks in the cloud at Amazon using the S3 service.
By default, no volumes are created. To create a volume click on the actions button on the top of the left tree navigation and select “Create an SDFS Volume”. Once selected, a Volume Creation wizard will appear and provide options for creating a volume.
Creating a Volume that Stores Data Locally
By default the SDFS Volume Manager will create a volume that stores data locally. This option will store all data in /opt/sdfs/volumes/<volume-name> . Local Storage is a good option for data that is used in production since writing and retrieving unique blocks will be done locally
The options for creating a local volume are as follows:
* SDFS Storage Node : This is the system the volume will live on.
* Volume Name : This is the unique name of the volume. It must be unique and not include spaces
* Default Mount Point : The mount point that the volume will be mounted on the appliance. By default it will mount to /media/<volume-name>
* Mount on Startup : The volume will be mounted when the system starts if this is selected. Otherwise it can be mounted manually.
* Enable Authentication : The volume management api requires authentication. This is an option that allows you to control who can control the volume throught the command line shell sdfscli.
* Volume Password : The pasword for authentication. This defaults to admin
* Volume Block Size: This option defines the block size at which sdfs will dedup. Block size determines the dedupe rate at a trade off of memory. Smaller block sizes dedup better than larger but take up more memory. Each block stored takes up 25 bytes of memory. At 128 KB block size you can store 32x as much data as 4k block size but will not dedup well for data like VMDKs.
* Volume Size : This is the size the volume will be presented as to the NAS Appliance and the NFS Clients. This can be any size and has no determination on the amount of data that can actully be stored within the unique data store. It is safe to make this 4x the unique store size.
* Unique Store Size : This is the store size for unique blocks of data. This can not be changed after the volume is created and cannot exceed the size of the mount point /opt/sdfs or the amount of memory on the system. The memory requirements for a volume are determined as (Unique store size in bytes/block size in bytes)* 25. An addition amount will be added on to this calculation for data caching. This additional amount will be at most 1 GB.
* Backend Storage Type : The storage type for backend data. To store data locally select “Local File System”.
After you enter data into the fields on this wizard page click next. It will provide verification of the volume. Once you are comfortable with the select select create volume. If mount on startup was selected, the volume will be mounted at this point. Otherwise, the volume can be mounted by right clicking on the volume and selecting “mount volume”.
Creating a Cloud Based Storage Volume (Amazon S3)
To create a S3 backed volume select “Amazon Web Service (S3) from the back end storage type option within the “Create Volume Wizard”. For a volume that is being stored at Amazon, a volume block size of 128k should be chosen. Additional Options for this volume typ will appear below the storage type selection. These must be filled out accurately to correctly create a volume and include:
* AWS Access Key : The Access key Amazon provides when a S3 Account is created
* AWS Secret Key : The Secret key Amazon provides when a S3 Account is created
* Unique Bucket Name : The Unique name the bucket is give.
More detail on S3 storage for SDFS can be found here.
Once a volume is created it can be manage by clicking on the selected options in the tree menu.
The Options Available are:
* Volume Information tab provides volume statistics and performance information. This page will only be available if the volume is mounted.
* Volume Configuration tab provides some basic options that can be modified. Additional configuration parameters are also shown these cannot be changed.
* Browse Volume File System provides the capabilites to manage the SDFS File System. Actions can be activated by right clicking on a file or folder that you would like to perform the action on. These actions include browse, delete, or take snapshots of a file or folder.
* NFS Exports tab is used to manage NFS Exports for a sdfs volume. Exports can be created, modified, deleted, exported, and unexported for this screen. When and NFS Export is created the target folder of the export will be created if it does not already exist. The nfs folder is created and exported by default when a volume is created and mounted. Other exports can be created manually by clicking the add button. If changes are made to the NFS volume while it is mounted it must be re-exported for the changes to apply.
To take a snapshot of a file or folder right click on the file or folder you would like to take a snapshot of and select take a snapshot. Once selected, a snashot wizard will appear. In the bottom text field enter the path that you would like to target the snapshot to.
Managing Multiple Storage Appliances from a single console
As of SDFS version 2.0.0, central managment of multiple virtual appliances is enabled. This feature allows you to create, manage, and view the entire volume and nas environment from your SDFS infrastructure. This feature is required and is a prerequisite for appliance to appliance replication.
To add a host for management click on the actions button and select add a storage host. The following dialog will be presented.
The field descriptions are as follows:
* Host Name or IP : This is the resolvable host name or IP of the system that will be managed. As entered, this will autofill the Host Connection URL
* Description : A non-required description of the Host.
* Host Connection URL: This field will auto-populate from the host name field but is the url to access the management console on that system.
* Password : The password used to login the management console on the remote system.
Once submitted the SDFS Web console will import the remote system configuration and populate the interface. You will see volumes, replication tasks, and events from the remote appliance. To make changed to the remote appliance, once imported, navigate to storage nodes-><system name>. The system can simulatiosly be managed from the local console and the remote console, although it is advisable that the system only be managed from the remote console after imported.
To remove a managed appliance right click on it within the storage nodes folder and select delete. This will remove it from management but not delete the node or any of the volumes on the node. Once removed, the Storage Appliance can me managed by directly accessing the web console on the node itself.
SDFS provides asynchronous master/slave volume and subvolume replication throught the web management console. SDFS volume replication takes a snapshot of the disignated master volume or subfolder and then replicated meta-data and unique blocks to the secondary, or slave, SDFS volume. Only unique blocks that are not already stored on the slave volume are replicated so data transfer should be minimal.
The benefits of SDFS Replication are:
* Fast replication – SDFS can replicate large volume sets quickly.
* Reduced bandwidth – Only unique data is replicated between volumes
* Build in scheduling – The sdfsreplicate service has a built in scheduling engine based on cron style syntax.
* Sub-volume replication – The sdfsreplicate service can replicate volumes or subfolders to slave volumes. In addition, replication can be set to be targeted to sub-volumes on the slave.
* Sub-volume targest on the slave allow for wildcard naming such as and appended timestamp or the hostname of the master.
* Replicate to S3 (Cloud based Storage) with limited bandwith requirements.
Once a second Storage Appliance is imported you can configure folder level replication. To configure replication both Storage Nodes must be resolvable from each other, therefore it is recommended that all nodes use IP addresses intially for configuration. To make sure all nodes are using ip address for names, select each “host configuration” within the node and change the Host Name or IP field to the IP address of the system. This will ensure all the nodes can talk to each other.
The next step for setting up replication is create a replication task. The create a replication task click “Actions” and select “Add a Replication Task”.
A description of the fields are as follows:
* Task Name : The name for the task
* Description : A brief description of the task
* Schedule Type :
The schedule type can be Scheduled or Manaul. A scheduled job requires that cron schedule be entered in the field Cron Schedule in quartz cron format
* Cron Schedule:
he Cron schedule that will be used to run the replication job. A scheduled job requires that cron schedule be entered in the field Cron Schedule in quartz cron format
* Next Run : This shows when this job will run next based on the cron schedule. It is a hyperlink and if selected, will show the next 100 runs.
* Source Volume: The volume that will be the source of the replica
* Remote Folder: The remote folder that will be replicated
* Destination Volume: The volume that will be the destination of the replica
* Destination Folder Target : The folder where you would like to replicate to wild cards are %d (date as yyMMddHHmmss) %h (remote host) the slave folder to replicated to e.g. backup-%h-%d will output backup-mastername-timestamp.
* Number of replicas to keep : The number of replicas to keep. If set to 0 this option will be ignored. Otherwise, older versions will be removed on a first in first out basis. The replicated folder will be appended with a squence number to distiguish it from other copies (e.g. /mount/replica-3).