When running OpenDedupe at scale disk, cpu, and memory requiremens need to be considered to size appropriately.
Data Stored on Disk
Cloud Volumes – SDFS Stores file metadata, a local hashtable, and a cache of unique blocks on local disk.
Local Volumes – SDFS Stores file metadata, a local hashtable, and all unique blocks on local disk.
Data Types:
File MetaData – Information about files and folders stored on opendedupe volumes. This data is also stored in the cloud for DR purposes when using cloud storage. File MetaData represents .21% of the non deduplicated size of the file stored.
HashTable – The hashtable is the lookup table that is used to identify whether incoming data is unique. The hashtable is stored on local disk and in the cloud for object storage backed instances. For local instances the hashtable is stored on local disk only. The hashtable is .4% of the unique storage size.
Local Cache – For Object storage backed volumes, active data is cached locally. The local cache stores compressed, deduplicate blocks only. This local cache size is set to 10GB by default but can be set to any capacity required with a minimum of 1GB. The local cache helps with restore performance and accelerated backup performance.
Local Unique Data Store – OpenDedupe stores all unique blocks locally for volumes not backed by object storage. For Object storage backed volumes this is not used. Local storage size will depend on the data being backed up and retention but typically represents 100% of the front end data for a 60 Day retention. OpenDedupe uses a similar variable block deduplication method to a DataDomain so it will be inline with its sizing requirements.
Storage Performance:
Minimum local disk storage performance:
- 2000 random read IOPS
- 2400 random write IOPS
- 180 MB/s of streaming reads
- 120 MB/s of streaming writes
Supported Filesystems:
- VXFS
- XFS
- EXT4
- NTFS (Windows Only)
Storage Requirements:
The following percentages should be used to calculate local storage requirements for Object Backed dedupe Volumes:
- MetaData: .21% of Non-Deduped Data Stored
- Local Cache: 10GB by default
- HashTable: .2% of Deduped Data
An example for 100TB of deduped data with an 8:1 dedupe rate would be as follows:
- Logical Data Stored on Disk = 8x100TB = 800TB
- Local Cache = 10GB
- Unique Data Stored in the Object Store 100TB
- MetaData
- .21%Logical Data Stored on Disk=MetaData Size
- .0021x800TB=1.68TB
- HashTable
- .2% * Unique Storage
- .002* 100TB = 400GB
- Total Volume Storage Requirements
- Local Cache + MetaData + Hashtable
- 10GB + 1.68TB + 400GB = 2.09TB
The following percentages should be used to calculate local storage requirements for local dedupe Volumes:
- MetaData: .21% of Non-Deduped Data Stored
- Local Cache: 10GB by default
- HashTable: .2% of Deduped Data
- Unique Data
An example for 100TB of deduped data with an 8:1 dedupe rate would be as follows:
- Logical Data Stored on Disk = 8x100TB = 800TB
- Unique Data Stored on disk 100TB
- MetaData
- 21%Logical Data Stored on Disk=MetaData Size
- .0021x800TB=1.68TB
- HashTable
- .2% * Unique Storage
- .002* 100TB = 400GB
- Total Volume Storage Requirements
- Unique + MetaData + Hashtable +
- 100TB + 1.68TB + 400GB = 102.08TB
Memory Sizing :
Memory for OpenDedupe is primarily used for internal simplified lookup tables (bloom filter) that indicate, with some likelihood that a hash is already stored or not. These data structures take about 256MB per TB of data stored.1GB of additional base memory is required for other uses.
In addition to memory used by opendedupe you will want to have memory available for filesystem cache to cache the most active parts of the lookup hashtable into ram. For a volume less than 1TB you will need an additional 1GB of ram. For a volume less than 100GB you will need an addition 8GB of RAM. For a volume over 100TB you will need an additional 16GB of ram.
An example for 100TB of deduped data:
- Hash Table Memory
- 200MB per 1TB of Storage
- 200MB x 100TB = 25.6 GB
- 1GB of base memory
- 8GB of Free RAM for Disk Cache
- Total = 25.6+1+8=34.6GB of RAM
CPU Sizing:
As long as the disk meets minimum IO and IOPs requirements the primary limiter for OpenDedupe performance will be CPU at higher dedupe rates. At lower dedupe rates volumes will be limited by the speed of the underlying disk.
For a single 16 Core CPU, SDFS will perform at :
- 2GB/s for 2% Unique Data
- Speed of local disk for 100% unique data. Using minimum requirements this would equal 120MB/s.