proxmox-logo
ZFS Logo


About ZFS

ZFS is a modern filesystem designed to be extremely flexible, powerful, easy to use and administer. There are already many great articles that go into depth which I will leave you to explore. Please see the References

About Proxmox

Proxmox is a great set of packages to help you manage your virtual machines. Proxmox simplifies a lot of management for virtual machines and can handle High availability, Clusters and much more. Behind the scenes it is using KVM for full virtualization and OpenVZ containers for high performance, lightweight OS-Level virtualization.

Check the following for more detail

https://www.proxmox.com/proxmox-ve/features

Notes

I came from ESXI and HyperV. I’m finding that I’m continuing to adopt open source solutions primarily as I don’t have any vendor lock in and to increase my skill-set.

Installing Proxmox

Install Proxmox onto your desired device. I used a spare USB flash drive. Most of the time, Proxmox will be running from memory so I don’t mind a slow install if it frees up a much faster disk for storage.

The only problem I came across is Proxmox installs the GRUB bootloader onto the first available disk rather than the selected disk. If anything it should not install a bootloader at all unless you ask it to, and if so only where you ask it to. It wiped my existing bootloader on the MBR so I had to chroot into the system and restore the bootloader. This bug has been reported. I suggest for now if possible to disable all drives in the bios except for the drive that you want Proxmox installed onto temporarily.

Update Proxmox

http://pve.proxmox.com/wiki/Package_repositories

Follow the instructions from http://pve.proxmox.com/wiki/Package_repositories#Proxmox_VE_No-Subscription_Repository if you have not purchased a subscription from Proxmox.

Install ZFS on Linux

http://ispire.me/native-zfs-for-linux-on-proxmox/ is an excellent to-the-point article that will get you going. You may also look at the official article on http://pve.proxmox.com/wiki/ZFS.

I chose to go with a raidz mode with 4 spare 400GB (approx) hard drives I had laying around. This gives me a good balance between redundancy and performance. I also decided to leave the pool name as tank.

Note
  • Make sure you have the correct repositories enabled before you install the SPL and ZFS modules as versions can get out of sync with the running kernel.
  • I strongly suggest you use the /dev/disk/by-id prefix to your devices to avoid confusion when remapping devices in your bios e.g the following in my case
    zpool create -f tank raidz /dev/disk/by-id/ata-Hitachi_HDP725032GLA380_GEK033RG2Z160C /dev/disk/by-id/ata-SAMSUNG_HD321KJ_S0ZEJ1MPB79243 /dev/disk/by-id/ata-SAMSUNG_HD322HJ_S17AJ9AS701687 /dev/disk/by-id/ata-SAMSUNG_HD403LJ_S0NFJQSS200987
    
  • I left checksumming on to improve reliability of writes. This is up to you to decide.

Configuring ZFS on Linux

First I made some changes to my pool to increase performance

zfs set compression=on tank
zfs set primarycache=all tank
zfs set atime=off tank

Now that we have finished we can check the status of our pool by doing the following

[email protected]:~# zpool status
pool: tank
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-Hitachi_HDP725032GLA380_GEK033RG2Z160C ONLINE 0 0 0
ata-SAMSUNG_HD321KJ_S0ZEJ1MPB79243 ONLINE 0 0 0
ata-SAMSUNG_HD322HJ_S17AJ9AS701687 ONLINE 0 0 0
ata-SAMSUNG_HD403LJ_S0NFJQSS200987 ONLINE 0 0 0

errors: No known data errors

It’s also a good idea to add a write and/or read cache for ZFS. The write cache is called the ZFS Intent Log(ZIL). The read cache is called the L2ARC. I confirmed that it is a good idea to separate caches onto different devices for the purpose of data integrity and performance. I only have 1 SSD available here of 120GB , and as most of my use-case is read I’ll dedicate it all to L2ARC and add ZIL later when my budget allows it.

Create a partition on the device using fdisk/parted or your preferred partitioning utility.


fdisk /dev/sdb
command (m for help): print

Disk /dev/sdb: 120.0 GB, 120034123776 bytes
255 heads, 63 sectors/track, 14593 cylinders, total 234441648 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00073527

Device Boot Start End Blocks Id System
Command (m for help): n
Partition type:
p primary (0 primary, 0 extended, 4 free)
e extended
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-234441647, default 2048): 2048
Last sector, +sectors or +size{K,M,G} (2048-234441647, default 234441647):
Using default value 234441647

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Then to add the cache simply add it to the previously created pool such as the following.

zpool add tank cache /dev/disk/by-id/ata-OCZ-AGILITY3_OCZ-CP2JK78KJ8T96IIN-part1

To check that the L2ARC cache was successfully added issue the zpool status command again and you should see something similar to the following

[email protected]:~# zpool status
pool: tank
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-Hitachi_HDP725032GLA380_GEK033RG2Z160C ONLINE 0 0 0
ata-SAMSUNG_HD321KJ_S0ZEJ1MPB79243 ONLINE 0 0 0
ata-SAMSUNG_HD322HJ_S17AJ9AS701687 ONLINE 0 0 0
ata-SAMSUNG_HD403LJ_S0NFJQSS200987 ONLINE 0 0 0
cache
ata-OCZ-AGILITY3_OCZ-CP2JK78KJ8T96IIN-part1 ONLINE 0 0 0
errors: No known data errors
Notes

Further configuration for Proxmox

So now we have a nice zpool ready to create some file-systems to dump all of our images, ISOs and containers onto. Before we start we are going to create further volumes in our pool for these different types of storage requirements.

We will create the following file-systems (ISO, IMAGES, CONTAINERS, BACKUPS, STORAGE). Simply do the following to create them

zfs create tank/STORAGE
zfs create tank/ISO
zfs create tank/IMAGES
zfs create tank/CONTAINERS
zfs create tank/BACKUPS

When you have finished creating you will be able to see these by doing zfs list.

[email protected]:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
tank 384K 878G 47.9K /tank
tank/BACKUPS 41.9K 878G 41.9K /tank/BACKUPS
tank/CONTAINERS 41.9K 878G 41.9K /tank/CONTAINERS
tank/IMAGES 41.9K 878G 41.9K /tank/IMAGES
tank/ISO 41.9K 878G 41.9K /tank/ISO

Now that we have the filesystems created we can add them to the storage pools in Proxmox. A quick check of mount shows that the wondeful little ZFS daemon already mounts the zfs filesystems for us.

[email protected]:~# mount
tank on /tank type zfs (rw,noatime,xattr,noacl)
tank/ISO on /tank/ISO type zfs (rw,noatime,xattr,noacl)
tank/IMAGES on /tank/IMAGES type zfs (rw,noatime,xattr,noacl)
tank/CONTAINERS on /tank/CONTAINERS type zfs (rw,noatime,xattr,noacl)
tank/BACKUPS on /tank/BACKUPS type zfs (rw,noatime,xattr,noacl)

Now it’s time to add these mounts to Proxmox. Simply add them using the Proxmox UI. It will look similar to the following.

Proxmox share ui

Sharing your ZFS datasets

To share these datasets on the network we will do it at the ZFS level and not using the Proxmox UI. ZFS simplifies setting up shares and exporting them which I am a big fan of.

1) Install NFS daemon and Samba

apt-get install nfs-kernel-server samba samba-common-bin

2) Set up the NFS exports via ZFS

zfs set sharenfs=on /tank/BACKUPS
zfs set sharenfs=on /tank/IMAGES ...

3) Update the NFS daemon to ignore an empty export file
By default the nfs-kernel-server init script checks for exported filesystems and will not start the daemon if it finds none. Please use one of the methods in the linked post. I opted with my own solution of modifying the init script.

4) Restart the nfs daemon

/etc/init.d/nfs-kernel-server start

5) Check on a client machine

sudo showmount -e 192.168.1.2

Note – You should specify NFS version 3 when you try to mount the NFS share otherwise it may take a LONG time to mount the shares.

e.g for manually mounting at the shell

sudo mount -v -t nfs 192.168.1.2:/tank/STORAGE /media/ZFS/ -onfsvers=3

e.g for mounting at boot via an /etc/fstab entry

192.168.1.2:/tank/STORAGE /media/ZFS nfs rsize=8191,wsize=8192,noatime,auto,rw,exec,nfsvers=3 0 0

6) Setup SAMBA/SMB shares

zfs set sharesmb=on /tank/STORAGE
zfs set sharesmb=on /tank/IMAGES ...

7) restart samba daemon

[email protected]:~# service samba restart
Stopping Samba daemons: nmbd smbd.
Starting Samba daemons: nmbd smbd.

8) Check on a client

smbclient -L 192.168.1.2
Domain=[WORKGROUP] OS=[Unix] Server=[Samba 3.6.6]

Sharename Type Comment
--------- ---- -------
print$ Disk Printer Drivers
IPC$ IPC IPC Service (proxmox server)
tank_CONTAINERS Disk Comment: /tank/CONTAINERS
tank_STORAGE Disk Comment: /tank/STORAGE
tank_ISO Disk Comment: /tank/ISO
tank_BACKUPS Disk Comment: /tank/BACKUPS
tank_IMAGES Disk Comment: /tank/IMAGES
tank Disk Comment: /tank
Domain=[WORKGROUP] OS=[Unix] Server=[Samba 3.6.6]

Server Comment
--------- -------
PROXMOX proxmox server

Workgroup Master
--------- -------
WORKGROUP PROXMOX
Notes
  • Datasets in a pool all share the same resources in the pool. They will all have access to the same storage capacity and inherit the rules of the parent unless you manually constrain them with quotas and provide explicit settings for a dataset.

Summary

I’m very happy with Proxmox , ZFS and this whole inexpensive setup in general. I can comfortably run numerous virtual machines and serve files directly should I need to. I’m serving files at a constant speed of greater than 100MB a second with a single Gigabit connection. I’d like to see what I could achieve with a higher budget and aggregated 10Gb links.

I strongly suggest that you don’t skimp on ram if you are setting up something like this as ZFS loves a lot of ram and then you have to account for the memory of guest VMs.

Soon I will be bench-marking this setup directly on the server to find out what sort of performance is achieved with lowly SATA drives and the L2ARC and 32GB of ram.

Notes
  • Don’t forget to move SWAP to the appropriate drive. By default I had swap mounted on the slow flash disk. You could create another data-set such as tank/SWAP
    zfs create -V 32G tank/SWAP
    mkswap /dev/zvol/tank/SWAP
    swapon /dev/zvol/tank/SWAP
    free
    
    
    total used free shared buffers cached
    Mem: 32634688 16905980 15728708 0 60260 1985236
    -/+ buffers/cache: 14860484 17774204
    Swap: 33554424 0 33554424
    
  • Don’t forget to also mount temp in the appropriate place. By default I had swap mounted on the slow flash disk. I suggest you use something like tempfs (see below for my example.
    cat /etc/fstab
    
    
    # /dev/pve/root / ext3 errors=remount-ro 0 1
    /dev/pve/data /var/lib/vz ext3 defaults 0 1
    UUID=b5bd555f-bac8-473f-9357-74a79d015bba /boot ext3 defaults 0 1
    /tank/SWAP none swap sw 0 0
    proc /proc proc defaults 0 0
    tmpfs /tmp tmpfs nodev,nosuid 0 0
    
Setting up a File Server and Hypervisor using Proxmox and ZFS
  • miiiR

    Whenever I try to do zfs set sharesmb/sharenfs i get ‘invalid dataset name’ even though it is listed under zfs list. Whatup with that??

    • root42

      I’m having this same issue. Anybody find a workaround?

      • r0b0m1nd

        Remove the leading slash. So it’s “zfs set sharenfs=on tank/mydataset” instead of /tank/mydataset.

  • Dana Goyette

    “Now it’s time to add these mounts to Proxmox. Simply add them using the Proxmox UI. It will look similar to the following.”

    Similar to what? I see no image there.