Recently I noticed that our primary mail store was becoming alarmingly large and running out of disk space. Origianlly 200gb seemed ample but fast forward a couple of years and well it’s vastly inadequate with 92% of the available disk space consummed and growing rapidly. Infact if it wasn’t for the GFC I would dare say we’d have well exceeded it by now.
Since the mail store is an iScsi volume mounted as a phyisical disk (dev/sdc) increasing the volume size was easy, I just allocated more resources to it at the SAN but that doesn’t really do much for us as the Operating System doesn’t know to use the additional space as we can see from;
Disk /dev/sdc: 429.4 GB, 429496729600 bytes
255 heads, 63 sectors/track, 52216 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdc1 1 26108 209712478+ 83 Linux
We need to grow our /dev/sdc1 partition to use these new disk cylinders. But before you start messing with any partition or file system the golden rule must be back up and not just once or in the same format when it comes to mission critical data, a mistake here is likely to get you in the shit at the least and maybe fired or worse.
So the next step is all this is to do a couple of “FULL” backups, the first I did was from within Zimbra itself this will not only compress the mail store but also grab all the LDAP info etc as well. This back up would let you deploy to another server instance if things went really bad;
zmbackup -f -s $server.domain.net.au -a all -t /mnt/SANBackup/zimbra.backup/ -z
where server.domain.net.au is the fully qualified domain name of your mail server. The syntax is simple -f = full -s = server -a = attributes -t = path and -z = compress.
Now comes a long waiting game, as you can see from this query this backup process took some 8+ hours to complete;
zmbackupquery -lb full-20091204.005207.351 -t /mnt/SANBackup/zimbra.backup/
Label: full-20091204.005207.351
Type: full
Status: completed
Started: Fri, 2009/12/04 10:52:07.351 EST
Ended: Fri, 2009/12/04 19:17:24.923 EST
Redo log sequence range: 3102 .. 3106
Number of accounts: 171
But of course since I like to cover myself I didn’t stop here I also decided to create an rsync copy of the mail store;
/sbin/rsync -avpHK /mnt/home/zimbra/ /mnt/SANBackup/MailRsync/
again go away for a few hours, then come back and hopefully you’ll have this;
du -h /mnt/home/zimbra/
167G /mnt/home/zimbra/
du -h /mnt/SANBackup/MailRsync/
167G /mnt/SANBackup/MailRsync/
Awesome we have two exact copies, the rsync syntax I use is a = archive v = verbose p = perseve permissions H = preserve hardlinks and K = keep directory links, ie treat a symlink as a directory.
Now that I felt we had a point to come back to if all went to hell it’s time to attack that partition. There are a couple of ways people will tell you to do this and the most commin would feather a Knoppix CD with gparted or similar. In the home use environment that would probably do the job and be less hands on but in this production environment it’s not an option and besides I am 200k’s from the server looking at a beach.
Will have to do the the console way and besides it’s quicker. There is a great GNU tool called parted and it’s what gparted is a front end for but in my experience it has issues with ext3 and journals, infact every time I’ve used it with ext3 I’ve gotten the error;
Error: Filesystem has incompatible feature enabled
to get around this you need to remove the file system features and we’ll do that later but for now I just suggest people forget about parted and use good ole fdisk
Of course I can hear the crys of “but fdisk can’t resize, only create and destroy” and you’d be right but we can use this to our advantage. See we don’t want to alter the starting cylinders just expand the partition to use more cylinders then before. Before we start you need to run;
/sbin/fdisk -l
Disk /dev/sdc: 429.4 GB, 429496729600 bytes
255 heads, 63 sectors/track, 52216 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdc1 1 26108 209712478+ 83 Linux
The reason we do this is to find our starting cylinder in my case it’s easy as I only have the 1 partition, so my starting cylinder is 1.. but if this partition was in the middle of the disk say you’d need to carefully write down this starting point because get it wrong later and you’re in for some pain as you’ll do damage to your data. From that output we can also see that my partition ends at 26108 but we want it to continue on till the end of the available space at 52216.
Here comes the fun part 🙂
First we need to stop all services in zimbra using the mail store;
zmcontrol stop
Stopping stats…Done
Stopping mta…Done
Stopping spell…Done
Stopping snmp…Done
Stopping archiving…Done
Stopping antivirus…Done
Stopping antispam…Done
Stopping imapproxy…Done
Stopping mailbox…Done
Stopping logger…Done
Stopping ldap…Done
you should just comfirm it really is all stopped with;
zmcontrol status
Unable to determine enabled services from ldap.
Enabled services read from cache. Service list may be inaccurate.
Host grange.langs.net.au
antispam Stopped
zmmtaconfigctl is not running
zmamavisdctl is not running
antivirus Stopped
zmmtaconfigctl is not running
zmamavisdctl is not running
zmclamdctl is not running
ldap Stopped
logger Stopped
logmysql.server is not running
zmlogswatchctl is not running
mailbox Stopped
zmmtaconfig is not running.
zmmtaconfigctl is not running
mysql.server is not running
zmconvertctl is not running
mailboxd is not running.
zmmailboxdctl is not running
mta Stopped
zmmtaconfigctl is not running
postfix is not running
zmsaslauthdctl is not running
snmp Stopped
zmswatch is not running.
spell Stopped
zmapachectl is not running
stats Stopped
and because I don’t have alot of faith I also run a script I came up with;
cat /opt/packages/killuser
#!/bin/bash
USER=$1
MYNAME=`basename $0`
if [ ! -n “$USER” ]
then
echo “Usage: $MYNAME username” >&2
exit 1
elif ! grep “^$USER:” /etc/passwd >/dev/null
then
echo “User $USER does not exist!” >&2
exit 2
fi
while [ `ps -ef | grep “^$USER” | wc -l` -gt 0 ]
do
PIDS=`ps -ef | grep “^$USER” | awk ‘{print $2}’`
echo “Killing ” `echo $PIDS | wc -w` ” processes for user $USER.”
for PID in $PIDS
do
kill -9 $PID 2>&1 >/dev/null
done
done
echo “User $USER has 0 processes still running.”
This will clean up any left over processes. Second part of this exercise is unmounting the file system the mail store uses;
umount /mnt/home
This has effectively parked our ext3 file system making it ready for manipulation.
We need to ensure the file system is in good order before we begin;
/sbin/fsck -n /dev/sdc1
fsck 1.35 (28-Feb-2004)
e2fsck 1.35 (28-Feb-2004)
/dev/sdc1: clean, 504961/26214400 files, 44472606/52428119 blocks
Okay looking good, now we need to remove those features that give parted a hard time by basically making our ext3 file system an ext2 file system, to do this we need to remove journaling;
/sbin/tune2fs -O ^has_journal /dev/sdc1
tune2fs 1.35 (28-Feb-2004)
Now it’s fdisk time, we’re going to delete the partition but don’t be alarmed because we’re not changing that starting cylinder remember and only going to expand the cylinder count this will be none destructive if you have claimly written down that starting cylinder and beside we took all those awesome backups;
[root@grange matthewd]# /sbin/fdisk /dev/sdc
The number of cylinders for this disk is set to 52216.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Command (m for help): p
Disk /dev/sdc: 429.4 GB, 429496729600 bytes
255 heads, 63 sectors/track, 52216 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdc1 1 26108 209712478+ 83 Linux
Command (m for help): d
Selected partition 1
Command (m for help): p
Disk /dev/sdc: 429.4 GB, 429496729600 bytes
255 heads, 63 sectors/track, 52216 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
At this point we’ve now removed the partition and it’s time to create the new one encompassing all the new space;
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-52216, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-52216, default 52216): 52216
Command (m for help): p
Disk /dev/sdc: 429.4 GB, 429496729600 bytes
255 heads, 63 sectors/track, 52216 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdc1 1 52216 419424988+ 83 Linux
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
Right now it’s time for some husbandry on our new parition and file system;
/sbin/e2fsck -f /dev/sdc1
e2fsck 1.35 (28-Feb-2004)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/sdc1: 504961/26214400 files (25.1% non-contiguous), 44464404/52428119 blocks
Note this part takes some time so don’t be alarmed your data is safe, for me this was about an hour. Next do;
/sbin/resize2fs /dev/sdc1
resize2fs 1.35 (28-Feb-2004)
Resizing the filesystem on /dev/sdc1 to 104856247 (4k) blocks.
The filesystem on /dev/sdc1 is now 104856247 blocks long.
This is actually the step that grows your file system to now match the partition size. We need to check it one last time with;
/sbin/fsck -n /dev/sdc1
fsck 1.35 (28-Feb-2004)
e2fsck 1.35 (28-Feb-2004)
/dev/sdc1: clean, 504961/52428800 files, 45289879/104856247 blocks
Remember how we removed those file system features? Time to put them back getting our ext3 back in order;
/sbin/tune2fs -j /dev/sdc1
tune2fs 1.35 (28-Feb-2004)
Creating journal inode: mount done
This filesystem will be automatically checked every 20 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
Now lets remount our file system and see if a) the new size is there and b) that yes indeed we still have data;
mount /mnt/home/
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md0 178G 12G 157G 7% /
none 2.0G 0 2.0G 0% /dev/shm
pluto:/sanbackup 1.8T 871G 870G 51% /mnt/SANBackup
/dev/sdc1 394G 167G 212G 45% /mnt/home
Woot it’s mounted and our extra space is there.. now how about our data;
# ls -la /mnt/home
total 32
drwxr-xr-x 4 root root 4096 May 8 2008 .
drwxr-xr-x 4 root root 4096 May 8 2008 ..
drwx—— 2 root root 16384 Aug 1 2006 lost+found
drwxr-xr-x 4 zimbra zimbra 4096 Sep 13 2008 zimbra
# ls -la /mnt/home/zimbra
total 16
drwxr-xr-x 4 zimbra zimbra 4096 Sep 13 2008 .
drwxr-xr-x 4 root root 4096 May 8 2008 ..
drwxr-xr-x 3 zimbra zimbra 4096 Aug 29 2006 index
drwxr-xr-x 4 zimbra zimbra 4096 Aug 29 2006 store
All seems in order, so let s go ahead and restart our mail server;
zmcontrol start
Starting ldap…Done.
Starting logger…Done.
Starting mailbox…Done.
Starting antispam…Done.
Starting antivirus…Done.
Starting snmp…Done.
Starting spell…Done.
Starting mta…Done.
Starting stats…Done.
zmcontrol status
antispam Running
antivirus Running
ldap Running
logger Running
mailbox Running
mta Running
snmp Running
spell Running
stats Running
There we have it, mail system is back up and we no longer have to worry about the mail store running out of space and we didn’t lose anything so we even get to keep our job. A warning however do NOT try this with a windows box or if you are faint of heart, nor can I stress enough that you must always have the backups to go with it.
I promise you the day you don’t have a decent backup is the day you will need it, that’s just Murphy.