1 |
|
---|
2 | Testbox Imaging (Backup / Restore)
|
---|
3 | ==================================
|
---|
4 |
|
---|
5 |
|
---|
6 | Introduction
|
---|
7 | ------------
|
---|
8 |
|
---|
9 | This document is explores deloying a very simple drive imaging solution to help
|
---|
10 | avoid needing to manually reinstall testboxes when a disk goes bust or the OS
|
---|
11 | install seems to be corrupted.
|
---|
12 |
|
---|
13 |
|
---|
14 | Definitions / Glossary
|
---|
15 | ======================
|
---|
16 |
|
---|
17 | See AutomaticTestingRevamp.txt.
|
---|
18 |
|
---|
19 |
|
---|
20 | Objectives
|
---|
21 | ==========
|
---|
22 |
|
---|
23 | - Off site, no admin interaction (no need for ILOM or similar).
|
---|
24 | - OS independent.
|
---|
25 | - Space and bandwidth efficient.
|
---|
26 | - As automatic as possible.
|
---|
27 | - Logging.
|
---|
28 |
|
---|
29 |
|
---|
30 | Overview of the Solution
|
---|
31 | ========================
|
---|
32 |
|
---|
33 | Here is a brief summary:
|
---|
34 |
|
---|
35 | - Always boot testboxes via PXE using PXELINUX.
|
---|
36 | - Default configuration is local boot (hard disk / SSD)
|
---|
37 | - Restore/backup action triggered by machine specific PXE config.
|
---|
38 | - Boots special debian maintenance install off NFS.
|
---|
39 | - A maintenance service (systemd style) does the work.
|
---|
40 | - The service reads action from TFTP location and performs it.
|
---|
41 | - When done the service removes the TFTP machine specific config
|
---|
42 | and reboots the system.
|
---|
43 |
|
---|
44 | Maintenance actions are:
|
---|
45 | - backup
|
---|
46 | - backup-again
|
---|
47 | - restore
|
---|
48 | - refresh-info
|
---|
49 | - rescue
|
---|
50 |
|
---|
51 | Possible modifier that indicates a subset of disk on testboxes with other OSes
|
---|
52 | installed. Support for partition level backup/restore is not explored here.
|
---|
53 |
|
---|
54 |
|
---|
55 | How to use
|
---|
56 | ----------
|
---|
57 |
|
---|
58 | To perform one of the above maintenance actions on a testbox, run the
|
---|
59 | ``testbox-pxe-conf.sh`` script::
|
---|
60 |
|
---|
61 | /mnt/testbox-tftp/pxeclient.cfg/testbox-pxe-conf.sh 10.165.98.220 rescue
|
---|
62 |
|
---|
63 | Then trigger a reboot. The box will then boot the NFS rooted debian image and
|
---|
64 | execute the maintenance action. On success, it will remove the testbox hex-IP
|
---|
65 | config file and reboot again.
|
---|
66 |
|
---|
67 |
|
---|
68 | Storage Server
|
---|
69 | ==============
|
---|
70 |
|
---|
71 | The storage server will have three areas used here. Using NFS for all three
|
---|
72 | avoids extra work getting CIFS sharing right too (NFS is already a pain).
|
---|
73 |
|
---|
74 | 1. /export/testbox-tftp - TFTP config area. Read-write.
|
---|
75 | 2. /export/testbox-backup - Images and logs. Read-write.
|
---|
76 | 3. /export/testbox-nfsroot - Custom debian. Read-only, no root squash.
|
---|
77 |
|
---|
78 |
|
---|
79 | TFTP (/export/testbox-tftp)
|
---|
80 | ============================
|
---|
81 |
|
---|
82 | The testbox-tftp share needs to be writable, root squashing is okay.
|
---|
83 |
|
---|
84 | We need files from both PXELINUX and SYSLINUX to make this work now. On a
|
---|
85 | debian system, the ``pxelinux`` and ``syslinux`` packages needs to be
|
---|
86 | installed. We actually do this further down when setting up the nfsroot, so
|
---|
87 | it's possible to get them from there by postponing this step a little. On
|
---|
88 | debian 8.6.0 the PXELINUX files are found in ``/usr/lib/PXELINUX`` and the
|
---|
89 | SYSLINUX ones in ``/usr/lib/syslinux``.
|
---|
90 |
|
---|
91 | The initial PXE image as well as associated modules comes in three variants,
|
---|
92 | BIOS, 32-bit EFI and 64-bit EFI. We'll only need the BIOS one for now.
|
---|
93 | Perform the following copy operations::
|
---|
94 |
|
---|
95 | cp /usr/lib/PXELINUX/pxelinux.0 /mnt/testbox-tftp/
|
---|
96 | cp /usr/lib/syslinux/modules/*/ldlinux.* /mnt/testbox-tftp/
|
---|
97 | cp -R /usr/lib/syslinux/modules/bios /mnt/testbox-tftp/
|
---|
98 | cp -R /usr/lib/syslinux/modules/efi32 /mnt/testbox-tftp/
|
---|
99 | cp -R /usr/lib/syslinux/modules/efi64 /mnt/testbox-tftp/
|
---|
100 |
|
---|
101 |
|
---|
102 | For simplicitly, all the testboxes boot using good old fashioned BIOS, no EFI.
|
---|
103 | However, it doesn't really hurt to be prepared.
|
---|
104 |
|
---|
105 | The PXELINUX related files goes in the root of the testbox-tftp share. (As
|
---|
106 | mentioned further down, these can be installed on a debian system by running
|
---|
107 | ``apt-get install pxelinux syslinux``.) We need the ``*pxelinux.0`` files
|
---|
108 | typically found in ``/usr/lib/PXELINUX/`` on debian systems (recent ones
|
---|
109 | anyway). It is possible we may need one ore more fo the modules [1]_ that
|
---|
110 | ships with PXELINUX/SYSLINUX, so do copy ``/usr/lib/syslinux/modules`` to
|
---|
111 | ``testbox-tftp/modules`` as well.
|
---|
112 |
|
---|
113 |
|
---|
114 | The directory layout related to the configuration files is dictated by the
|
---|
115 | PXELINUX configuration file searching algorithm [2]_. Create a subdirectory
|
---|
116 | ``pxelinux.cfg/`` under ``testbox-tftp`` and create the world readable file
|
---|
117 | ``default`` with the following content::
|
---|
118 |
|
---|
119 | PATH bios
|
---|
120 | DEFAULT local-boot
|
---|
121 | LABEL local-boot
|
---|
122 | LOCALBOOT
|
---|
123 |
|
---|
124 | This will make the default behavior to boot the local disk system.
|
---|
125 |
|
---|
126 | Copy the ``testbox-pxe-conf.sh`` script file found in the same directory as
|
---|
127 | this document to ``/mnt/testbox-tftp/pxelinux.cfg/``. Edit the copy to correct
|
---|
128 | the IP addresses near the top, as well as any linux, TFTP and PXE details near
|
---|
129 | the bottom of the file. This script will generate the PXE configuration file
|
---|
130 | when performing maintenance on a testbox.
|
---|
131 |
|
---|
132 |
|
---|
133 | Images and logs (/export/testbox-backup)
|
---|
134 | =========================================
|
---|
135 |
|
---|
136 | The testbox-backup share needs to be writable, root squashing is okay.
|
---|
137 |
|
---|
138 | In the root there must be a file ``testbox-backup`` so we can easily tell
|
---|
139 | whether we've actually mounted the share or are just staring at an empty mount
|
---|
140 | point directory.
|
---|
141 |
|
---|
142 | The ``testbox-maintenance.sh`` script maintains a global log in the root
|
---|
143 | directory that's called ``maintenance.log``. Errors will be logged there as
|
---|
144 | well as a ping and the action.
|
---|
145 |
|
---|
146 | We use a directory layout based on dotted decimal IP addresses here, so for a
|
---|
147 | server with the IP 10.40.41.42 all its file will be under ``10.40.41.42/``:
|
---|
148 |
|
---|
149 | ``<hostname>``
|
---|
150 | The name of the testbox (empty file). Help finding a testbox by name.
|
---|
151 |
|
---|
152 | ``testbox-info.txt``
|
---|
153 | Information about the testbox. Starting off with the name, decimal IP,
|
---|
154 | PXELINUX style hexadecimal IP, and more.
|
---|
155 |
|
---|
156 | ``maintenance.log``
|
---|
157 | Maintenance log file recording what the maintenance service does.
|
---|
158 |
|
---|
159 | ``disk-devices.lst``
|
---|
160 | Optional list of disk devices to consider backuping up or restoring. This is
|
---|
161 | intended for testboxes with additional disks that are used for other purposes
|
---|
162 | and should touched.
|
---|
163 |
|
---|
164 | ``sda.raw.gz``
|
---|
165 | The gzipped raw copy of the sda device of the testbox.
|
---|
166 |
|
---|
167 | ``sd[bcdefgh].raw.gz``
|
---|
168 | The gzipped raw copy sdb, sdc, sde, sdf, sdg, sdh, etc if any of them exists
|
---|
169 | and are disks/SSDs.
|
---|
170 |
|
---|
171 |
|
---|
172 | Note! If it turns out we can be certain to get a valid host name, we might just
|
---|
173 | switch to use the hostname as the directory name instead of the IP.
|
---|
174 |
|
---|
175 |
|
---|
176 | Debian NFS root (/export/testbox-nfsroot)
|
---|
177 | ==========================================
|
---|
178 |
|
---|
179 | The testbox-nfsroot share should be read-only and must **not** have root
|
---|
180 | squashing enabled. Also, make sure setting the set-uid-bit is allowed by the
|
---|
181 | server, or ``su` and ``sudo`` won't work
|
---|
182 |
|
---|
183 | There are several ways of creating a debian nfsroot, but since we've got a
|
---|
184 | tool like VirtualBox around we've just installed it in a VM, prepared it,
|
---|
185 | and copied it onto the NFS server share.
|
---|
186 |
|
---|
187 | As of writing debian 8.6.0 is current, so a minimal 64-bit install of it was
|
---|
188 | done in a VM. After installation the following modifications was done:
|
---|
189 |
|
---|
190 | - ``apt-get install pxelinux syslinux initramfs-tools zip gddrescue sudo joe``
|
---|
191 | and optionally ``apt-get install smbclient cifs-utils``.
|
---|
192 |
|
---|
193 | - ``/etc/default/grub`` was modified to set ``GRUB_CMDLINE_LINUX_DEFAULT`` to
|
---|
194 | ``""`` instead of ``"quiet"``. This allows us to see messages during boot
|
---|
195 | and perhaps spot why something doesn't work on a testbox. Regenerate the
|
---|
196 | grub configuration file by running ``update-grub`` afterwards.
|
---|
197 |
|
---|
198 | - ``/etc/sudoers`` was modified to allow the ``vbox`` user use sudo without
|
---|
199 | requring any password.
|
---|
200 |
|
---|
201 | - Create the directory ``/etc/systemd/system/getty@tty1.service.d`` and create
|
---|
202 | the file ``noclear.conf`` in it with the following content::
|
---|
203 |
|
---|
204 | [Service]
|
---|
205 | TTYVTDisallocate=no
|
---|
206 |
|
---|
207 | This stops getty from clearing VT1 and let us see the tail of the boot up
|
---|
208 | messages, which includes messages from the testbox-maintenance service.
|
---|
209 |
|
---|
210 | - Mount the testbox-nfsroot under ``/mnt/`` with write privileges. (The write
|
---|
211 | privileges are temporary - don't forget to remove them later on.)::
|
---|
212 |
|
---|
213 | mount -t nfs myserver.com:/export/testbox-nfsroot
|
---|
214 |
|
---|
215 | Note! Adding ``-o nfsvers=3`` may help with some NTFv4 servers.
|
---|
216 |
|
---|
217 | - Copy the debian root and dev file system onto nfsroot. If you have ssh
|
---|
218 | access to the NFS server, the quickest way to do it is to use ``tar``::
|
---|
219 |
|
---|
220 | tar -cz --one-file-system -f /mnt/testbox-maintenance-nfsroot.tar.gz . dev/
|
---|
221 |
|
---|
222 | An alternative is ``cp -ax . /mnt/. && cp -ax dev/. /mnt/dev/.`` but this
|
---|
223 | is quite a bit slower, obviously.
|
---|
224 |
|
---|
225 | - Edit ``/etc/ssh/sshd_config`` setting ``PermitRootLogin`` to ``yes`` so we can ssh
|
---|
226 | in as root later on.
|
---|
227 |
|
---|
228 | - chroot into the nfsroot: ``chroot /mnt/``
|
---|
229 |
|
---|
230 | - ``mount -o proc proc /proc``
|
---|
231 |
|
---|
232 | - ``mount -o sysfs sysfs /sys``
|
---|
233 |
|
---|
234 | - ``mkdir /mnt/testbox-tftp /mnt/testbox-backup``
|
---|
235 |
|
---|
236 | - Recreate ``/etc/fstab`` with::
|
---|
237 |
|
---|
238 | proc /proc proc defaults 0 0
|
---|
239 | /dev/nfs / nfs defaults 1 1
|
---|
240 | 10.42.1.1:/export/testbox-tftp /mnt/testbox-tftp nfs tcp,nfsvers=3,noauto 2 2
|
---|
241 | 10.42.1.1:/export/testbox-backup /mnt/testbox-backup nfs tcp,nfsvers=3,noauto 3 3
|
---|
242 |
|
---|
243 | We use NFS version 3 as that works better for our NFS server and client,
|
---|
244 | remove if not necessary. The ``noauto`` option is to work around mount
|
---|
245 | trouble during early bootup on some of our boxes.
|
---|
246 |
|
---|
247 | - Do ``mount /mnt/testbox-tftp && mount /mnt/testbox-backup`` to mount the
|
---|
248 | two shares. This may be a good time to execute the instructions in the
|
---|
249 | sections above relating to these two shares.
|
---|
250 |
|
---|
251 | - Edit ``/etc/initramfs-tools/initramfs.conf`` and change the ``MODULES``
|
---|
252 | value from ``most`` to ``netboot``.
|
---|
253 |
|
---|
254 | - Append ``aufs`` to ``/etc/initramfs-tools/modules``. The advanced
|
---|
255 | multi-layered unification filesystem (aufs) enables us to use a
|
---|
256 | read-only NFS root. [3]_ [4]_ [5]_
|
---|
257 |
|
---|
258 | - Create ``/etc/initramfs-tools/scripts/init-bottom/00_aufs_init`` as
|
---|
259 | an executable file with the following content::
|
---|
260 |
|
---|
261 | #!/bin/sh
|
---|
262 | # Don't run during update-initramfs:
|
---|
263 | case "$1" in
|
---|
264 | prereqs)
|
---|
265 | exit 0;
|
---|
266 | ;;
|
---|
267 | esac
|
---|
268 |
|
---|
269 | modprobe aufs
|
---|
270 | mkdir -p /ro /rw /aufs
|
---|
271 | mount -t tmpfs tmpfs /rw -o noatime,mode=0755
|
---|
272 | mount --move $rootmnt /ro
|
---|
273 | mount -t aufs aufs /aufs -o noatime,dirs=/rw:/ro=ro
|
---|
274 | mkdir -p /aufs/rw /aufs/ro
|
---|
275 | mount --move /ro /aufs/ro
|
---|
276 | mount --move /rw /aufs/rw
|
---|
277 | mount --move /aufs /root
|
---|
278 | exit 0
|
---|
279 |
|
---|
280 | - Update the init ramdisk: ``update-initramfs -u -k all``
|
---|
281 |
|
---|
282 | Note! It may be necessary to do ``mount -t tmpfs tmpfs /var/tmp`` to help
|
---|
283 | this operation succeed.
|
---|
284 |
|
---|
285 | - Copy ``/boot`` to ``/mnt/testbox-tftp/maintenance-boot/``.
|
---|
286 |
|
---|
287 | - Copy the ``testbox-maintenance.sh`` file found in the same directory as this
|
---|
288 | document to ``/root/scripts/`` (need to create the dir) and make it
|
---|
289 | executable.
|
---|
290 |
|
---|
291 | - Create the systemd service file for the maintenance service as
|
---|
292 | ``/etc/systemd/system/testbox-maintenance.service`` with the content::
|
---|
293 |
|
---|
294 | [Unit]
|
---|
295 | Description=Testbox Maintenance
|
---|
296 | After=network.target
|
---|
297 | Before=getty@tty1.service
|
---|
298 |
|
---|
299 | [Service]
|
---|
300 | Type=oneshot
|
---|
301 | RemainAfterExit=True
|
---|
302 | ExecStart=/root/scripts/testbox-maintenance.sh
|
---|
303 | ExecStartPre=/bin/echo -e \033%G
|
---|
304 | ExecReload=/bin/kill -HUP $MAINPID
|
---|
305 | WorkingDirectory=/tmp
|
---|
306 | Environment=TERM=xterm
|
---|
307 | StandardOutput=journal+console
|
---|
308 |
|
---|
309 | [Install]
|
---|
310 | WantedBy=multi-user.target
|
---|
311 |
|
---|
312 | - Enable our service: ``systemctl enable /etc/systemd/system/testbox-maintenance.service``
|
---|
313 |
|
---|
314 | - xxxx ... more ???
|
---|
315 |
|
---|
316 | - Before leaving the chroot, do ``mount /proc /sys /mnt/testbox-*``.
|
---|
317 |
|
---|
318 |
|
---|
319 | - Testing the setup from a VM is kind of useful (if the nfs server can be
|
---|
320 | convinced to accept root nfs mounts from non-privileged clinet ports):
|
---|
321 |
|
---|
322 | - Create a VM using the 64-bit debian profile. Let's call it "pxe-vm".
|
---|
323 | - Mount the TFTP share somewhere, like M: or /mnt/testbox-tftp.
|
---|
324 | - Reconfigure the NAT DHCP and TFTP bits::
|
---|
325 |
|
---|
326 | VBoxManage setextradata pxe-vm VBoxInternal/PDM/DriverTransformations/pxe/AboveDriver NAT
|
---|
327 | VBoxManage setextradata pxe-vm VBoxInternal/PDM/DriverTransformations/pxe/Action mergeconfig
|
---|
328 | VBoxManage setextradata pxe-vm VBoxInternal/PDM/DriverTransformations/pxe/Config/TFTPPrefix M:/
|
---|
329 | VBoxManage setextradata pxe-vm VBoxInternal/PDM/DriverTransformations/pxe/Config/BootFile pxelinux.0
|
---|
330 |
|
---|
331 | - Create the file ``testbox-tftp/pxelinux.cfg/0A00020F`` containing::
|
---|
332 |
|
---|
333 | PATH bios
|
---|
334 | DEFAULT maintenance
|
---|
335 | LABEL maintenance
|
---|
336 | MENU LABEL Maintenance (NFS)
|
---|
337 | KERNEL maintenance-boot/vmlinuz-3.16.0-4-amd64
|
---|
338 | APPEND initrd=maintenance-boot/initrd.img-3.16.0-4-amd64 ro ip=dhcp aufs=tmpfs \
|
---|
339 | boot=nfs root=/dev/nfs nfsroot=10.42.1.1:/export/testbox-nfsroot
|
---|
340 | LABEL local-boot
|
---|
341 | LOCALBOOT
|
---|
342 |
|
---|
343 |
|
---|
344 | Troubleshooting
|
---|
345 | ===============
|
---|
346 |
|
---|
347 | ``PXE-E11`` or something like ``No ARP reply``
|
---|
348 | You probably got the TFTP and DHCP on different machines. Try move the TFTP
|
---|
349 | to the same machine as the DHCP, then the PXE stack won't have to do any
|
---|
350 | additional ARP resolving. Google results suggest that a congested network
|
---|
351 | could use the ARP reply to get lost. Our suspicion is that it might also be
|
---|
352 | related to the PXE stack shipping with the NIC.
|
---|
353 |
|
---|
354 |
|
---|
355 |
|
---|
356 | -----
|
---|
357 |
|
---|
358 | .. [1] See http://www.syslinux.org/wiki/index.php?title=Category:Modules
|
---|
359 | .. [2] See http://www.syslinux.org/wiki/index.php?title=PXELINUX#Configuration
|
---|
360 | .. [3] See https://en.wikipedia.org/wiki/Aufs
|
---|
361 | .. [4] See http://shitwefoundout.com/wiki/Diskless_ubuntu
|
---|
362 | .. [5] See http://debianaddict.com/2012/06/19/diskless-debian-linux-booting-via-dhcppxenfstftp/
|
---|
363 |
|
---|
364 |
|
---|
365 | -----
|
---|
366 |
|
---|
367 | :Status: $Id: TestBoxImaging.txt 82972 2020-02-04 11:13:09Z vboxsync $
|
---|
368 | :Copyright: Copyright (C) 2010-2020 Oracle Corporation.
|
---|
369 |
|
---|
370 |
|
---|