VirtualBox

source: vbox/trunk/src/VBox/ValidationKit/docs/TestBoxImaging.txt@ 107044

Last change on this file since 107044 was 106065, checked in by vboxsync, 2 months ago

Manual copyright year updates.

  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 13.3 KB
Line 
1
2Testbox Imaging (Backup / Restore)
3==================================
4
5
6Introduction
7------------
8
9This document is explores deploying a very simple drive imaging solution to help
10avoid needing to manually reinstall testboxes when a disk goes bust or the OS
11install seems to be corrupted.
12
13
14Definitions / Glossary
15======================
16
17See AutomaticTestingRevamp.txt.
18
19
20Objectives
21==========
22
23 - Off site, no admin interaction (no need for ILOM or similar).
24 - OS independent.
25 - Space and bandwidth efficient.
26 - As automatic as possible.
27 - Logging.
28
29
30Overview of the Solution
31========================
32
33Here is a brief summary:
34
35 - Always boot testboxes via PXE using PXELINUX.
36 - Default configuration is local boot (hard disk / SSD)
37 - Restore/backup action triggered by machine specific PXE config.
38 - Boots special debian maintenance install off NFS.
39 - A maintenance service (systemd style) does the work.
40 - The service reads action from TFTP location and performs it.
41 - When done the service removes the TFTP machine specific config
42 and reboots the system.
43
44Maintenance actions are:
45 - backup
46 - backup-again
47 - restore
48 - refresh-info
49 - rescue
50
51Possible modifier that indicates a subset of disk on testboxes with other OSes
52installed. Support for partition level backup/restore is not explored here.
53
54
55How to use
56----------
57
58To perform one of the above maintenance actions on a testbox, run the
59``testbox-pxe-conf.sh`` script::
60
61 /mnt/testbox-tftp/pxeclient.cfg/testbox-pxe-conf.sh 10.165.98.220 rescue
62
63Then trigger a reboot. The box will then boot the NFS rooted debian image and
64execute the maintenance action. On success, it will remove the testbox hex-IP
65config file and reboot again.
66
67
68Storage Server
69==============
70
71The storage server will have three areas used here. Using NFS for all three
72avoids extra work getting CIFS sharing right too (NFS is already a pain).
73
74 1. /export/testbox-tftp - TFTP config area. Read-write.
75 2. /export/testbox-backup - Images and logs. Read-write.
76 3. /export/testbox-nfsroot - Custom debian. Read-only, no root squash.
77
78
79TFTP (/export/testbox-tftp)
80============================
81
82The testbox-tftp share needs to be writable, root squashing is okay.
83
84We need files from both PXELINUX and SYSLINUX to make this work now. On a
85debian system, the ``pxelinux`` and ``syslinux`` packages needs to be
86installed. We actually do this further down when setting up the nfsroot, so
87it's possible to get them from there by postponing this step a little. On
88debian 8.6.0 the PXELINUX files are found in ``/usr/lib/PXELINUX`` and the
89SYSLINUX ones in ``/usr/lib/syslinux``.
90
91The initial PXE image as well as associated modules comes in three variants,
92BIOS, 32-bit EFI and 64-bit EFI. We'll only need the BIOS one for now.
93Perform the following copy operations::
94
95 cp /usr/lib/PXELINUX/pxelinux.0 /mnt/testbox-tftp/
96 cp /usr/lib/syslinux/modules/*/ldlinux.* /mnt/testbox-tftp/
97 cp -R /usr/lib/syslinux/modules/bios /mnt/testbox-tftp/
98 cp -R /usr/lib/syslinux/modules/efi32 /mnt/testbox-tftp/
99 cp -R /usr/lib/syslinux/modules/efi64 /mnt/testbox-tftp/
100
101
102For simplicity, all the testboxes boot using good old fashioned BIOS, no EFI.
103However, it doesn't really hurt to be prepared.
104
105The PXELINUX related files goes in the root of the testbox-tftp share. (As
106mentioned further down, these can be installed on a debian system by running
107``apt-get install pxelinux syslinux``.) We need the ``*pxelinux.0`` files
108typically found in ``/usr/lib/PXELINUX/`` on debian systems (recent ones
109anyway). It is possible we may need one ore more fo the modules [1]_ that
110ships with PXELINUX/SYSLINUX, so do copy ``/usr/lib/syslinux/modules`` to
111``testbox-tftp/modules`` as well.
112
113
114The directory layout related to the configuration files is dictated by the
115PXELINUX configuration file searching algorithm [2]_. Create a subdirectory
116``pxelinux.cfg/`` under ``testbox-tftp`` and create the world readable file
117``default`` with the following content::
118
119 PATH bios
120 DEFAULT local-boot
121 LABEL local-boot
122 LOCALBOOT
123
124This will make the default behavior to boot the local disk system.
125
126Copy the ``testbox-pxe-conf.sh`` script file found in the same directory as
127this document to ``/mnt/testbox-tftp/pxelinux.cfg/``. Edit the copy to correct
128the IP addresses near the top, as well as any linux, TFTP and PXE details near
129the bottom of the file. This script will generate the PXE configuration file
130when performing maintenance on a testbox.
131
132
133Images and logs (/export/testbox-backup)
134=========================================
135
136The testbox-backup share needs to be writable, root squashing is okay.
137
138In the root there must be a file ``testbox-backup`` so we can easily tell
139whether we've actually mounted the share or are just staring at an empty mount
140point directory.
141
142The ``testbox-maintenance.sh`` script maintains a global log in the root
143directory that's called ``maintenance.log``. Errors will be logged there as
144well as a ping and the action.
145
146We use a directory layout based on dotted decimal IP addresses here, so for a
147server with the IP 10.40.41.42 all its file will be under ``10.40.41.42/``:
148
149``<hostname>``
150 The name of the testbox (empty file). Help finding a testbox by name.
151
152``testbox-info.txt``
153 Information about the testbox. Starting off with the name, decimal IP,
154 PXELINUX style hexadecimal IP, and more.
155
156``maintenance.log``
157 Maintenance log file recording what the maintenance service does.
158
159``disk-devices.lst``
160 Optional list of disk devices to consider backuping up or restoring. This is
161 intended for testboxes with additional disks that are used for other purposes
162 and should touched.
163
164``sda.raw.gz``
165 The gzipped raw copy of the sda device of the testbox.
166
167``sd[bcdefgh].raw.gz``
168 The gzipped raw copy sdb, sdc, sde, sdf, sdg, sdh, etc if any of them exists
169 and are disks/SSDs.
170
171
172Note! If it turns out we can be certain to get a valid host name, we might just
173 switch to use the hostname as the directory name instead of the IP.
174
175
176Debian NFS root (/export/testbox-nfsroot)
177==========================================
178
179The testbox-nfsroot share should be read-only and must **not** have root
180squashing enabled. Also, make sure setting the set-uid-bit is allowed by the
181server, or ``su` and ``sudo`` won't work
182
183There are several ways of creating a debian nfsroot, but since we've got a
184tool like VirtualBox around we've just installed it in a VM, prepared it,
185and copied it onto the NFS server share.
186
187As of writing debian 8.6.0 is current, so a minimal 64-bit install of it was
188done in a VM. After installation the following modifications was done:
189
190 - ``apt-get install pxelinux syslinux initramfs-tools zip gddrescue sudo joe``
191 and optionally ``apt-get install smbclient cifs-utils``.
192
193 - ``/etc/default/grub`` was modified to set ``GRUB_CMDLINE_LINUX_DEFAULT`` to
194 ``""`` instead of ``"quiet"``. This allows us to see messages during boot
195 and perhaps spot why something doesn't work on a testbox. Regenerate the
196 grub configuration file by running ``update-grub`` afterwards.
197
198 - ``/etc/sudoers`` was modified to allow the ``vbox`` user use sudo without
199 requring any password.
200
201 - Create the directory ``/etc/systemd/system/getty@tty1.service.d`` and create
202 the file ``noclear.conf`` in it with the following content::
203
204 [Service]
205 TTYVTDisallocate=no
206
207 This stops getty from clearing VT1 and let us see the tail of the boot up
208 messages, which includes messages from the testbox-maintenance service.
209
210 - Mount the testbox-nfsroot under ``/mnt/`` with write privileges. (The write
211 privileges are temporary - don't forget to remove them later on.)::
212
213 mount -t nfs myserver.com:/export/testbox-nfsroot
214
215 Note! Adding ``-o nfsvers=3`` may help with some NTFv4 servers.
216
217 - Copy the debian root and dev file system onto nfsroot. If you have ssh
218 access to the NFS server, the quickest way to do it is to use ``tar``::
219
220 tar -cz --one-file-system -f /mnt/testbox-maintenance-nfsroot.tar.gz . dev/
221
222 An alternative is ``cp -ax . /mnt/. && cp -ax dev/. /mnt/dev/.`` but this
223 is quite a bit slower, obviously.
224
225 - Edit ``/etc/ssh/sshd_config`` setting ``PermitRootLogin`` to ``yes`` so we can ssh
226 in as root later on.
227
228 - chroot into the nfsroot: ``chroot /mnt/``
229
230 - ``mount -o proc proc /proc``
231
232 - ``mount -o sysfs sysfs /sys``
233
234 - ``mkdir /mnt/testbox-tftp /mnt/testbox-backup``
235
236 - Recreate ``/etc/fstab`` with::
237
238 proc /proc proc defaults 0 0
239 /dev/nfs / nfs defaults 1 1
240 10.42.1.1:/export/testbox-tftp /mnt/testbox-tftp nfs tcp,nfsvers=3,noauto 2 2
241 10.42.1.1:/export/testbox-backup /mnt/testbox-backup nfs tcp,nfsvers=3,noauto 3 3
242
243 We use NFS version 3 as that works better for our NFS server and client,
244 remove if not necessary. The ``noauto`` option is to work around mount
245 trouble during early bootup on some of our boxes.
246
247 - Do ``mount /mnt/testbox-tftp && mount /mnt/testbox-backup`` to mount the
248 two shares. This may be a good time to execute the instructions in the
249 sections above relating to these two shares.
250
251 - Edit ``/etc/initramfs-tools/initramfs.conf`` and change the ``MODULES``
252 value from ``most`` to ``netboot``.
253
254 - Append ``aufs`` to ``/etc/initramfs-tools/modules``. The advanced
255 multi-layered unification filesystem (aufs) enables us to use a
256 read-only NFS root. [3]_ [4]_ [5]_
257
258 - Create ``/etc/initramfs-tools/scripts/init-bottom/00_aufs_init`` as
259 an executable file with the following content::
260
261 #!/bin/sh
262 # Don't run during update-initramfs:
263 case "$1" in
264 prereqs)
265 exit 0;
266 ;;
267 esac
268
269 modprobe aufs
270 mkdir -p /ro /rw /aufs
271 mount -t tmpfs tmpfs /rw -o noatime,mode=0755
272 mount --move $rootmnt /ro
273 mount -t aufs aufs /aufs -o noatime,dirs=/rw:/ro=ro
274 mkdir -p /aufs/rw /aufs/ro
275 mount --move /ro /aufs/ro
276 mount --move /rw /aufs/rw
277 mount --move /aufs /root
278 exit 0
279
280 - Update the init ramdisk: ``update-initramfs -u -k all``
281
282 Note! It may be necessary to do ``mount -t tmpfs tmpfs /var/tmp`` to help
283 this operation succeed.
284
285 - Copy ``/boot`` to ``/mnt/testbox-tftp/maintenance-boot/``.
286
287 - Copy the ``testbox-maintenance.sh`` file found in the same directory as this
288 document to ``/root/scripts/`` (need to create the dir) and make it
289 executable.
290
291 - Create the systemd service file for the maintenance service as
292 ``/etc/systemd/system/testbox-maintenance.service`` with the content::
293
294 [Unit]
295 Description=Testbox Maintenance
296 After=network.target
297 Before=getty@tty1.service
298
299 [Service]
300 Type=oneshot
301 RemainAfterExit=True
302 ExecStart=/root/scripts/testbox-maintenance.sh
303 ExecStartPre=/bin/echo -e \033%G
304 ExecReload=/bin/kill -HUP $MAINPID
305 WorkingDirectory=/tmp
306 Environment=TERM=xterm
307 StandardOutput=journal+console
308
309 [Install]
310 WantedBy=multi-user.target
311
312 - Enable our service: ``systemctl enable /etc/systemd/system/testbox-maintenance.service``
313
314 - xxxx ... more ???
315
316 - Before leaving the chroot, do ``mount /proc /sys /mnt/testbox-*``.
317
318
319 - Testing the setup from a VM is kind of useful (if the nfs server can be
320 convinced to accept root nfs mounts from non-privileged clinet ports):
321
322 - Create a VM using the 64-bit debian profile. Let's call it "pxe-vm".
323 - Mount the TFTP share somewhere, like M: or /mnt/testbox-tftp.
324 - Reconfigure the NAT DHCP and TFTP bits::
325
326 VBoxManage setextradata pxe-vm VBoxInternal/PDM/DriverTransformations/pxe/AboveDriver NAT
327 VBoxManage setextradata pxe-vm VBoxInternal/PDM/DriverTransformations/pxe/Action mergeconfig
328 VBoxManage setextradata pxe-vm VBoxInternal/PDM/DriverTransformations/pxe/Config/TFTPPrefix M:/
329 VBoxManage setextradata pxe-vm VBoxInternal/PDM/DriverTransformations/pxe/Config/BootFile pxelinux.0
330
331 - Create the file ``testbox-tftp/pxelinux.cfg/0A00020F`` containing::
332
333 PATH bios
334 DEFAULT maintenance
335 LABEL maintenance
336 MENU LABEL Maintenance (NFS)
337 KERNEL maintenance-boot/vmlinuz-3.16.0-4-amd64
338 APPEND initrd=maintenance-boot/initrd.img-3.16.0-4-amd64 ro ip=dhcp aufs=tmpfs \
339 boot=nfs root=/dev/nfs nfsroot=10.42.1.1:/export/testbox-nfsroot
340 LABEL local-boot
341 LOCALBOOT
342
343
344Troubleshooting
345===============
346
347``PXE-E11`` or something like ``No ARP reply``
348 You probably got the TFTP and DHCP on different machines. Try move the TFTP
349 to the same machine as the DHCP, then the PXE stack won't have to do any
350 additional ARP resolving. Google results suggest that a congested network
351 could use the ARP reply to get lost. Our suspicion is that it might also be
352 related to the PXE stack shipping with the NIC.
353
354
355
356-----
357
358.. [1] See http://www.syslinux.org/wiki/index.php?title=Category:Modules
359.. [2] See http://www.syslinux.org/wiki/index.php?title=PXELINUX#Configuration
360.. [3] See https://en.wikipedia.org/wiki/Aufs
361.. [4] See http://shitwefoundout.com/wiki/Diskless_ubuntu
362.. [5] See http://debianaddict.com/2012/06/19/diskless-debian-linux-booting-via-dhcppxenfstftp/
363
364
365-----
366
367:Status: $Id: TestBoxImaging.txt 106065 2024-09-16 21:42:41Z vboxsync $
368:Copyright: Copyright (C) 2010-2024 Oracle Corporation.
Note: See TracBrowser for help on using the repository browser.

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette