AutomaticTestingRevamp.txt@ 98107

Last change on this file since 98107 was 98107, checked in by vboxsync, 22 months ago
Manual (C) year updates.
Property svn:eol-style set to `native` Property svn:keywords set to `Author Date Id Revision`
File size: 39.5 KB

Line
1
2	Revamp of Automatic VirtualBox Testing
3	======================================
4
5
6	Introduction
7	------------
8
9	This is the design document for a revamped automatic testing framework.
10	The revamp aims at replacing the current tinderbox based testing by a new
11	system that is written from scratch.
12
13	The old system is not easy to work with and was never meant to be used for
14	managing tests, after all it just a simple a build manager tailored for
15	contiguous building. Modifying the existing tinderbox system to do what
16	we want would require fundamental changes that would render it useless as
17	a build manager, it would therefore end up as a fork. The amount of work
18	required would probably be about the same as writing a new system from
19	scratch. Other considerations, such as the license of the tinderbox
20	system (MPL) and language it is realized in (Perl), are also in favor of
21	doing it from scratch.
22
23	The language envisioned for the new automatic testing framework is Python. This
24	is for several reasons:
25
26	- The VirtualBox API has Python bindings.
27	- Python is used quite a bit inside Sun (dunno about Oracle).
28	- Works relatively well with Apache for the server side bits.
29	- It is more difficult to produce write-only code in Python (alias the
30	we-don't-like-perl argument).
31	- You don't need to compile stuff.
32
33	Note that the author of this document has no special training as a test
34	engineer and may therefore be using the wrong terms here and there. The
35	primary focus is to express what we need to do in order to improve
36	testing.
37
38	This document is written in reStructuredText (rst) which just happens to
39	be used by Python, the primary language for this revamp. For more
40	information on reStructuredText: http://docutils.sourceforge.net/rst.html
41
42
43	Definitions / Glossary
44	======================
45
46	sub-test driver
47	A set of test cases that can be used by more than one test driver. Could
48	also be called a test unit, in the pascal sense of unit, if it wasn't so
49	easily confused with 'unit test'.
50
51	test
52	This is somewhat ambiguous and this document try avoid using it where
53	possible. When used it normally refers to doing testing by executing one or
54	more testcases.
55
56	test case
57	A set of inputs, test programs and expected results. It validates system
58	requirements and generates a pass or failed status. A basic unit of testing.
59	Note that we use the term in a rather broad sense.
60
61	test driver
62	A program/script used to execute a test. Also known as a test harness.
63	Generally abbreviated 'td'. It can have sub-test drivers.
64
65	test manager
66	Software managing the automatic testing. This is a web application that runs
67	on a dedicated server (tindertux).
68
69	test set
70	The output of testing activity. Logs, results, ++. Our usage of this should
71	probably be renamed to 'test run'.
72
73	test group
74	A collection of related test cases.
75
76	testbox
77	A computer that does testing.
78
79	testbox script
80	Script executing orders from the test manager on a testbox. Started
81	automatically upon bootup.
82
83	testing
84	todo
85
86	TODO: Check that we've got all this right and make them more exact
87	where possible.
88
89	See also http://encyclopedia2.thefreedictionary.com/testing%20types
90	and http://www.aptest.com/glossary.html .
91
92
93
94	Objectives
95	==========
96
97	- A scalable test manager (>200 testboxes).
98	- Optimize the web user interface (WUI) for typical workflows and analysis.
99	- Efficient and flexibile test configuration.
100	- Import test result from other test systems (logo testing, VDI, ++).
101	- Easy to add lots of new testscripts.
102	- Run tests locally without a manager.
103	- Revamp a bit at the time.
104
105
106
107	The Testbox Side
108	================
109
110	Each testbox has a unique name corresponding to its DNS zone entry. When booted
111	a testbox script is started automatically. This script will query the test
112	manager for orders and execute them. The core order downloads and executes a
113	test driver with parameters (configuration) from the server. The test driver
114	does all the necessary work for executing the test. In a typical VirtualBox
115	test this means picking a build, installing it, configuring VMs, running the
116	test VMs, collecting the results, submitting them to the server, and finally
117	cleaning up afterwards.
118
119	The testbox environment which the test drivers are executed in will have a
120	number of environment variables for determining location of the source images
121	and other test data, scratch space, test set id, server URL, and so on and so
122	forth.
123
124	On startup, the testbox script will look for crash dumps and similar on
125	systems where this is possible. If any sign of a crash is found, it will
126	put any dumps and reports in the upload directory and inform the test
127	manager before reporting for duty. In order to generate the proper file
128	names and report the crash in the right test set as well as prevent
129	reporting crashes unrelated to automatic testing, the testbox script will
130	keep information (test set id, ++) in a separate scratch directory
131	(${TESTBOX_PATH_SCRATCH}/../testbox) and make sure it is synced to the
132	disk (both files and directories).
133
134	After checking for crashes, the testbox script will clean up any previous test
135	which might be around. This involves first invoking the test script in cleanup
136	mode and the wiping the scratch space.
137
138	When reporting for duty the script will submit information about the host: OS
139	name, OS version, OS bitness, CPU vendor, total number of cores, VT-x support,
140	AMD-V support, amount of memory, amount of scratch space, and anything else that
141	can be found useful for scheduling tests or filtering test configurations.
142
143
144
145	Testbox Script Orders
146	---------------------
147
148	The orders are kept in a queue on the server and the testbox script will fetch
149	them one by one. Orders that cannot be executed at the moment will be masked in
150	the query from the testbox.
151
152	Execute Test Driver
153	Downloads and executes the a specified test driver with the given
154	configuration (arguments). Only one test driver can be executed at a time.
155	The server can specify more than one ZIP file to be downloaded and unpacked
156	before executing the test driver. The testbox script may cache these zip
157	files using http time stamping.
158
159	Abort Test Driver
160	Aborts the current test driver. This will drop a hint to the driver and give
161	it 60 seconds to shut down the normal way. If that fails, the testbox script
162	will kill the driver processes (SIGKILL or equivalent), invoke the
163	testdriver in cleanup mode, and finally wipe the scratch area. Should either
164	of the last two steps fail in some way, the testbox will be rebooted.
165
166	Idle
167	Ask again in X seconds, where X is specified by the server.
168
169	Reboot
170	Reboot the testbox. If a test driver is current running, an attempt at
171	aborting it (Abort Test Driver) will be made first.
172
173	Update
174	Updates the testbox script. The order includes a server relative path to the
175	new testbox script. This can only be executed when no test driver is
176	currently being executed.
177
178
179	Testbox Environment: Variables
180	------------------------------
181
182	COMSPEC
183	This will be set to C:\Windows\System32\cmd.exe on Windows.
184
185	PATH
186	This will contain the kBuild binary directory for the host platform.
187
188	SHELL
189	This will be set to point to kmk_ash(.exe) on all platforms.
190
191	TESTBOX_NAME
192	The testbox name.
193	This is not required by the local reporter.
194
195	TESTBOX_PATH_BUILDS
196	The absolute path to where the build repository can be found. This should be
197	a read only mount when possible.
198
199	TESTBOX_PATH_RESOURCES
200	The absolute path to where static test resources like ISOs and VDIs can be
201	found. The test drivers knows the layout of this. This should be a read only
202	mount when possible.
203
204	TESTBOX_PATH_SCRATCH
205	The absolute path to the scratch space. This is the current directory when
206	starting the test driver. It will be wiped automatically after executing the
207	test.
208	(Envisioned as ${TESTBOX_PATH_SCRIPTS}/../scratch and that
209	${TESTBOX_PATH_SCRATCH}/ will be automatically wiped by the testbox script.)
210
211	TESTBOX_PATH_SCRIPTS
212	The absolute path to the test driver and the other files that was unzipped
213	together with it. This is also where the test-driver-abort file will be put.
214	(Envisioned as ${TESTBOX_PATH_SCRATCH}/../driver, see above.)
215
216	TESTBOX_PATH_UPLOAD
217	The absolute path to the upload directory for the testbox. This is for
218	putting VOBs, PNGs, core dumps, crash dumps, and such on. The files should be
219	bzipped or zipped if they aren't compress already. The names should contain
220	the testbox and test set ID.
221
222	TESTBOX_REPORTER
223	The name of the test reporter back end. If not present, it will default to
224	the local reporter.
225
226	TESTBOX_TEST_SET_ID
227	The test set ID if we're running.
228	This is not required by the local reporter.
229
230	TESTBOX_MANAGER_URL
231	The URL to the test manager.
232	This is not required by the local reporter.
233
234	TESTBOX_XYZ
235	There will probably be some more of these.
236
237
238	Testbox Environment: Core Utilities
239	-----------------------------------
240
241	The testbox will not provide the typical unix /bin and /usr/bin utilities. In
242	other words, cygwin will not be used on Windows!
243
244	The testbox will provide the unixy utilities that ships with kBuild and possibly
245	some additional ones from tools/./bin in the VirtualBox tree (wget, unzip,
246	zip, and so on). The test drivers will avoid invoking any of these utilities
247	directly and instead rely on generic utility methods in the test driver
248	framework. That way we can more easily reimplement the functionality of the
249	core utilities and drop the dependency on them. It also allows us to quickly
250	work around platform specific oddities and bugs.
251
252
253	Test Drivers
254	------------
255
256	The test drivers are programs that will do the actual testing. In addition to
257	run under the testbox script, they can be executed in the VirtualBox development
258	environment. This is important for bug analysis and for simplifying local
259	testing by the developers before committing changes. It also means the test
260	drivers can be developed locally in the VirtualBox development environment.
261
262	The main difference between executing a driver under the testbox script and
263	running it manually is that there is no test manager in the latter case. The
264	test result reporter will not talk to the server, but report things to a local
265	log file and/or standard out/err. When invoked manually, all the necessary
266	arguments will need to be specified by hand of course - it should be possible
267	to extract them from a test set as well.
268
269	For the early implementation stages, an implementation of the reporter interface
270	that talks to the tinderbox base test manager will be needed. This will be
271	dropped later on when a new test manager is ready.
272
273	As hinted at in other sections, there will be a common framework
274	(libraries/packages/classes) for taking care of the tedious bits that every
275	test driver needs to do. Sharing code is essential to easing test driver
276	development as well as reducing their complexity. The framework will contain:
277
278	- A generic way of submitting output. This will be a generic interface with
279	multiple implementation, the TESTBOX_REPORTER environment variable
280	will decide which of them to use. The interface will have very specific
281	methods to allow the reporter to do a best possible job in reporting the
282	results to the test manager.
283
284	- Helpers for typical tasks, like:
285	- Copying files.
286	- Deleting files, directory trees and scratch space.
287	- Unzipping files.
288	- Creating ISOs
289	- And such things.
290
291	- Helpers for installing and uninstalling VirtualBox.
292
293	- Helpers for defining VMs. (The VBox API where available.)
294
295	- Helpers for controlling VMs. (The VBox API where available.)
296
297	The VirtualBox bits will be separate from the more generic ones, simply because
298	this is cleaner it will allow us to reuse the system for testing other products.
299
300	The framework will be packaged in a zip file other than the test driver so we
301	don't waste time and space downloading the same common code.
302
303	The test driver will poll for the file
304	${TESTBOX_PATH_SCRIPTS}/test-driver-abort and abort all testing when it sees it.
305
306	The test driver can be invoked in three modes: execute, help and cleanup. The
307	default is execute mode, the help shows an configuration summary and the cleanup
308	is for cleaning up after a reboot or aborted run. The latter is done by the
309	testbox script on startup and after abort - the driver is expected to clean up
310	by itself after a normal run.
311
312
313
314	The Server Side
315	===============
316
317	The server side will be implemented using a webserver (apache), a database
318	(postgres) and cgi scripts (Python). In addition a cron job (Python) running
319	once a minute will generate static html for frequently used pages and maybe
320	execute some other tasks for driving the testing forwards. The order queries
321	from the testbox script is the primary driving force in the system. The total
322	makes up the test manager.
323
324	The test manager can be split up into three rough parts:
325
326	- Configuration (of tests, testgroups and testboxes).
327	- Execution (of tests, collecting and organizing the output).
328	- Analysis (of test output, mostly about presentation).
329
330
331	Test Manager: Requirements
332	==========================
333
334	List of requirements:
335
336	- Two level testing - L1 quick smoke tests and L2 longer tests performed on
337	builds passing L1. (Klaus (IIRC) meant this could be realized using
338	test dependency.)
339	- Black listing builds (by revision or similar) known to be bad.
340	- Distinguish between build types so we can do a portion of the testing with
341	strict builds.
342	- Easy to re-configure build source for testing different branch or for
343	testing a release candidate. (Directory based is fine.)
344	- Useful to be able to partition testboxes (run specific builds on some
345	boxes, let an engineer have a few boxes for a while).
346	- Interaction with ILOM/...: reset systems.
347	- Be able to suspend testing on selected testboxes when doing maintenance
348	(where automatically resuming testing on reboot is undesired) or similar
349	activity.
350	- Abort testing on selected testboxes.
351	- Scheduling of tests requiring more than one testbox.
352	- Scheduling of tests that cannot be executing concurrently on several
353	machines because of some global resource like an iSCSI target.
354	- Jump the scheduling queue. Scheduling of specified test the next time a
355	testbox is available (optionally specifying which testbox to schedule it
356	on).
357	- Configure tests with variable configuration to get better coverage. Two modes:
358	- TM generates the permutations based on one or more sets of test script arguments.
359	- Each configuration permutation is specified manually.
360	- Test specification needs to be flexible (select tests, disable test, test
361	scheduling (run certain tests nightly), ... ).
362	- Test scheduling by hour+weekday and by priority.
363	- Test dependencies (test A depends on test B being successful).
364	- Historize all configuration data, in particular test configs (permutations
365	included) and testboxes.
366	- Test sets has at a minimum a build reference, a testbox reference and a
367	primary log associated with it.
368	- Test sets stores further result as a recursive collection of:
369	- hierarchical subtest name (slash sep)
370	- test parameters / config
371	- bool fail/succ
372	- attributes (typed?)
373	- test time
374	- e.g. throughput
375	- subresults
376	- log
377	- screenshots, video,...
378	- The test sets database structure needs to designed such that data mining
379	can be done in an efficient manner.
380	- Presentation/analysis: graphs!, categorize bugs, columns reorganizing
381	grouped by test (hierarchical), overviews, result for last day.
382
383
384
385	Test Manager: Configuration
386	===========================
387
388
389	Testboxes
390	---------
391
392	Configuration of testboxes doesn't involve much work normally. A testbox
393	is added manually to the test manager by entering the DNS entry and/or IP
394	address (the test manager resolves the missing one when necessary) as well as
395	the system UUID (when obtainable - should be displayed by the testbox script
396	installer). Queries from unregistered testboxes will be declined as a kind of
397	security measure, the incident should be logged in the webserver log if
398	possible. In later dealings with the client the System UUID will be the key
399	identifier. It's permittable for the IP address to change when the testbox
400	isn't online, but not while testing (just imagine live migration tests and
401	network tests). Ideally, the testboxes should not change IP address.
402
403	The testbox edit function must allow changing the name and system UUID.
404
405	One further idea for the testbox configuration is indicating what they are
406	capable of to filter out tests and test configurations that won't work on that
407	testbox. To examplify this take the ACP2 installation test. If the test
408	manager does not make sure the testbox have VT-x or AMD-v capabilities, the test
409	is surely going to fail. Other testbox capabilities would be total number of
410	CPU cores, memory size, scratch space. These testbox capabilities should be
411	collected automatically on bootup by the testbox script together with OS name,
412	OS version and OS bitness.
413
414	A final thought, instead of outright declining all requests from new testboxes,
415	we could record the unregistered testboxes with ip, UUID, name, os info and
416	capabilities but mark them as inactive. The test operator can then activate
417	them on an activation page or edit the testbox or something.
418
419
420	Testcases
421	---------
422
423	We use the term testcase for a test.
424
425
426	Testgroups
427	----------
428
429	Testcases are organized into groups. A testcase can be member of more than one
430	group. The testcase gets a priority assigned to it in connection with the
431	group membership.
432
433	Testgroups are picked up by a testbox partition (aka scheduling group) and a
434	prioirty, scheduling time restriction and dependencies on other test groups are
435	associated with the assignment. A testgroup can be used by several testbox
436	partitions.
437
438	(This used to be called 'testsuites' but was renamed to avoid confusion with
439	the VBox Test Suite.)
440
441
442	Scheduling
443	----------
444
445	The initial scheduler will be modelled after what we're doing already on in the
446	tinderbox driven testing. It's best described as a best effort continuous
447	integration scheduler. Meaning, it will always use the latest build suitable
448	for a testcase. It will schedule on a testcase level, using the combined
449	priority of the testcase in the test group and the test group with the testbox
450	partition, trying to spread the test case argument variation out accordingly
451	over the whole scheduilng queue. Which argument variation to start with, is
452	not undefined (random would be best).
453
454	Later, we may add other schedulers as needed.
455
456
457
458	The Test Manager Database
459	=========================
460
461	First a general warning:
462
463	The guys working on this design are not database experts, web
464	programming experts or similar, rather we are low level guys
465	who's main job is x86 & AMD64 virtualization. So, please don't
466	be too hard on us. :-)
467
468
469	A logical table layout can be found in TestManagerDatabaseMap.png (created by
470	Oracle SQL Data Modeler, stored in TestManagerDatabase.dmd). The physical
471	database layout can be found in TestManagerDatabaseInit.pgsql postgreSQL
472	script. The script is commented.
473
474
475	Data History
476	------------
477
478	We need to somehow track configuration changes over time. We also need to
479	be able to query the exact configuration a test set was run with so we can
480	understand and make better use of the results.
481
482	There are different techniques for archiving this, one is tuple-versioning
483	( http://en.wikipedia.org/wiki/Tuple-versioning ), another is log trigger
484	( http://en.wikipedia.org/wiki/Log_trigger ). We use tuple-versioning in
485	this database, with 'effective' as start date field name and 'expire' as
486	the end (exclusive).
487
488	Tuple-versioning has a shortcoming wrt to keys, both primary and foreign.
489	The primary key of a table employing tuple-versioning is really
490	'id' + 'valid_period', where the latter is expressed using two fields
491	([effective...expire-1]). Only, how do you tell the database engine that
492	it should not allow overlapping valid_periods? Useful suggestions are
493	welcomed. :-)
494
495	Foreign key references to a table using tuple-versioning is running into
496	trouble because of the time axis and that to our knowledge foreign keys
497	must reference exactly one row in the other table. When time is involved
498	what we wish to tell the database is that at any given time, there actually
499	is exactly one row we want to match in the other table, only we've no idea
500	how to express this. So, many foreign keys are not expressed in SQL of this
501	database.
502
503	In some cases, we extend the tuple-versioning with a generation ID so that
504	normal foreign key referencing can be used. We only use this for recording
505	(references in testset) and scheduling (schedqueue), as using it more widely
506	would force updates (gen_id changes) to propagate into all related tables.
507
508	See also:
509	- http://en.wikipedia.org/wiki/Slowly_changing_dimension
510	- http://en.wikipedia.org/wiki/Change_data_capture
511	- http://en.wikipedia.org/wiki/Temporal_database
512
513
514
515	Test Manager: Execution
516	=======================
517
518
519
520	Test Manager: Scenarios
521	=======================
522
523
524
525	#1 - Testbox Signs On (At Bootup)
526	---------------------------------
527
528	The testbox supplies a number of inputs when reporting for duty:
529	- IP address.
530	- System UUID.
531	- OS name.
532	- OS version.
533	- CPU architecture.
534	- CPU count (= threads).
535	- CPU VT-x/AMD-V capability.
536	- CPU nested paging capability.
537	- Chipset I/O MMU capability.
538	- Memory size.
539	- Scratch size space (for testing).
540	- Testbox Script revision.
541
542	Results:
543	- ACK or NACK.
544	- Testbox ID and name on ACK.
545
546	After receiving a ACK the testbox will ask for work to do, i.e. continue with
547	scenario #2. In the NACK case, it will sleep for 60 seconds and try again.
548
549
550	Actions:
551
552	1. Validate the testbox by looking the UUID up in the TestBoxes table.
553	If not found, NACK the request. SQL::
554
555	SELECT idTestBox, sName
556	FROM TestBoxes
557	WHERE uuidSystem = :sUuid
558	AND tsExpire = 'infinity'::timestamp;
559
560	2. Check if any of the information by testbox script has changed. The two
561	sizes are normalized first, memory size rounded to nearest 4 MB and scratch
562	space is rounded down to nearest 64 MB. If anything changed, insert a new
563	row in the testbox table and historize the current one, i.e. set
564	OLD.tsExpire to NEW.tsEffective and get a new value for NEW.idGenTestBox.
565
566	3. Check with TestBoxStatuses:
567	a) If there is an row for the testbox in it already clean up change it
568	to 'idle' state and deal with any open testset like described in
569	scenario #9.
570	b) If there is no row, add one with 'idle' state.
571
572	4. ACK the request and pass back the idTestBox.
573
574
575	Note! Testbox.enabled is not checked here, that is only relevant when it asks
576	for a new task (scenario #2 and #5).
577
578	Note! Should the testbox script detect changes in any of the inputs, it should
579	redo the sign in.
580
581	Note! In scenario #8, the box will not sign on until it has done the reboot and
582	cleanup reporting!
583
584
585	#2 - Testbox Asks For Work To Do
586	---------------------------------
587
588
589	Inputs:
590	- The testbox is supplying its IP indirectly.
591	- The testbox should supply its UUID and ID directly.
592
593	Results:
594	- IDLE, WAIT, EXEC, REBOOT, UPGRADE, UPGRADE-AND-REBOOT, SPECIAL or DEAD.
595
596	Actions:
597
598	1. Validate the ID and IP by selecting the currently valid testbox row::
599
600	SELECT idGenTestBox, fEnabled, idSchedGroup, enmPendingCmd
601	FROM TestBoxes
602	WHERE id = :id
603	AND uuidSystem = :sUuid
604	AND ip = :ip
605	AND tsExpire = 'infinity'::timestamp;
606
607	If NOT found return DEAD to the testbox client (it will go back to sign on
608	mode and retry every 60 seconds or so - see scenario #1).
609
610	Note! The WUI will do all necessary clean-ups when deleting a testbox, so
611	contrary to the initial plans, we don't need to do anything more for
612	the DEAD status.
613
614	2. Check with TestBoxStatuses (maybe joined with query from 1).
615
616	If enmState is 'gang-gathering': Goto scenario #6 on timeout or pending
617	'abort' or 'reboot' command. Otherwise, tell the testbox to WAIT [done].
618
619	If enmState is 'gang-testing': The gang has been gathered and execution
620	has been triggered. Goto 5.
621
622	If enmState is not 'idle', change it to 'idle'.
623
624	If idTestSet is not NULL, CALL scenario #9 to it up.
625
626	If there is a pending abort command, remove it.
627
628	If there is a pending command and the old state doesn't indicate that it was
629	being executed, GOTO scenario #3.
630
631	Note! There should be a TestBoxStatuses row after executing scenario #1,
632	however should none be found for some funky reason, returning DEAD
633	will fix the problem (see above)
634
635	3. If the testbox was marked as disabled, respond with an IDLE command to the
636	testbox [done]. (Note! Must do this after TestBoxStatuses maintenance from
637	point 2, or abandoned tests won't be cleaned up after a testbox is disabled.)
638
639	4. Consider testcases in the scheduling queue, pick the first one which the
640	testbox can execute. There is a concurrency issue here, so we put and
641	exclusive lock on the SchedQueues table while considering its content.
642
643	The cursor we open looks something like this::
644
645	SELECT idItem, idGenTestCaseArgs,
646	idTestSetGangLeader, cMissingGangMembers
647	FROM SchedQueues
648	WHERE idSchedGroup = :idSchedGroup
649	AND ( bmHourlySchedule is NULL
650	OR get_bit(bmHourlySchedule, :iHourOfWeek) = 1 ) --< does this work?
651	ORDER BY ASC idItem;
652
653	If there no rows are returned (this can happen because no testgroups are
654	associated with this scheduling group, the scheduling group is disabled,
655	or because the queue is being regenerated), we will tell the testbox to
656	IDLE [done].
657
658	For each returned row we will:
659	a) Check testcase/group dependencies.
660	b) Select a build (and default testsuite) satisfying the dependencies.
661	c) Check the testcase requirements with that build in mind.
662	d) If idTestSetGangLeader is NULL, try allocate the necessary resources.
663	e) If it didn't check out, fetch the next row and redo from (a).
664	f) Tentatively create a new test set row.
665	g) If not gang scheduling:
666	- Next state: 'testing'
667	ElIf we're the last gang participant:
668	- Set idTestSetGangLeader to NULL.
669	- Set cMissingGangMembers to 0.
670	- Next state: 'gang-testing'
671	ElIf we're the first gang member:
672	- Set cMissingGangMembers to TestCaseArgs.cGangMembers - 1.
673	- Set idTestSetGangLeader to our idTestSet.
674	- Next state: 'gang-gathering'
675	Else:
676	- Decrement cMissingGangMembers.
677	- Next state: 'gang-gathering'
678
679	If we're not gang scheduling OR cMissingGangMembers is 0:
680	Move the scheduler queue entry to the end of the queue.
681
682	Update our TestBoxStatuses row with the new state and test set.
683	COMMIT;
684
685	5. If state is 'testing' or 'gang-testing':
686	EXEC reponse.
687
688	The EXEC response for a gang scheduled testcase includes a number of
689	extra arguments so that the script knows the position of the testbox
690	it is running on and of the other members. This means the that the
691	TestSet.iGangMemberNo is passed using --gang-member-no and the IP
692	addresses of the all gang members using --gang-ipv4-<memb-no> <ip>.
693	Else (state is 'gang-gathering'):
694	WAIT
695
696
697
698	#3 - Pending Command When Testbox Asks For Work
699	-----------------------------------------------
700
701	This is a subfunction of scenario #2 and #5.
702
703	As seen in scenario #2, the testbox will send 'abort' commands to /dev/null
704	when it finds one when not executing a test. This includes when it reports
705	that the test has completed (no need to abort a completed test, wasting lot
706	of effort when standing at the finish line).
707
708	The other commands, though, are passed back to the testbox. The testbox
709	script will respond with an ACK or NACK as it sees fit. If NACKed, the
710	pending command will be removed (pending_cmd set to none) and that's it.
711	If ACKed, the state of the testbox will change to that appropriate for the
712	command and the pending_cmd set to none. Should the testbox script fail to
713	respond, the command will be repeated the next time it asks for work.
714
715
716
717	#4 - Testbox Uploads Results During Test
718	----------------------------------------
719
720
721	TODO
722
723
724	#5 - Testbox Completes Test and Asks For Work
725	---------------------------------------------
726
727	This is very similar to scenario #2
728
729	TODO
730
731
732	#6 - Gang Gathering Timeout
733	---------------------------
734
735	This is a subfunction of scenario #2.
736
737	When gathering a gang of testboxes for a testcase, we do not want to wait
738	forever and have testboxes doing nothing for hours while waiting for partners.
739	So, the gathering has a reasonable timeout (imagine something like 20-30 mins).
740
741	Also, we need some way of dealing with 'abort' and 'reboot' commands being
742	issued while waiting. The easy way out is pretend it's a time out.
743
744	When changing the status to 'gang-timeout' we have to be careful. First of all,
745	we need to exclusively lock the SchedQueues and TestBoxStatuses (in that order)
746	and re-query our status. If it changed redo the checks in scenario #2 point 2.
747
748	If we still want to timeout/abort, change the state from 'gang-gathering' to
749	'gang-gathering-timedout' on all the gang members that has gathered so far.
750	Then reset the scheduling queue record and move it to the end of the queue.
751
752
753	When acting on 'gang-timeout' the TM will fail the testset in a manner similar
754	to scenario #9. No need to repeat that.
755
756
757
758	#7 - Gang Cleanup
759	-----------------
760
761	When a testbox completes a gang scheduled test, we will have to serialize
762	resource cleanup (both globally and on testboxes) as they stop. More details
763	can be found in the documentation of 'gang-cleanup'.
764
765	So, the transition from 'gang-testing' is always to 'gang-cleanup'. When we
766	can safely leave 'gang-cleanup' is decided by the query::
767
768	SELECT COUNT(*)
769	FROM TestBoxStatuses,
770	TestSets
771	WHERE TestSets.idTestSetGangLeader = :idTestSetGangLeader
772	AND TestSets.idTestBox = TestBoxStatuses.idTestBox
773	AND TestBoxStatuses.enmState = 'gang-running'::TestBoxState_T;
774
775	As long as there are testboxes still running, we stay in the 'gang-cleanup'
776	state. Once there are none, we continue closing the testset and such.
777
778
779
780	#8 - Testbox Reports A Crash During Test Execution
781	--------------------------------------------------
782
783	TODO
784
785
786	#9 - Cleaning Up Abandoned Testcase
787	-----------------------------------
788
789	This is a subfunction of scenario #1 and #2. The actions taken are the same in
790	both situations. The precondition for taking this path is that the row in the
791	testboxstatus table is referring to a testset (i.e. testset_id is not NULL).
792
793
794	Actions:
795
796	1. If the testset is incomplete, we need to completed:
797	a) Add a message to the root TestResults row, creating one if necessary,
798	that explains that the test was abandoned. This is done
799	by inserting/finding the string into/in TestResultStrTab and adding
800	a row to TestResultMsgs with idStrMsg set to that string id and
801	enmLevel set to 'failure'.
802	b) Mark the testset as failed.
803
804	2. Free any global resources referenced by the test set. This is done by
805	deleting all rows in GlobalResourceStatuses matching the testbox id.
806
807	3. Set the idTestSet to NULL in the TestBoxStatuses row.
808
809
810
811	#10 - Cleaning Up a Disabled/Dead TestBox
812	-----------------------------------------
813
814	The UI needs to be able to clean up the remains of a testbox which for some
815	reason is out of action. Normal cleaning up of abandoned testcases requires
816	that the testbox signs on or asks for work, but if the testbox is dead or
817	in some way indisposed, it won't be doing any of that. So, the testbox
818	sheriff needs to have a way of cleaning up after it.
819
820	It's basically a manual scenario #9 but with some safe guards, like checking
821	that the box hasn't been active for the last 1-2 mins (max idle/wait time * 2).
822
823
824	Note! When disabling a box that still executing the testbox script, this
825	cleanup isn't necessary as it will happen automatically. Also, it's
826	probably desirable that the testbox finishes what ever it is doing first
827	before going dormant.
828
829
830
831	Test Manager: Analysis
832	=======================
833
834	One of the testbox sheriff's tasks is to try figure out the reason why something
835	failed. The test manager will provide facilities for doing so from very early
836	in it's implementation.
837
838
839	We need to work out some useful status reports for the early implementation.
840	Later there will be more advanced analysis tools, where for instance we can
841	create graphs from selected test result values or test execution times.
842
843
844
845	Implementation Plan
846	===================
847
848	This has changed for various reasons. The current plan is to implement the
849	infrastructure (TM & testbox script) first and do a small deployment with the
850	2-5 test drivers in the Testsuite as basis. Once the bugs are worked out, we
851	will convert the rest of the tests and start adding new ones.
852
853	We just need to finally get this done, no point in doing it piecemeal by now!
854
855
856	Test Manager Implementation Sub-Tasks
857	-------------------------------------
858
859	The implementation of the test manager and adjusting/completing of the testbox
860	script and the test drivers are tasks which can be done by more than one
861	person. Splitting up the TM implementation into smaller tasks should allow
862	parallel development of different tasks and get us working code sooner.
863
864
865	Milestone #1
866	------------
867
868	The goal is to getting the fundamental testmanager engine implemented, debugged
869	and working. With the exception of testboxes, the configuration will be done
870	via SQL inserts.
871
872	Tasks in somewhat prioritized order:
873
874	- Kick off test manager. It will live in testmanager/. Salvage as much as
875	possible from att/testserv. Create basic source and file layout.
876
877	- Adjust the testbox script, part one. There currently is a testbox script
878	in att/testbox, this shall be moved up into testboxscript/. The script
879	needs to be adjusted according to the specification layed down earlier
880	in this document. Installers or installation scripts for all relevant
881	host OSes are required. Left for part two is result reporting beyond the
882	primary log. This task must be 100% feature complete, on all host OSes,
883	there is no room for FIXME, XXX or @todo here.
884
885	- Implement the schedule queue generator.
886
887	- Implement the testbox dispatcher in TM. Support all the testbox script
888	responses implemented above, including upgrading the testbox script.
889
890	- Implement simple testbox management page.
891
892	- Implement some basic activity and result reports so that we can see
893	what's going on.
894
895	- Create a testmanager / testbox test setup. This lives in selftest/.
896
897	1. Set up something that runs, no fiddly bits. Debug till it works.
898	2. Create a setup that tests testgroup dependencies, i.e. real tests
899	depending on smoke tests.
900	3. Create a setup that exercises testcase dependency.
901	4. Create a setup that exercises global resource allocation.
902	5. Create a setup that exercises gang scheduling.
903
904	- Check that all features work.
905
906
907	Milestone #2
908	------------
909
910	The goal is getting to VBox testing.
911
912	Tasks in somewhat prioritized order:
913
914	- Implement full result reporting in the testbox script and testbox driver.
915	A testbox script specific reporter needs to be implemented for the
916	testdriver framework. The testbox script needs to forward the results to
917	the test manager, or alternatively the testdriver report can talk
918	directly to the TM.
919
920	- Implement the test manager side of the test result reporting.
921
922	- Extend the selftest with some setup that report all kinds of test
923	results.
924
925	- Implement script/whatever feeding builds to the test manager from the
926	tinderboxes.
927
928	- The toplevel test driver is a VBox thing that must be derived from the
929	base TestDriver class or maybe the VBox one. It should move from
930	toptestdriver to testdriver and be renamed to vboxtltd or smth.
931
932	- Create a vbox testdriver that boots the t-xppro VM once and that's it.
933
934	- Create a selftest setup which tests booting t-xppro taking builds from
935	the tinderbox.
936
937
938	Milestone #3
939	------------
940
941	The goal for this milestone is configuration and converting current testcases,
942	the result will be the a minimal test deployment (4-5 new testboxes).
943
944	Tasks in somewhat prioritized order:
945
946	- Implement testcase configuration.
947
948	- Implement testgroup configuration.
949
950	- Implement build source configuration.
951
952	- Implement scheduling group configuration.
953
954	- Implement global resource configuration.
955
956	- Re-visit the testbox configuration.
957
958	- Black listing of builds.
959
960	- Implement simple failure analysis and reporting.
961
962	- Implement the initial smoke tests modelled on the current smoke tests.
963
964	- Implement installation tests for Windows guests.
965
966	- Implement installation tests for Linux guests.
967
968	- Implement installation tests for Solaris guest.
969
970	- Implement installation tests for OS/2 guest.
971
972	- Set up a small test deployment.
973
974
975	Further work
976	------------
977
978	After milestone #3 has been reached and issues found by the other team members
979	have been addressed, we will probably go for full deployment.
980
981	Beyond this point we will need to improve reporting and analysis. There may be
982	configuration aspects needing reporting as well.
983
984	Once deployed, a golden rule will be that all new features shall have test
985	coverage. Preferably, implemented by someone else and prior to the feature
986	implementation.
987
988
989
990
991	Discussion Logs
992	===============
993
994	2009-07-21,22,23 Various Discussions with Michal and/or Klaus
995	-------------------------------------------------------------
996
997	- Scheduling of tests requiring more than one testbox.
998	- Scheduling of tests that cannot be executing concurrently on several machines
999	because of some global resource like an iSCSI target.
1000	- Manually create the test config permutations instead of having the test
1001	manager create all possible ones and wasting time.
1002	- Distinguish between built types so we can run smoke tests on strick builds as
1003	well as release ones.
1004
1005
1006	2009-07-20 Brief Discussion with Michal
1007	----------------------------------------
1008
1009	- Installer for the testbox script to make bringing up a new testbox even
1010	smoother.
1011
1012
1013	2009-07-16 Raw Input
1014	--------------------
1015
1016	- test set. recursive collection of:
1017	- hierachical subtest name (slash sep)
1018	- test parameters / config
1019	- bool fail/succ
1020	- attributes (typed?)
1021	- test time
1022	- e.g. throughput
1023	- subresults
1024	- log
1025	- screenshots,....
1026
1027	- client package (zip) dl from server (maybe client caching)
1028
1029
1030	- thoughts on bits to do at once.
1031	- We really need the basic bits ASAP.
1032	- client -> support for test driver
1033	- server -> controls configs
1034	- cleanup on both sides
1035
1036
1037	2009-07-15 Raw Input
1038	--------------------
1039
1040	- testing should start automatically
1041	- switching to branch too tedious
1042	- useful to be able to partition testboxes (run specific builds on some boxes, let an engineer have a few boxes for a while).
1043	- test specification needs to be more flexible (select tests, disable test, test scheduling (run certain tests nightly), ... )
1044	- testcase dependencies (blacklisting builds, run smoketests on box A before long tests on box B, ...)
1045	- more testing flexibility, more test than just install/moke. For instance unit tests, benchmarks, ...
1046	- presentation/analysis: graphs!, categorize bugs, columns reorganizing grouped by test (hierarchical), overviews, result for last day.
1047	- testcase specificion, variables (e.g. I/O-APIC, SMP, HWVIRT, SATA...) as sub-tests
1048	- interation with ILOM/...: reset systems
1049	- Changes needs LDAP authentication
1050	- historize all configuration w/ name
1051	- ability to run testcase locally (provided the VDI/ISO/whatever extra requirements can be met).
1052
1053
1054	-----
1055
1056	.. [1] no such footnote
1057
1058	-----
1059
1060	:Status: $Id: AutomaticTestingRevamp.txt 98107 2023-01-17 22:56:50Z vboxsync $
1061	:Copyright: Copyright (C) 2010-2023 Oracle Corporation.

Note: See TracBrowser for help on using the repository browser.

source: vbox/trunk/src/VBox/ValidationKit/docs/AutomaticTestingRevamp.txt@ 98107

Download in other formats: