Opened 7 years ago
Closed 6 years ago
#17716 closed defect (invalid)
VBox stuck in write loop with 5.2.10, corruption -> Arch Linux issue
Reported by: | xnoreq | Owned by: | |
---|---|---|---|
Component: | shared folders | Version: | VirtualBox 5.2.10 |
Keywords: | Cc: | ||
Guest type: | Linux | Host type: | Windows |
Description
Host: Windows 10, VBox Version 5.2.10 r122406 (Qt5.6.2)
Guest: Arch Linux 4.16.4-1-ARCH Installed package: virtualbox-guest-modules-arch 5.2.10-3 https://www.archlinux.org/packages/community/x86_64/virtualbox-guest-modules-arch/
In the guest I have a shared folder, fstab: shared /mnt/shared vboxsf uid=1000,gid=1000,rw,dmode=700,fmode=600,comment=systemd.automount
Now if I run qbittorrent (also installed from Arch repos) in the guest and e.g. download Arch (see magnet link at https://www.archlinux.org/download/) then as soon as it starts to download the iso:
1) VirtualBox.exe CPU usage on the host increases dramatically 2) VirtualBox.exe shows heavy I/O write (20x - 30x than the download rate within the guest) 3) qbittorrent in the guest shows high CPU usage
Analyzing the writes on the host:
20:55:36,3166640 VirtualBox.exe 1840 CreateFile C:\shared\archlinux-2018.04.01-x86_64.iso SUCCESS Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Non-Directory File, Disallow Exclusive, Attributes: N, ShareMode: Read, Write, AllocationSize: n/a, OpenResult: Opened 20:55:36,3170140 VirtualBox.exe 1840 ReadFile C:\shared\archlinux-2018.04.01-x86_64.iso SUCCESS Offset: 4.096, Length: 4.096 20:55:36,3171295 VirtualBox.exe 1840 ReadFile C:\shared\archlinux-2018.04.01-x86_64.iso SUCCESS Offset: 8.192, Length: 4.096 20:55:36,3172539 VirtualBox.exe 1840 ReadFile C:\shared\archlinux-2018.04.01-x86_64.iso SUCCESS Offset: 12.288, Length: 4.096 ... the file is being allocated ... 20:55:42,1340471 VirtualBox.exe 1840 ReadFile C:\shared\archlinux-2018.04.01-x86_64.iso SUCCESS Offset: 284.164.096, Length: 4.096 20:55:42,1341867 VirtualBox.exe 1840 ReadFile C:\shared\archlinux-2018.04.01-x86_64.iso END OF FILE Offset: 284.168.192, Length: 4.096 20:55:42,1342749 VirtualBox.exe 1840 ReadFile C:\shared\archlinux-2018.04.01-x86_64.iso END OF FILE Offset: 284.688.384, Length: 4.096 20:55:42,1343604 VirtualBox.exe 1840 ReadFile C:\shared\archlinux-2018.04.01-x86_64.iso END OF FILE Offset: 285.212.672, Length: 4.096 20:55:42,1344587 VirtualBox.exe 1840 ReadFile C:\shared\archlinux-2018.04.01-x86_64.iso END OF FILE Offset: 285.736.960, Length: 4.096 20:55:42,1345399 VirtualBox.exe 1840 ReadFile C:\shared\archlinux-2018.04.01-x86_64.iso END OF FILE Offset: 286.261.248, Length: 4.096 20:55:42,1346192 VirtualBox.exe 1840 ReadFile C:\shared\archlinux-2018.04.01-x86_64.iso END OF FILE Offset: 286.785.536, Length: 4.096 20:55:42,1347007 VirtualBox.exe 1840 ReadFile C:\shared\archlinux-2018.04.01-x86_64.iso END OF FILE Offset: 287.309.824, Length: 4.096 20:55:42,1347766 VirtualBox.exe 1840 ReadFile C:\shared\archlinux-2018.04.01-x86_64.iso END OF FILE Offset: 287.834.112, Length: 4.096 20:55:45,6189281 VirtualBox.exe 1840 QueryOpen C:\shared\archlinux-2018.04.01-x86_64.iso SUCCESS CreationTime: 27.04.2018 20:52:44, LastAccessTime: 27.04.2018 20:52:44, LastWriteTime: 27.04.2018 20:54:20, ChangeTime: 27.04.2018 20:54:20, AllocationSize: 284.168.192, EndOfFile: 284.168.192, FileAttributes: A 20:55:45,6190843 VirtualBox.exe 1840 QueryOpen C:\shared\archlinux-2018.04.01-x86_64.iso SUCCESS CreationTime: 27.04.2018 20:52:44, LastAccessTime: 27.04.2018 20:52:44, LastWriteTime: 27.04.2018 20:54:20, ChangeTime: 27.04.2018 20:54:20, AllocationSize: 284.168.192, EndOfFile: 284.168.192, FileAttributes: A 20:55:45,6191496 VirtualBox.exe 1840 CreateFile C:\shared\archlinux-2018.04.01-x86_64.iso SUCCESS Desired Access: Generic Read/Write, Disposition: OpenIf, Options: Synchronous IO Non-Alert, Non-Directory File, Disallow Exclusive, Attributes: N, ShareMode: Read, Write, AllocationSize: 0, OpenResult: Opened 20:55:45,6192084 VirtualBox.exe 1840 QueryInformationVolume C:\shared\archlinux-2018.04.01-x86_64.iso BUFFER OVERFLOW VolumeCreationTime: xxx, VolumeSerialNumber: xxx, SupportsObjects: True, VolumeLabel: xxx 20:55:45,6192206 VirtualBox.exe 1840 QueryAllInformationFile C:\shared\archlinux-2018.04.01-x86_64.iso BUFFER OVERFLOW CreationTime: 27.04.2018 20:52:44, LastAccessTime: 27.04.2018 20:52:44, LastWriteTime: 27.04.2018 20:54:20, ChangeTime: 27.04.2018 20:54:20, FileAttributes: A, AllocationSize: 284.168.192, EndOfFile: 284.168.192, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x3400000001536f, EaSize: 0, Access: Generic Read/Write, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Word 20:55:45,6193266 VirtualBox.exe 1840 CloseFile C:\shared\archlinux-2018.04.01-x86_64.iso SUCCESS 20:55:45,6194764 VirtualBox.exe 1840 QueryOpen C:\shared\archlinux-2018.04.01-x86_64.iso SUCCESS CreationTime: 27.04.2018 20:52:44, LastAccessTime: 27.04.2018 20:52:44, LastWriteTime: 27.04.2018 20:54:20, ChangeTime: 27.04.2018 20:54:20, AllocationSize: 284.168.192, EndOfFile: 284.168.192, FileAttributes: A 20:55:45,6196035 VirtualBox.exe 1840 WriteFile C:\shared\archlinux-2018.04.01-x86_64.iso SUCCESS Offset: 1.572.864, Length: 4.096, Priority: Normal 20:55:45,6197488 VirtualBox.exe 1840 WriteFile C:\shared\archlinux-2018.04.01-x86_64.iso SUCCESS Offset: 1.572.864, Length: 4.096 20:55:45,6198623 VirtualBox.exe 1840 WriteFile C:\shared\archlinux-2018.04.01-x86_64.iso SUCCESS Offset: 1.572.864, Length: 4.096 20:55:45,6199472 VirtualBox.exe 1840 WriteFile C:\shared\archlinux-2018.04.01-x86_64.iso SUCCESS Offset: 1.572.864, Length: 4.096 20:55:45,6200234 VirtualBox.exe 1840 WriteFile C:\shared\archlinux-2018.04.01-x86_64.iso SUCCESS Offset: 1.572.864, Length: 4.096 ... this keeps on going and never stops ... 20:56:09,7814536 VirtualBox.exe 1840 WriteFile C:\shared\archlinux-2018.04.01-x86_64.iso SUCCESS Offset: 1.572.864, Length: 4.096
Naturally, the resulting file is corrupted.
Change History (21)
comment:1 by , 7 years ago
comment:2 by , 7 years ago
No, I really don't think that the load is too much. The guest application and host system are the *constants* here - it worked with every version of VirtualBox before in much heavier load scenarios without problems but with 5.2.10 it breaks even in the simplest case.
comment:3 by , 7 years ago
Does it work if you downgrade? Can you try different combinations?
- 5.2.8 VirtulBox with 5.2.10 GAs
- 5.2.8 VirtulBox with 5.2.8 GAs
- 5.2.10 VirtulBox with 5.2.8 GAs
comment:4 by , 7 years ago
As soon as I upgrade the guest modules to 5.2.10 it breaks, regardless of the host version (5.2.8, 5.2.10, I had even tried 5.2.11 builds).
As soon as I downgrade the guest to 5.2.8 it starts to work again as it did with the previous versions, that is every released versions since 5.1.34 I believe.
Maybe it is a change in the 5.2.10 vboxsf implementation that confuses the application. That would explain the high CPU usage of the guest process.
I will try to strace the guest process to see what the kernel (module) does differently in terms of I/O.
comment:5 by , 7 years ago
Excellent analysis! That will give the developers a much narrower focus on the issue. Kudos!
comment:7 by , 7 years ago
My first comment still stands; you shouldn't be using VirtualBox shared folders for this type of activity. There's no guarantee that it will work, or that it will keep on working.
Maybe they changed something that was breaking something more fundamental than your case, and it broke your case. As I said, no guarantees were ever made (or implied) that this would ever work. It's a small miracle that it did.
My advice? Switch to normal networked folders and be done with it. If it gets fixed in VirtualBox (who knows), great. If it doesn't, you still have your rock-solid solution and you don't care in any event...
comment:8 by , 7 years ago
you shouldn't be using VirtualBox shared folders for this type of activity
Having an application reading and writing to a shared folder?! You're essentially saying that shared folders should not be used to share data.
There's no guarantee that it will work, or that it will keep on working.
I don't understand this response at all. You're again essentially saying that users cannot expect features of VirtualBox, that are also present in other virtualization solutions, to work or not to break randomly.
That makes it a horrible product which forces users to switch to a different solution .. fine. It's just a shame because shared folders worked fine for years before 5.2.10.
My advice? Switch to normal networked folders
Been there, done that. The result is abysmal I/O performance and applications failing in very interesting ways. So not an option.
comment:9 by , 7 years ago
On the contrary, shared folders are meant as an easy way to share data. But that's it. A open database residing in a shared folder is not shared data. I said (and I'm sticking by my statement) that shared folders are good enough to copy data. Period. Anything more than that, that's asking for advanced filesystem features, might fail.
The features that are there, are expected to work. But work as expected. If for application XYZ it works, but that advanced feature gets changed in the future, a feature that was never explicitly promised to work, it might fail. A copy will always work.
If your network folders are failing when using your application, chances are that it's the application that's written without shared folders in mind. Expect more failures if using VirtualBox shared folders. If true network folders is not an option, you got to change your game plan.
comment:10 by , 7 years ago
If your network folders are failing when using your application, chances are that it's the application that's written without shared folders in mind.
There is no such thing as "shared folders" in Linux. The shared folder feature of VirtualBox is implemented as a VFS which is transparent to the applications. That's the whole point of VFSs... and properly implemented, it supports a wide variety of different features which may or may not be supported by any filesystem.
So it's broken. I don't see how this attempt at making excuses for this breakage helps. It just adds more noise to the ticket.
comment:11 by , 7 years ago
and properly implemented
That's the part you don't want to understand. The implementation part doesn't fit your needs, therefore it's a bug? Not really. If FAT32 doesn't fit your needs to transfer multi-GB files with your stick, it's not a bug. That's how it's supposed to work.
it supports a wide variety of different features which may or may not be supported by any filesystem.
Not the VirtualBox ones. They support a very specific set of features. Copying a file. End of story.
I don't see how this attempt at making excuses for this breakage helps
You do realize that if it something states "Not to be used for any purpose other than ..." then it's not broken, but that's the specification is was built with, right? Huge difference.
It just adds more noise to the ticket.
Not really. It merely tries to make you understand why this ticket shouldn't exist in the first place, why this ticket should be closed as "Invalid".
In all fairness, it could very well turn out to be a bug. What I'm simply trying to get to people that use shared folders with weird, corner cases is "Don't use shared folders with weird, corner cases".
comment:12 by , 7 years ago
That's the part you don't want to understand.
That shared folders after 5.2.10 isn't properly implemented? That's the reason for the ticket, duh.
The implementation part doesn't fit your needs, therefore it's a bug?
You don't know what you're talking about. If a filesystem doesn't support a feature then it is typically not a problem for applications because then the filesystem doesn't advertise the feature.
Not the VirtualBox ones. They support a very specific set of features. Copying a file. End of story.
Again, you don't know what you're talking about. VFS has no concept of "copying a file". The only relevant operations here are basic file operations, such as open(2), read(2) or write(2) ...
that's the specification is was built with, right?
What specification?! You again are just making stuff up you have non clue about.
It merely tries to make you understand why this ticket shouldn't exist in the first place, why this ticket should be closed as "Invalid".
If anything shouldn't exist, then it is the noise you've added. Please stop it.
In all fairness, it could very well turn out to be a bug.
... but let's rather spam a ticket with excuses and noise and derail it instead of getting it fixed?!
This is outrageous behavior. Are you working for Oracle?
I'm simply trying to get to people that use shared folders with weird, corner cases
What file operations that I'm using are weird corner cases?
Please answer me this.
comment:13 by , 7 years ago
I'll skip the "duh" and the rest of the tone/insults/misunderstandings, until you've graduated from high school, it's not fair.
But, I'll ask you this: Do you have a problem copying files from/to a shared folder? Then it would be a problem. For anything else you'll need to adjust your expectation-meter.
I tried to explain a couple of things, you seem to be not wanting to hear. I'm outta here... Buona fortuna!
PS. A couple of questions unrelated to the ticket:
- No, I'm not Oracle, why you have a support contract? Or am I only allowed to speak if I'm Oracle? You have heard of the concept of "open source" and "user supported", right?
- You do realize that everybody can comment on a ticket right? This isn't exactly personalized support, you haven't paid for that, you can't afford that. If you could you wouldn't be here...
comment:14 by , 7 years ago
I hope you finally understood that you really shouldn't be commenting on tickets if you have no clue what you're talking about.
Evading my questions seals the deal. Thank you & goodbye.
comment:15 by , 7 years ago
Now, after all this noise, back to the bug:
Pre 5.2.10:
openat(AT_FDCWD, "/mnt/shared/archlinux-2018.05.01-x86_64.iso", O_RDWR|O_CREAT|O_NOATIME, 0666) = 59 pwritev(59, [{iov_base="\3148"..., iov_len=16384}], 32, 412090368) = 524288 <2.838525> pwritev(59, [{iov_base="i\331"..., iov_len=16384}], 16, 487063552) = 262144 <0.422510>
These are the first two writes, and I have confirmed that the bytes written at those offsets match the bytes that the host wrote to the file at those offset.
The operations take some time, but they finish.
Contrast this with 5.2.10 and 5.2.12:
openat(AT_FDCWD, "/mnt/shared/archlinux-2018.05.01-x86_64.iso", O_RDWR|O_CREAT|O_NOATIME, 0666) = 67 pwritev(67, [{iov_base="a\236"..., iov_len=16384}], 32, 331874304 <unfinished ...> pwritev(67, [{iov_base="\377\331"..., iov_len=16384}], 28, 250609664 <unfinished ...> pwritev(67, [{iov_base="qH"..., iov_len=16384}], 18, 455081984 <unfinished ...>
Even the first pwritev doesn't finish.
At the beginning of the first write offset (331874304) the data in the file matches the pwritev data.
After 4096 bytes the file ends however (should be 16k) and the bytes stop matching at some point within those 4k.
So it seems that the host gets stuck in what appears to be a loop writing to the file somewhere around this offset.
This is a very serious bug causing data loss and/or corruption.
comment:16 by , 7 years ago
Test program to reproduce the issue:
#include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/uio.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #define BUFLEN 16384 #define NUMBUFS 32 #define OFFSET 10000 int main(int argc, char** argv) { struct iovec iov[NUMBUFS]; for (int i = 0; i < NUMBUFS; i++) { uint8_t* buf = (uint8_t*)malloc(BUFLEN); iov[i].iov_base = buf; iov[i].iov_len = BUFLEN; for (int j = 0; j < BUFLEN; j++) { buf[j] = (uint8_t)(i%26 + 'A'); } } int fd = open("/mnt/shared/test.txt", O_RDWR|O_CREAT, S_IRUSR|S_IWUSR); int count = pwritev(fd, iov, NUMBUFS, OFFSET); printf("wrote: %d\n", count); return 0; }
fstab entry:
shared /mnt/shared vboxsf uid=1000,gid=1000,rw,dmode=700,fmode=600 0 0
Same bug occurs with updated package versions: linux 4.16.8-1, virtualbox-guest-modules-arch 5.2.12-1
comment:17 by , 6 years ago
Four months and two kernel versions later, the issue still hasn't been fixed.
Archlinux bug: https://bugs.archlinux.org/task/58583
Redhat bug: https://bugzilla.redhat.com/show_bug.cgi?id=1481630#c80
comment:19 by , 6 years ago
This hang/loop/corruption issue with shared folders still exists in linux 4.20.3 with virtualbox 6.0.2.
comment:20 by , 6 years ago
Ok, so I've finally gotten around to debugging this, sorry for taking so long and thank you for the reproducer.
This only happens when using my cleaned-up standalone version of vboxsf, which is intended for merging upstream from: https://github.com/jwrdegoede/vboxsf/
So it seems that you are seeing this problem because the virtualbox-guest-modules-arch is using my version of vboxsf starting with the troublesome version. Therefor I believe that this ticket can be closed as this is not an upstream virtualbox bug.
During the refactoring / cleanup of the code to prepare it for merging into the mainline kernel I messed up the sf_write_end function return's value. Unlike the other mmap handling functions it is supposed to return the number of bytes written on success or 0 on error, instead of 0 on success and negative errno on error (which is documented nowhere). With that fixed your reproducer works as expected.
This is fixed in my vboxsf version with this commit: https://github.com/jwrdegoede/vboxsf/commit/6738af37c935f3d9b0db138678c2cd3d8bc1fc99
comment:21 by , 6 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Summary: | VBox stuck in write loop with 5.2.10, corruption → VBox stuck in write loop with 5.2.10, corruption -> Arch Linux issue |
Just FYI, VirtualBox's shared folders present a very simplified file system implementation, just enough to read/write files from/to the guest. Many applications can error when using shared folders, because they expect advanced features, like file locking or access controls, which don't exist for shared folders.
Like a torrent client that needs to keep the file(s) open at all times and update specific chunks. Maybe the load is too much and it can't take the I/O.
I would use a a true network share (SaMBa, NFS). Shared folders were never designed to be anything more that a simple copy mechanism, AFAIK...