Official eMule-Board: Supporting Sparse Files - Official eMule-Board

Jump to content


Page 1 of 1

Supporting Sparse Files some code to make new files sparse

#1 User is offline   monkeydonkey 

  • Newbie
  • Pip
  • Group: Members
  • Posts: 4
  • Joined: 17-February 03

Posted 17 February 2003 - 11:54 PM

I've seen that this has been discussed before, but I still don't get it why there is no support for sparse files on NTFS.
I've downloaded the 26d code, and with minor modifications I have sparse file support; so, if I get a chunk which is located at the end of the file, then no space is lost due to premature allocation of previous segments in the file.


Here is what I've added to CPartFile::CreatePartFile() -

	if (!m_hpartfile.Open(partfull,CFile::modeCreate|CFile::modeReadWrite|CFile::shareDenyWrite|CFile::osSequentialScan)){
	//if (!m_hpartfile.Open(partfull,CFile::modeCreate|CFile::modeReadWrite|CFile::shareDenyWrite|CFile::osSequentialScan|CFile::osWriteThrough)){
  theApp.emuledlg->AddLogLine(false,GetResString(IDS_ERR_CREATEPARTFILE));
  status = PS_ERROR;
	}
	else /**** THIS IS THE ADDITION !!!!!! ******/
	{
  DWORD  d;
  theApp.emuledlg->AddLogLine(false, CString("Setting File as Sparse"));
  if (DeviceIoControl(m_hpartfile.m_hFile, FSCTL_SET_SPARSE, 0, NULL, 0, NULL, &d, NULL))
  {
  	theApp.emuledlg->AddLogLine(false, CString("Sparse Succssessfull"));
  	
  	// should we extends the file, so it's final size is refelected in the file system?
  	/*
  	ULONGLONG ull;

  	ull = m_nFileSize;
  	m_hpartfile.SetLength(ull);
  	m_hpartfile.SeekToBegin();
  	*/
  }
  else theApp.emuledlg->AddLogLine(false, CString("Sparse Unsuccsessful"));
	}


There is another modifications needs to be done, and that is to stdafx.h - change the windows versions to 0x0500 (I've modified all of them, but perhaps _WIN32_WINNT is sufficient).
0

#2 User is offline   LoneStar 

  • Golden eMule
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 1,005
  • Joined: 27-September 02

Posted 18 February 2003 - 12:04 AM

Quote

I still don't get it why there is no support for sparse files on NTFS.


Well, for one, it breaks 9x compatibility a bit. The more appropriate solution would be to check the OS version first and then issue the DeviceIOControl if applicable. Although it *should* just fail under 9x, knowing MS it might crash the whole OS too.

For two, why? Although at first glance it seems like a really cool thing to do, all that the result will be is that your downloads will eventually take all the space up on the drive without realizing why. There's the same argument against using NTFS compression on your temp files - you might be able to start many more downloads, but there's no way you can complete all of them at the same time, and you'll have to manage your space yourself much more efficiently; plus, as the files DO need to expand, they'll get fragmented all to hell. And yes, I understand that you could have temp and incoming on two separate volumes, keep temp sparse or compressed and have eMule unpause a file as a file completes, so any that did get paused will eventually continue and finish. But trust me, it's just not a good idea unless you know what you're doing, and let's face it, probably 95% of the users out there DON'T.

Just my thoughts. But hey, Kudos on the addition for those that'll want to use it :)
-D

PS Hmm, and one more thing... Just setting the file as sparse won't do the whole thing. You need to issue sparse commands when the file gets extended, otherwise it'll expand like normal. So you either need to set as sparse and extend to end right away (the commented out part), or do it on every expansion. Also, I don't remember how good NT is about de-sparsing; as in, you've said 0-30000 should be sparse; now you write 500 bytes at pos 29000 - does it then slice the regions into sparse 0-29000, data 29000-29499, sparse 29500-30000? Or does it de-sparse the whole block, resulting in 0-30000 written out physically...

This post has been edited by LoneStar: 18 February 2003 - 12:10 AM

0

#3 User is offline   Jordy 

  • Member
  • PipPip
  • Group: Members
  • Posts: 32
  • Joined: 05-January 03

Posted 18 February 2003 - 12:17 AM

I've found that eMule seriously fragments files across my hard drive as it is. I'd actually prefer is eMule preallocated the whole thing so it wouldn't jump around my drive so much whenever it extends the file.
0

#4 User is offline   monkeydonkey 

  • Newbie
  • Pip
  • Group: Members
  • Posts: 4
  • Joined: 17-February 03

Posted 18 February 2003 - 12:32 AM

Quote

PS Hmm, and one more thing... Just setting the file as sparse won't do the whole thing. You need to issue sparse commands when the file gets extended, otherwise it'll expand like normal. So you either need to set as sparse and extend to end right away (the commented out part), or do it on every expansion. Also, I don't remember how good NT is about de-sparsing; as in, you've said 0-30000 should be sparse; now you write 500 bytes at pos 29000 - does it then slice the regions into sparse 0-29000, data 29000-29499, sparse 29500-30000? Or does it de-sparse the whole block, resulting in 0-30000 written out physically...


Because the file is marked as sparse when it is created (file size 0), any write that causes the file size to expand will not have the empty areas allocated. The section of code that I have commented will make the file have it's final size, but it will be completely empty (only one cluster).

In regards to fragmentation - I can see your line of thinking, but has it been proved/checked that NTFS's allocation is so flawed that it causes really nasty fragmentation?

As far as final size concerned, well, some files are huge and slow to download - so I don't see any reason for having a 1GB file occupying 1GB only because the last chunk was the first to be downloaded.
I think sparse files should be supported; I'd suggest to have it off by default, so people who don't know what they're doing do not encounter free space issues.
0

#5 User is offline   LoneStar 

  • Golden eMule
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 1,005
  • Joined: 27-September 02

Posted 18 February 2003 - 12:54 AM

Quote

any write that causes the file size to expand will not have the empty areas allocated

Ok, that answers that question. I only thought of it because of needing FSCTL_SET_ZERO_DATA to allocate sparse chunks; but I guess as long as it's a seek-ed expansion and not a zero-write expansion, it should be ok.

Quote

but has it been proved/checked that NTFS's allocation is so flawed that it causes really nasty fragmentation?

I can't speak for sparse files, but since the sparse mechanism is similarly based to the compression mechanism, I can say yes, it will be horrendous, depending on your number of concurrent downloads. As new data gets downloaded and written, it will get dynamically allocated, and wind up all over the place. I may add in the sparse code to my mod to run a test, but I know with compression on, and a nice clean disk to start with, I now have had files with > 30,000 fragments! My average right now is about 15,000 frags/file (mostly ~700MB files)

I'm not really against having the option in there - in fact I'm one of the people who WOULD use it :) I just have to be the devil's advocate all the time and bring the other side to the table.

I do use NTFS compression so I can keep many downloads going on my small drive (only 4GB of eMule space to work with). That way the 700MB file with the block at the end DOES only take up a little space. But I also know how to manage my downloads properly, and when to shift files over network shares, etc. The worry is the people who will see this, say "Cool", and then expect eMule to manage everything automatically. So I do agree that it should be off by default - too many people with potential to really screw things up (my biggest worry is the ones who will run out of diskspace 'cuz they had too many concurrent, unfinished downloads, and then get their .met files killed, and lose all the downloads.)
-D
0

#6 User is offline   monkeydonkey 

  • Newbie
  • Pip
  • Group: Members
  • Posts: 4
  • Joined: 17-February 03

Posted 18 February 2003 - 01:09 AM

Quote

Ok, that answers that question. I only thought of it because of needing FSCTL_SET_ZERO_DATA to allocate sparse chunks; but I guess as long as it's a seek-ed expansion and not a zero-write expansion, it should be ok.


The only reason to explicitly use FSCTL_SET_ZERO_DATA would be to "sparsify" existing part files (although I think this should be done by a utility and rather have it in the client's code), and perhaps to free corrupted chunks (this is just a thought, since I don't know the details of how it is all done).


Quote

I now have had files with > 30,000 fragments! My average right now is about 15,000 frags/file mostly ~700MB files)


I still don't think compression is exactly the same situation.
Can you tell me how you check the fragmentation of an individual file?
I want to get some numbers by running my modified emule, and grabing some large files.
0

#7 User is offline   LoneStar 

  • Golden eMule
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 1,005
  • Joined: 27-September 02

Posted 18 February 2003 - 01:29 AM

I don't think there's any existing code to "release" a corrupted chunk, just to mark it as a gap in the, um, gap list for the file (lack of a better term). So you could use FSCTL_SET_ZERO_DATA to have NT deallocate that range of the file, but it might not be worth it, because you'd be giving up the capability to perform ICH; so you get some space back, but definitely have to download the full chunk again.

For fragmentation analysis (and defragmenting single files) grab contig.exe from Sysinternals.com - http://www.sysintern...re/contig.shtml

I look forward to seeing your results :)
-D
0

#8 User is offline   stobo 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 96
  • Joined: 26-November 02

Posted 19 February 2003 - 04:03 AM

and run into the scenario that none of the files can be completed because the disk is _totally_ full?

i like the idea better that files are completely allocated as soon as they start downloading, and paused BEFORE attempting if there is no disk space.

nothing can help/hurt NTFS fragmentation, i think if there is any logic in where blocks are written, it is "where the drive head happens to be right now"

anyone know a decent (not contig *.*) background defragmenter that doesn't crash all the time? thought so.
0

#9 User is offline   Avi 

  • Golden eMule
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 1,460
  • Joined: 11-September 02

Posted 19 February 2003 - 05:28 AM

stobo, on Feb 18 2003, 11:03 PM, said:

i like the idea better that files are completely allocated as soon as they start downloading

I would personally like to see an option to allocate the files completly even when the added to the list.. and if you can (i wish.) in a different thread.. ;)
0

#10 User is offline   .:fl0yd:. 

  • Magnificent Member
  • PipPipPipPipPipPip
  • Group: Members
  • Posts: 355
  • Joined: 17-February 03

Posted 19 February 2003 - 06:24 AM

Since I'm not really the Byte-billionaire I'm all for sparse files. As opinions appear to differ on this subject, the call for two or three additional checkboxes in the preferences dialog arises and everyone will be happy. Too bad I cannot patch the client myself as I only have VS6. Thanks for bringing attention to this issue, monkeydonkey.
0

#11 User is offline   stobo 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 96
  • Joined: 26-November 02

Posted 19 February 2003 - 09:53 AM

Quote

I would personally like to see an option to allocate the files completly even when the added to the list.. and if you can (i wish.) in a different thread.. 


yep. definitely. i'm "really" a java programmer and therefore think that EVERYTHING should be a different thread :D

not a reality on c++, though :(...
0

#12 User is offline   shadow# 

  • Splendid Member
  • PipPipPipPip
  • Group: Members
  • Posts: 218
  • Joined: 23-October 02

Posted 19 February 2003 - 05:45 PM

perhaps you forgot something:
PartFile.cpp(206): error C2065: 'FSCTL_SET_SPARSE' : undeclared identifier
0

#13 User is offline   LoneStar 

  • Golden eMule
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 1,005
  • Joined: 27-September 02

Posted 19 February 2003 - 05:49 PM

Quote

There is another modifications needs to be done, and that is to stdafx.h - change the windows versions to 0x0500 (I've modified all of them, but perhaps _WIN32_WINNT is sufficient).


Don't forget to do that, otherwise the compile won't include all the constants you need (specifically 'FSCTL_SET_SPARSE', 'cuz it's only a 2k+ function, so it's not included by default in the headers)

If you still have trouble, throw in an #include <winioctl.h> somewhere (either in stdafx or in partfile.?, wherever you want really)
-D

This post has been edited by LoneStar: 19 February 2003 - 06:53 PM

0

#14 User is offline   monkeydonkey 

  • Newbie
  • Pip
  • Group: Members
  • Posts: 4
  • Joined: 17-February 03

Posted 20 February 2003 - 02:26 AM

Ok, here are the statistics.... and they don't look good:

626 MB ==> 6260 frags
652 MB ==> 7490 frags
1115 MB ==> 13040 frags
1418 MB ==> 14770 frags


This fragmentation is by far higher than any other file I have on my drive.
For example a ~600MB file would be <500 frags.
I will get some files in the 100, and 200 MB range and see how frameneted the turn out to be.
Still, I think I've proved my point of having huge empty part files lying around - it took me several days to download all that. If I had a scheme to move files to another location once completed (as you suggested), then sparse files would allow me to download more than I could actually store on the temp drive.

Nevertheless, even with these numbers, I still want to see support for sparse files; make it disabled by default, and put a big warning about fragmenetaion.
The amount of code that needs to be added in order to support it is ridicilously minimal, so might as well give the users this option.
I also agree with stobo about having an option to preallocate the space; the amount of code to implement that is in the same range of the sparse file option (excluding GUI code).

Just a though: perhaps a mixture of partial preallocation and sparse support could help reduce fragmenetation, or do you think that this would lead to micromanagement that isn't worth it?
0

#15 User is offline   LoneStar 

  • Golden eMule
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 1,005
  • Joined: 27-September 02

Posted 20 February 2003 - 04:12 AM

Well, it would help to reduce the fragmentation, but the more preallocation (non-sparse) there is, the less the benefit of the sparse file to begin with :( It might be possible to find some compromise ratio, but I don't know how easy that would be (probably alot of trial and error), and the optimal value might be different depending on the size of the file. So, eh :) Just remember to defrag from time to time. And really, although the fragmentation slows things down, in the grand scheme of things, it's not THAT much slower (and drive damage? I don't really believe in that as being a factor)

And hey, sorry about being negative before. One of the biggest things I forgot to think of was that sure, you could just use NTFS compression and get nearly the same effect without modding the program, but one slower comps like mine, you *do* notice the extra overhead from time to time. One thing that might be smart to add would be OS checks. Like I said above, it's probably not a big deal and the call should just fail on 9x (or NT with FAT), but it's pretty easy to add the "Does the volume support sparse files" query, so might be appropriate simply to avoid the error condition. But whatever, it's good the way it is! Again, nice job!
-D
0

#16 User is offline   xrmb 

  • Magnificent Member
  • PipPipPipPipPipPip
  • Group: Members
  • Posts: 442
  • Joined: 29-September 02

Posted 24 February 2003 - 11:48 PM

one more cent from me, i would like to see sparse files, because my temp directory is on an old slow drive, because its old its all the time full. But with sparse files I could download more because the missing parts wont use any byte, and there is a slightly chance that a file completes and gets moved to /dev/null.

Oh, one more cent, the ntfs compression sucks, because if emule moves the file to the incoming dir the compressed attribute is kept, that doesnt make sense for 99% of our downloads, if you remove the attribute it takes time, even more free space and more fragmentation...


just imagine 64gb large files using only 110byte real space :)
0

#17 User is offline   LoneStar 

  • Golden eMule
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 1,005
  • Joined: 27-September 02

Posted 25 February 2003 - 12:08 AM

xrmb, on Feb 24 2003, 11:48 PM, said:

just imagine 64gb large files using only 110byte real space :)

But eDonkey/mule can't handle 64GB files :P

I hear ya though. I always had just used compression, but now have thrown this patch into the version I run. As you can see I was initially of the "What's the point" frame of mind, but it really does make sense to use instead of compression; I'm in the midst of making a convertor like compact to sparsify existing part files.

Incidentally, about how I said we should probably check for sparse support before trying to set it: the easiest way I could find is GetVolumeInformation; which unfortunately needs a volume name, but you don't need to parse it from the filepath yourself; complete the filename path with GetFullPathName and then pull the mounted volume with GetVolumePathName. I don't know which of these are NT-only though.
-D

This post has been edited by LoneStar: 25 February 2003 - 12:12 AM

0

#18 User is offline   SlugFiller 

  • The one and only master slug
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 6,988
  • Joined: 15-September 02

Posted 25 February 2003 - 11:38 AM

Aren't you afraid that all this "maybe I'll have more room" state of mind would cause you to end up with not enough room to finish any of your files? Even though my CheckDiskspace patch is compatible with sparse-files and compressed-files, you have to consider that in the official release eMule wouldn't give a second thought about filling your harddrive until another chunk can't be downloaded, but none of your files are complete. You need to consider that you do want your files complete at one point.
Why haven't you clicked yet?

SlugFiller rule #1: Unsolicited PMs is the second most efficient method to piss me off.
SlugFiller rule #2: The first most efficient method is unsolicited eMails.
SlugFiller rule #3: If it started in a thread, it should end in the same thread.
SlugFiller rule #4: There is absolutely no reason to perform the same discussion twice in parallel, especially if one side is done via PM.
SlugFiller rule #5: Does it say "Group: Moderators" under my name? No? Then stop telling me about who you want to ban! I really don't care! Go bother a moderator.
SlugFiller rule #6: I can understand English, Hebrew, and a bit of Japanese(standard) and Chinese(mandarin), but if you speak to me in anything but English, do expect to be utterly ignored, at best.
0

#19 User is offline   Dummy 

  • Splendid Member
  • PipPipPipPip
  • Group: Members
  • Posts: 116
  • Joined: 13-September 02

Posted 25 February 2003 - 12:22 PM

I support for sparse files, as I usually remove files after finish download 1 weeks, with sparse files, I could use it instead of NTFS compression, which I can be use, as I need to encrypt the whole partition for some reason ( encryption and compression in NTFS can't be turn on at the same time)
0

  • Member Options

Page 1 of 1

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users