Official eMule-Board: Stop Rehashing Of Files With Foreign-language Name - Official eMule-Board

Jump to content


Page 1 of 1

Stop Rehashing Of Files With Foreign-language Name

#1 User is offline   Borschtsch 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 77
  • Joined: 25-November 04

Posted 15 August 2008 - 09:16 AM

The patch dedicated to this problem.

this part is not required, but removes redundant code
If you want to repeat the same for friend.cpp, partfile.cpp, serverlist.cpp
then WriteOptED2KUTF8Tag() can be safely removed from the source code

KnownFile.cpp
bool CKnownFile::WriteToFile(CFileDataIO* file)
{
...
...
	/* Borschtsch - we always save file name using UTF8
	 * Helps to eliminate re-hashing of files with
	 * the "wrong" symbols in their names
	if (WriteOptED2KUTF8Tag(file, GetFileName(), FT_FILENAME))
		uTagCount++;
	CTag nametag(FT_FILENAME, GetFileName());
	nametag.WriteTagToFile(file);
	*/
	CTag nametag(FT_FILENAME, GetFileName());
	nametag.WriteTagToFile(file, utf8strOptBOM);
...
}


this part is required.
We get rid of wrong condition if (*pwsz >= 0x100U) by saving all text tags as Unicode in UTF-8. Also saves some CPU cycles.

StringConversion.h
/* Borschtsch
__inline bool NeedUTF8String(LPCWSTR pwsz)
*/
__inline bool NeedUTF8String(LPCWSTR pwsz)
{
	return true;
	/* Borschtsch - we always use Unicode
	while (*pwsz != L'\0')
	{
		if (*pwsz >= 0x100U)
			return true;
		pwsz++;
	}
	return false;
	*/
}

This post has been edited by Borschtsch: 22 August 2008 - 07:56 AM

0

#2 User is offline   Stulle 

  • [Enter Mod] Dev
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 5804
  • Joined: 07-April 04

Posted 15 August 2008 - 10:12 PM

is that working for all supported FS and OS? like Win95 and FAT32?
I am an emule-web.de member and fan!

[Imagine there was a sarcasm meter right here!]

No, there will not be a new version of my mods. No, I do not want your PM. No, I am certain, use the board and quit sending PMs. No, I am not kidding, there will not be a new version of my mods just because of YOU asking for it!
0

#3 User is offline   Borschtsch 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 77
  • Joined: 25-November 04

Posted 17 August 2008 - 12:45 PM

View PostStulle, on Aug 16 2008, 04:12 AM, said:

is that working for all supported FS and OS? like Win95 and FAT32?

This patch changes nothing else but the coding of FT_FILENAME tag stored in the known.met and it should not depend on any type of OS or FS.

If on the first run eMule finds a file with "not correct" symbols it tries to save a hash information in the known.met and the file name is getting stored using ASCII coding so the information about "not correct" symbols is getting lost. On the next run when eMule finds that file then filename compare of existing file with a file stored in known.met fails and eMule considers that this file is unknown yet and it is being re-hashed.

Also I found that this patch should be coming with a change in this function. Sorry that I forgot it.
StringConversion.h
/* Borschtsch
__inline bool NeedUTF8String(LPCWSTR pwsz)
*/
__inline bool NeedUTF8String(LPCWSTR pwsz)
{
	return true;
	/* Borschtsch - we always use Unicode
	while (*pwsz != L'\0')
	{
		if (*pwsz >= 0x100U)
			return true;
		pwsz++;
	}
	return false;
	*/
}

This was done because we don't care if we save all local text tags in Unicode. It is not a waste. This change didn't have any side effects for a long run for a lot of people.

I just thought that probably we can use utf8strRaw coding flag instead of utf8strOptBOM. Not sure about this.
But I know that this patch has helped to a lot of people including me :)

This post has been edited by Borschtsch: 17 August 2008 - 12:59 PM

0

#4 User is offline   Stulle 

  • [Enter Mod] Dev
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 5804
  • Joined: 07-April 04

Posted 19 August 2008 - 05:51 AM

well, right but if we are not able to save unicode filenames (and that was what i was pointing to with my question) there is no point in enforcing unicode filenames at all times for the known.met file...
I am an emule-web.de member and fan!

[Imagine there was a sarcasm meter right here!]

No, there will not be a new version of my mods. No, I do not want your PM. No, I am certain, use the board and quit sending PMs. No, I am not kidding, there will not be a new version of my mods just because of YOU asking for it!
0

#5 User is offline   Borschtsch 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 77
  • Joined: 25-November 04

Posted 19 August 2008 - 03:56 PM

View PostStulle, on Aug 19 2008, 11:51 AM, said:

well, right but if we are not able to save unicode filenames (and that was what i was pointing to with my question) there is no point in enforcing unicode filenames at all times for the known.met file...

How we may fall in that situation? Is it a mandatory for a system to support Unicode by builtin means or libraries to run eMule?
Are there any restrictions in Win95 or FAT32 with LFN which will not allow you to have a filename with symbols like "accented a or tilde n, c cede etc."?
I didn't find one here or here. Filename coding is UTF-16.

This post has been edited by Borschtsch: 19 August 2008 - 04:55 PM

0

#6 User is offline   CiccioBastardo 

  • Doomsday Executor
  • PipPipPipPipPipPipPip
  • Group: Italian Moderators
  • Posts: 5541
  • Joined: 22-November 03

Posted 19 August 2008 - 08:06 PM

Just for the statistics, I have integrated this patch into last Bastard version.
It doesn't hurt even in Win95+FAT, you at most "waste" a few bytes in the known.met file, so IMHO there's no point in arguing if it is valid or not.
eMule was thought to be Unicode compatible. This is is surely a part that was not updated for that. Creating problem on real Unicode supporting systems. Now it works correctly.

Thank Borschtsch.
The problem is not the client, it's the user
0

#7 User is offline   PacoBell 

  • Professional Lurker ¬_¬ (so kyoot!)
  • PipPipPipPipPipPipPip
  • Group: Moderator
  • Posts: 7296
  • Joined: 04-February 03

Posted 19 August 2008 - 11:29 PM

View PostBorschtsch, on Aug 19 2008, 08:56 AM, said:

Are there any restrictions in Win95 or FAT32 with LFN which will not allow you to have a filename with symbols like "accented a or tilde n, c cede etc."?
I didn't find one here or here. Filename coding is UTF-16.
:google:

Quote

What MSLU Does and Does Not Address
MSLU was designed to be a translation layer for Windows NT-based Unicode APIs on Windows 9x. The layer, however, is not a complete rewrite of Windows 9x, nor is it some type of Windows NT emulator for the Windows 9x platform. It does not provide support for Unicode-only scripts like Devanagari or Georgian, or for any of the new supplementary characters that have been added to Unicode (specifically UTF-16) as surrogate pairs. It also cannot provide extended international support beyond the platform that it runs on, since it is relying on the operating system for any particular international functionality that developers may want to provide. For example, MSLU does not provide updated versions of the cp_*nls files that contain support for code pages, nor does it add support on Windows 95 for GetLocalInfo LCTypes for which the operating system has no specific information.

Sed quis custodiet ipsos custodes
Math is delicious!
MmMm! Mauna Loa Milk Chocolate Toffee Macadamias are little drops of Heaven ^_^
Si vis pacem, para bellum DIE SPAMMERS DIE!

#8 User is offline   Stulle 

  • [Enter Mod] Dev
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 5804
  • Joined: 07-April 04

Posted 20 August 2008 - 02:09 AM

if it works it's fine. i was merely asking if it worked because that'd be something to know on implementing it in the official emule because it still keeps up win95 support and i think it is unwanted to drop it at this point...
I am an emule-web.de member and fan!

[Imagine there was a sarcasm meter right here!]

No, there will not be a new version of my mods. No, I do not want your PM. No, I am certain, use the board and quit sending PMs. No, I am not kidding, there will not be a new version of my mods just because of YOU asking for it!
0

#9 User is offline   Borschtsch 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 77
  • Joined: 25-November 04

Posted 20 August 2008 - 04:54 AM

View PostCiccioBastardo, on Aug 20 2008, 02:06 AM, said:

Just for the statistics, I have integrated this patch into last Bastard version.
It doesn't hurt even in Win95+FAT, you at most "waste" a few bytes in the known.met file, so IMHO there's no point in arguing if it is valid or not.
eMule was thought to be Unicode compatible. This is is surely a part that was not updated for that. Creating problem on real Unicode supporting systems. Now it works correctly.

Thank Borschtsch.

Thank you.

There is one thing I am thinking about. Function NeedUTF8String() is called by WriteOptED2KUTF8Tag() and not only when saving known.met
It is called also when we save friend list, partfiles or server list.

partfile.cpp
SavePartFile()
{
...
		if (WriteOptED2KUTF8Tag(&file, GetFileName(), FT_FILENAME))
			uTagCount++;
		CTag nametag(FT_FILENAME, GetFileName());
		nametag.WriteTagToFile(&file);
...
}

Should we also change this part? It looks reasonable to do it.
SavePartFile()
{
...
		if (WriteOptED2KUTF8Tag(&file, GetFileName(), FT_FILENAME))
			uTagCount++;
		CTag nametag(FT_FILENAME, GetFileName());
		/*
		nametag.WriteTagToFile(&file);
		*/
		nametag.WriteTagToFile(file, utf8strOptBOM);
...
}


But...
I made further investigation.

Let us think that we didn't make any changes in the code
	if (WriteOptED2KUTF8Tag(file, GetFileName(), FT_FILENAME))
		uTagCount++;
	CTag nametag(FT_FILENAME, GetFileName());
	nametag.WriteTagToFile(file);
	uTagCount++;


If function NeedUTF8String() returns true then we can virtually expand WriteOptED2KUTF8Tag() and code may look like
...
	CTag tag(FT_FILENAME, GetFileName());
	tag.WriteTagToFile(file, utf8strOptBOM);
	uTagCount++;

	CTag nametag(FT_FILENAME, GetFileName());
	nametag.WriteTagToFile(file);
	uTagCount++;
...


So we write FT_FILENAME in known.met two times in the code, but it doesn't hurt at this moment because when we call LoadTagsFromFile() this part prevents us from discarding of the UTF-8 value
			case FT_FILENAME:{
				ASSERT( newtag->IsStr() );
				if (newtag->IsStr()){
-->>				if (GetFileName().IsEmpty())
-->>					SetFileName(newtag->GetStr());
				}
				delete newtag;
				break;
			}


According to the latest discovered information the part of a patch in knownfile.cpp may look like this
KnownFile.cpp
bool CKnownFile::WriteToFile(CFileDataIO* file)
{
...
...
	/* Borschtsch - we always save file name using UTF8
	 * Helps to eliminate re-hashing of files with
	 * the "wrong" symbols in their names
	if (WriteOptED2KUTF8Tag(file, GetFileName(), FT_FILENAME))
		uTagCount++;
	CTag nametag(FT_FILENAME, GetFileName());
	nametag.WriteTagToFile(file);
	*/
	CTag nametag(FT_FILENAME, GetFileName());
	nametag.WriteTagToFile(file, utf8strOptBOM);
...
}


But now I have concerns about another thing.
Function LoadTagsFromFile() reads all tags using CTag::CTag(CFileDataIO* data, bool bOptUTF8). bOptUTF8 is false in the code.
Patch works, no complains for more than a year, but how then UTF-8 values are being read from known.met? Do we want to set bOptUTF8 to true?

Any comments?
0

#10 User is offline   Borschtsch 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 77
  • Joined: 25-November 04

Posted 21 August 2008 - 02:00 AM

Quote

But now I have concerns about another thing.
Function LoadTagsFromFile() reads all tags using CTag::CTag(CFileDataIO* data, bool bOptUTF8). bOptUTF8 is false in the code.
Patch works, no complains for more than a year, but how then UTF-8 values are being read from known.met? Do we want to set bOptUTF8 to true?

Any comments?

I just made a check with filenames which contain tilda characters.

Function CString CFileDataIO::ReadString(bool bOptUTF8, UINT uRawSize) has a check for UTF-8 leading characters, so it correctly reads Unicode string.
I will update the first post.

Thank you everybody.
0

#11 User is offline   Borschtsch 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 77
  • Joined: 25-November 04

Posted 22 August 2008 - 04:15 AM

I did further investigation. Take a look here.
Solution on the link above suites perfect for the official eMule source code, but if you want to take the maximum of it, then you should refer to the first post of current thread.

This post has been edited by Borschtsch: 22 August 2008 - 04:16 AM

0

#12 User is offline   Enig123 

  • Golden eMule
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 553
  • Joined: 22-November 04

Posted 25 August 2008 - 02:27 AM

In your case, even the NeedUTF8String() function can be removed by just replace all NeedUTF8String() function call by true.
0

#13 User is offline   Borschtsch 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 77
  • Joined: 25-November 04

Posted 26 August 2008 - 03:46 AM

View PostEnig123, on Aug 25 2008, 08:27 AM, said:

In your case, even the NeedUTF8String() function can be removed by just replace all NeedUTF8String() function call by true.

Right, it can be removed and some other redundant code too, like every reference to that function and code that rely on the condition if (NeedUTF8String() == false).
Current patch doesn't require a lot of changes in the source code.

This post has been edited by Borschtsch: 26 August 2008 - 08:55 AM

0

  • Member Options

Page 1 of 1

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users