Finally I think the cause of my heap corruption has been figured out. When a downloading file is completing, if we received OP_FILESTATUS, which obviously will trigger SwapToAnotherFile() action, which may have thread conflict with another SwapToAnotherFile() when hashing the file is done and CPartFile::RemoveAllSources() and ended up with heap corruption.
I have two different crashes been caught after download file completed, and DoSwap() leave with 'Unsync between parfile->srclist and client otherfiles list' logs before both crashes.
The fix from my side is as following:
case OP_FILESTATUS:
{
if (thePrefs.GetDebugClientTCPLevel() > 0)
DebugRecv("OP_FileStatus", client, (size >= 16) ? packet : NULL);
theStats.AddDownDataOverheadFileRequest(size);
CSafeMemFile data(packet, size);
uchar cfilehash[16];
data.ReadHash16(cfilehash);
CPartFile* file = theApp.downloadqueue->GetFileByID(cfilehash);
if (file == NULL)
client->CheckFailedFileIdReqs(cfilehash);
//Enig123::fixed a rare crash bug around AFAF handling, after a file is completed while still received OP_FILESTATUS
else if (file->GetStatus() == PS_COMPLETE || file->GetStatus() == PS_COMPLETING)
{
AddDebugLogLine(false, _T("client send FileStatus during the file is being completed, don't process: client '%s' file '%s'"), client->DbgGetClientInfo(), file->GetFileName());
break;
}
//Enig123::end
client->ProcessFileStatus(false, &data, file);
break;
}
After applying this patch, the crash seems never happened again. I dig a little deeper, and found the extra OP_FILESTATUS was asked by introducing NetFinity's Delayed NNP, which tried to ask for the latest partstatus from a source became NNP.
Since there's buggy clients out there sending OP_FILESTATUS in no time, the patch seems to do no harm, and prevented potential heap corruption.
Need developer's opinions on this.
Edit: I thought it was caused by thread conflict, later after I tried to put the RemoveAllSources invokes to the main thread, eMule still crashes, which made me think that maybe two consecutive invoke of SwapToAnotherFile() actions is enough to cause the heap corruption I spotted in this case.
Edit2: The trying to ask for the latest part status before put the source to real NNP would merely made the heap corruption emerge earlier. Even if I commented that that part of code, the "client send FileStatus during the file is being completed" still in the log sometimes.
This post has been edited by Enig123: 02 March 2015 - 03:41 AM