New Aich Verification How does it work?
Posted 15 April 2010 - 08:31 PM
MD4 is known to be weak. Weak means, you can create data matching a certain hash easier than in 2^128 attempts. (IIRC that's only possible for certain specially-formed hashes.) So there are two possible attack scenarios: publish false MD4 hash sets for a download, or upload data that sums ab right but is wrong. Just sharing a single part with bad content and correct MD4 and uploading it at high speed would be enough to corrupt a share.
Q1: There is no protection against publishing of false MD4 hash sets at the moment, right?
So the idea is to always verify a downloaded chunk both with MD4 and with AICH, at least if we have a trusted AICH hash. Trusted means, it comes from a trusted ED2K link (from a trusted link site) or from a KAD search.
Q2: How can an AICH hash from a KAD search be trusted? Attackers can always publish files with wrong AICH hashes. Is a majority evaluation used for this? How to we know clients publishing AICH don't already have completed downloading a fake with a right-hash-wrong-content part and so publish a bad AICH?
Changelog says AICH part hashsets are build out of existing AICH recovery hashsets, no rehashing is done.
Q3: So is an AICH part hashset different from a recovery hashset? How and why? I can see an AICH recovery hashset is still stored in addition to the AICH part hashset in CFileIdentifier.
I am not sure if this new feature really adds security, or if it not just opens ways for new attack scenarios. Like bringing down valid shares by publishing links with bad AICH.
Posted 17 April 2010 - 07:36 PM
Well a good start would be to look into the sources. I mean its actually even described in the comments.
// About the AICH hash: We received a list of possible AICH Hashs for this file and now have to deceide what to do // If it wasn't for backwards compability, the choice would be easy: Each different md4+aich+size is its own result, // but we can'T do this alone for the fact that for the next years we will always have publishers which don'T report // the AICH hash at all (which would mean ahving a different entry, which leads to double files in searchresults). So here is what we do for now: // If we have excactly 1 AICH hash and more than 1/3 of the publishers reported it, we set it as verified AICH hash for // the file (which is as good as using a ed2k link with an AICH hash attached). If less publishers reported it or if we // have multiple AICH hashes, we ignore them and use the MD4 only. // This isn't a perfect solution, but it makes sure not to open any new attack vectors (a wrong AICH hash means we cannot // download the file sucessfully) nor to confuse users by requiering them to select an entry out of several equal looking results. // Once the majority of nodes in the network publishes AICH hashes, this might get reworked to make the AICH hash more sticky
Your definition of weak is not quite helpful for the praxis. Current MD4 weaknesses allow to produce different data with the same hash if you are able to create both data parts (which means that you cannot easily corrupt an existing file in the network). This may change in the future however if attacks develop further - that why we did those changes in 0.50a.
Given that we have a full identifier (including AICH hash), the file won't finish in this case - but we will not complete nor spread any corrupted data.
// Right now we demand that AICH (if we have one) and MD4 agree on a parthash, no matter what // This is the most secure way in order to make sure eMule will never deliver a corrupt file, // even if one of the hashalgorithms is completely or both somewhat broken // This however doesn't means that eMule is guaranteed to be able to finish a file in case // one of the algorithms is completely broken, but we will bother about that if it becomes an // issue, with the current implementation at least nothing can go horribly wrong (from a security PoV)
A recovery hashset contains all hashes - because they are needed to recover corruptions. A part hashset contains only part hashes. Since AICH is a tree hash, you can always create the part hashset out of the recovery one.
Posted 17 April 2010 - 08:31 PM
Seven for the Dwarf-lords in their halls of stone,
Nine for Mortal Men doomed to die,
One for the Dark Lord on his dark throne
In the Land of Mordor where the Shadows lie.
One Ring to rule them all, One Ring to find them,
One Ring to bring them all and in the darkness bind them
In the Land of Mordor where the Shadows lie.
Dark Lord of the Forum
Morph your Mule
Need a little help with your MorphXT? Click here
Posted 19 April 2010 - 07:17 PM
So at the moment Kad search is only helpful to give usage a boost. If AICH is ignored if one hash is different an attacker could probably prevent its usage by publishing a false hash.
Ah, it's the same tree, but recovery is only a thin branch going up to the leaves, while the part hashset is the full tree, but cropped at the twigs (== parts).
Posted 19 April 2010 - 08:13 PM
Yes. But that can't be helped. As soon as the majority of all nodes publishes AICH we might change this. Of course what we really would like is that all nodes publish complete fileidentifiers so we can create one item per unique identifier. But this would take at least 1-2 years even if we intented to force it (by breaking backwardcompability).
It is not a biggy however as long as MD4 doesn't gets broken further.
Unlikely due to compabitlity reason. Would also be quite some work and it hasn't much advantages. It is possible that we will trust at one point AICH more than MD4 (but only if it can'T be helped otherwise, as both algorithms together are much stronger than each one alone), but even in this case MD4 will stay as main identifier.