Official eMule-Board: Aich Building The Tree - Official eMule-Board

Jump to content


Page 1 of 1

Aich Building The Tree

#1 User is offline   slivovice 

  • Newbie
  • Pip
  • Group: Members
  • Posts: 12
  • Joined: 08-February 22

Posted 31 March 2023 - 06:39 PM

I dont understand how the tree is built up to the root.
SHAHashSet.h has an example:

Example tree:
	FileSize: 19506000 Bytes = 18,6 MB

                     X (18,6)                      MasterHash
                    /        \
                   X (18,55)   \
                /    \           \
                   X(9,28)  x(9,28)         X (0,05MB)               PartHashes
              /     \      /       \          \
         X(4,75)  X(4,53) X(4,53)  X(4,75)     \
			[...............]
X(180KB)   X(180KB)  [...] X(140KB) | X(180KB) X(180KB [...]			   BlockHashes
                                    v
                    Border between first and second Part (9.28MB)


I thought the hashtree is build up by pair of each 180kb Block, then by pair from the resulting level,... until the top.
Remaining hashes when the number in each level is not even are pushed up to the next level,... I implemented this but it works only for files with size of 180kb*4. With bigger size it fails.


I dont understand where the sizes of 4.75 and 4.53 come from and why is this 4.75 hash on the left not joined to a Part 9.28.

This post has been edited by slivovice: 01 April 2023 - 12:59 PM

0

#2 User is offline   megaT 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 62
  • Joined: 09-May 20

Posted 20 April 2023 - 09:13 PM

Well I had some issue's with understanding this AICH tree too, but I implemented it too.
Take the original code as reference, I rewrote it completely because the original code isn't so nice written and also not good documented (barely readable, at least if we consider todays standards).

Basically the AICH is an B-Tree which is used to calculate SHA checksums, each node contains one of these checksums for every block.
The most important ed2k/emule data sizes are the blocks with 180kb and the so-called parts with 9.28Mb.
One part consists of 53 blocks.

Since the complete tree has to cover a whole file it need's to be divided for each level, the last part's are the leaf's, which contain the SHA checksums.
The node's are there like in any b-tree in order to search quickly thru the tree until you have the correct node for the part of the file you want to examine (check) the SHA or generate one.
In this example this all starts with an file size of 18.60MB, which has to be divided into part's.

18.60 / 9.28 = 2.0043...
So you have two full parts, the next level represents these with 18.55. Why? Because two complete parts fit into this level, the rest is 0.05MB.
Then you follow the next level which is 18.55... divide again:
18.55 / 9.28 = 1.9989...
So this explains now why 4.75 is not joined with 9.28 because 9.28 + 4.75 are not two complete part's otherwise you'd need to reach 18.56MB for it.
It's bit tricky why 4.75 is on the very left... but it has to do with distributing the blocks more or less evenly (there is a calculation for this in the original code, basically I just copied it - just wrote it a bit nicer)

You have to do this whenever you want to find the right node, if you cannot divide into less parts you have to divide them evenly related to the block count... you end up with smaller chunks.
In the end you have a single 180K node where you can calculate the SHA for.
0

#3 User is offline   slivovice 

  • Newbie
  • Pip
  • Group: Members
  • Posts: 12
  • Joined: 08-February 22

Posted 03 May 2023 - 03:15 PM

thank you for your answer!

18.55 are two parts (2x9728000 = 19456000).
18.55/9.28 must be 2 with no remainder.



The ASCII Graphics in the source is not clear bec. it uses tabs mixed with spaces

Maybe it is meant like this:
                          X(18,6)                         MasterHash
                         /      \___.
                        /            \
                     X(18,55)         \
                    /        \         \
               X(9,28)        x(9,28)   X(0,05MB)         PartHashes
              /    \           /     \          \
        X(4,75)     X(4,53)  |X(4,53) X(4,75)    \
                                                  \
                       [...............]           \
                                                    \
X(180KB)X(180KB)[...]X(140KB)|X(180KB)X(180KB)[...]|0,05MB       BlockHashes
                             |
                             v
                        Border between first and second Part (9.28MB)


How do you get the values 4.75 and 4.53?
The levels between the lowest (52 x 180k + 140k) up to the level with parthashes (9.28) is nowhere explained, they are a black box.
0

#4 User is offline   megaT 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 62
  • Joined: 09-May 20

Posted 03 May 2023 - 07:06 PM

Hi,

it's not a blackbox - you have to understand that the tree is growing from top-left to bottom-left.. so the left leaf's are favoured, the algorithm
tries to fulfill these parts of the tree at first.

The 140K or 4.53 are both reminders of what's left in the level's.

180K is no blackbox, this is the block size, 140K < 180K - so that's a reminder to what's left in that part of the tree.

Edit:

Quote

18.55/9.28 must be 2 with no remainder.

Yup you're right, sorry - this are 2 parts and the diagram is showing how to split those two. (eg. 4.75 and 4.53)

if you want to know how the algorithm is working you need to read the source code - the diagram isn't sufficient.
There is the base size which is either the part or the block size if they file is too small.
Have you checked how the block count and the find function works?

This post has been edited by megaT: 03 May 2023 - 07:32 PM

0

#5 User is offline   slivovice 

  • Newbie
  • Pip
  • Group: Members
  • Posts: 12
  • Joined: 08-February 22

Posted 04 May 2023 - 07:50 AM

View PostmegaT, on 03 May 2023 - 07:06 PM, said:

Hi,

it's not a blackbox - you have to understand that the tree is growing from top-left to bottom-left.. so the left leaf's are favoured, the algorithm
tries to fulfill these parts of the tree at first.

Yes I know. Theres the flag that decides what goes to left or right. Beginning from top is left.

Quote

The 140K or 4.53 are both reminders of what's left in the level's.

Yes I know that 140k is a remainder of a whole part: 52*180k + 140k.

Quote

180K is no blackbox, this is the block size, 140K < 180K - so that's a reminder to what's left in that part of the tree.

With blackbox I mean how do I get 4.53/4.75 from a 9.28 node like in the example.
I never get this values, that was where the black box begun for me.

But in the meantime I found the error in my code…

Quote

if you want to know how the algorithm is working you need to read the source code - the diagram isn't sufficient.
There is the base size which is either the part or the block size if they file is too small.
Have you checked how the block count and the find function works?

Yes, I even implemented find more or less 1:1 and it works until I reach a 9.28 Node, from there I dont
get the 4.75,4.53 values.

The main part for building the tree is this code fragment. It appears in several places,
ASSERTs deleted here.

uint64 nBlocks = m_nDataSize / GetBaseSize() + static_cast<uint64>(m_nDataSize % GetBaseSize() != 0);
uint64 nLeft = (nBlocks + static_cast<unsigned>(m_bIsLeftBranch)) / 2 * GetBaseSize();
uint64 nRight = m_nDataSize - nLeft;

 if (m_pLeftTree == NULL)
   m_pLeftTree = new CAICHHashTree(nLeft, true, (nLeft <= PARTSIZE) ? EMBLOCKSIZE : PARTSIZE);

 if (m_pRightTree == NULL)
   m_pRightTree = new CAICHHashTree(nRight, false, (nRight <= PARTSIZE) ? EMBLOCKSIZE : PARTSIZE);

  return m_pLeftTree->LoadLowestLevelHashes(fileInput) &&  m_pRightTree->LoadLowestLevelHashes(fileInput);


I rechecked my code and found out that my GetBaseSize was wrong. The emule code uses a flag to save space
saving the two values, I didnt do this but use the simple return from the original GetBaseSize that does
not do the check. The check is done in the original SetBaseSize. So my GetBaseSize always returns 9.28 - ARGH!


Thank you for your answers.
0

#6 User is offline   megaT 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 62
  • Joined: 09-May 20

Posted 04 May 2023 - 08:56 AM

Heh, yeah happens - to me it was complicated to understand that old code.
I've done a type/class called AICH_Hash_tree which basically represents a single node, nothing else.

In my code, base size is determined like this:
size_t getBaseSize( ) const noexcept {
 return (m_BaseSize) ? _blockSize : _partSize;
}


m_BaseSize is just a bool, which is initialized by an constructor:
m_BaseSize( (_datasiz <= _partSize) )


I've an inspect function which validates and prints the whole tree and tested it with several files,
along (in the end) with stuff I received via ed2k.
0

#7 User is offline   slivovice 

  • Newbie
  • Pip
  • Group: Members
  • Posts: 12
  • Joined: 08-February 22

Posted 06 May 2023 - 01:38 PM

I found an alternative implementation in NeoLoader:
Look in the source to FileHashTree.cpp

Start at FileHashTree::Calculate(QIODevice *pFile) to trace what methods are called for hashing (AICH) a complete file.

This is much more readable than the emule source.
0

#8 User is offline   megaT 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 62
  • Joined: 09-May 20

Posted 22 May 2023 - 09:07 AM

Yes you're correct, well since you asked me about my design of my application.
One choice I did make was that my code doesn't base on Qt.
I think Qt is a valid choice, a very good framework for C++ however sometimes you get into compatibility troubles
and license wise it might not be so good (since Qt6..).

So I did everything in Standard C++ without taking any big framework.
I only used Qt5 for a (testing?) GUI, but the code base isn't really that big.
0

  • Member Options

Page 1 of 1

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users