[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

GSoC 2016 Week 7 - Indexation and data Persistance over OpenDht



Hi,

As said last week this week i've started work onto the the multidimensional key for the Pht.
I've manage to create the first basic needed for this action.

First, we need to convert  Prefix ( which is a combination of an vector of uint8_t and the size of the prefix ) to a "double" prefix.
In which instead of each data is on one bit ( either 0 or 1 ) we go to 2 bit. The first bit indicate is the second bit ( which contains the data ) is known or not.

Which means that i need to do some "clever" binary operation to achieve to do that.
I've finished this first part ( convert from prefix to prefix with unknown value and vice versa. )
I've also added a padding to those prefix and an undo padding method.

Here is what i did :
From prefix with unknown to normal prefix : https://ideone.com/8qeiZY
From normal prefix to prefix with unknown : https://ideone.com/eiCZNS ( also include unpadding )

I've also write an article for https://blog.savoirfairelinux.com/en/ which should comes out this week ( for information this is not a real technical article ).

Anyway, see you next week

Regards,


REYNAUD Nicolas


From: kaldoran@live.fr
To: debian-outreach@lists.debian.org
Subject: GSoC 2016 Week 6 - Indexation and data Persistance over OpenDht
Date: Tue, 5 Jul 2016 19:04:49 +0200

Hi,

Last week i was speaking about a new solution for the merge. ( As reminder : I work on node splitting where there is too much data into one one ).
The splitting work perfectly and i good to go.

For the merge, i've tried the second solution i was speaking of last week, which was let the node expire.
But here comes another problem, when node expire and if they are at the end of the trie, then all above node will expire too ... and then data who was at the end of the trie will be unreachable.
Then i need to comes up with another solution ...

For the moment, since i go none, i will juste end with the splitting by including a "move" of the data instead of the reinsert which mean i need to keep track of the timer of the data and reinsert them with the same time_point where they where insert.

I also will write an article for the blog of Savoir Faire Linux ( https://blog.savoirfairelinux.com/en/ ).
The article should be done for Friday :D

After it i will try other solution for the merge of the data and will then start working of the multidimensional data indexation.

Anyway, see you next week, with hopefully some good news for this merge who don't want to merge :D

Regards,


REYNAUD Nicolas




From: kaldoran@live.fr
To: debian-outreach@lists.debian.org
Subject: FW: GSoC 2016 Week 5 - Indexation and data Persistance over OpenDht
Date: Thu, 30 Jun 2016 18:41:55 +0200

Hi,

Last week i was speaking about my new solution for the data persistance, now there is an implementation of that.
But here comes another problem :

If there is a lot of same value, then the split be done until the end ( that ok ), but it will not put the value by the end of the triee since there is not enough value.
That will happend is the following :
  • Split till the end of the tries
  • Put data on last node
  • Since last node, is parent, and his sibling does not contains the maximum value possible in a node, then put the data in the parent
  • All others value does the same, until the n + 1 values do it ( n is the maximum on a node )
  • Then this node will split the node into two subnode, and all n data will be split into those 2.


There problem as you may guest is that the data does a kind of "ping-pong" between last nodes and we dont want that since that will give a lot of useless data transfer.


This week i work on another method for the merge which is basically let the node expire.
then, when we put another value, we will node update the node and then after some time it will expire ... and so the data will be put into the parent.
And Voila, merge is done.

Hope that, that will work.

Anyway, see you next week, next week i'll probably start the indexation of data with multi key :D

Regards,


REYNAUD Nicolas

Reply to: