Sharing small amounts of data between PCs

List overview All Threads
Download

newer

older

2600 meeting at 18:00 on Friday...

Will two NAT routers 'in series'...

Mark Rogers

10 Oct 2014 10 Oct '14

2:08 p.m.

Suppose I have several PCs (call them Raspberry Pis, because they probably will be) each collecting data from local sources (eg temperature data via a local sensor), logging it to a local database, and making it visible via a local display on HDMI and/or via a web browser.

If I want to take advantage of the fact that they're all networked and allow each of them to pull the stored data from all their neighbours, what would be the simplest way to achieve this?

Database replication etc feels like overkill.

I don't really want there to be a "master", I'd like to think of them all as equivalent peers.

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) @ 13 Clarke Rd, Milton Keynes, MK1 1LG

Show replies by date

Ewan Slater

10 Oct 10 Oct

2:16 p.m.

How about...

http://memcached.org http://hazelcast.org/ Oracle Coherence

On 10 October 2014 14:08, Mark Rogers mark@quarella.co.uk wrote:

...

Suppose I have several PCs (call them Raspberry Pis, because they probably will be) each collecting data from local sources (eg temperature data via a local sensor), logging it to a local database, and making it visible via a local display on HDMI and/or via a web browser.

If I want to take advantage of the fact that they're all networked and allow each of them to pull the stored data from all their neighbours, what would be the simplest way to achieve this?

Database replication etc feels like overkill.

I don't really want there to be a "master", I'd like to think of them all as equivalent peers.

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) @ 13 Clarke Rd, Milton Keynes, MK1 1LG

main@lists.alug.org.uk http://www.alug.org.uk/ http://lists.alug.org.uk/mailman/listinfo/main Unsubscribe? See message headers or the web site above!

Mark Rogers

2:59 p.m.

On 10 October 2014 14:16, Ewan Slater ewan.slater@gmail.com wrote:

...

How about...

http://memcached.org

Good idea, I'll look into it (and also I just found memcachedb).

The potential issue is how devices add themselves to the system without pre-configuration though?

Also: In systems that are designed to store key/value pairs, what's the best way to store historic data?

...

http://hazelcast.org/

Looks powerful, but Java based so I'm guessing it might push my little Pi harder than I'd like?

It also crossed my mind that I could use any Dropbox-type service (I'm thinking FOSS ones though); still need a central server though. Or there's BTsync which doesn't need a central server. That way I could just dump data into a shared filesystem in some way.

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) @ 13 Clarke Rd, Milton Keynes, MK1 1LG

Ewan Slater

9:10 p.m.

On 10 October 2014 14:59, Mark Rogers mark@quarella.co.uk wrote:

...

Also: In systems that are designed to store key/value pairs, what's the best way to store historic data?

Generally it depends on: - what are the natural keys for your data (you may have a key with two or three components)? - how you are going to access the data (if you're always searching for time stamp, it might make sense to include this in the key)? - whether the system that you're using supports indexing / value extraction, or some other mechanism that avoids searching through the whole store to find a particular value ("full table scan").

Mark Rogers

13 Oct 13 Oct

12:25 p.m.

On 10/10/14 21:10, Ewan Slater wrote:

...

what are the natural keys for your data (you may have a key with two or three components)?

I guess timestamp and "name" (eg temperature, humidity).

...

how you are going to access the data (if you're always searching for time

stamp, it might make sense to include this in the key)?

What I'm going to need are: - Latest values (with their timestamps so I know how current they are) - All values in a date range (eg to draw a graph)

...

whether the system that you're using supports indexing / value

extraction,or some other mechanism that avoids searching through the whole store to find a particular value ("full table scan").

That effectively brings me full circle to where I started, which is to be asking which system I should be using! I'd say I want something pretty flexible, although not as flexible as an RDBMS. The brief research I've done on memcached storage systems suggests that if I know the key I can get the value pretty much instantly, but I've not really found any thing that indicates how (or indeed if) I can search for keys matching some criteria, and how fast that would be. If, for example, it's possible to extract values for keys matching a regex then I could probably construct something pretty useful out of that.

Mark

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER

Ewan Slater

2:10 p.m.

OK. I would try to use Coherence or Hazelcast to implement your use case. Both support range queries and indexing. Hazelcast is open source, Coherence is not.

I have not seen Coherence installed on a Raspberry Pi, but apparently Hazelcast can be:

https://oracleus.activeevents.com/2013/connect/sessionDetail.ww?SESSION_ID=7...

Caveats: 1. I have never used Hazelcast myself (I am aware of it as a competitor to Coherence). 2. I have used Coherence lots and it rocks (but I am somewhat biased, given that I work for Oracle and Coherence was one of the products I specialised in).

If you want to go for a store that does not provide indexes out of the box, then you can create your own "index views" - create an index key corresponding to a particular time slice (say) and in that store a collection of the real keys for all the readings in that time slice.

HTH,

Ewan

On 13 October 2014 12:25, Mark Rogers mark@quarella.co.uk wrote:

...

On 10/10/14 21:10, Ewan Slater wrote:

...
what are the natural keys for your data (you may have a key with two or three components)?

I guess timestamp and "name" (eg temperature, humidity).

...

how you are going to access the data (if you're always searching for

time stamp, it might make sense to include this in the key)?

What I'm going to need are:

Latest values (with their timestamps so I know how current they are)

All values in a date range (eg to draw a graph)

...

whether the system that you're using supports indexing / value

extraction,or some other mechanism that avoids searching through the whole store to find a particular value ("full table scan").

That effectively brings me full circle to where I started, which is to be asking which system I should be using! I'd say I want something pretty flexible, although not as flexible as an RDBMS. The brief research I've done on memcached storage systems suggests that if I know the key I can get the value pretty much instantly, but I've not really found any thing that indicates how (or indeed if) I can search for keys matching some criteria, and how fast that would be. If, for example, it's possible to extract values for keys matching a regex then I could probably construct something pretty useful out of that.

Mark

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER

main@lists.alug.org.uk http://www.alug.org.uk/ http://lists.alug.org.uk/mailman/listinfo/main Unsubscribe? See message headers or the web site above!

Mark Rogers

3:38 p.m.

On 13 October 2014 14:10, Ewan Slater ewan.slater@gmail.com wrote:

...

OK. I would try to use Coherence or Hazelcast to implement your use case. Both support range queries and indexing. Hazelcast is open source, Coherence is not.

OK, I've looked a bit more into this (although not got as far as trying anything).

Coherence/Hazelcast look like great tools that I should probably learn to use at some point. Indeed I need to start playing with NoSQL in general some day just to have it in the mental toolbox for when it's needed.

But the more I look at them the less they look like the right was to share a small amount of data between (small, underpowered, low memory) hosts?

The other thing I'm finding is that everything is based around high availability clusters, rather than the "simpler" concept of just making sure that all the boxes have the same data, which is really all I want.

I'm now looking at things like btsync and syncthing to keep a directory synchronised between boxes (and, optionally, to an external backup server, although the mechanisms will be P2P with no "master"). The issues there are with speed (eg by default syncthing will synchronise the directories every minute, and although I can reduce that to 1s I don't feel like it "wants" me to be doing that), and then obviously with file locking. I'm sure that something based around memcache would be better suited though, if only I can get my head around it!

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) @ 13 Clarke Rd, Milton Keynes, MK1 1LG

Ewan Slater

4:26 p.m.

On 13 October 2014 15:38, Mark Rogers mark@quarella.co.uk wrote:

...

But the more I look at them the less they look like the right was to share a small amount of data between (small, underpowered, low memory) hosts?

It wouldn't be the first time I'd suggested a sledgehammer to crack a nut ;-)

A lot of sensor data type use cases are using NoSQL databases. Admittedly doing everything in memory isn't the first choice for low powered / embedded solutions (see previous comment).

...

The other thing I'm finding is that everything is based around high availability clusters, rather than the "simpler" concept of just making sure that all the boxes have the same data, which is really all I want.

Fair point - although you can usually turn the HA features off through configuration.

...

I'm now looking at things like btsync and syncthing to keep a directory synchronised between boxes (and, optionally, to an external backup server, although the mechanisms will be P2P with no "master"). The issues there are with speed (eg by default syncthing will synchronise the directories every minute, and although I can reduce that to 1s I don't feel like it "wants" me to be doing that), and then obviously with file locking. I'm sure that something based around memcache would be better suited though, if only I can get my head around it!

It depends if the file sync option gives you what you want? Sounds like deep down you feel that's a bit clunky.

A couple of other thoughts:

If you're going to use an external server why not just put a (lightweight) database on that? Or could you use cloud storage / database?

Or, how about using a pub / sub model? You could have each node: 1. publish it's own events and listen to the events of all the others 2. write all events it's own & others events to a file (or files) which is it's view of the data

There is a Python pub/sub library (which I am assuming will run on the Raspberry Pi): http://pubsub.sourceforge.net/#

Cheers,

Ewan

Mark Rogers

14 Oct 14 Oct

3:38 p.m.

On 13 October 2014 16:26, Ewan Slater ewan.slater@gmail.com wrote:

...

It wouldn't be the first time I'd suggested a sledgehammer to crack a nut ;-)

Or the first time I'd used one! But the Pi might be a bit puny to withstand the attack!

(Aside: Actually I'm planning to use these: https://www.olimex.com/Products/OLinuXino/A20/A20-OLinuXino-LIME/open-source... .. rather than the Pi; quite a bit more powerful and a bit more "industrial" but still only 33 Euros, and fully open source design.)

...

It depends if the file sync option gives you what you want? Sounds like deep down you feel that's a bit clunky.

Yep, that's my fear, but it feels simpler than the data centre options.

...

If you're going to use an external server why not just put a (lightweight) database on that? Or could you use cloud storage / database?

I don't want to *need* a central server, and in particular I don't want to need an Internet connection. (I do want to be able to use one, just not require it.)

...

Or, how about using a pub / sub model? You could have each node:

publish it's own events and listen to the events of all the others

write all events it's own & others events to a file (or files)

which is it's view of the data

I looked at this but again the concept seems more geared to having a more known infrastructure.

I want to end up with something where any of the individual boxes can be switched off without affecting the overall system, and that I can add another box somewhere to add something extra (eg monitor a temperature in a different location) without a dependence on one of the boxes being a "master". I wouldn't be trying if I weren't convinced the technology is all there to be used, though!

I have today found: http://www.consul.io/ ... which looks promising, and pretty lightweight.

Mark

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) @ 13 Clarke Rd, Milton Keynes, MK1 1LG

Ewan Slater

6:43 p.m.

On 14 October 2014 15:38, Mark Rogers mark@quarella.co.uk wrote:

...

On 13 October 2014 16:26, Ewan Slater ewan.slater@gmail.com wrote: (Aside: Actually I'm planning to use these: https://www.olimex.com/Products/OLinuXino/A20/A20-OLinuXino-LIME/open-source... .. rather than the Pi; quite a bit more powerful and a bit more "industrial" but still only 33 Euros, and fully open source design.)

Those look fun - I may have to treat myself to one (or two)!

...

I have today found: http://www.consul.io/ ... which looks promising, and pretty lightweight.

It does. I'd be interested to know how it works out

Cheers,

Ewan

Mark Rogers

15 Oct 15 Oct

1:33 p.m.

On 14 October 2014 18:43, Ewan Slater ewan.slater@gmail.com wrote:

...

On 14 October 2014 15:38, Mark Rogers mark@quarella.co.uk wrote:

...
On 13 October 2014 16:26, Ewan Slater ewan.slater@gmail.com wrote: (Aside: Actually I'm planning to use these: https://www.olimex.com/Products/OLinuXino/A20/A20-OLinuXino-LIME/open-source... .. rather than the Pi; quite a bit more powerful and a bit more "industrial" but still only 33 Euros, and fully open source design.)

Those look fun - I may have to treat myself to one (or two)!

If you do, and if you plan to use the I/O, note that they use a 0.05" pitch rather than 0.1" as per the Pi (and almost everything else!), which is great for getting more stuff in a small space, but less good when it comes to actually using them. I mention it because although shipping in Europe is cheap, it's a bit of a pain having to then go back and separately order cables that you didn't think you'd need first time around!

Also: The supplied Debian image seems OK but expect to be pretty much on your own after that (there's far less support for it than the Pi, for example).

For me it's well worth a look though; the extra power is very useful, the wider temperature range seems handy to have, and it also has built in battery charging so you can connect a cheap battery that then acts as a UPS. I'd also have said that MicroSD is a plus but the current crop of Pis have caught up there (but on the flipside a lot of Pi cases no longer fit and one of the best things about the Pi was the ecosystem around it).

...

...
I have today found: http://www.consul.io/ ... which looks promising, and pretty lightweight.

It does. I'd be interested to know how it works out

Well as a simple KV store it's trivial to set up and use (download a binary and run it, a couple of commands later you have a distributed multi-master KV store that is pretty responsive from the tests I've done so far). I haven't had time to push it very hard but it looks like a great starting point.

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) @ 13 Clarke Rd, Milton Keynes, MK1 1LG

Mark Rogers

3 Nov 3 Nov

10:23 a.m.

On 15 October 2014 13:33, Mark Rogers mark@quarella.co.uk wrote:

...

[History snipped, not relevant]

Just to provide an update on my quest for those who may be interested:

I appear to have more or less settled on OrientDB, which didn't get mentioned here. It's a distributed database (using Hazelcast, which did get a mention here), and is Java based (which I was biased against initially but it seems happy within the constraints of something like a Pi). It's a NoSQL database (with a fairly straightforward SQL interface for people like me who are new to NoSQL). The distributed side of things has seemed mostly trivial so far (install Java, download it and run it on two PCs and they find each other and take it from there themselves).

Biggest challenge for me is the lack of NoSQL knowledge but I'll post a question about that in a new thread.

Mark

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) @ 13 Clarke Rd, Milton Keynes, MK1 1LG

Ewan Slater

10:42 a.m.

Sounds good - I look forward to hearing how you get on :-)

On 3 November 2014 10:23, Mark Rogers mark@quarella.co.uk wrote:

...

On 15 October 2014 13:33, Mark Rogers mark@quarella.co.uk wrote:

...
[History snipped, not relevant]

Just to provide an update on my quest for those who may be interested:

I appear to have more or less settled on OrientDB, which didn't get mentioned here. It's a distributed database (using Hazelcast, which did get a mention here), and is Java based (which I was biased against initially but it seems happy within the constraints of something like a Pi). It's a NoSQL database (with a fairly straightforward SQL interface for people like me who are new to NoSQL). The distributed side of things has seemed mostly trivial so far (install Java, download it and run it on two PCs and they find each other and take it from there themselves).

Biggest challenge for me is the lack of NoSQL knowledge but I'll post a question about that in a new thread.

Mark

Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) @ 13 Clarke Rd, Milton Keynes, MK1 1LG

main@lists.alug.org.uk http://www.alug.org.uk/ http://lists.alug.org.uk/mailman/listinfo/main Unsubscribe? See message headers or the web site above!

Chris Green

10 Oct 10 Oct

2:52 p.m.

On Fri, Oct 10, 2014 at 02:08:51PM +0100, Mark Rogers wrote:

...

Suppose I have several PCs (call them Raspberry Pis, because they probably will be) each collecting data from local sources (eg temperature data via a local sensor), logging it to a local database, and making it visible via a local display on HDMI and/or via a web browser.

If I want to take advantage of the fact that they're all networked and allow each of them to pull the stored data from all their neighbours, what would be the simplest way to achieve this?

Database replication etc feels like overkill.

I don't really want there to be a "master", I'd like to think of them all as equivalent peers.

NFS? Come up with a reasonably sensible set of names for them, then each can export an area (specific user?) that all the others can see. Or alternatively have one provide a common NFS share that all can write to and put their data.

-- Chris Green

Mark Rogers

3:01 p.m.

On 10 October 2014 14:52, Chris Green cl@isbd.net wrote:

...

NFS? Come up with a reasonably sensible set of names for them, then each can export an area (specific user?) that all the others can see. Or alternatively have one provide a common NFS share that all can write to and put their data.

I've never actually used NFS so I might have my idea of it wrong, but the key point is that all systems need to be equal (ie no "server"), and also I don't want devices to have to jump through hoops to get added into the system (ie if I have 10 devices, I don't want to have to reconfigure all 10 to tell them that I've added an 11th).

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) @ 13 Clarke Rd, Milton Keynes, MK1 1LG

Chris Green

3:53 p.m.

On Fri, Oct 10, 2014 at 03:01:55PM +0100, Mark Rogers wrote:

...

On 10 October 2014 14:52, Chris Green cl@isbd.net wrote:

...
NFS? Come up with a reasonably sensible set of names for them, then each can export an area (specific user?) that all the others can see. Or alternatively have one provide a common NFS share that all can write to and put their data.

I've never actually used NFS so I might have my idea of it wrong, but the key point is that all systems need to be equal (ie no "server"), and also I don't want devices to have to jump through hoops to get added into the system (ie if I have 10 devices, I don't want to have to reconfigure all 10 to tell them that I've added an 11th).

Well they're all servers if you do the "each can export an area" suggestion. Then each can scan for other hosts on the LAN and mount the exported directories.

You are surely going to have to configure somewhere the names of systems in your 'peer sharing' group aren't you, whatever sort of system you use.

-- Chris Green

Ewan Slater

6:50 p.m.

A lot of in memory data grids use multicast auto discovery, so you can scale the cluster out and back easily, without having to specify what's running where up front.

They're also peer to peer so there's no SPOF/SPOB.

If you don't believe me, download Coherence and have a play :-)

I don't have a Pi, so don't know how hard it is to run Java on one.

Sent from my iSheep

...

On 10 Oct 2014, at 15:53, Chris Green cl@isbd.net wrote:

...
On Fri, Oct 10, 2014 at 03:01:55PM +0100, Mark Rogers wrote:

...
On 10 October 2014 14:52, Chris Green cl@isbd.net wrote: NFS? Come up with a reasonably sensible set of names for them, then each can export an area (specific user?) that all the others can see. Or alternatively have one provide a common NFS share that all can write to and put their data.

I've never actually used NFS so I might have my idea of it wrong, but the key point is that all systems need to be equal (ie no "server"), and also I don't want devices to have to jump through hoops to get added into the system (ie if I have 10 devices, I don't want to have to reconfigure all 10 to tell them that I've added an 11th).

Well they're all servers if you do the "each can export an area" suggestion. Then each can scan for other hosts on the LAN and mount the exported directories.

You are surely going to have to configure somewhere the names of systems in your 'peer sharing' group aren't you, whatever sort of system you use.

-- Chris Green

main@lists.alug.org.uk http://www.alug.org.uk/ http://lists.alug.org.uk/mailman/listinfo/main Unsubscribe? See message headers or the web site above!

Chris Green

7:04 p.m.

On Fri, Oct 10, 2014 at 06:50:44PM +0100, Ewan Slater wrote:

...

A lot of in memory data grids use multicast auto discovery, so you can scale the cluster out and back easily, without having to specify what's running where up front.

Does that assume that *everything* on the cluster is part of the sharing? If not then how do they decide who to share with?

-- Chris Green

Ewan Slater

9:04 p.m.

On 10 October 2014 19:04, Chris Green cl@isbd.net wrote:

...

On Fri, Oct 10, 2014 at 06:50:44PM +0100, Ewan Slater wrote:

...
A lot of in memory data grids use multicast auto discovery, so you can scale the cluster out and back easily, without having to specify what's running where up front.

Does that assume that *everything* on the cluster is part of the sharing? If not then how do they decide who to share with?

In the simplest case yes - the data is available across all the nodes in the cluster. The data will be partitioned (the key for an object is hashed to determine which partition it goes on) and the nodes exchange data about which node has which partition(s). So the amount of data that the cluster can hold increases as you add more nodes.

You may then have clients which can call the cluster, but do not themselves store data.

3921

Age (days ago)

3945

Last active (days ago)

main@lists.alug.org.uk

18 comments

3 participants

tags (0)

participants (3)

Chris Green
Ewan Slater
Mark Rogers