[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Advice on hardware server to use for small a dedicated data center



debian-user:

I sent the following to the OP rather than the list by mistake...


David



On 2020-06-28 03:34, echo test wrote:

... about 2000 users all are restaurants that save their
selling history locally on their own server then 2 or 3 times in the
morning they will rsync their postgres data on my data center.

We are a startup and for the moment we have a production and a
development, in fact the production is just like a test environment
because we do continuous delivery, we push everyday in order to know
more quickly  when something has been broken and our semi-automated
tests didn't detect it. Personally, I'm a self learner, and probably
many guys of my team are too. So some advices here are also welcome.

We want to be able to handle 2500+ rsync in the morning (probably
distributing them in time in order to avoid a single big load acting
as a ddos) and for each client of my clients (restaurants) a get and
put profile request.

Note: client's profile are shared across restaurants and clients can
find/filter restaurants on the website which is not yet built but we
are working on it.


Thank you for describing your services. That makes it easier for people to comment.


It sounds like you have three services -- mail, web, and database. The web and database services interact heavily, but mail not so much.


I once worked on a project with a similar aspect. One server dialed remote data acquisition units and downloaded data updates once a day. The engineers would then copy the database from the server to their workstations and crunch the numbers. The data collection server was a modest computer. The engineers always wanted more powerful computers. I recall one query took 30+ minutes to run on my workstation. I added indexes for the relevant fields, and the query took a few minutes. Powerful hardware is nice, but using it efficiently is even better.


ZFS beeing a filesystem and mdadm an utility software, I think I'll go
for mdadm. I didn't know that Debian was supporting ZFS I always used
Ext4.


For ZFS on Debian, install the 'zfs-dkms' package.


I migrated my file servers from Linux, md, and ext4 to FreeBSD and ZFS over the past few years. ZFS requires a different way of thinking, and I am still wrestling with automating system administration chores. Key benefits include unification of the storage stack, flexible storage allocation and management, fine-grained control via metadata, detection and correction of bit rot, snapshots, and replication. With careful planning, the last feature can replace rsync(1) and has the advantage that it can be done both synchronously and asynchronously. So, you can replicate over a network in real-time and you can send a replication stream to a file and receive it later.


Given the trends of increasing storage, increasing bandwidth, and flat bit error rates, bit rot has become a reality that we must contend with. For file systems, the only two choices I know of are btrfs and ZFS. I chose ZFS.


Similarly so for memory.  I now buy computers with ECC memory.


... raspberry pi ... external hard drive ...
I dislike the idea that if I encrypt my hard drive anybody with enough
knowledge can just take the SD card and break my encryption.

If all an attacker needs to decrypt your external hard drive is the SD card from your Raspberry Pi, then your encryption is not implemented correctly.


My practice for external drives is to encrypt them with a random alphanumeric password (I believe the current recommendation is 12 or more characters).


About the power consumption, any advice about some low power hardware are
also welcome.

I believe power consumption for transistors is proportional to clock frequency squared. So, one core running twice as fast does twice the work, but consumes four times the power. If your program(s) can be readily divided into many concurrent tasks, a 16-core processor running at 2 GHz will accomplish twice the work as a 4-core processor running at 4 GHz, for the same amount of power.


Choose energy efficient hardware and components, especially power supplies.


Write and use efficient software.


Why do you dislike systemd ? I heard many people saying the same
thing and I don't really understand what are their motivation except initd
is less invasive.

systemd has been debated endlessly. I install and run Debian "oldstable" OOTB, I avoid unofficial packages. I avoid compiling/ installing from source. As a result, I systemd is mostly invisible to me. If you build your services to run on a Unix-like OS without customizing the OS itself, that would simplify things.


On 2020-06-28 00:16, David Christensen wrote:

Even if you do not use their services, you might find it useful to
emulate them and implement a private cloud.

Very interesting, can you tell me more about that emulation process please ?

There has been a trend towards virtualization and containerization for many years. Open- and closed-source ideas and solutions are developed in parallel. If you go with a commercial provider, they will have put the pieces in place so you can focus on your app. But, your app must be designed to fit into their solution. If you would rather do all it yourself with open-source software, then it's a matter of researching all the pieces and making them happen in your environment. If you choose your pieces to match a commercial offering that is based on open-source software, it is possible that your app could work in both places. Study the commercial offerings. On Linux, I expect you will see Kubernetes and Docker. That said, I use FreeBSD and jails for Samba and CVS services. My Debian VPS/ network management service is vendor specific (linode.com). I use VirtualBox on my Debian desktop for development work.


Are you monitoring your current services? If so, you could compare measurements against hardware, and then scale to your goals.


Rolling out changes directly from development to production is risky (but I've done it too). Now would be the time to implement two sets of equipment, one for live and one for standby, and the means to swap which is live and which is not. That way, you could deploy to the standby set, test, swap roles if good, and swap back if the updates later fail. The crux will be keeping live data up to date on both machines.


Do you have good backup and restore processes for your current systems? What are your plans for the new systems?


Do you desire 24x7 operations, live maintenance, automatic fail-over, high-availability, or similar? If so, how?


What is your budget and schedule?  Manpower?


David


Reply to: