[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: file server



On 7/12/23 02:44, lina wrote:
Dear all,

My computer only has 2 TB data storage capacity,

I want to have 100 TB capacity to store/analyze data.

I am thinking of adding 5 hard drives, each is 18TB, and then merge
them into one volume? or get a file server? What is the best option
for me, and what is the budget?

Thanks so much for your advice, best, lina


On 7/12/23 04:48, lina wrote:
Currently I do not have a plan to keep the data, once the data
finished analyzing, I can just remove it.


On 7/12/23 06:00, lina wrote:
I need to extract the data for downstream analysis. after that, these
data can be removed.

It is hard to provide recommendations without knowing your computer, your network, your analysis, your quality metrics, or your budget.


I use ZFS. Given an x86_64/amd64 computer with Debian, sufficient HDD bays, and sufficient HBA ports, yes, you could install 5 @ 18 TB HDD's and merge them into one 90 TB ZFS pool. If your computer has 5 bays and ports, this will be your lowest cost solution; but is unlikely to be your "best" solution.


ZFS likes memory; the more the better. (I use ECC memory.) For 90 TB, I would consider filling all memory slots with the fastest and largest modules that are supported.


ZFS allows SSD's to be added as read cache devices and/or write cache devices. Done correctly, either or both can improve performance at a fraction the cost of all-SSD storage.


If your analysis can make use of concurrent I/O, more drives of smaller size each will improve performance. One or more external chassis may be desirable:

	 6 @ 15 TB
	 9 @ 10 TB
	10 @  9 TB
	15 @  6 TB
	18 @  5 TB
	30 @  3 TB
	45 @  2 TB
	90 @  1 TB


And, smaller drives make RAID more feasible. E.g. 20 @ 6 TB arranged as 5 raidz1 virtual devices (vdev) of 4 drives each would provide 90 TB of storage, support 5 concurrent I/O operations, and tolerate 1 drive failure per vdev at an incremental cost of +33%. Whereas 10 @ 18 TB drives arranged as 5 mirror vdev's of 2 drives each would provide 90 TB of storage, support 5 concurrent I/O operations, and tolerate 1 drive failure per vdev at an incremental cost of +100%. But, the latter will resilver faster when you replace a failed drive (or a spare activates).


If your analysis can be partitioned across multiple threads and the threads have independent memory and I/O patterns, putting the data onto a file server (or NAS) would allow multiple computers to work together and do the analysis in less time. You will want a fast connection between the analysis computers and the storage server (e.g. 10+ Gbps Ethernet). (Alternatively, a storage area network; SAN.)


David


Reply to: