[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: raw disk access



On Mon, Feb 10, 2003 at 08:43:22AM -0500, Phillip Hofmeister wrote:
> On Mon, 10 Feb 2003 at 01:24:29PM +0100, Alberto Cort?s wrote:
> > cp, dd and every command use the system calls, and system calls use
> > the drivers, and i am not sure the drivers don't modify "structure".
> 
> dd, cat, etc. do modify the structure.  One common way I rip an ISO is:
> 
> cat /dev/cdrom > myfile.iso
> 
> The geometry of the HD and the cdrom are almost certainly different.

 They don't modify any "structure".  There is no structure, just a linear
array of bytes.  The partition table says where on the disk the sub-arrays
called partitions start and end.  There are various ways of addressing this
byte array, such as Cylinder/Head/Sector addresses and Linear Block
Addressing.  These are only used in the partiton table and in PC BIOS APIs.
They have no relevance to anything other than fdisk and lilo (and the
partition detection code in the kernel).  The byte array that is a partition
can of course be any sequence of bytes whatsoever, but usually partitions
hold filesystems.  Filesystems (AFAIK) only use relative offsets;  e.g. an
ext3 filesystem stores the free block list in terms of where the blocks are
relative to the start of the partition, not in terms of CHS or LBA
addresses.  You can dd a filesystem from a partition into a file, and mount
it with the loopback device, without modifying the data from the partition,
and it works.  (Somebody has probably written a filesystem that deals with
absolute disk addresses, but I hope there aren't any like that in common
use!) 

 Hard drives don't store their size in the byte array that is accessable.
They can report it (as when you run hdparm -I /dev/hdx) via a channel other
than normal reads and writes.  This is how fdisk can tell how big it should
let you make your partitions, not by reading the size of the old partition
table.  (there won't be a part table on a new hard drive:  they come filled
with zero bytes (as would result from dd if=/dev/zero of=/dev/hdx)).

 Also note that CDROMs don't use partition tables.  Thus, CDROMs don't have
"geometry".  Incidentally, the data on a CDROM is physically in one long
spiral, like a vinyl record.  The data is logically broken up into 2kB
blocks, each with its own error correction data.  (audio CDs use 2352B
blocks, with less error correction or something...)

The difference between dd and cp is that you can tell dd what block size to
use.  If you are writing to a flash device, cp (which reads 4kB, then writes
4kB, etc.) will take a lot longer than dd bs=32k. Why?  (some) flash devices
have a block size of 32kB, so you have to rewrite the whole block to modify
any part of it.  If you write in small chunks, you have to do several
read-modify-write cycles, instead of just one write cycle.  Hard drive block
sizes are 512B, so cp vs. dd doesn't make a difference.  (bigger block sizes
results in fewer system calls, and probably lower CPU overhead, though.  I
usually use dd bs=1024k.)

-- 
#define X(x,y) x##y
Peter Cordes ;  e-mail: X(peter@llama.nslug. , ns.ca)

"The gods confound the man who first found out how to distinguish the hours!
 Confound him, too, who in this place set up a sundial, to cut and hack
 my day so wretchedly into small pieces!" -- Plautus, 200 BC



Reply to: