[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Minidisk support (was: Installation Question)



On 2009-12-15, Frans Pop wrote:
>
> OK. My s390 knowledge is very limited. My understanding was that minidisks 
> were not supported at all (as there's a longstanding BR open to add 
> support for them in the installer).
>

OK, now I think I understand where the confusion lies.  I'd start a new
thread here; but since the subject line already says "minidisk support",
and since that's exactly what we're discussing now, I'll just continue
with the current thread.  If you want to split this off into another
thread, be my guest.  I assume that you are referring to bug report
number 447755, which I opened.  (That reminds me, I opened it under my
old e-mail address.  I've got to get the e-mail address updated.)

Please forgive me if I insult your intelligence or give too much
information.  That is not my intent.  I have a tendency to do that at
times, but I don't do it on purpose.  I do respect you.  I just don't
know what you do know and what you don't know.  So I'll just explain the
whole thing and please politely ignore what you already know.

We'll start with the definition of DASD.  DASD is an acronym which stands
for Direct Access Storage Device.  It's a general term for any storage
device in which the records can be easily accessed in random order,
such as a disk.  This is in contrast with a sequential access storage
device, in which the records must be accessed in sequential order, such
as a magnetic tape.  Historically, DASD was not necessarily a disk device.
In the early days of mainframes, there were magnetic drum devices as well,
and they were also classified as DASD.  But those devices fell by the
wayside long ago.  In today's environment, DASD and disk are practically,
though not technically, synonymous.

Mainframe DASD comes in two basic types: FBA (Fixed Block Architecture)
and CKD (Count Key Data).  FBA DASD is similar to the type of disk
devices used in the world of PCs.  The physical blocks on disk are all
of a fixed size: 512 bytes.  Sometimes you will see FBA DASD described
as an FB-512 device.  In theory, an FBA device can use other blocksizes;
but to the best of my knowledge every FBA device ever made for mainframe
use has a physical blocksize of 512.

CKD DASD is different.  With CKD DASD, the
physical blocks on disk can be of all different sizes, from a theoretical
minimum of 1 to a theoretical maximum of 65535 (hex ffff).  In order to
keep track of things, there is a special little block in front of every
main block called a count block which contains the size (length) of the
following main block, or data block.  In addition, some types of blocks,
such as directory blocks for partitioned data sets, also contain keys.
The key is typically a sort key that is significant to the type of data
being stored.  In the case of a partitioned data set directory block,
it is the key (member name) of the highest-sorting member (in the
EBCDIC collating sequence) of all the members described in that
directory block.  Thus, for a keyed block, there are actually three
blocks on the disk: a count block, a key block, and a data block.

Most blocks do not have keys, they have only a count block and a data
block.  But in the general case, a block on this type of DASD device
may have keys.  Hence the name count-key-data or CKD DASD.  In Linux,
the FBA driver (the dasd_fba_mod kernel module) supports FBA devices and
the ECKD driver (the dasd_eckd_mod kernel module) supports CKD devices.
The E in ECKD stands for Extended.  This refers to some extra channel
commands supported by the control unit which allow some high-performance
options, such as reading a whole track at a time, etc.  But the underlying
data format is still CKD.  When people speak of ECKD DASD, what they
mean is CKD-format DASD which has a control unit which supports the extra
ECKD channel commands.

Different IBM DASD devices are identified by a four-digit device-type
number.  For example, 3370, 9336, 9332, and 9335 are four different
device types of FBA DASD.  9345, 3390, 3380, and 3350 are four different
device types of CKD DASD.  These different device types differ from each
other in things like track capacity, number of tracks per cylinder,
average seek time, average rotational delay, channel speed, etc.
In addition to the main four-digit number, there is often a suffix
to distinguish different models.  These different models most
often differ from each other only in the number of cylinders they possess.
For example, a 3390-3, the most popular model of 3390, has 3339 cylinders,
numbered from 0-3338.  The 3390-9 has 10017 cylinders, numbered 0-10016.

IBM has two main historical operating systems: VSE and MVS.
VSE added support for FBA DASD, but MVS never did.  MVS can only use CKD
DASD.  And since MVS is IBM's most popular (and most lucrative) operating
system, CKD DASD is far more popular with mainframe customers than FBA
DASD is.

Of course, these days, hardly anybody uses "real" mainframe DASD anymore,
be it FBA or CKD.  Most mainframe customers use some kind of RAID box
which is emulating traditional mainframe DASD under the covers.  To the
mainframe, it looks like traditional mainframe DASD.  But the physical
implementation on the back end is some type of RAID implementation which
uses PC hard disks.  So even though the actual physical disks on which
the data is stored may use fixed-length 512-byte blocks, the software
within the RAID box has to make it look like CKD DASD to the mainframe.
Otherwise, you can't run MVS.

In many RAID implementations, a physical
hard disk will contain only whole emulated DASD volumes.  It may be
mirrored somewhere else, but the point is you won't see part of a
logical volume on one physical disk and the rest of that logical volume
on another physical disk.  Since the size of PC hard disks is rarely
a multiple of the size of mainframe logical volumes, this leaves some
unused space left over.  In order to get the maximum space utilization,
some vendors offer non-standard sized disks to the customer when they
configure the RAID box.  They might have, lets say, ten 3390-3 standard-
sized volumes of 3339 cylinders each and one partial volume of whatever
is left.  Maybe it's only, say, 1597 cylinders long.  Different vendors
have different names for these things, but one vendor calls it a
"hyper-volume".  It has all the characteristics of a 3390-3, but it's
only, in this example, 1597 cylinders long instead of the standard 3339
cylinders long.  Some customers don't want these odd-ball sized volumes;
so the vendor doesn't configure them and the leftover space is wasted.

I believe that the Linux FBA driver and the Linux ECKD driver will
support these short volumes, or hyper-volumes.  Somehow they can obtain
the number of cylinders, perhaps through the RDC (Read Device Character-
istics) or RCD (Read Configuration Data) channel commands.  I don't
know how they do it, but somehow they can find out the number of cylinders.
And when Linux is running in an LPAR, or in basic mode on older S390
models, they can use either a standard-sized volume or a hyper-volume.

OK, so far so good.  All of this is background.  So what's a minidisk?
A minidisk is a construct of the z/VM operating system.  z/VM is another
IBM operating system, like VSE and MVS.  The "z" comes from z-series
or system-z, the hardware it runs on (64-bit mainframes).  The VM
stands for Virtual Machines.  You can think of it as "VMWARE for the
mainframe".  z/VM creates virtual images of a mainframe, called
virtual machines.  Each virtual machine has a userid associated with it,
similar to a username under Linux.  (Unlike Linux, however, z/VM does
not allow multiple simultaneous logins under the same name.)
The component of z/VM which creates and manages these virtual machines
is called CP, which stands for the Control Program.

So how does an operating system such as Linux, which is running in a
virtual machine under z/VM, access its DASD?  Well, there are two
basic ways.  One way is by using DEDICATED DASD.  A whole DASD volume
(a regular volume or a hyper-volume) can be dedicated to a particular
virtual machine, either through the DEDICATE statement in the CP
directory entry for the virtual machine or dynamically via the CP
ATTACH command.  In this mode, as the name implies, only that virtual
machine has any access to the DASD volume.  Others virtual machines
cannot touch it.

The other way is by the use of minidisks.  A minidisk is a contiguous
range of cylinders on a DASD volume that CP knows by name.  It's name
is the owning virtual machine combined with the virtual address.
For example, if there is a virtual machine defined to CP called
DEBIAN1, and there is an MDISK statement in the CP directory entry
of DEBIAN1 which defines virtual device number 500, then the name
of the minidisk is DEBIAN1 500.  Minidisks can (potentially) be shared
by multiple virtual machines.  Here is an example MDISK statement:

MDISK 0201 3390 1751 0075 VMSY05 MR HARRY LARRY MARY

This minidisk definition, present in the directory entry for virtual
machine DEBIAN1, defines virtual device number 0201 (or simply 201,
the leading zero is not significant) as a minidisk.  (Device numbers,
whether they are real or virtual, are implicitly hexadecimal numbers.)
The definition states that the underlying device type is a 3390,
which is a CKD device.  It states that the starting cylinder number
of the minidisk is 1751 (a decimal number).  It states that the size
of the minidisk is 75 cylinders (a decimal number).  And it states
that the volume serial number of the real DASD volume on which this
minidisk resides is VMSY05.  The MR is the access mode used by DEBIAN1
to link to this minidisk at logon time.  HARRY, LARRY, and MARY are
minidisk link passwords (as opposed to virtual machine logon passwords).
They allow READ, WRITE, and MULT access, respectively.  If the device
type of the real DASD volume does not match the device type specified
in the MDISK statement, then the MDISK statement is in error.

A special case of a minidisk is when the minidisk overlaps the entire
real DASD volume.  That is, the starting cylinder is zero and the
number of cylinders is equal to the number of cylinders of the real
DASD volume (or is equal to the special keyword END).  This is called
a full-pack minidisk.  By creating a full-pack minidisk you can
share real DASD volumes between virtual machines.  Minidisks,
including full-pack minidisks, can be created on regular or hyper-
volumes.

Do the Linux DASD drivers support minidisks in a virtual machine under
z/VM?  Yes, they do.  A minidisk is similar to a hyper-volume in that
it has the characteristics of the underlying device, but may (and
usually does) have a non-standard number of cylinders.  But instead
of the shortened disk being emulated in hardware by a RAID box, it
is emulated in software by CP.  CP steps in and alters the channel
program to change the cylinder numbers in channel commands so that
when the "real" device sees the channel program, it goes after the
right data.  It's all done by smoke and mirrors inside CP.  The
operating system in the virtual machine thinks its going after cylinder
0 but its really going after cylinder 1751 (in this example).

Now that we're talking about virtual machines under z/VM, the DIAG
driver (kernel module dasd_diag_mod) comes into play.
The DIAG driver works *only* in a virtual machine under z/VM.  It uses
a special instruction (DIAGNOSE code X'250') which is meaningful
*only* in a virtual machine.  But the DIAG driver will work with either
a minidisk or a dedicated DASD device.

OK, so if all three drivers support minidisks, then what is Debian
bug report 447755 all about?  The issue here is the *format* of the
minidisk.  A DASD device, be it a dedicated device or a minidisk,
can have one of four formats under Linux for s390: cdl, ldl, CMS
non-reserved, and CMS reserved.  The FBA driver supports two of the
four formats: CMS non-reserved and CMS reserved.  The DIAG driver
supports three of the four formats: ldl, CMS non-reserved, and
CMS reserved.  The ECKD driver supports all four formats.

Preparing
a disk for use under Linux for s/390, like other operating systems,
and other platforms, involves three basic steps: low-level formatting,
partitioning, and high-level formatting.  How you do that depends on
which disk format you want to end up with.

----------

cdl format:

Low-level formatting: dasdfmt -d cdl (this is the default format for dasdfmt)
Partitioning: fdasd (up to three partitions can be created)
High-level formatting: mke2fs or mkswap

ldl format:

Low-level formatting: dasdfmt -d ldl
Partitioning: none (a single partition is implied)
High-level formatting: mke2fs or mkswap

CMS non-reserved:

Low-level formatting: CMS FORMAT command
Partitioning: none (a single partition is implied)
High-level formatting: mke2fs or mkswap

CMS reserved:

Low-level formatting: CMS FORMAT command
Partitioning: CMS RESERVE command (a single partition is created)
High-level formatting: mke2fs or mkswap

----------

The issue in 447755 is that the Debian installer only supports
cdl format.  And since this is the only format that the DIAG
driver *doesn't* support, the DIAG driver cannot be used, even
after installation, without migrating the data to other minidisks
after the installation.  (It also means that you can't install
to FBA DASD, since cdl format is not valid for FBA DASD.
Since I don't have any FBA DASD, that does not affect me.
But it might affect others.)

My preferred format, for reasons
explained below, is the CMS reserved format.
I have published migration instructions for how to migrate the
data to CMS reserved minidisks after installation here:

http://www.wowway.com/~zlinuxman/diag250.htm

This document also explains why the CMS reserved minidisk is my
preferred format for Linux virtual machines under z/VM.  There is
no problem with Debian *running* on CMS reserved minidisks.
It works great.  The problem is that I can't *install* to CMS
reserved minidisks.  I can with Suse, though.  ;-)
And once I have the data on CMS reserved minidisks, I can use
the DIAG driver.  It works great.  Except for the bug that we
are discussing that prevents it from working with minidisks
that are linked read-only.  To get that to work I need to apply
a patch to the kernel source code and compile a custom kernel.
And that bug affects all distributions, not just Debian.
It's not a Debian-installer-specific problem.

There is also another issue mentioned in 447755, and that is
a problem with mounting devices in the wrong order.  In hindsight,
I should have created two separate bug reports: one for the lack
of support for CMS minidisks and one for the mount order issue.

I apologize for the long reply, but again; I don't know what
you know and what you don't know.  I hope I haven't put you to
sleep.  And I hope that I have resolved the confusion.

On 2009-12-15, Frans Pop wrote:
> Eh, what? Why would I have to do that? I have no special status
> here.

As an official Debian developer, you carry more weight than I do.
I can appeal to the kernel team, of course; but if you say I'm
full of excrement, they'll believe you more than they will me.


Reply to: