[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: perspectives on 32 bit vs 64 bit



Adam Skutt wrote:

Helge Hafting wrote:

Adam Skutt wrote:

Helge Hafting wrote:

You can address more than 4GiB by using the always-unpopular
"segment" registers found on intel processors.



How? In protected-mode, they're in use as segement descriptor selectors. Certain bits have specific meanings you cannot override, as they're part of the memory protection mechanism.



Yes, so?

That means it's logically impossible to have a 48-bit pointer, at all period.

You are right that this isn't a true 48-bit pointer.  The upper 16
bit of such a pointer is not a numerical part that can be incremented
the ordinary way.  But it _is_ a way that lets you have more than
32 bits of address space, although this way is so cumbersome that
nobody sane would bother implement it.

(Pointer arithmetic no longer being simple add/subtract, precisely
due to the descriptors, invoking the _swapper_ whenever we
reference a pointer to another 4G area . . .)

Sigh.  All mechanisms that lets the os support more than 4GB for
several processes, can be used to support more than 4GB for a
single process as well.  That is trivial, although also less efficient
than only supporting 4GB.

Yes, but it's obvious now you didn't understand what I said.

You /cannot/ have more than 32-bits of virtual address space.  Period.
There is no way to do it.

What you can do is remap the same virtual space to different physical addresses. Which is different from having extra v.a.s.


 Whenever the app reloads a segment register,

(i.e. trying to use a 48-bit pointer where the segment descriptor
             differs from the last pointer used)

This isn't a 48-bit pointer, because descriptor selectors aren't pointers.

Not a true 48 bit pointer, it doesn't give you 48 bits of address space.
But it gives you more than 32 bit, thats my point.  And I called it a
"48-bit pointer" because storing such a pointer indeed takes 48 bit
for the selector & offset.

And it won't work anyway. How do I get a base offset higher than 0xFFFFFFFF? And if I add to it, what behavior is yielded?

You don't get a higher base offset than that - but I never said so
either.  Your compiler have to support a segment switch whenever
you cross a 4GB boundary.  Needless to say, this makes all
pointer arithmetic slow.

Not what is desired, to say the least.

Nobody desires this way of programming - but it is possible.
I never claimed it was useful - get a 64-bit processor instead I said.


You can't have more than 32-bit v.a.s. Anytricks to get around that don't really get around that, they just have the same addresses the user-space code sees point to different physical addresses.

I really don't see how this is possible leafing through the IA-32 System Programming Guide so links or text would be preferred.

No guide will tell you how, they'll guide you towards something saner.
It is all there in the specs though, and is easier to understand if
you compare to a similiar situation in the 1980's:

Nobody ever used the 48-bit pointer system, but a 32-bit pointer
system (16-bit selector + 16-bit offset) was widely used to support more
than 1MB on the 80286 processor.  Of course this wasn't true
32-bit pointers either, they needed 32 bits of storage space but
merely allowed  a 24-bit address space.  Pointer arithmetic was
highly nontrivial due to the "selector" part of the pointer, but it worked.
The compilers did support data structures bigger than 64kB (and bigger
than 1MB), even though you couldn't have an offset bigger than 64kB.
They supported this by changing the segment selector when necessary.
Such pointer arithmetic was time-consuming and slow -
and programmers laughed at it because
true 32-bit processors were available at the time. But those didn't run
microsoft windows.

At least two operating systems used this programming model-
windows 3.0 and os/2 v.1.3. The 80286 was popular, unfortunately.

Using "48-bit" pointers (16-bit selector + 32-bit offset) works much
of the same way, but with an added problem:  Where the 80286
created a 24-bit address from a 32-bit segmented pointer,
the 80386 creates a 32-bit pointer from a 48-bit segmented pointer.
This is the only extra problem that we get, other problems,
such as the offset not being greater than 32-bit is solved the
same way as 80286 programmers solved the problem of the offset
not being more than 16 bit.  The offset limitation don't stop us, it
is merely a performance problem.

The 32-bit address problem is solved by having only one segment selector
marked present at any time.  Accessing any other selector will then
give a "segment not present" trap, similiar to a page fault.  The os
can then resolve the problem by changing the PAE-extended page
tables, mapping a different 32-bit address space, marking the new
selector present
(and marking the previously used one not-present) and then
restart the instruction.  This step makes 48-bit segmented pointers
even slower than the 32-bit segmented pointers once where,
but the approach is doable.  This technique also limits your
use of the 80386 instruction set, you can't use any instruction that
might use two selectors at the same time. (It't be very hard to map
both if they have the same offset, particularly in case of the
string move instructions.)  Fortunately, such instructions are
not needed to get working programs, so the compiler could
simply be set to not generate them.   (Someone executing such
code would not bring down the system, but the system
might want to kill the process when it keeps faulting on the
same instruction over and over.

How much time will the remapping need?  Not
that much, because there is no need to alter the page tables,
you simply reload a new page directory into cr3.  Everything is
in memory, all you get is the overhead of the not-present fault trap
and the TLB refills resulting from the new page table.  Also, you
won't really map 4GB at a time.  There is some overhead - the
program code will have to be mapped the same way in all the 4G
mappings, or you won't be able to restart the instructions. I have
never heard of anyone needing more than 4G of _code_ though.
The same goes for the os kernel, which also must be mapped all
the time.
Helge Hafting







Reply to: