[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: sysadmin qualifications (Re: apt-get vs. aptitude)



On 10/22/2013 8:47 PM, berenger.morel@neutralite.org wrote:


Le 22.10.2013 23:01, Jerry Stuckle a écrit :
On 10/21/2013 5:26 PM, berenger.morel@neutralite.org wrote:
Le 18.10.2013 19:36, Jerry Stuckle a écrit :
On 10/18/2013 1:10 PM, berenger.morel@neutralite.org wrote:
Le 18.10.2013 17:22, Jerry Stuckle a écrit :
On 10/17/2013 12:42 PM, berenger.morel@neutralite.org wrote:
Le 16.10.2013 17:51, Jerry Stuckle a écrit :
I only know few people who actually likes them :)
I liked them too, at a time, but since I can now use standard
smart
pointers in C++, I tend to avoid them. I had so much troubles with
them,
so now I only use them for polymorphism and sometimes RTTI.
I hope that someday references will become usable in standard
containers... (I think they are not because of technical problems,
but I
do not know a lot about that. C++ is easy to learn, but hard to
master.)


Good design and code structure eliminates most pointer problems;
proper testing will get the rest.  Smart pointers are nice, but in
real time processing they are an additional overhead (and an
unknown
one at that since you don't know the underlying libraries).

Depends on the smart pointer. shared_ptr indeed have a runtime cost,
since it maintains additional data, but unique_ptr does not,
afaik, it
is made from pure templates, so only compilation-time cost.


You need to check your templates.  Templates generate code.  Code
needs resources to execute.  Otherwise there would be no difference
between a unique_ptr and a C pointer.

In practice, you can replace every occurrence of
std::unique_ptr<int> by
int* in your code. It will still work, and have no bug. Except, of
course, that you will have to remove some ".get()", ".release()" and
things like that here and there.
You can not do the inverse transformation, because you can not copy
unique_ptr.

The only use of unique_ptr is to forbid some operations. The code it
generates is the same as you would have used around your raw pointers:
new, delete, swap, etc.
Of course, you can say that the simple fact of calling a method
have an
overhead, but most of unique_ptr's stuff is inlined. Even without
speaking about compiler's optimizations.


Even inlined code requires resources to execute.  It is NOT as fast
as regular C pointers.

I did some testing, to be sure. With -O3, the code is exactly the same.
Did not tried with -O1 and -O2. Without optimization, the 5 lines with
pointers were half sized of those using unique_ptr. But I never ship
softwares not optimized (the level depends on my needs, and usually I do
not use -O3, though).


First of all, with the -O1 and -O2 optimization you got extra code.

Did you try it? It just did, with a code doing simply a new and a delete
with raw against unique_ptr. In short, the simplest usage possible.
Numbers are optimization level, p means pointer and u means unique_ptr.
It seems that it is the 2nd level of optimization which removes the
difference.


Which is why your -O3 was able to optimize the extra code out. A more complicated test would not do that.

  7244 oct.  23 01:57 p0.out
  6845 oct.  23 01:58 p1.out
  6845 oct.  23 01:58 p2.out
  6845 oct.  23 01:58 p3.out

11690 oct.  23 01:59 u0.out
10343 oct.  23 01:59 u1.out
  6845 oct.  23 01:59 u2.out
  6845 oct.  23 01:59 u3.out

That means the template DOES create more code.  With -O3, your
*specific test* allowed the compiler to optimize out the extra code.
But that's only in your test; other code will not fare as well.

Indeed it adds code. But what is relevant is what you will release, and
if by adding some switches (I am interested in that stuff now, but too
tired from now. Tomorrow I'll make testing with various switches to know
which one exactly allows to have those results, plus a better testing
code. Sounds like a good occasion to learn few things.) you have the
same final results, then it is not a problem, at least for me.


Only in your simple case.

Now, I have never found any benchmark trying to compare raw pointers and
unique_ptr, could be interesting to have real numbers instead of
assumptions. I'll probably do that tomorrow.


I don't care about benchmarks in such things. If I need a unique_ptr, I use a unique_ptr. If I don't, I (may) use a raw pointer.

If there is a performance problem later, I will find the problem and fix it. But I don't prematurely optimize.

unique_ptr must manage the object at *run time* - not at *compile
time*. To ensure uniqueness, there has to be an indication in the
object that it is being managed by a unique_ptr object.

Wrong. std::unique_ptr is not intrusive, the content never knows that it
is managed manually or by a smart_ptr. This is also valid for shared_ptr.

In fact, the uniqueness is not guaranteed. This code shows what I mean:

std::unique_ptr<int> foo, bar ( new int ( 5 ) );
foo.reset( bar.get() );
bar.reset();
printf( "%d", *foo );

This is a simple example of how to break the features given by
unique_ptr. But the get() method is needed for compatibility problem and
because they did not made a weak_ptr for unique_ptr, as they did for
shared_ptr. I have read that this issue might be fixed in 2014 ( the
need for raw pointers, not the fact that get() can break the guarantees ).
In real code, you could have that problem too, if responsibilities are
not correctly defined, but it will probably happens less often than for
raw pointers because since the assignment and copy constructors for raw
pointers are explicitly deleted, the compiler will give you an error.
This error is one of the good points of that smart pointer, and it is
compile-time stuff.


True, my mistake. You should never have two unique_ptr objects pointing at the same object, but it is not guaranteed.

Additionally,
when the unique_ptr object is destroyed, the object being pointed to
must also be destroyed.

If you do not provide an empty deleter, you are right. This is the
default behavior. But you can provide one, for example if you need to
interface with C libraries, like SDL.

The object is destroyed, whether you provide a destructor or not. If you do not provide one, the compiler provides an empty one.

For example, to manage the SDL_Surface objects, you can simply pass it
with something like this: "std::unique_ptr< SDL_Surface,
SDL_FreeSurface> surface_ptr;".
Or you can provide an empty deleter.
Or you can use the release method before destruction.

But anyway, unique_ptr are made to automate RAII, so deleting
automatically is a good thing. I can not see why you could want to use a
unique_ptr and not allow it to delete things... That would be the same
as using a vector and never using it's capacity to be a dynamic
container. Instead, use a raw pointers or references.


You don't.  That's one of the purposes of a unique_ptr.

Neither of these can be handled by the compiler.  There must be
run-time code associated with the unique_ptr to ensure the above.

As I said, the destructor compiled with O3 have no need to runtime code
(same for O2).
Same for allocations and constructors.

Now, if you have any code where what I said is wrong, I would be happy
to take a look at it.


You never *need* a constructor or a destructor, unless you need to do things yourself. But if you do not provide them, the compiler will provide dummy ones for you. This satisfies the C++ requirement that all objects have both a constructor and a destructor.

You should look at the unique_ptr template code.  It's not easy to
read or understand (none of STL is).  But you can see the code in it.

I do it quite regularly, and honestly, for many parts it is not as
complex as people usually says.
Unique_ptr is a good example there, it would be really easy to read if
programmers did not messed the code with tabulation and spaces for
indentation. And if you use the classic setup of 8 spaces per tab, you
won't have my problem (I usually uses 2 spaces for terminals, and 4 on
graphical applications).

For other parts, it may be harder, but still, if you understand how
templates works that's not so hard when you are accustomed to their
naming conventions. Plus, that code is not too badly documented, so it
makes things not so hard.

Indeed, it is not easy to read for people not used to C++ syntax, but I
allows myself to think that I am no longer a beginner with this language.


STL is a very poor example on how to code. Bad naming conventions, even worse documentation...

Sure, you can spend a lot of time trying to decode it, but good code doesn't need that much work. Even after about 25 years of C++ I find myself spending too much time trying to read it when I have to.

But all of this have nothing related to the need of understanding
basics
of what you use when doing a program. Not understanding how a
resources
you acquired works in its big lines, imply that you will not be
able to
manage it correctly by yourself. It is valid for RAM memory, but
also
for CPU, network sockets, etc.


Do you know how the SQL database you're using works?

No, but I do understand why comparing text is slower than
integers on
x86 computers. Because I know that an int can be stored into one
word,
which can be compared with only one instruction, while the text will
imply to compare more than one word, which is indeed slower. And it
can
even become worse when the text is not an ascii one.
So I can use that understanding to know why I often avoid to use
text as
keys. But it happens that sometimes the more problematic cost is
not the
speed but the memory, and so sometimes I'll use text as keys anyway.
Knowing what is the word's size of the SQL server is not needed to
make
things work, but it is helps to make it working faster. Instead of
requiring to buy more hardware.


First of all, there is no difference between comparing ASCII text and
non-ASCII text, if case-sensitivity is observed.

Character's size, in bits. ASCII uses 7 bits, E-ASCII uses 8, UTF8
= 8,
UTF16 = 16, etc. It have an impact, for both memory, bandwidth and
instruction sets used.


But ASCII, even if it only uses 7 bits, is stored in an 8 bit byte.
A 4 byte ASCII character will take up exactly the same amount of room
as a 32 bit integer.  And comparison can use exactly the same machine
language instructions for both.

Be fair. Compare the max number you can represent with a uint32_t with
the max you can represent with a char*.
For int8_t and char:
256 values versus 10 if we limit ourselves to numbers. If we limit
ourselves to printable ASCII characters, ok, we can have... a little
more than 100, but what a strange numeric base it will be...


The max number of values you can store in a char is 2^8.  The max you
can represent with a 4 byte character field is exactly the same as
that of an int.  Both have 32 bits, so it's 2^32.  Nothing says a
char* field has to contain only the digits 0-9 (or A-Z for that
matter).

I know that pretty well.
If you use chars as very short integers, you are right. I did that
sometimes, too, when I did not known about uint8_t and int8_t.
But when you use them for real text, you only rarely will use values
like, say, 0x01 to 0x10. I am not very precise, and that range could
contain the \t. I do not remember it's ascii value... (and IIRC, 0x10 is
\r or \n, not sure which one. But you probably know what I mean here.).

Technically, everything is only bits, 0 and 1. But when I see a char[4]
I expect it to contains printable characters, not the equivalent to an int.


That's one difference. I don't expect it to necessarily contain printable characters. I expect it only to contain 4 characters worth of information. Documentation will say whether it is printable characters or not.

The exact same set
of machine language instructions is generated.  However, if you are
doing a case-insensitive comparison, ASCII is definitely slower.

And saying "comparing text is slower than integers" is completely
wrong.  For instance, a CHAR(4) field can be compared just as quickly
as an INT field, and CHAR(2) may in fact be faster, depending on many
factors.

It is partially wrong. Comparing a text of 6 characters will be slower
than comparing short.
6 characters: "-12345" and you have the same data on only 2 bytes.


I didn't say 6 characters.  I SPECIFICALLY said 4 characters - one
case where your "strings take longer to compare than integers) is
wrong.

With you char[4], you can only go to 9999, even a short can represent
more, and so, compute (including comparison) faster.
Now, you say I'm completely wrong, when you take 1 case where I am
wrong. I agreed with you that I was partially, but by saying that
comparing text is slower than comparing integers, I did not specifically
same binary size. Plus, in the SQL context, I thought it was obvious
that I was referring to the fact that sometimes integers are good
replacements for primary keys.


And what says you are limited to digits?  And since both are 4 byte
fields, they can hold exactly the same number of bits, and therefore
the same number of values.  Now often a char[4] will only contain
alphanumeric values, which limits the number of values it does
contain.  But either way, he comparison is exactly the same.

I never said integers weren't good replacements as primary keys; I
just disagreed with your statement that char fields always take longer
for comparison.  They do not.

But, fine. You said, without taking uppercase/lowercase, that's the
same?
So, how would you sort texts which includes, for example: éèê ?
Those chars are greater than 'f' if you take only their numerical
values. But then, results would be... just wrong. They should, at least
in French, be considered equal with the 'e'. That's what users would
expect. So the comparison of texts will need replacement and then
comparison, for the simpler (and the worse, since it does not sort the
accentuated characters themselves plus it loses informations) solution.
And we have plenty of characters like that here, that we use on lot of
words (and names, indeed).


How can you do it in an integer field?

But now you're changing the rules (again), bringing into play
localization.  I said nothing about that in my previous statements. A
plain char[4] field does not include localization.

Indeed, if you use characters as if they were numbers... correct syntax,
yes. But I wonder about the semantic.


Absolutely nothing wrong with it.  Works fine.

Texts are directly linked with localization, so you can not simply
compare them as if they were integers. It only works with English ( or
maybe there is another language which uses only 26 letters without
accents, but I do not know about it). The old trick to consider chars as
bytes should not be taught longer in my opinion. Students could use it
real situations where there are accentuated characters, and cause bugs
for nothing.


Text is not necessarily linked to localization.  Maybe in France they
are, but not here in the U.S.

As I said, in English it is obviously not need needed, since you simply
have to use ASCII, and things are made automatically. But localization
is something that every other countries have to work with, if they have
accentuated characters. Of course, I only speak about about latin
languages, for other languages I have not idea.


That's where I admit virtually all of my experience (even when I was in Hong Kong and Germany) was with latin_1 (American) with no accented characters. But then I've always worked with American firms, even overseas.


Plus, CHAR(4) is not necessarily coded on 4 bytes. Characters and
bytes
are different notions.


In any current database, CHAR(4) for ASCII data is encoded in 4
bytes. Please show where that is not the case.

Ok, you got me one the char(4) stuff. All that time I was thinking about
text, but forgot to be explicit enough. I thought I mentioned unicode
somewhere... but I'm too lazy to check it.


I was talking specifically about char(4).  I did not mention unicode
or other character sets.

And at start, I was speaking about texts.


As a side note, I'm surprised. CHAR does not seems to be in the
standard:

http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CCwQFjAA&url=http%3A%2F%2Fjtc1sc32.org%2Fdoc%2FN1951-2000%2F32N1964T-text_for_ballot-FCD_9075-2.pdf&ei=6JBlUoPXNoa00QWCrIGYAw&usg=AFQjCNHiJl_XShEUGPmObfmrji81RtDVNg&sig2=1vIOHIp64_oLVO8rIuMjIA&bvm=bv.54934254,d.d2k




It is.  See page 151 - CHAR is a reserved word.  Page 177 - CHAR can
be used in place of CHARACTER.

Oh, did not seen it. Thanks.


But if an extra 4 byte key is going to cause you memory problems,
you're hardware is already undersized.

Or your program could be too hungry, because you did not know that you
have a limited hardware.


As I said - your hardware is already undersized.  If adding 4 bytes
to a row is going to cause problems now, you'll have even greater
problems later.

Smaller stuff is often better. It is not a problem of hardware being
undersized.


Smaller is better within limits.  Clarity and portability are more
important, IMHO.

I said often, right?


Yes, and I just said I don't worry about size unless it is a problem.


And here, we are not in the simple efficiency, but to something
which
can make an application completely unusable, with "random" errors.


Not at all.  Again, it's a matter of understanding the language you
are using.  Different languages have different limitations.

So it must be that C's limitations are not fixed enough, because size
types can vary according to the hardware (and/or compiler).


Sure.  And you need to understand those limitations.

Indeed. And those are dependent on hardware and compiler, for the C and
C++ languages at least.


No, they are completely dependent on the compiler being used.

And BTW - even back in the days if 16 bit PC's, C compilers still
used 32 bit ints.

I can remember having used borland TurboC (can't remember the version, I
do not even remember if it was able to support C++), and it's "int" type
was a short. 16 bits. It is when I switched to another (more recent one)
compiler that I stopped using int, for this exact reason. And when I
discovered stdint.h, I simply immediately loved it. No more surprises,
no implicitly defined sizes.


Yes, and long was 32 bits.  Check the C and C++ specs.  The only
thing they say is the short <= int <= long.  There is no reason an int
could not have been 32 bits; it's just the way Borland defined it in
their compiler.

I never said that their behavior was standard or not standard. I simply
explained why now I try to avoid using short, int and long, which have
different behaviors depending on the tools, and why I think C99 was a
nice improvement to bring in stdint.h (which MS does not include... I
won't comment that here).
But even with that standard, I have never seen any learning document (
or any lesson ) encouraging their uses. As for some other very important
stuff in programming, I had to discover that myself. Not a problem, of
course, when I chose programming as a job, I known that, but still...


Once again, it's all about knowing your tools. We teach their use, and tell people what the standard is, and how to pick the appropriate type.

But even then it can be a problem. For instance, there are differences between short, int and long in 32 and 64 bit versions of many C and C++ compilers.


So, ok, if you can find a job when you have every single low level
feature you will require through high level functions/objects,
having
knowledge of on what you are sit on is useless. Maybe I am wrong
because
I actually am interested by knowing what is behind the screen,
and not
only by the screen itself. But still, if you only know about your
own
stuff, and the man who will deploy it only knows about his own
stuff,
won't you need a 3rd person to allow you to communicate? Which imply
loss of time.

No, it's called a DESIGN - which, amongst other things, defines the
communications method between the two.  I've been on projects with up
to 100 or so programmers working on various pieces.  But everything
worked because it was properly designed from the outset and every
programmer knew how his/her part fit into the whole programs.

I do not think that most programmers work in teams of hundreds of
people. But I may be wrong. I do not know.


I didn't say most did.  I DID say they exist, for large projects.

I never said you said all did. I simply said I do not know the average
size of programmer's teams around the world.

I think, but it's pure guessing based on my small experience, that IT
services with more than 30 persons in R&D are not the most common
situations, and if I am right, then programmers... no, people, have to
be able to interact with other teams which are doing different things.
And to interact, you need to be able to speak the same language.
At least, that's what I have seen in my experiences. But, indeed, my cv
is far less impressive than yours, and I would never use it to prove
that I am true.

(note: we were exactly 6 programmers. There were 3 sysadmins, 1 lead
project, and 3 others were for support.)


In most companies, programmers are not part of R&D.  They are there
to support the business.  For instance, in an insurance company, they
write programs to support customers, transactions, various policy
types offered by the company, billing, accounting, payroll - on and
on. Engineering firms (like it sounds like you work for) have a much
higher percentage in R&D, often because they are smaller and use a lot
more packaged solutions to run their businesses, whereas there are no
packaged solutions to support their engineering needs (i.e. ARM
controllers, etc.).

In my last job, when we had something to release, we usually talked
directly with the people who had then to deploy it, to explain them
some
requirements and consequences, that were not directly our
programmer's
job. Indeed, I was not employed by microsoft, google or IBM, but
very
far from that, we were less than 10 dev.
But now, are most programmers paid by societies with hundreds of
programmers?


In the jobs I've had, the programmers have never had to talk to the
deployers.  Both were given the design details; the programmers wrote
to the design and the deployers generated the necessary scripts do
deploy what the design indicated.  When the programmers were done,
the
deployers were ready to install.

Maybe you worked only in big structures, or maybe this one was doing
things wrong. But the IT team was quite small, if we only consider
sysadmins, dev, project leads and few other roles. Less than 15
persons.


Even 15 person teams can do it right.  Unfortunately, too many
companies (both big and small) won't do it right.  A major reason why
there are so many bugs out there.  Also a major reason why projects go
over time and over budget.

Sorry, but I do not see what's wrong with having communication between
various actors of a project.
I have followed various classes of various good ways to manage projects,
stuff about ITIL for example, and I can not remember having learn that
part of the process was wrong in my last job. Other were, and not only a
little, but I do not think having learn that something was wrong in that
part. But that was only lessons, and I never read the whole ITIL stuff.



There's no problem with having communications with small groups of
programmers.  But when you have > 100 programmers working on a
project, you can't have all of them communicating with each other -
nothing would get done.  So you break them up into small teams and
assign each group a piece of the program.  Those people talk together,
and the team
leaders talk together.  It is much more manageable.

But "doing it right" has very little to do with team size.  Rather,
it has to do with management.

<snip>
To be fair, it wasn't all the manager's fault.  He was under a lot of
pressure from above to do better.  But never having been a programmer
himself, he had never seen how a well-oiled operation worked.  But he
never listened to the couple of guys on his team who tried to convince
him of how it should run before I was brought in.

I did not thought any second that it was only his fault. But it is a
common problem of what I can see even with my young age: some people
thinks they know what others should do even if they do not know what
they need.
Computer science is a young science, it will take time to people to
admit that softwares are not just wind. Giving people enough time to do
their job seems does not seems to be the rule those days. And not only
in computing.


Computer Science is not young any longer. It was young when I started around 45 years ago; it was maturing 30 years ago. But I would never call it "young" any more.

I could go into a lot about what's going on, but we've strayed way off topic here. I think this is a good time to stop and I won't respond any longer.

Jerry


Reply to: