[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Extracting indiviual files or directories from XYZ.tar.xz - Possible?



On Sat, Aug 9, 2025 at 2:50 AM Vincent Lefevre <vincent@vinc17.net> wrote:
> On 2025-08-09 01:30:52 -0700, Michael Paoli wrote:
> > < tar.xz | xz -d | tar tf -
>
> With tar utilities that support xz (like GNU tar), not using "xz -d"
> could be more efficient as "xz -d" will uncompress the whole file
> while this may not be necessary:
>
>   tar tf file.tar.xz
>
> is sufficient. This may allow one to skip xz blocks if the archive
> contains big files. That said, I don't know whether GNU tar has
> such an optimization.
I rather doubt any tar implementation has such an optimization.
I don't think there's any tar format that has an "index" or the like,
it's generally just a tar header, then for each file (of any type),
specific header for that, and any associated data, and I think there's
also some type of end marker or the like, perhaps with a bit of metadata
at the end too.  And there may be some type of marker or the like at
the end of the archive too.  But I think that's basically it.
And if compression is used, same as the compression programs would do,
additional header, compressed data and however they handle that,
and likely some type of end marker, at least for most compression
formats, and that's it.  So, even if tar is requested to restore a single file,
and has gotten to the point in archive where it's extracted that file,
that doesn't mean it can quit at that point.  Alas, same file/pathname
may appear again later in the archive, with same or differing data.

So, e.g.:
$ tar -tf tar
f
this_could_be_a_huge_file
f
$ tar -tvf tar
-rw------- michael/users     0 2025-08-09 19:20 f
-rw------- michael/users     0 2025-08-09 19:20 this_could_be_a_huge_file
-rw------- michael/users     1 2025-08-09 19:24 f
$
Exact same pathname f, two different sets of contents and mtimes.
So, after tar has read past the first file, into the second, it doesn't know
if the pathname of the first repeats, with same, or differing content.
So in most all circumstances it will continue, in fact reading through
the entire archive.
But some versions of tar may have option to shortcut that.
E.g. bsdtar has -q option, GNU tar may have similar.
Not sure if there exists any tar that can extract or list only the nth occurence
of the same pathname, e.g. if there are 3 occurrences of the same pathname,
to request extracting only the 2nd occurrence.  But certainly such
could be done,
e.g. at least lower level libraries would have access to that data,
and could be requested to handle that accordingly.  So, maybe there
well exists a tar implementation or utility that already has convenient
option or the like to be able to do that.
See also tar's [-]r option.
Note also that such can be highly practical.  E.g. in our example, if
f is small/tiny file, our other file is huge, we earlier created tar archive
with first f in it, then the huge file.  Now f has changed, and we
want to update
the archive - but we don't want to have to reread the data of our huge
file again
or have to overwrite all that data to the tar file.  Well, we can simply append
a backup of f to the existing tar archive.  And when extracting, at least by
default, each occurrence of f will be extracted, and the last extraction of that
will generally clobber any earlier extractions of such.


Reply to: