Tux

...making Linux just a little more fun!

compressed issues of LG

Minh Nguyen [nguyenminh2 at gmail.com]


Fri, 2 Nov 2007 14:13:10 +1100

So far, issues of LG have been compressed using tar and gzip. Is there any intention to use tar with bzip2 for future issues? Since most of the files in each issue are text files, bzip2 is more efficient (in terms of the size of the compressed file) than gzip. Here is a comparison of bzip2 and gzip using the current issue; i.e. November 2007 (#144):

1028042 lg-144.tar.bz2
1045337 lg-144.tar.gz
IMHO, providing a bzip2 compressed format of LG issues would save some download time.

Regards

Minh Van Nguyen


Top    Back


Ramon van Alteren [ramon at forgottenland.net]


Fri, 02 Nov 2007 09:41:34 +0100

Minh Nguyen wrote:

> So far, issues of LG have been compressed using tar and gzip. Is there
> any intention to use tar with bzip2 for future issues? Since most of
> the files in each issue are text files, bzip2 is more efficient (in
> terms of the size of the compressed file) than gzip. Here is a
> comparison of bzip2 and gzip using the current issue; i.e. November
> 2007 (#144):
>
> 1028042 lg-144.tar.bz2
> 1045337 lg-144.tar.gz

That is a 1% size decrease.

Best regards,

Ramon


Top    Back


Ben Okopnik [ben at linuxgazette.net]


Fri, 2 Nov 2007 09:02:47 -0400

On Fri, Nov 02, 2007 at 02:13:10PM +1100, Minh Nguyen wrote:

> So far, issues of LG have been compressed using tar and gzip. Is there
> any intention to use tar with bzip2 for future issues? Since most of
> the files in each issue are text files, bzip2 is more efficient (in
> terms of the size of the compressed file) than gzip. Here is a
> comparison of bzip2 and gzip using the current issue; i.e. November
> 2007 (#144):
> 
> 1028042 lg-144.tar.bz2
> 1045337 lg-144.tar.gz
> 
> IMHO, providing a bzip2 compressed format of LG issues would save some
> download time.

As I recall, we had a similar discussion here in TAG quite a while back (digging through my 'Sent_mail' says 2002 - but I can't find it in LG. Annoying, that.) In any case, here's the comparison that I ran then:

	OK, I'm the curious type... Here's a bunch of files from many walks of
	life; let's see who does what.
 	
	-rw-r--r--    1 ben      ben       1474560 May 20 05:51 test.bin
	-rw-rw-r--    1 ben      ben        102970 Sep 19  2000 test.bmp
	-rw-rw-r--    1 ben      ben        121880 Sep 19  2000 test.gif
	-rw-rw----    1 ben      ben        939783 Jun 17 15:29 test.jpg
	-rw-r--r--    1 ben      ben       1727320 Oct  6 15:51 test.mov
	-rw-r--r--    1 ben      ben       1048576 Oct 16 20:59 test.nulls
	-rw-r--r--    1 ben      ben       1048576 Oct 16 21:03 test.ones
	-rw-r--r--    1 ben      ben        490765 Sep  1  2001 test.pbm
	-rw-r--r--    1 ben      ben        197029 Oct 12 13:53 test.ps
	-rw-rw-r--    1 ben      ben       1995119 May 29  2001 test.txt
	-rw-r--r--    1 ben      ben      36354922 Oct 16 20:29 test.wav
 	
	# So then, I was like, "Dude, check out some of this stuff:"
 	
	rar a ../rar.rar *      # Very slow
	zip ../zip.zip *
	tar czf ../tgz.tgz *    # Uses gzip as compressor
	tar cjf ../tbz2.tbz2 *  # Uses bz2 as compressor, slowest of all
	tar cf -|compress - 
 	
	# And the winnah and champeen is...
 	
	-rw-r--r--    1 ben      ben      26653542 Oct 16 21:09 rar.rar
	-rw-r--r--    1 ben      ben      33171830 Oct 16 21:26 tbz2.tbz2
	-rw-r--r--    1 ben      ben      36128937 Oct 16 21:10 zip.zip
	-rw-r--r--    1 ben      ben      36132733 Oct 16 21:14 tgz.tgz
	-rw-r--r--    1 ben      ben      43458125 Oct 16 21:21 Z.Z
	
	I'll be darned. Looks like "rar" is it. Whodathunk? 
Unfortunately, the only method that shows an appreciable savings in size - 'rar', that is - uses a proprietary algorithm.

Given that there's no appreciable gain to be had by changing - and that a change may occasion problems (e.g., it would break any automated scripts that download and decompress the monthly archives), I don't see it changing any time soon. I'm usually pretty reluctant to change things like this without a really compelling reason.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Kapil Hari Paranjape [kapil at imsc.res.in]


Fri, 2 Nov 2007 19:54:17 +0530

Hello,

On Fri, 02 Nov 2007, Ben Okopnik wrote:

> 	# And the winnah and champeen is...
> 	
> 	-rw-r--r--    1 ben      ben      26653542 Oct 16 21:09 rar.rar
> 	-rw-r--r--    1 ben      ben      33171830 Oct 16 21:26 tbz2.tbz2
> 	-rw-r--r--    1 ben      ben      36128937 Oct 16 21:10 zip.zip
> 	-rw-r--r--    1 ben      ben      36132733 Oct 16 21:14 tgz.tgz
> 	-rw-r--r--    1 ben      ben      43458125 Oct 16 21:21 Z.Z
> 	
> 	I'll be darned. Looks like "rar" is it. Whodathunk? 

You should've tried "7zip".

Regards,

Kapil. --


Top    Back


Breen Mullins [breen.mullins at gmail.com]


Fri, 2 Nov 2007 07:58:58 -0700

* Kapil Hari Paranjape <kapil@imsc.res.in> [2007-11-02 19:54 +0530]:

>
>You should've tried "7zip". 
>
That would've been quite a trick in 2002...

Breen

-- 
Breen Mullins
Menlo Park, California

Top    Back


Ben Okopnik [ben at linuxgazette.net]


Fri, 2 Nov 2007 11:09:00 -0400

On Fri, Nov 02, 2007 at 07:58:58AM -0700, Breen Mullins wrote:

> * Kapil Hari Paranjape <kapil@imsc.res.in> [2007-11-02 19:54 +0530]:
> 
> >
> >You should've tried "7zip". 
> >
> That would've been quite a trick in 2002...

I was wondering about that. Like I said, I only heard about it much later - and it was being touted as a brand-new widget then.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back


Ben Okopnik [ben at linuxgazette.net]


Fri, 2 Nov 2007 11:08:06 -0400

On Fri, Nov 02, 2007 at 07:54:17PM +0530, Kapil Hari Paranjape wrote:

> Hello,
> 
> On Fri, 02 Nov 2007, Ben Okopnik wrote:
> > 	# And the winnah and champeen is...
> > 	
> > 	-rw-r--r--    1 ben      ben      26653542 Oct 16 21:09 rar.rar
> > 	-rw-r--r--    1 ben      ben      33171830 Oct 16 21:26 tbz2.tbz2
> > 	-rw-r--r--    1 ben      ben      36128937 Oct 16 21:10 zip.zip
> > 	-rw-r--r--    1 ben      ben      36132733 Oct 16 21:14 tgz.tgz
> > 	-rw-r--r--    1 ben      ben      43458125 Oct 16 21:21 Z.Z
> > 	
> > 	I'll be darned. Looks like "rar" is it. Whodathunk? 
> 
> You should've tried "7zip". 

I recall finding out about and playing with 7zip well after this discussion; I don't recall being particularly impressed with it one way or another. Looking at it now, one reason, at least, stands out:

 From the man page:
 Backup and limitations
       DO NOT USE the 7-zip format for backup purpose on Linux/Unix because :
        - 7-zip does not store the owner/group of the file.
Compression-wise, using my 'Sent_mail' archive (I've trimmed the output for readability):

ben@Tyr:/tmp/t$ time tar cvzf Sent_mail.tgz Sent_mail 
real    0m34.554s
ben@Tyr:/tmp/t$ time tar cvjf Sent_mail.tbz Sent_mail 
real    1m52.085s
ben@Tyr:/tmp/t$ time tar cvZf Sent_mail.tar.Z Sent_mail 
real    0m47.239s
ben@Tyr:/tmp/t$ time tar cvf - Sent_mail | 7zr a -si Sent_mail.7z
real    2m34.064s
ben@Tyr:/tmp/t$ ls -lS
total 551944
-rw-r--r-- 1 ben ben 162769893 2007-11-02 10:40 Sent_mail
-rw-r--r-- 1 ben ben 128435455 2007-11-02 10:56 Sent_mail.tar.Z
-rw-r--r-- 1 ben ben  96948867 2007-11-02 10:52 Sent_mail.tgz
-rw-r--r-- 1 ben ben  92754358 2007-11-02 10:54 Sent_mail.tbz
-rw-r--r-- 1 ben ben  83686986 2007-11-02 11:00 Sent_mail.7z
Yep, "7zip" is smallest (I don't have "rar" anymore; won't be using proprietary software for key LG functions anyway. :) It's also slowest, by a large margin. TANSTAAFL, I guess.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

Top    Back