Open compressed file with gzip zcat perl php lua python

I have a compressed file of:


250.000.000 lines
Compressed the file size is: 671M
Uncompressed, it's: 6,5G

Need to extract a plethora of things and verify some others.

I dont want to use bash but something more elegant, like python or lua.

Looking through “The-Internet”, I’ve created some examples for the single purpose of educating my self.

So here are my results.
BE AWARE they are far-far-far away from perfect in code or execution.

Sorted by (less) time of execution:

pigz

pigz - Parallel gzip - Zlib



# time pigz  -p4 -cd  2016-08-04-06.ldif.gz &> /dev/null

real    0m9.980s
user    0m16.570s
sys 0m0.980s

gzip

gzip 1.8



# time /bin/gzip -cd 2016-08-04-06.ldif.gz &> /dev/null

real    0m23.951s
user    0m23.790s
sys 0m0.150s

zcat

zcat (gzip) 1.8



# time zcat 2016-08-04-06.ldif.gz &> /dev/null

real    0m24.202s
user    0m24.100s
sys 0m0.090s

Perl

Perl v5.24.0

code:



#!/usr/bin/perl

open (FILE, '/bin/gzip -cd 2016-08-04-06.ldif.gz |');

while (my $line = ) {
  print $line;
}

close FILE;

time:


# time ./dump.pl &> /dev/null

real    0m49.942s
user    1m14.260s
sys 0m2.350s

PHP

PHP 7.0.9 (cli)

code:


#!/usr/bin/php

< ? php

  $fp = gzopen("2016-08-04-06.ldif.gz", "r");

  while (($buffer = fgets($fp, 4096)) !== false) {
        echo $buffer;
  }

  gzclose($fp);

 ? >

time:


# time php -f dump.php &> /dev/null

real    1m19.407s
user    1m4.840s
sys 0m14.340s

PHP - Iteration #2

PHP 7.0.9 (cli)

Impressed with php results, I took the perl-approach on code:



< ? php

  $fp = popen("/bin/gzip -cd 2016-08-04-06.ldif.gz", "r");

  while (($buffer = fgets($fp, 4096)) !== false) {
        echo $buffer;
  }

  pclose($fp);

 ? >

time:


# time php -f dump2.php &> /dev/null

real    1m6.845s
user    1m15.590s
sys 0m19.940s

not bad !

Lua

Lua 5.3.3

code:


#!/usr/bin/lua

local gzip = require 'gzip'

local filename = "2016-08-04-06.ldif.gz"

for l in gzip.lines(filename) do
  print(l)
end

time:


# time ./dump.lua &> /dev/null

real    3m50.899s
user    3m35.080s
sys 0m15.780s

Lua - Iteration #2

Lua 5.3.3

I was depressed to see that php is faster than lua!!
Depressed I say !

So here is my next iteration on lua:

code:


#!/usr/bin/lua

local file = assert(io.popen('/bin/gzip -cd 2016-08-04-06.ldif.gz', 'r'))

while true do
        line = file:read()
        if line == nil then break end
        print (line)
end
file:close()

time:


# time ./dump2.lua &> /dev/null

real    2m45.908s
user    2m54.470s
sys 0m21.360s

One minute faster than before, but still too slow !!

Lua - Zlib

Lua 5.3.3

My next iteration with lua is using zlib :

code:



#!/usr/bin/lua

local zlib = require 'zlib'
local filename = "2016-08-04-06.ldif.gz"

local block = 64
local d = zlib.inflate()

local file = assert(io.open(filename, "rb"))
while true do
  bytes = file:read(block)
  if not bytes then break end
  print (d(bytes))
end

file:close()

time:



# time ./dump.lua  &> /dev/null

real    0m41.546s
user    0m40.460s
sys 0m1.080s

Now, that's what I am talking about !!!

Playing with window_size (block) can make your code faster or slower.

Python v3

Python 3.5.2

code:


#!/usr/bin/python

import gzip

filename='2016-08-04-06.ldif.gz'
with gzip.open(filename, 'r') as f:
    for line in f:
        print(line,)

time:


# time ./dump.py &> /dev/null

real    13m14.460s
user    13m13.440s
sys 0m0.670s

Not enough tissues on the whole damn world!

Python v3 - Iteration #2

Python 3.5.2

but wait ... a moment ... The default mode for gzip.open is 'rb'.
(read binary)

let's try this once more with rt(read-text) mode:

code:


#!/usr/bin/python

import gzip

filename='2016-08-04-06.ldif.gz'
with gzip.open(filename, 'rt') as f:
    for line in f:
        print(line, end="")

time:


# time ./dump.py &> /dev/null

real    5m33.098s
user    5m32.610s
sys 0m0.410s

With only one super tiny change and run time in half!!!
But still tooo slow.

Python v3 - Iteration #3

Python 3.5.2

Let's try a third iteration with popen this time.

code:


#!/usr/bin/python

import os

cmd = "/bin/gzip -cd 2016-08-04-06.ldif.gz"
f = os.popen(cmd)
for line in f:
  print(line, end="")
f.close()

time:


# time ./dump2.py &> /dev/null

real    6m45.646s
user    7m13.280s
sys 0m6.470s

Python v3 - zlib Iteration #1

Python 3.5.2

Let's try a zlib iteration this time.

code:



#!/usr/bin/python

import zlib

d = zlib.decompressobj(zlib.MAX_WBITS | 16)
filename='2016-08-04-06.ldif.gz'

with open(filename, 'rb') as f:
    for line in f:
        print(d.decompress(line))

time:


# time ./dump.zlib.py &> /dev/null

real    1m4.389s
user    1m3.440s
sys 0m0.410s

finally some proper values with python !!!

Specs

All the running tests occurred to this machine:


4 x Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz
8G RAM

Conclusions

Ok, I Know !

The shell-pipe approach of using gzip for opening the compressed file, is not fair to all the above code snippets.
But ... who cares ?

I need something that run fast as hell and does smart things on those data.

Get in touch

As I am not a developer, I know that you people know how to do these things even better!

So I would love to hear any suggestions or even criticism on the above examples.

I will update/report everything that will pass the "I think I know what this code do" rule and ... be gently with me ;)

PLZ use my email address: evaggelos [ _at_ ] balaskas [ _dot_ ] gr

to send me any suggestions

Thanks !

Tag(s): php, perl, python, lua, pigz

Open compressed file with gzip zcat perl php lua python

pigz

gzip

zcat

Perl

PHP

PHP - Iteration #2

Lua

Lua - Iteration #2

Lua - Zlib

Python v3

Python v3 - Iteration #2

Python v3 - Iteration #3

Python v3 - zlib Iteration #1

Specs

Conclusions

Get in touch

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112