Re: [ALUG] Combining log files

26 Jan 2018


      On 26 January 2018 at 10:47, Mark Rogers mark@more-solutions.co.uk wrote:
...
Can awk either write directly to
gzipped files or else can the above be modified to pipe through gzip?
It's not just the relative efficiency of doing it one step but also
the volume of disk space I'm going to chew up otherwise. I know that
PHP can write .gz directly but that feels like a horrible tool for the
job in general.
I've written a Python script to do this (as it can write directly to
.gz much as PHP can but doesn't feel like a horrible choice).
I'm currently splitting into one file per hour's data due to their
size; each file has around 1,700,000 lines (yes, that's per hour) by
the look of it, and I'm averaging 30-45s per file. That works out
around 8hrs to process one month's data. I have maybe 10 years of
files to work through (so that'll run for about a month and a half).
The script is below (I'm only really getting to grips with Python so
I'm sure it's not great code). I run it using
   $zcat *.log.gz | ./filter.py
Any comments? For example, would it make more sense to open the .gz
files directly in python rather than piping them in? (I assume that
zcat is efficient and so are pipes, and I'm unlikely to achieve
anything better myself.)
#!/usr/bin/env python3
import gzip, sys, time
def closeLast():
    if fh:
        td = time.time() - ts
        print(' %d lines, %0.3f seconds, %d lines/sec' % (ctr, td, ctr/td) )
        fh.close()
oldFilename = False
fh = False
for l in sys.stdin:
    filename = l[0:11]
    if filename != oldFilename:
        closeLast()
        fn = 'filtered/%s.event.log.gz'%(filename)
        fh = gzip.open(fn, 'wt')
        ts = time.time()
        ctr=0
        oldFilename = filename
        print("Logging to "+fn, end='', flush=True)
    ctr += 1
    fh.write(l)
closeLast()
-- 
Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450
Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: [ALUG] Combining log files