Once you have your ordered stream, with awk you could do e.g:
awk -F '-' '{f = "split"$1".log";print >> f}' log.txt
which creates split20180114.log, split20180115.log and so forth.
As for best/efficient, there may be ways that are faster, but I'd not optimise prematurely; if the above simple way gets the job done and works fast enough.
-- Martijn
On 25 Jan 2018, at 12:00, Mark Rogers mark@more-solutions.co.uk wrote:
I have many gigabytes of .gz compressed log files, generally in very small files. For example, one file only contains entries from 20180114-205608 to 20180114-205746 (about 90s), and is about 350k compressed.
I want to extract these and combine them into one file per day*, compressed.
Log file entries are in date/time order, and filenames are easily sorted into date/time order too, so I can easily generate a stream of log entries in date/time order from which to work. So I assume that what I need is something to pipe that to that will look at the date of each line, and if it's changed close any existing output file, open a new one (gzipped), and write to it until the date changes or EOF.
I could write something like that in a scripting language (for my sins PHP would be easiest, Python I'm getting better at, bash could likely do it too). But given the volume of data are there any suggestions as to the "right" tool to use? Is this a job for awk or similar?
As an aside: The files are all archived on my Linux box, but they're sourced from a Windows box, so a cross-platform solution would allow me to do it on the host; transferring lots of small files isn't as efficient as a few big files (athough I have archives going back years on my Linux box to work through first).
- I say a file per day but ideally I could be flexible about the
period - day is most likely and probably easiest though
-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER
main@lists.alug.org.uk http://www.alug.org.uk/ https://lists.alug.org.uk/mailman/listinfo/main Unsubscribe? See message headers or the web site above!