I have many gigabytes of .gz compressed log files, generally in very small files. For example, one file only contains entries from 20180114-205608 to 20180114-205746 (about 90s), and is about 350k compressed.
I want to extract these and combine them into one file per day*, compressed.
Log file entries are in date/time order, and filenames are easily sorted into date/time order too, so I can easily generate a stream of log entries in date/time order from which to work. So I assume that what I need is something to pipe that to that will look at the date of each line, and if it's changed close any existing output file, open a new one (gzipped), and write to it until the date changes or EOF.
I could write something like that in a scripting language (for my sins PHP would be easiest, Python I'm getting better at, bash could likely do it too). But given the volume of data are there any suggestions as to the "right" tool to use? Is this a job for awk or similar?
As an aside: The files are all archived on my Linux box, but they're sourced from a Windows box, so a cross-platform solution would allow me to do it on the host; transferring lots of small files isn't as efficient as a few big files (athough I have archives going back years on my Linux box to work through first).
* I say a file per day but ideally I could be flexible about the period - day is most likely and probably easiest though