Manipulating files from bash

List overview All Threads
Download

newer

older

2600 meeting at 18:00 on Friday...

RPi SD backup/restore

Mark Rogers

22 Sep 2017 22 Sep '17

4:05 p.m.

There are lots of ways to process text files (eg sed, awk, etc); I'm after suggestions on the best/easiest way to achieve something fairly trivial to describe.

Below is example output from sfdisk -d:

label: dos label-id: 0xe3f4f21a device: /dev/sdd unit: sectors

/dev/sdd1 : start= 8192, size= 85622, type=c /dev/sdd2 : start= 94208, size= 5521408, type=83 /dev/sdd3 : start= 5615616, size= 25499648, type=83

I want to convert this into a new script to go back into sfdisk but with some changes; I want the start values skipped (so sfdisk will put partition boundaries wherever it thinks is best), I want the last sector not have a size (ie "rest of disk"), and I don't really need any of the settings at the top.

In fact, a suitable (minimal) script to go into sfdisk would be simply: ,85622,c ,5521408 ,;

I don't really mind at this point whether I end up with something more like the first format or minified like the latter. In fact to be honest I'm more interested just in learning the best ways to play with text files like this.

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER

Show replies by date

steve-ALUG＠hst.me.uk

22 Sep 22 Sep

4:44 p.m.

On 22/09/17 17:05, Mark Rogers wrote:

...

There are lots of ways to process text files (eg sed, awk, etc); I'm after suggestions on the best/easiest way to achieve something fairly trivial to describe.

Below is example output from sfdisk -d:
label: dos
label-id: 0xe3f4f21a
device: /dev/sdd
unit: sectors

/dev/sdd1 : start=        8192, size=       85622, type=c
/dev/sdd2 : start=       94208, size=     5521408, type=83
/dev/sdd3 : start=     5615616, size=    25499648, type=83
I want to convert this into a new script to go back into sfdisk but with some changes; I want the start values skipped (so sfdisk will put partition boundaries wherever it thinks is best), I want the last sector not have a size (ie "rest of disk"), and I don't really need any of the settings at the top.

In fact, a suitable (minimal) script to go into sfdisk would be simply: ,85622,c ,5521408 ,;

I don't really mind at this point whether I end up with something more like the first format or minified like the latter. In fact to be honest I'm more interested just in learning the best ways to play with text files like this.

sed will probably do what you want. It can edit files in place or write output to a spearate file, search and replace, and has wildcards. There's lots to it though, so I suggest a trip to the manual

Steve

Huge

10:17 p.m.

On Fri, 2017-09-22 at 17:05 +0100, Mark Rogers wrote:

...

I don't really mind at this point whether I end up with something more like the first format or minified like the latter. In fact to be honest I'm more interested just in learning the best ways to play with text files like this.

perl, of course.

-- Today is Setting Orange, the 46th day of Bureaucracy in the YOLD 3183 I don't have an attitude problem. If you have a problem with my attitude, that's your problem.

Martijn Koster

23 Sep 23 Sep

11:28 a.m.

Typically the way I deal with something like that is hook up simple grep/seds into a few lines, inspecting/massaging the data as we go. For example, to create your minimal format:

--- # write the data file cat > data <<EOM label: dos label-id: 0xe3f4f21a device: /dev/sdd unit: sectors

/dev/sdd1 : start= 8192, size= 85622, type=c /dev/sdd2 : start= 94208, size= 5521408, type=83 /dev/sdd3 : start= 5615616, size= 25499648, type=83 EOM

# we only want lines starting with /dev, and get rid of everything up to including size=, # remove the default type=83, and remove the type= # I built up the command-line, and the -e sections one by one. egrep '^/dev/'< data | sed -e 's/.*, size= */,/' -e 's/, type=83//' -e 's/ type=//' > data2

# remove the last line first number, and add a semicolon at the end # btw, what's that semicolon? Isn't that meant to be a comma? (head -n -1 data2; tail -n 1 data2 | sed -E -e 's/^,[0-9]+/,/' -e 's/$/;/') > data3 ---

and then put it in a script (with 'set -euo pipefail' at the top) and be done. For quick one-off things that's fine, especially if you're going to look at the data before blindly feeding it back to fdisk. It's easy once you know the basics of grep/sed and regular expressions, and oddly satisfying, I find.

In awk, you can can combine all of these into a single program (a bit ugly because the "last line is different" requirement):

--- /^/dev// { SIZE=$6 TYPE=gensub(/type=83/, "", "g", $7) TYPE2=gensub(/type=/, "", "g", TYPE) if (length(PREV)) { print PREV } PREV=sprintf(",%s%s", SIZE, TYPE2) } END { print gensub(/^,[0-9]+,/, ",;", "g", PREV) } ---

In both approaches, there is lots of scope for errors. It could be be that the grep picked up more than it should (eg if you forgot the ^), or one of the sed changes didn't actually change anything. And what if the output of sfdisk changes in a future versions (mine shows "Id" instead of "type")? Or what if on some systems it produces extra output you don't copy with. What if you didn't pass the device to sfdisk, and you end up with conflicting data from partitions for multiple disks? sfdisk prints some fixed-width right-aligned colums; what happens which those numbers get really large, do the colums run into eachother? So if you want to make something more robust, you'd have add a bunch of extra checking. You can do that by splitting up the commands more, and add verification steps; and/or do checking of the produced output to ensure it has the correct format and desired number of lines.

It helps if you can produce better input in the first place. In your case, have a look at e.g. `sudo partx --noheadings --bytes --show --output SIZE,TYPE /dev/sda` which prints just the info your are looking for, in 2 colums, making it easier to parse.

If you find you need lots of extra checking, or lots of complex logic, or be portable (MacOS and BSD have subtle changes between grep/sed/awk), it may be easier to do it in a more general programming language (python, perl, or whatever) where you can fully parse the text file (with error checking) into an internal representation, and then generate the desired output form that representation; that way you can enforce correctly formatted output. Often that ends up more readable/maintainable. And with some luck, there may be existing libraries (https://www.linuxvoice.com/issues/005/pyparted.pdf) to help you do what you need to do. But, it's more skills to learn, more effort, and comes with its own set of issues (version differences, having to setup environments for libraries etc).

— Martijn

...

On 22 Sep 2017, at 17:05, Mark Rogers mark@more-solutions.co.uk wrote:

There are lots of ways to process text files (eg sed, awk, etc); I'm after suggestions on the best/easiest way to achieve something fairly trivial to describe.

Below is example output from sfdisk -d:

label: dos label-id: 0xe3f4f21a device: /dev/sdd unit: sectors

/dev/sdd1 : start= 8192, size= 85622, type=c /dev/sdd2 : start= 94208, size= 5521408, type=83 /dev/sdd3 : start= 5615616, size= 25499648, type=83

I want to convert this into a new script to go back into sfdisk but with some changes; I want the start values skipped (so sfdisk will put partition boundaries wherever it thinks is best), I want the last sector not have a size (ie "rest of disk"), and I don't really need any of the settings at the top.

In fact, a suitable (minimal) script to go into sfdisk would be simply: ,85622,c ,5521408 ,;

I don't really mind at this point whether I end up with something more like the first format or minified like the latter. In fact to be honest I'm more interested just in learning the best ways to play with text files like this.

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER

main@lists.alug.org.uk http://www.alug.org.uk/ https://lists.alug.org.uk/mailman/listinfo/main Unsubscribe? See message headers or the web site above!

Mark Rogers

25 Sep 25 Sep

9:27 a.m.

On 23 September 2017 at 12:28, Martijn Koster mak-alug@greenhills.co.uk wrote:

...

Typically the way I deal with something like that is hook up simple grep/seds into a few lines, inspecting/massaging the data as we go. For example, to create your minimal format: [.. big snip of really useful detailed info ..]

Thanks for that, I was already leaning towards sed (because I use it a lot but only in basic ways so it makes sense to expand my knowledge of it) and that has helped a huge amount.

awk is something I used to use a lot (getting on for 20 years ago) but I'd have to think very hard to remember much from back then now.

...

# remove the last line first number, and add a semicolon at the end # btw, what's that semicolon? Isn't that meant to be a comma?

The semicolon came from Googling for sfdisk scripting examples. Stupidly I started there rather than "man sfdisk" which actually explains things pretty well, and from a quick read of that (a quick read is all you need when playing around with partitions right?? :-)) whitespace, commas and semicolons are all equivalent separators.

Having read the docs I'll go for something much closer to the output generated by "sfdisk -d", massively simplifying the process for sed.

However, the reason for asking was a more broad general query as to the best way to do things that are more complex. Stream editing is fine when what you want out is broadly similar to what you put in, but what about when the structure changes completely and values from near the end of the input need to end up near the start of the output, etc? What would be the best (generic) way to parse the input, capture a few fields, and then generate a new output?

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER

Mark Rogers

10:19 a.m.

On 25 September 2017 at 10:27, Mark Rogers mark@more-solutions.co.uk wrote:

...

Having read the docs I'll go for something much closer to the output generated by "sfdisk -d", massively simplifying the process for sed.

Here's what I end up with:

sudo sfdisk -d /dev/sdd | \ sed -E '/^(label(-id)?|device):/d # Remove unwanted headers ; /^$/d # Remove blank lines ; s#^/dev/[a-z0-9]+ *: *## # Remove device prefixes ; s# *= *#=#g # Remove whitespace around = signs ; s#start=[0-9]+,## # Remove start values ; $ s#size=[0-9]+,## # Remove last size value ' (Apologies for any wrapping!)

The result is: unit: sectors size=85622, type=c size=5521408, type=83 type=83

The "$" trick to apply an expression to the last line of the input only was a new one on me, as was EOL commenting within the script.

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER

2847

Age (days ago)

2850

Last active (days ago)

main@lists.alug.org.uk

5 comments

4 participants

tags (0)

participants (4)

Huge
Mark Rogers
Martijn Koster
steve-ALUG＠hst.me.uk