[ALUG] Manipulating files from bash

Martijn Koster mak-alug at greenhills.co.uk
Sat Sep 23 12:28:01 BST 2017

Typically the way I deal with something like that is hook up simple grep/seds into a few lines,
inspecting/massaging the data as we go. For example, to create your minimal format:

# write the data file
cat > data <<EOM
label: dos
label-id: 0xe3f4f21a
device: /dev/sdd
unit: sectors

/dev/sdd1 : start=        8192, size=       85622, type=c
/dev/sdd2 : start=       94208, size=     5521408, type=83
/dev/sdd3 : start=     5615616, size=    25499648, type=83

# we only want lines starting with /dev, and get rid of everything up to including size=,
# remove the default type=83, and remove the type=
# I built up the command-line, and the -e sections one by one.
egrep '^/dev/'< data | sed -e 's/.*, size= */,/' -e 's/, type=83//' -e 's/ type=//' > data2

# remove the last line first number, and add a semicolon at the end
# btw, what's that semicolon? Isn't that meant to be a comma?
(head -n -1 data2; tail -n 1 data2 | sed -E -e 's/^,[0-9]+/,/' -e 's/$/;/') > data3

and then put it in a script (with 'set -euo pipefail' at the top) and be done.
For quick one-off things that's fine, especially if you're going to look at the data before blindly feeding it back to fdisk.
It's easy once you know the basics of grep/sed and regular expressions, and oddly satisfying, I find.

In awk, you can can combine all of these into a single program (a bit ugly because the "last line is different" requirement):

/^\/dev\// {
  TYPE=gensub(/type=83/, "", "g", $7)
  TYPE2=gensub(/type=/, "", "g", TYPE)
  if (length(PREV)) {
    print PREV
  PREV=sprintf(",%s%s", SIZE, TYPE2)
END { print gensub(/^,[0-9]+,/, ",;", "g", PREV) }

In both approaches, there is lots of scope for errors. It could be be that the grep picked up more than it should (eg if you forgot the ^),
or one of the sed changes didn't actually change anything. And what if the output of sfdisk changes in a future
versions (mine shows "Id" instead of "type")? Or what if on some systems it produces extra output you don't copy with.
What if you didn't pass the device to sfdisk, and you end up with conflicting data from partitions for multiple disks?
sfdisk prints some fixed-width right-aligned colums; what happens which those numbers get really large, do the colums run into eachother?
So if you want to make something more robust, you'd have add a bunch of extra checking. You can do that
by splitting up the commands more, and add verification steps; and/or do checking of the produced output to ensure
it has the correct format and desired number of lines.

It helps if you can produce better input in the first place. In your case, have a look at e.g.
 `sudo partx --noheadings --bytes --show --output SIZE,TYPE /dev/sda`
which prints just the info your are looking for, in 2 colums, making it easier to parse.

If you find you need lots of extra checking, or lots of complex logic, or be portable (MacOS and BSD have subtle changes
between grep/sed/awk), it may be easier to do it in a more general programming language (python, perl, or whatever) where you can
fully parse the text file (with error checking) into an internal representation, and then generate the desired output form that
representation; that way you can enforce correctly formatted output. Often that ends up more readable/maintainable.
And with some luck, there may be existing libraries (https://www.linuxvoice.com/issues/005/pyparted.pdf)
to help you do what you need to do. But, it's more skills to learn, more effort, and comes with its own set of issues
(version differences, having to setup environments for libraries etc).

— Martijn

> On 22 Sep 2017, at 17:05, Mark Rogers <mark at more-solutions.co.uk> wrote:
> There are lots of ways to process text files (eg sed, awk, etc); I'm
> after suggestions on the best/easiest way to achieve something fairly
> trivial to describe.
> Below is example output from sfdisk -d:
>   label: dos
>   label-id: 0xe3f4f21a
>   device: /dev/sdd
>   unit: sectors
>   /dev/sdd1 : start=        8192, size=       85622, type=c
>   /dev/sdd2 : start=       94208, size=     5521408, type=83
>   /dev/sdd3 : start=     5615616, size=    25499648, type=83
> I want to convert this into a new script to go back into sfdisk but
> with some changes; I want the start values skipped (so sfdisk will put
> partition boundaries wherever it thinks is best), I want the last
> sector not have a size (ie "rest of disk"), and I don't really need
> any of the settings at the top.
> In fact, a suitable (minimal) script to go into sfdisk would be simply:
>  ,85622,c
>  ,5521408
>  ,;
> I don't really mind at this point whether I end up with something more
> like the first format or minified like the latter. In fact to be
> honest I'm more interested just in learning the best ways to play with
> text files like this.
> -- 
> Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450
> Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER
> _______________________________________________
> main at lists.alug.org.uk
> http://www.alug.org.uk/
> https://lists.alug.org.uk/mailman/listinfo/main
> Unsubscribe?  See message headers or the web site above!

More information about the main mailing list