text wrapping regex

List overview All Threads
Download

newer

older

[peeaq@xandros.com: Xandros...

any broadband recommendations?

Richard Lewis

15 Feb 2005 15 Feb '05

12:13 p.m.

Hello ALUG,

I wonder if anyone can help with a regular expression (for text wrapping)?

I need to match all the characters from either the beginning of the line or the last match (in global mode) up to the last space before the Xth character.

I can do the first space after the Xth character:

$ echo "A string with quite a lot of words and spaces in it." | sed "s/(.{,X}) /\1\n/g"

Any ideas?

Richard

Show replies by date

MJ Ray

15 Feb 15 Feb

2:01 p.m.

Richard Lewis wrote:

...

I need to match all the characters from either the beginning of the line or the last match (in global mode) up to the last space before the Xth character.

I can do the first space after the Xth character: $ echo "A string with quite a lot of words and spaces in it." | sed "s/(.{,X}) /\1\n/g"

Doesn't that do the last space before the (X+1)th character? I tested it a bit and it seems to. All you need to do is change the space to ( |$) so you don't always split the last word off, I think.

-- MJR/slef http://mjr.towers.org.uk/

Richard Lewis

3:43 p.m.

On Tue, 15 Feb 2005 14:00:03 +0000, "MJ Ray" mjr@dsl.pipex.com said:

...

Richard Lewis wrote:

...
I need to match all the characters from either the beginning of the line or the last match (in global mode) up to the last space before the Xth character.

I can do the first space after the Xth character: $ echo "A string with quite a lot of words and spaces in it." | sed "s/(.{,X}) /\1\n/g"

Doesn't that do the last space before the (X+1)th character? I tested it a bit and it seems to. All you need to do is change the space to ( |$) so you don't always split the last word off, I think.

Yes, that seems to be it. Thanks.

On Tue, 15 Feb 2005 14:06:13 +0000, "Brett Parker" iDunno@sommitrealweird.co.uk said:

...

Hows about forgetting the regexp and using the right tool for the job?

Oh, yes. I should have said: I was only using sed to work out the regex, I need it for an XSLT transformation so unfortunately I don't have the luxury of neat little UNIX tools :-(

Cheers, Richard

Brett Parker

3:52 p.m.

Richard Lewis richardlewis@fastmail.co.uk wrote:

...

...
Richard Lewis wrote:

iDunno@sommitrealweird.co.uk said:

...
Hows about forgetting the regexp and using the right tool for the job?

Oh, yes. I should have said: I was only using sed to work out the regex, I need it for an XSLT transformation so unfortunately I don't have the luxury of neat little UNIX tools :-(

Ahh, there must be a neater way to do it than that in XSLT. With a quick google I get to: http://sources.redhat.com/ml/xsl-list/2001-12/msg00651.html

Which may be of interest to you.

Cheers,

-- Brett Parker web: http://www.sommitrealweird.co.uk/ email: iDunno@sommitrealweird.co.uk

Richard Lewis

4:11 p.m.

On Tue, 15 Feb 2005 15:51:55 +0000, "Brett Parker" iDunno@sommitrealweird.co.uk said:

...

Richard Lewis richardlewis@fastmail.co.uk wrote:

...
...
Richard Lewis wrote:

iDunno@sommitrealweird.co.uk said:

...
Hows about forgetting the regexp and using the right tool for the job?

Oh, yes. I should have said: I was only using sed to work out the regex, I need it for an XSLT transformation so unfortunately I don't have the luxury of neat little UNIX tools :-(

Ahh, there must be a neater way to do it than that in XSLT. With a quick google I get to: http://sources.redhat.com/ml/xsl-list/2001-12/msg00651.html

Yes, there may be. But ATM it looks like regular expressions (as long as you're using XSLT2) are winning ;-)

Richard Lewis

16 Feb 16 Feb

1:12 p.m.

On Tue, 15 Feb 2005 14:00:03 +0000, "MJ Ray" mjr@dsl.pipex.com said:

...

Richard Lewis wrote:

...
I need to match all the characters from either the beginning of the line or the last match (in global mode) up to the last space before the Xth character.

I can do the first space after the Xth character: $ echo "A string with quite a lot of words and spaces in it." | sed "s/(.{,X}) /\1\n/g"

Doesn't that do the last space before the (X+1)th character? I tested it a bit and it seems to. All you need to do is change the space to ( |$) so you don't always split the last word off, I think.

Um, on second thoughts, actually I don't think this does work (the difference is quite subtle).

XSLT regex syntax allows: "* X{n}? matches X, exactly n times * X{n,}? matches X, at least n times * X{n,m}? matches X, at least n times, but not more than m times"

of which I'm trying to use the second with the replace() function like this:

replace($text, '(.{n,}?)\s+', '$1
')

[replace any character repeated at least n times (saved as $1) followed by any number and type of spaces with match $1 plus a line break]

(I then split the string on new-line characters to display it using tokenize().)

Um, yeah. This isn't really a question, but if anyone has any thoughts.....

Richard

PS. I'm just having an idea about taking exactly n characters then further processing the result by trying to find the last space......I'm not sure there /is/ a regex which does that.

PPS. prove me wrong!

MJ Ray

1:37 p.m.

Richard wrote:

...

XSLT regex syntax allows: "* X{n}? matches X, exactly n times

X{n,}? matches X, at least n times

X{n,m}? matches X, at least n times, but not more than m times"

of which I'm trying to use the second with the replace() function like this: replace($text, '(.{n,}?)\s+', '$1
')

Why did you change from the third type to the second type when moving from sed to XSLT? Your sed example was equivalent to (.{0,X}?)\s+ wasn't it? Changing type will change the meaning of the regex.

-- MJR/slef http://mjr.towers.org.uk/

Richard Lewis

4:55 p.m.

On Wed, 16 Feb 2005 13:36:12 +0000, "MJ Ray" mjr@dsl.pipex.com said:

...

Richard wrote:

...
XSLT regex syntax allows: "* X{n}? matches X, exactly n times

X{n,}? matches X, at least n times

X{n,m}? matches X, at least n times, but not more than m times"

of which I'm trying to use the second with the replace() function like this: replace($text, '(.{n,}?)\s+', '$1
')

Why did you change from the third type to the second type when moving from sed to XSLT? Your sed example was equivalent to (.{0,X}?)\s+ wasn't it? Changing type will change the meaning of the regex.

I hadn't noticed that. Um, {,n} and {n,} seem to work in sed, but {,n} causes a runtime error in XSLT saying that it expects a digit and if you use {0,n} it says it doesn't allow a regex that matches a 0 length string (presumably because its the replace() function). ({1,n} just splits it into single word strings...)

I'll keep fiddling.

Richard

Brett Parker

6:18 p.m.

Richard Lewis richardlewis@fastmail.co.uk wrote:

...

I hadn't noticed that. Um, {,n} and {n,} seem to work in sed, but {,n} causes a runtime error in XSLT saying that it expects a digit and if you use {0,n} it says it doesn't allow a regex that matches a 0 length string (presumably because its the replace() function). ({1,n} just splits it into single word strings...)

I'll keep fiddling.

You could just use the XSLT stylesheets I linked earlier which will tokenise and then split the strings to a certain character length... that link looked fairly useful for those type things... The link was: http://sources.redhat.com/ml/xsl-list/2001-12/msg00651.html

Which is a mailing list post, but following the links round there should give you an includable stylesheet that you can then just call with some parameters for the formatting.

Cheers,

-- Brett Parker web: http://www.sommitrealweird.co.uk/ email: iDunno@sommitrealweird.co.uk

Richard Lewis

17 Feb 17 Feb

4:03 p.m.

On Wed, 16 Feb 2005 18:17:53 +0000, "Brett Parker" iDunno@sommitrealweird.co.uk said:

...

Richard Lewis richardlewis@fastmail.co.uk wrote:

...
I hadn't noticed that. Um, {,n} and {n,} seem to work in sed, but {,n} causes a runtime error in XSLT saying that it expects a digit and if you use {0,n} it says it doesn't allow a regex that matches a 0 length string (presumably because its the replace() function). ({1,n} just splits it into single word strings...)

I'll keep fiddling.

You could just use the XSLT stylesheets I linked earlier which will tokenise and then split the strings to a certain character length... that link looked fairly useful for those type things... The link was: http://sources.redhat.com/ml/xsl-list/2001-12/msg00651.html

Which is a mailing list post, but following the links round there should give you an includable stylesheet that you can then just call with some parameters for the formatting.

I had a look through those templates and they're good, but they're designed for use with XSLT 1.0 really. I'm lucky enough to be working with XSLT 2.0 which gives me regular expressions in the form of replace(string, regex, replacement), matches(string, regex) and tokenize(string, onRegex) functions and an analyze-string template. Much of the hard work performed in those templates can be replaced with the tokenize() function.

Anyway, I'm sure you'll all be glad to know I've now found a working solution. The template I'm writing wraps text into an arbitrary polygon for use with transformations to SVG (the current version of which does not support text wrapping). The solution I've come up with first determines the path of the polygon from the given points and then fits rows of conceptual (not visible) horizontal rectangles inside it. The text is then displayed along these rows.

The regular expression came into it because I needed a way of selecting the correct portion of text for each row; i.e. a portion of text which has a length (in pixels) which is less than or equal to the length of the row but which does not allow the last word to be split. I decided that the regex idea wasn't going to work and did it like this:

<xsl:variable name="stringToDisplay"> <xsl:variable name="maxPortionOfText"> <xsl:value-of select="substring($remainingText,1,$maxStringWidthInChars)" /> </xsl:variable> xsl:choose <xsl:when test="string-length($maxPortionOfText) >= string-length($remainingText)"> <xsl:value-of select="$maxPortionOfText" /> </xsl:when> xsl:otherwise <xsl:for-each select="tokenize($maxPortionOfText,'\s+')"> <xsl:if test="position() < last() or position() = 1"> <xsl:value-of select="." /><xsl:if test="position()!=last()">xsl:text </xsl:text></xsl:if> </xsl:if> </xsl:for-each> </xsl:otherwise> </xsl:choose> </xsl:variable>

Thanks for your help on this.

Cheers, Richard

Brett Parker

15 Feb 15 Feb

2:07 p.m.

Richard Lewis richardlewis@fastmail.co.uk wrote:

...

Hello ALUG,

I wonder if anyone can help with a regular expression (for text wrapping)?

I need to match all the characters from either the beginning of the line or the last match (in global mode) up to the last space before the Xth character.

I can do the first space after the Xth character:

$ echo "A string with quite a lot of words and spaces in it." | sed "s/(.{,X}) /\1\n/g"

Hows about forgetting the regexp and using the right tool for the job?

$ echo "A string with quite a lot of words and spaces in it." | fmt -w 22 A string with quite a lot of words and spaces in it. $ $ echo "A string with quite a lot of words and spaces in it." | fmt -w 44 A string with quite a lot of words and spaces in it. $

fmt is nice and will do what you want.

On debian systems it's in the coreutils package, so should be on every debian system going.

Thanks,

-- Brett Parker web: http://www.sommitrealweird.co.uk/ email: iDunno@sommitrealweird.co.uk

7447

Age (days ago)

7449

Last active (days ago)

main@lists.alug.org.uk

10 comments

3 participants

tags (0)

participants (3)

Brett Parker
MJ Ray
Richard Lewis