Hello ALUG,
I wonder if anyone can help with a regular expression (for text wrapping)?
I need to match all the characters from either the beginning of the line or the last match (in global mode) up to the last space before the Xth character.
I can do the first space after the Xth character:
$ echo "A string with quite a lot of words and spaces in it." | sed "s/(.{,X}) /\1\n/g"
Any ideas?
Richard
Richard Lewis wrote:
I need to match all the characters from either the beginning of the line or the last match (in global mode) up to the last space before the Xth character.
I can do the first space after the Xth character: $ echo "A string with quite a lot of words and spaces in it." | sed "s/(.{,X}) /\1\n/g"
Doesn't that do the last space before the (X+1)th character? I tested it a bit and it seems to. All you need to do is change the space to ( |$) so you don't always split the last word off, I think.
On Tue, 15 Feb 2005 14:00:03 +0000, "MJ Ray" mjr@dsl.pipex.com said:
Richard Lewis wrote:
I need to match all the characters from either the beginning of the line or the last match (in global mode) up to the last space before the Xth character.
I can do the first space after the Xth character: $ echo "A string with quite a lot of words and spaces in it." | sed "s/(.{,X}) /\1\n/g"
Doesn't that do the last space before the (X+1)th character? I tested it a bit and it seems to. All you need to do is change the space to ( |$) so you don't always split the last word off, I think.
Yes, that seems to be it. Thanks.
On Tue, 15 Feb 2005 14:06:13 +0000, "Brett Parker" iDunno@sommitrealweird.co.uk said:
Hows about forgetting the regexp and using the right tool for the job?
Oh, yes. I should have said: I was only using sed to work out the regex, I need it for an XSLT transformation so unfortunately I don't have the luxury of neat little UNIX tools :-(
Cheers, Richard
Richard Lewis richardlewis@fastmail.co.uk wrote:
Richard Lewis wrote:
iDunno@sommitrealweird.co.uk said:
Hows about forgetting the regexp and using the right tool for the job?
Oh, yes. I should have said: I was only using sed to work out the regex, I need it for an XSLT transformation so unfortunately I don't have the luxury of neat little UNIX tools :-(
Ahh, there must be a neater way to do it than that in XSLT. With a quick google I get to: http://sources.redhat.com/ml/xsl-list/2001-12/msg00651.html
Which may be of interest to you.
Cheers,
On Tue, 15 Feb 2005 15:51:55 +0000, "Brett Parker" iDunno@sommitrealweird.co.uk said:
Richard Lewis richardlewis@fastmail.co.uk wrote:
Richard Lewis wrote:
iDunno@sommitrealweird.co.uk said:
Hows about forgetting the regexp and using the right tool for the job?
Oh, yes. I should have said: I was only using sed to work out the regex, I need it for an XSLT transformation so unfortunately I don't have the luxury of neat little UNIX tools :-(
Ahh, there must be a neater way to do it than that in XSLT. With a quick google I get to: http://sources.redhat.com/ml/xsl-list/2001-12/msg00651.html
Yes, there may be. But ATM it looks like regular expressions (as long as you're using XSLT2) are winning ;-)
R.
On Tue, 15 Feb 2005 14:00:03 +0000, "MJ Ray" mjr@dsl.pipex.com said:
Richard Lewis wrote:
I need to match all the characters from either the beginning of the line or the last match (in global mode) up to the last space before the Xth character.
I can do the first space after the Xth character: $ echo "A string with quite a lot of words and spaces in it." | sed "s/(.{,X}) /\1\n/g"
Doesn't that do the last space before the (X+1)th character? I tested it a bit and it seems to. All you need to do is change the space to ( |$) so you don't always split the last word off, I think.
Um, on second thoughts, actually I don't think this does work (the difference is quite subtle).
XSLT regex syntax allows: "* X{n}? matches X, exactly n times * X{n,}? matches X, at least n times * X{n,m}? matches X, at least n times, but not more than m times"
of which I'm trying to use the second with the replace() function like this:
replace($text, '(.{n,}?)\s+', '$1
')
[replace any character repeated at least n times (saved as $1) followed by any number and type of spaces with match $1 plus a line break]
(I then split the string on new-line characters to display it using tokenize().)
Um, yeah. This isn't really a question, but if anyone has any thoughts.....
Richard
PS. I'm just having an idea about taking exactly n characters then further processing the result by trying to find the last space......I'm not sure there /is/ a regex which does that.
PPS. prove me wrong!
Richard wrote:
XSLT regex syntax allows: "* X{n}? matches X, exactly n times
- X{n,}? matches X, at least n times
- X{n,m}? matches X, at least n times, but not more than m times"
of which I'm trying to use the second with the replace() function like this: replace($text, '(.{n,}?)\s+', '$1
')
Why did you change from the third type to the second type when moving from sed to XSLT? Your sed example was equivalent to (.{0,X}?)\s+ wasn't it? Changing type will change the meaning of the regex.
On Wed, 16 Feb 2005 13:36:12 +0000, "MJ Ray" mjr@dsl.pipex.com said:
Richard wrote:
XSLT regex syntax allows: "* X{n}? matches X, exactly n times
- X{n,}? matches X, at least n times
- X{n,m}? matches X, at least n times, but not more than m times"
of which I'm trying to use the second with the replace() function like this: replace($text, '(.{n,}?)\s+', '$1
')
Why did you change from the third type to the second type when moving from sed to XSLT? Your sed example was equivalent to (.{0,X}?)\s+ wasn't it? Changing type will change the meaning of the regex.
I hadn't noticed that. Um, {,n} and {n,} seem to work in sed, but {,n} causes a runtime error in XSLT saying that it expects a digit and if you use {0,n} it says it doesn't allow a regex that matches a 0 length string (presumably because its the replace() function). ({1,n} just splits it into single word strings...)
I'll keep fiddling.
Richard
Richard Lewis richardlewis@fastmail.co.uk wrote:
I hadn't noticed that. Um, {,n} and {n,} seem to work in sed, but {,n} causes a runtime error in XSLT saying that it expects a digit and if you use {0,n} it says it doesn't allow a regex that matches a 0 length string (presumably because its the replace() function). ({1,n} just splits it into single word strings...)
I'll keep fiddling.
You could just use the XSLT stylesheets I linked earlier which will tokenise and then split the strings to a certain character length... that link looked fairly useful for those type things... The link was: http://sources.redhat.com/ml/xsl-list/2001-12/msg00651.html
Which is a mailing list post, but following the links round there should give you an includable stylesheet that you can then just call with some parameters for the formatting.
Cheers,
On Wed, 16 Feb 2005 18:17:53 +0000, "Brett Parker" iDunno@sommitrealweird.co.uk said:
Richard Lewis richardlewis@fastmail.co.uk wrote:
I hadn't noticed that. Um, {,n} and {n,} seem to work in sed, but {,n} causes a runtime error in XSLT saying that it expects a digit and if you use {0,n} it says it doesn't allow a regex that matches a 0 length string (presumably because its the replace() function). ({1,n} just splits it into single word strings...)
I'll keep fiddling.
You could just use the XSLT stylesheets I linked earlier which will tokenise and then split the strings to a certain character length... that link looked fairly useful for those type things... The link was: http://sources.redhat.com/ml/xsl-list/2001-12/msg00651.html
Which is a mailing list post, but following the links round there should give you an includable stylesheet that you can then just call with some parameters for the formatting.
I had a look through those templates and they're good, but they're designed for use with XSLT 1.0 really. I'm lucky enough to be working with XSLT 2.0 which gives me regular expressions in the form of replace(string, regex, replacement), matches(string, regex) and tokenize(string, onRegex) functions and an analyze-string template. Much of the hard work performed in those templates can be replaced with the tokenize() function.
Anyway, I'm sure you'll all be glad to know I've now found a working solution. The template I'm writing wraps text into an arbitrary polygon for use with transformations to SVG (the current version of which does not support text wrapping). The solution I've come up with first determines the path of the polygon from the given points and then fits rows of conceptual (not visible) horizontal rectangles inside it. The text is then displayed along these rows.
The regular expression came into it because I needed a way of selecting the correct portion of text for each row; i.e. a portion of text which has a length (in pixels) which is less than or equal to the length of the row but which does not allow the last word to be split. I decided that the regex idea wasn't going to work and did it like this:
<xsl:variable name="stringToDisplay"> <xsl:variable name="maxPortionOfText"> <xsl:value-of select="substring($remainingText,1,$maxStringWidthInChars)" /> </xsl:variable> xsl:choose <xsl:when test="string-length($maxPortionOfText) >= string-length($remainingText)"> <xsl:value-of select="$maxPortionOfText" /> </xsl:when> xsl:otherwise <xsl:for-each select="tokenize($maxPortionOfText,'\s+')"> <xsl:if test="position() < last() or position() = 1"> <xsl:value-of select="." /><xsl:if test="position()!=last()">xsl:text </xsl:text></xsl:if> </xsl:if> </xsl:for-each> </xsl:otherwise> </xsl:choose> </xsl:variable>
Thanks for your help on this.
Cheers, Richard
Richard Lewis richardlewis@fastmail.co.uk wrote:
Hello ALUG,
I wonder if anyone can help with a regular expression (for text wrapping)?
I need to match all the characters from either the beginning of the line or the last match (in global mode) up to the last space before the Xth character.
I can do the first space after the Xth character:
$ echo "A string with quite a lot of words and spaces in it." | sed "s/(.{,X}) /\1\n/g"
Hows about forgetting the regexp and using the right tool for the job?
$ echo "A string with quite a lot of words and spaces in it." | fmt -w 22 A string with quite a lot of words and spaces in it. $ $ echo "A string with quite a lot of words and spaces in it." | fmt -w 44 A string with quite a lot of words and spaces in it. $
fmt is nice and will do what you want.
On debian systems it's in the coreutils package, so should be on every debian system going.
Thanks,