2009-02-27, 11:58
I have a problem unicode string (korean) matching that is surrounded by lots of tab and spaces.
What I trying to get is words between <dd> and </dd>.
With this, I could get whatever between <dd> and </dd>
problem is that I can not get rid of white spaces around words.
I tried with no "noclean", "trim", /s, /t which does not help.
If I use /b, it get rid of whole string. regex engine does not seem to support /p. I looked at pcre and saying that supporting /p is option.
please guide me on this.
Code:
<strong>등급</strong></dt>
<dd>
청소년관람불가(한국) </dd>
What I trying to get is words between <dd> and </dd>.
Code:
<RegExp input="$$7" output="<mpaa>\1</mpaa>" dest="8+">
<RegExp input="$$1" output="\1" dest="7">
<expression noclean="1"><strong>등급</strong></dt>[^>]*>(.[^<]*)</dd></expression>
</RegExp>
<expression trim="1"></expression>
</RegExp>
With this, I could get whatever between <dd> and </dd>
problem is that I can not get rid of white spaces around words.
I tried with no "noclean", "trim", /s, /t which does not help.
If I use /b, it get rid of whole string. regex engine does not seem to support /p. I looked at pcre and saying that supporting /p is option.
please guide me on this.