I had some strange erros when adding a user control
First of all it told me:
Element 'searchresults' is not a known element. This can occur if there is a compilation error in the Web site
The I tried to compile the user control and it said
The name 'lnavigation' does not exist in the current context
So I figured the variable wiring up wasnt working and added a definition for lnavigation manually. The I got.
The type 'referencesearchresultsusercontrol' already contains a definition for 'lnavigation'
Very frustrating.
Later I also got
The file 'src' is not a valid here because it doesn't expose a type in the register tag.
Turns out that the problem was that the register tag was the problem and I had to remove the ".cs" from the src attribute.
Change
<%@ Register TagPrefix="blahblah" TagName="searchresults" Src="searchresults.ascx.cs" %>
To
<%@ Register TagPrefix="blahblah" TagName="searchresults" Src="searchresults.ascx" >
Tuesday, April 29, 2008
Monday, April 28, 2008
.Net Regular Expressions and accented / unicode characters
I was trying to replace any non letter characters using regular expressions which turned out to be a bit of a pain when unicode / accented characters were used.
I ended up trying to match the stuff that I wanted and remove everything else. Regular expressions aren't really set up like this as there isn't really a "not" operator.
This page was very useful:
http://www.regular-expressions.info/unicode.html
This expression did the trick for me, it matches everything, but only replaces (with the match) matches that I wanted.
Regex.Replace(authors, @"(?(?\p{L}\p{M}*|[ ,;|-])|(?.))", "${allowed}", RegexOptions.Compiled | RegexOptions.Multiline);
\p{L} matches any letter character without a separate accent
\p{M} matches any accent
\P{L}\p{M}* matches any letter character with any number of accents (it is possible to have more than one)
[ ,;|-] matches any special characters that I wanted to keep
the all group matches everything
the allowed group matches characters that I want to keep
the not allowed group matches anything else
The first expression in an or (|) group is the one that is matched so is the allowed group matches then the not allowed doesn't.
"${allowed}" in the replace string replaces a match with the contents of the allowed group. Since everything is matched nothing remains of the original string. If an not allowed match is replaced there is nothing in the allowed group.
Some notes:
Accented characters in unicode can be represented by a single character (for legacy reasons) or as a combination of a base character and one or more accent characters.
Thus a single character on screen such as é can be represented by either one or two unicode characters.
It is thus not possible to match accented characters in the usual way using square brackets [].
I ended up trying to match the stuff that I wanted and remove everything else. Regular expressions aren't really set up like this as there isn't really a "not" operator.
This page was very useful:
http://www.regular-expressions.info/unicode.html
This expression did the trick for me, it matches everything, but only replaces (with the match) matches that I wanted.
Regex.Replace(authors, @"(?
\p{L} matches any letter character without a separate accent
\p{M} matches any accent
\P{L}\p{M}* matches any letter character with any number of accents (it is possible to have more than one)
[ ,;|-] matches any special characters that I wanted to keep
the all group matches everything
the allowed group matches characters that I want to keep
the not allowed group matches anything else
The first expression in an or (|) group is the one that is matched so is the allowed group matches then the not allowed doesn't.
"${allowed}" in the replace string replaces a match with the contents of the allowed group. Since everything is matched nothing remains of the original string. If an not allowed match is replaced there is nothing in the allowed group.
Some notes:
Accented characters in unicode can be represented by a single character (for legacy reasons) or as a combination of a base character and one or more accent characters.
Thus a single character on screen such as é can be represented by either one or two unicode characters.
It is thus not possible to match accented characters in the usual way using square brackets [].
Subscribe to:
Posts (Atom)