Sorry no unfolding but folding.
If you put certain characters in UTF -8 HTML documents, like o= or u= (only ü and ö are in the UTF -8 table) the thing is it works for general purpose. I thought they get coded in the HTML with their html entity number. I would like to unfold these but I failed miserably..
Let’s take “u=” for example (Hungarian character). Envato can’t even process it See that’s how important this would be if it could work.
The only way it works is when I look for Å± I don’t know what is that called, I can’t reproduce how I got it and I don’t want to look for them one by one. I only know I have seen these kind of characters when there was an error with the encoding, mostly in SQL tables, so it’s kind of a last resort entity reference, I have no idea…
$string = preg_replace( array(htmlentities('/Å±/')), array("u"), htmlentities($string)); $string = html_entity_decode($string);The string is UTF -8 encoded by default. Before any further processing and accent unfolding this code strips the “u=” and converts it “u”. But this is only one. What am I missing here.
$string = preg_replace( array('/&(.)[^;]*;/'), array('$1'), htmlentities(utf8_decode($string)));This is close to what I use. Just an example.
I thought I’d just write a regex search for the unnamed characters, referencing them by their numbers. The numbers.
This pattern would match all the u= like characters from Ũ to ų , inlcuding u= which is number ű – but it didn’t work. How could I use this pattern? It seems like I can only use it for UTF -8 characters.
(This is what I use, like in the example above.) This would match (basically everything) a lot of things including all number references and also including some non UTF -8 but named entities like Σ and when told it to replace with $1 backreference it would return S (the first char), but no joy here either.. Also tried it with sigma’s number reference, no luck.
What can I do? I don’t want to just strip them, the number range pattern solution would be so convenient! Um.. do I smell hex in the air?