1683 posts Time is what you desire most, but waste carelessly.
  • Sold between 100 000 and 250 000 dollars
  • Elite Author
  • Bought between 50 and 99 items
  • Referred between 100 and 199 users
  • Grew a moustache for the Envato Movember competition
  • Exclusive Author
  • Has been a member for 4-5 years
  • Envato Studio (Microlancer) Beta Tester
  • Europe
+1 more
Firsh says

Sorry no unfolding but folding.

If you put certain characters in UTF -8 HTML documents, like o= or u= (only ü and ö are in the UTF -8 table) the thing is it works for general purpose. I thought they get coded in the HTML with their html entity number. I would like to unfold these but I failed miserably..

Let’s take “u=” for example (Hungarian character). Envato can’t even process it :D See that’s how important this would be if it could work.

The only way it works is when I look for ű I don’t know what is that called, I can’t reproduce how I got it and I don’t want to look for them one by one. I only know I have seen these kind of characters when there was an error with the encoding, mostly in SQL tables, so it’s kind of a last resort entity reference, I have no idea…

        $string = preg_replace( 
            array(htmlentities('/ű/')),
            array("u"),
            htmlentities($string));
        $string = html_entity_decode($string);
The string is UTF -8 encoded by default. Before any further processing and accent unfolding this code strips the “u=” and converts it “u”. But this is only one. What am I missing here.


        $string = preg_replace( 
            array('/&(.)[^;]*;/'),
            array('$1'),
            htmlentities(utf8_decode($string)));
This is close to what I use. Just an example.

I thought I’d just write a regex search for the unnamed characters, referencing them by their numbers. The numbers.

&#(36[0-9]|37[01]);

This pattern would match all the u= like characters from Ũ to ų , inlcuding u= which is number ű – but it didn’t work. How could I use this pattern? It seems like I can only use it for UTF -8 characters.

'/&(.)[^;]*;/'

(This is what I use, like in the example above.) This would match (basically everything) a lot of things including all number references and also including some non UTF -8 but named entities like Σ and when told it to replace with $1 backreference it would return S (the first char), but no joy here either.. Also tried it with sigma’s number reference, no luck.

What can I do? I don’t want to just strip them, the number range pattern solution would be so convenient! Um.. do I smell hex in the air?

by
by
by
by
by
by