1783 posts Time is what you desire most, but waste carelessly.
  • Has sold $250,000+ on Envato Market
  • Elite Author: Sold more than $75,000 on Envato Market
  • Has been part of the Envato Community for over 5 years
  • Sells items exclusively on Envato Market
+5 more
Firsh says

Sorry no unfolding but folding.

If you put certain characters in UTF -8 HTML documents, like o= or u= (only ü and ö are in the UTF -8 table) the thing is it works for general purpose. I thought they get coded in the HTML with their html entity number. I would like to unfold these but I failed miserably..

Let’s take “u=” for example (Hungarian character). Envato can’t even process it :D See that’s how important this would be if it could work.

The only way it works is when I look for ű I don’t know what is that called, I can’t reproduce how I got it and I don’t want to look for them one by one. I only know I have seen these kind of characters when there was an error with the encoding, mostly in SQL tables, so it’s kind of a last resort entity reference, I have no idea…

        $string = preg_replace( 
            array(htmlentities('/ű/')),
            array("u"),
            htmlentities($string));
        $string = html_entity_decode($string);
The string is UTF -8 encoded by default. Before any further processing and accent unfolding this code strips the “u=” and converts it “u”. But this is only one. What am I missing here.


        $string = preg_replace( 
            array('/&(.)[^;]*;/'),
            array('$1'),
            htmlentities(utf8_decode($string)));
This is close to what I use. Just an example.

I thought I’d just write a regex search for the unnamed characters, referencing them by their numbers. The numbers.

&#(36[0-9]|37[01]);

This pattern would match all the u= like characters from Ũ to ų , inlcuding u= which is number ű – but it didn’t work. How could I use this pattern? It seems like I can only use it for UTF -8 characters.

'/&(.)[^;]*;/'

(This is what I use, like in the example above.) This would match (basically everything) a lot of things including all number references and also including some non UTF -8 but named entities like Σ and when told it to replace with $1 backreference it would return S (the first char), but no joy here either.. Also tried it with sigma’s number reference, no luck.

What can I do? I don’t want to just strip them, the number range pattern solution would be so convenient! Um.. do I smell hex in the air?

Helpful Information

  • Please read our community guidelines. Self promotion and discussion of piracy is not allowed.
  • Open a support ticket if you would like specific help with your account, deposits or purchases.
  • Item Support by authors is optional and may vary. Please see the Support tab on each item page.

Most of all, enjoy your time here. Thank you for being a valued Envato community member.

Post Reply

Format your entry with some basic HTML. Read the Full Details, or here is a refresher:

<strong></strong> to make things bold
<em></em> to emphasize
<ul><li> or <ol><li> to make lists
<h3> or <h4> to make headings
<pre></pre> for code blocks
<code></code> for a few words of code
<a></a> for links
<img> to paste in an image (it'll need to be hosted somewhere else though)
<blockquote></blockquote> to quote somebody

:grin: :shocked: :cry: Complete List of Smiley Codes

by
by
by
by
by
by