Monday, April 25, 2011

PHP: Invalid character breaks html

Hello,

how should I clean up a string which contains invalid characters and would break html after printing it in textarea?

PHP's ord() returns 0 for the said character, but I suspect it's not null, which I don't think it matters anyway.

When string is displayed in textarea all text after the invalid character would disappear as well as all html elements after the textarea.

I tried with htmlentities, htmlspecialchars, mb_convert_encoding, iconv('UTF-8', 'UTF-8//IGNORE', $str), but neither of them worked.

filter_var() isn't introduced yet in 5.1.2 which we are using.

echo and var_dump print the character as �

From stackoverflow
  • Try filter_var($string, FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_LOW).

    Edit: Since 5.1.2 doesn't have filter_var, you could try this (which is almost the same thing):

    preg_replace('/\p{Cc}/u', '', $string);
    
    Mitja : Thanks, but we are using 5.1.2 which doesn't feature filter_var() yet.
    Ant P. : I think they have a PECL version of the filter extension for 5.1, but I've added another answer to the post anyway...
  • I have used this regular expression before when htmlentities, htmlspecialchars, mb_convert_encoding, iconv('UTF-8', 'UTF-8//IGNORE', $str) didn't work. It strips out the control characters

    $str = preg_replace( '/[\x00-\x08\x0B-\x0C\x0E-\x1F\x7F-\x9F]/', '', $str );
    
  • Possibly:

    $str = trim($str, chr(0));
    

    ??

0 comments:

Post a Comment