php - UTF-8 all the way through -
i'm setting new server, , want support utf-8 in web application. have tried in past on existing servers , seem end having fall iso-8859-1.
where need set encoding/charsets? i'm aware need configure apache, mysql , php - there standard checklist can follow, or perhaps troubleshoot mismatches occur?
this new linux server, running mysql 5, php 5 , apache 2.
data storage:
specify
utf8mb4
character set on tables , text columns in database. makes mysql physically store , retrieve values encoded natively in utf-8. note mysql implicitly useutf8mb4
encoding ifutf8mb4_*
collation specified (without explicit character set).in older versions of mysql (< 5.5.3), you'll unfortunately forced use
utf8
, supports subset of unicode characters. wish kidding.
data access:
in application code (e.g. php), in whatever db access method use, you'll need set connection charset
utf8mb4
. way, mysql no conversion native utf-8 when hands data off application , vice versa.some drivers provide own mechanism configuring connection character set, both updates own internal state , informs mysql of encoding used on connection—this preferred approach. in php:
if you're using pdo abstraction layer php ≥ 5.3.6, can specify
charset
in dsn:$dbh = new pdo('mysql:charset=utf8mb4');
if you're using mysqli, can call
set_charset()
:$mysqli->set_charset('utf8mb4'); // object oriented style mysqli_set_charset($link, 'utf8mb4'); // procedural style
if you're stuck plain mysql happen running php ≥ 5.2.3, can call
mysql_set_charset
.
if driver not provide own mechanism setting connection character set, may have issue query tell mysql how application expects data on connection encoded:
set names 'utf8mb4'
.the same consideration regarding
utf8mb4
/utf8
applies above.
output:
if application transmits text other systems, need informed of character encoding. web applications, browser must informed of encoding in data sent (through http response headers or html metadata).
in php, can use
default_charset
php.ini option, or manually issuecontent-type
mime header yourself, more work has same effect.
input:
unfortunately, should verify every received string being valid utf-8 before try store or use anywhere. php's
mb_check_encoding()
trick, have use religiously. there's no way around this, malicious clients can submit data in whatever encoding want, , haven't found trick php reliably.from reading of current html spec, following sub-bullets not necessary or valid anymore modern html. understanding browsers work , submit data in character set specified document. however, if you're targeting older versions of html (xhtml, html4, etc.), these points may still useful:
- for html before html5 only: want data sent browsers in utf-8. unfortunately, if go the way reliably add
accept-charset
attribute<form>
tags:<form ... accept-charset="utf-8">
. - for html before html5 only: note w3c html spec says clients "should" default sending forms server in whatever charset server served, apparently recommendation, hence need being explicit on every single
<form>
tag.
- for html before html5 only: want data sent browsers in utf-8. unfortunately, if go the way reliably add
other code considerations:
obviously enough, files you'll serving (php, html, javascript, etc.) should encoded in valid utf-8.
you need make sure every time process utf-8 string, safely. is, unfortunately, hard part. you'll want make extensive use of php's
mbstring
extension.php's built-in string operations not default utf-8 safe. there things can safely normal php string operations (like concatenation), things should use equivalent
mbstring
function.to know you're doing (read: not mess up), need know utf-8 , how works on lowest possible level. check out of links utf8.com resources learn need know.
Comments
Post a Comment