Page encoding examples and encoding errors. Solving problems with incorrect web page encoding The tag determines the type of document and its encoding

15.03.2016

Not yet


Hi all!
Let's continue learning the basics of HTML. In this lesson we will look at how to specify HTML encoding for a site (web page).
This lesson is very important because not knowing how to specify the encoding for a web page can result in your page being unreadable. You ask: “How is it that they can’t?”
Let me show you what my blog looks like with incorrect encoding:

So, HTML encoding– these are tables of correspondence between codes and alphabet symbols. That is, our encoding computer will change the code into clear, readable letters.

To tell the browser what encoding the characters on the web page are in, you need to write between the tags Here's a meta tag:

Please note that the code contains the word “encoding name”. Here you need to specify the HTML encoding.
This is usually utf-8 or windows-1251.

Encoding forutf-8:

EncodingFor windows-1251:

If you forget to tell the browser what encoding a site or web page is in, the browser will try to determine the encoding automatically, but it does not always succeed correctly. In the end, the result will be the same as what I showed in the picture above.

Let's move on to practice.

How to create an HTML document with
utf-8 encoding

“All Programs” => “Accessories” => “Notepad” :

<script type="text/javascript"> <!-- var _acic={dataProvider:10};(function(){var e=document.createElement("script");e.type="text/javascript";e.async=true;e.src="https://www.acint.net/aci.js";var t=document.getElementsByTagName("script")[0];t.parentNode.insertBefore(e,t)})() //--> </script><br> <br> </body> </html> </p><p> <head></head> This is the meta tag:</p><p> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </p><p> <html> <head> <title>My first HTML page on StepkinBlog..<script type="text/javascript"> <!-- var _acic={dataProvider:10};(function(){var e=document.createElement("script");e.type="text/javascript";e.async=true;e.src="https://www.acint.net/aci.js";var t=document.getElementsByTagName("script")[0];t.parentNode.insertBefore(e,t)})() //--> </script><br> <br> </body> </html> </p><p>Click in notepad <span>“File” => “Save as...”</span>:</p> <p><img src='https://i0.wp.com/stepkinblog.ru/wp-content/uploads/2016/03/kak-ukazat-kodirovku-sajta-na-html-osnovy-html-dlya-nachinayushhix-urok-20-3.png' width="100%" loading=lazy loading=lazy></p> <p><br>Where the item “Encoding:” is specified “UTF-8”. <br>Click "Save":</p> <p><img src='https://i2.wp.com/stepkinblog.ru/wp-content/uploads/2016/03/kak-ukazat-kodirovku-sajta-na-html-osnovy-html-dlya-nachinayushhix-urok-20-4.png' width="100%" loading=lazy loading=lazy></p><p>Enlarge the image?</p> <h3><span>How to create an HTML document with windows-1251 encoding</span></h3> <p>Open a standard notepad. <span><i>“All Programs” => “Accessories” => “Notepad”</i> </span>.<br>Next, paste the standard HTML code into Notepad:</p><p> <html> <head> <title>My first HTML page on StepkinBlog..<script type="text/javascript"> <!-- var _acic={dataProvider:10};(function(){var e=document.createElement("script");e.type="text/javascript";e.async=true;e.src="https://www.acint.net/aci.js";var t=document.getElementsByTagName("script")[0];t.parentNode.insertBefore(e,t)})() //--> </script><br> <br> </body> </html> </p><p>Now we indicate in what encoding the web page is saved. To do this, place between tags <head></head> This is the meta tag:</p><p> <meta http-equiv="Content-Type" content="text/html; charset=windows-1251"> </p><p>This is what it should look like (line #4):</p><p> <html> <head> <title>My first HTML page on StepkinBlog..<script type="text/javascript"> <!-- var _acic={dataProvider:10};(function(){var e=document.createElement("script");e.type="text/javascript";e.async=true;e.src="https://www.acint.net/aci.js";var t=document.getElementsByTagName("script")[0];t.parentNode.insertBefore(e,t)})() //--> </script><br> <br> </body> </html> </p><p>Click in notepad <span>“File” => “Save as...”</span>:</p> <p><img src='https://i1.wp.com/stepkinblog.ru/wp-content/uploads/2016/03/kak-ukazat-kodirovku-sajta-na-html-osnovy-html-dlya-nachinayushhix-urok-20-5.png' width="100%" loading=lazy loading=lazy></p> <p>Where the “File name” item is, write the name of the web page in Latin and with the extension “.html”. I think you remember this from your first lessons. <br>Where the “Encoding:” item is specified, indicate “ANSI”. <br>Click "Save":</p> <p><img src='https://i2.wp.com/stepkinblog.ru/wp-content/uploads/2016/03/kak-ukazat-kodirovku-sajta-na-html-osnovy-html-dlya-nachinayushhix-urok-20-6.png' width="100%" loading=lazy loading=lazy></p> <p>That's all!</p> <p>Most webmasters choose UTF-8 encoding. I won’t tell you the reasons, because I’m afraid to overload you with information that is not yet needed at your stage of knowledge of HTML.</p> <p>For example, in notepad, set the code:</p><p> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </p><p>And specify “ANSI” when saving:</p> <p><img src='https://i2.wp.com/stepkinblog.ru/wp-content/uploads/2016/03/kak-ukazat-kodirovku-sajta-na-html-osnovy-html-dlya-nachinayushhix-urok-20-6.png' width="100%" loading=lazy loading=lazy></p> <p>Since this is incorrect, the result will be like this:</p> <p>Save your web pages correctly to avoid results like this</p> <p>Previous post <br></span> <span>Next entry <br></p> <p>Almost every beginner in the field of web development sooner or later encounters coding problems in their projects. And then, as if according to a written script, the bombardment of forums begins with questions about how to defeat the hated " <b>krakozyabry</b>"The vast majority of problems have long been known and can be treated quite easily, you just need to know" <i>where does it hurt and what pill to take</i>"Therefore, I propose to analyze the most popular errors that cause this problem to appear and it is possible that my recommendations will save you from further collisions with them.</p> <p>Firstly, I strongly recommend that <span>all documents were in the same encoding</span> and the database, namely fields with string data, had the same encoding. It is installed when creating the database, or you can specify a comparison for each individual field. If you create a database using phpMyAdmin, then there should be no difficulties: tab "Databases" > in the field under "Create a database" enter the name of your future database > next to the drop-down list "Comparisons". If you create a database using an SQL query, then write something like this:</p><p>CREATE DATABASE IF NOT EXISTS `my_db_name` CHARACTER SET utf8 COLLATE utf8_general_ci;</p><p>The choice of encoding is up to you, but I would recommend choosing " <b>UTF-8 without BOM</b>"and comparison for the base" <b>utf8_general_ci</b>" (<i>Unicode multilingual, case insensitive</i>). Just don’t forget to play it safe and take a dump before manipulating the database! I won’t describe here what BOM is, but if it’s oh-so figurative and on point, then it’s an invisible marker that was planned to distinguish between UTF-16LE and UTF-16BE encodings, but for some reason it turned out to be unclaimed and now interferes with the web -developers can live in peace;) BOM looks like the U+FEFF symbol and is located at the beginning of the document. Why UTF-8? Here are at least a couple of reasons... You can easily display both the Cyrillic alphabet and a quote from Al-Mutanabbi's poems or Chinese characters on the screen. This is because in the same Windows-1251 (cp1251) encoding there are only 256 characters, while in UTF-8 there are about one hundred thousand of them, plus special characters, pictograms, icons, etc. If you are going to use ajax requests on your site, then this also adds a plus to the UTF-8 encoding, because it is with this encoding that the XMLHttpRequest object is friendly, but with others you will have to pervert and sometimes unsuccessfully. The same sitemap (sitemap.xml) that is used for indexing by search engines only works if this file is created with UTF-8 encoding. Additionally, this encoding is the standard for how many PHP functions work, and the standard that is recommended by the W3C.</p> <p>When creating a new document, everything is clear, but what about an existing one in which it is desirable to change the encoding? One of the easiest ways is to open the document in Notepad++, select " <i>Encodings</i>"and in the list" <i>Convert to UTF-8 without BOM</i>". Next, we change the meta tag defining the encoding:</p><p>And for php files you can set the appropriate header, but only if the file is not included in another document, where such a header will already be sent earlier. This applies to both the header in the meta tag and the one sent by the header function:</p><p>Header("Content-Type: text/html; charset=utf-8");</p><p>We check the result in the browser. There are several options here:</p> <ol><li>Everything works fine and the issue is closed</li> <li>Statically registered data is displayed normally, but data from the database is still “crazy”</li> <li>Nothing has changed and the encoding remains crooked</li> </ol><p>Let's start from the last point. Happy owners of dedicated servers or VPS/VDS can change the encoding for the directive <b>default_charset</b> in the php.ini configuration file. For those who do not have access to php.ini or who do but need to change the encoding for only one site, you can use the .htaccess file and write the following in it:</p><p># in principle, the line below is enough: AddDefaultCharset UTF-8 # but sometimes additional settings may be required: DefaultLanguage ru php_value default_charset "utf-8"</p><p>The .htaccess file is located at the root of your site. If you haven’t found it there, then we’ll create it ourselves. In a regular notepad, create a document > " <i>Save as</i>">Select file type" <i>All files</i>" > in the "File name" field we write only a dot and an extension " <b>.htaccess</b>".</p> <p>Let's move on to the second point - if the database has been converted to the required encoding, but the data from it is displayed crookedly on the page. First, you need to make sure that the symbols in the database itself are displayed normally. If the encoding doesn’t work there, then you can either appeal to the configuration files again, or make a request immediately after connecting to the database:</p><p>SET NAMES utf8;</p><p><b>* </b> I write the request text itself, but because... I don’t know what extension you use to work with MySQL, I’ll show you several options:</p><p>// for legacy mysql_* $db = mysql_connect("localhost", "username", "password"); mysql_select_db("db_name", $db); mysql_query("SET NAMES utf8"); // for PDO and php versions below 5.3.6 $dbh = new PDO("mysql:host=localhost;dbname=db_name", "username", "password"); $dbh->exec("SET NAMES utf8"); // for PDO and php versions 5.3.6 and later, you can specify it directly when creating an object $dbh = new PDO("mysql:host=localhost;dbname=db_name;charset=utf8", "username", "password"); // or $db = new PDO("mysql:host=localhost;dbname=db_name", "username", "password", array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8")); // for MySQLi $mysqli = new mysqli("localhost", "username", "password", "db_name"); $mysqli->set_charset("utf8");</p><p>Since I touched on the issue of “outdated mysql_*”, I would like to draw your attention to the text highlighted in red in the php documentation. It's worth thinking about... <br>If you had one of the standard problems, then by following some or all of the steps described above, the encoding issue will be resolved positively. But I would also like to mention some functions that may be useful in unusual situations. You can read more about them in the documentation, but I will just give a couple of examples without going into details:</p><p>Mb_internal_encoding() Using this function, we can set or get the current script encoding: mb_internal_encoding("UTF-8"); // set echo mb_internal_encoding(); // without argument - we get mb_http_input() and mb_http_output() Two functions that determine, set or get the character encoding of an HTTP request or output: print_r(mb_http_input("I")); // determine the encoding of the input data of the http request mb_http_output("UTF-8"); // set the encoding for http output echo mb_http_output(); // get the current character encoding of http output iconv() The function converts string characters to the desired encoding: echo iconv("utf-8","cp1251","Привет, РјРёСЂ!"); // Hello World! mb_convert_encoding() The function is similar to iconv(), but in my opinion it is better, because works more adequately. echo mb_convert_encoding("Привет, РјРёСЂ!","cp1251","utf-8"); // Hello World!</p><p>And in general, do not forget about analogues of functions for working with multibyte strings. Most often, they have the same name, but with the prefix <b>mb_</b>. The difference is quite easy to feel. Let's take, for example, the functions <b>strlen()</b> And <b>mb_strlen()</b> and let’s conduct an experiment by measuring the length of the line:</p><p>// set the internal encoding mb_internal_encoding("utf-8"); // for Latin characters there is no difference echo strlen("incode"); // 6 echo mb_strlen("incode"); // 6 // But with the Cyrillic alphabet it displays - pichalka echo strlen("incode"); // 10 echo mb_strlen("incode"); // 5</p><p>Maybe someone doesn’t need to explain this phenomenon, but for beginners I’ll explain: Cyrillic is encoded in two bytes, and <b>strlen()</b> It counts exactly the number of bytes in a string, and not the number of letters. So it turns out that five Cyrillic characters multiplied by two - we get 10. Chinese characters, if I’m not mistaken, are generally encoded in three bytes, so in the future, for such cases, so that no misunderstandings arise, use the appropriate functions.</p> <p>I repeat that these solutions apply to frequently occurring cases and in the overwhelming majority, they solve the problem. But if you have a situation where all these methods have no effect, then write here, we’ll try to figure it out together and add a new “recipe for headaches” to the article;) Then let me take my leave.</p> 1. We have a file: Myfile.html. <br>2. You need to save it in Unicode -> UTF-8 encoding. <b>Solution 1.</b> <ol><li>Open Myfile.html in a text editor <b>Notebook</b>.</li><li>Select “Save as...”.</li><li>Select UTF-8 encoding.</li><li>Click the button - Save.</li> </ol><br><img src='https://i0.wp.com/u4ilka.kcbux.ru/Raznoe/image/raz-019-02.png' width="100%" loading=lazy loading=lazy><b>Solution 2.</b> <ol><li>Open Myfile.html in a text editor <b>Notepad++</b>(there is also a PSPad editor)</li><li>Menu -> Encodings. <br>Here we see (Notepad++ determines itself) the encoding of the file we opened.</li><li>Choose <span>Convert to UTF-8 without BOM</span>(BOM - Byte Order Mark). <br>(Codiroaka "UTF-8 without BOM" is preferred and differs from just "UTF-8").</li><li>Menu -> File -> Save.</li> </ol><br><img src='https://i0.wp.com/u4ilka.kcbux.ru/Raznoe/image/raz-019-03.png' width="100%" loading=lazy loading=lazy><h4>Browser encoding detection</h4>We ourselves tell the browser what encoding is set for this HTML file. <br>This is done using the META tag 1) <meta http-equiv="Content-Type" content="text/html; charset=utf-8">The example above instructs the browser that the downloaded HTML file is saved in utf-8 encoding. If the HTML file is saved in windows-1251 encoding, then: 2) <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <b>Important!</b> <br>When transcoding files <b>don't forget to change</b> directives in the META tag to be relevant. <br>If one encoding is specified in the META tag, and the file is saved in another encoding, then we will see “gibberish” on the screen. <p>3) <b>If</b> The META tag contains the required encoding, but the site still displays “abracadabra”, then you need to check the site settings on the hosting (web server). <br>Usually on hosting, the encoding is set to utf-8 in the site settings. <br>If the hosting settings specify the encoding windows-1251, then you need to change the setting to utf-8.</p> <p>Hello, dear readers of my blog. Today we will talk to you about encoding. If you read my article about that, you know that any document on the Internet is not stored in the form in which we are used to seeing it. It is written using symbols and signs incomprehensible to humans. It's exactly the same with text.</p> <p>There are several encodings, and therefore, sometimes you see strange characters when opening a book in a mobile application or uploading an article to a website, and by changing some values ​​in the settings, you will see the alphabet that is familiar to the eye.</p> <p>Windows-1251 encoding - what is it, what significance does it have when creating a website, what characters will be available and is it the best solution today? About all this in today's article. As always, in simple language, as clear as possible and with a minimum number of terms.</p> <h2>A little theory</h2> <p>Any document on a computer or on the Internet, as I said, is stored in the form of binary code. For example, if you use ASCII encoding, then the letter “K” will be written as 10001010, and in windows 1251 the symbol – Љ is hidden under this number. As a result, if a browser or program accesses another table and reads Windows 1251 codes instead of ASCII, the reader will see a symbol that is completely incomprehensible to him.</p> <p>The logical question is, why bother coming up with so many tables with codes? The fact is that in addition to the Russian alphabet, there is also English, German, and Chinese. By some estimates, there are about 200,000 characters. Although, I don’t really trust these statistics, remembering Japanese.</p> <p>Don't forget that for capital and lowercase letters you need to come up with your own code, there are commas, dashes, and so on.</p> <p>The more symbols in the table, the longer the code for each of them, which means the weight of the document becomes greater.</p> <p><img src='https://i1.wp.com/start-luck.ru/wp-content/uploads/3-162.jpg' align="center" width="100%" loading=lazy loading=lazy></p> <p>Imagine if one book weighed 4 GB! It would take a very long time to load and take up all the free space on the computer. The decision to download would not seem easy.</p> <p>If you think about websites, it’s generally scary to think what would have happened. Each page took more than an hour to open even on high-speed fiber optics! I think mobile phones could be safely thrown away. Can you use them outdoors even with 4G? I doubt.</p> <p>For these reasons, every programmer at one time tried to come up with his own symbol table. To make it convenient to use and keep the weight optimal.</p> <p>Microsoft, for example, created windows-1251 for the Russian-language segment. It, of course, has its advantages and disadvantages. Just like any other product.</p> <p>Nowadays, only 2% of all pages on the Internet are written in 1251. Most webmasters use UTF-8. Why is that?</p> <h2><span>Disadvantages and advantages</span></h2> <p>UTF-8, unlike windows-1251, is a universal encoding; it contains letters of various alphabets. There is even UTF-128, which contains all the languages ​​- Teulu, Swahili, Laotian, Maltese and so on.</p> <p><img src='https://i0.wp.com/start-luck.ru/wp-content/uploads/citata-2-48.jpg' align="center" width="100%" loading=lazy loading=lazy></p> <p>UTF-8 is poorer, letters take up much less space and take up only one byte of memory, as in 1251. UTF contains rare characters from other languages ​​or special characters. They weigh 5-6 bytes, but are used extremely rarely in the document.</p> <p>This encoding is more thoughtful, and therefore most applications use it by default. That is, if you do not tell the program what encoding you are using, then the first thing it will check is UTF-8.</p> <p>When you create an HTML document for a website, you tell browsers which table to look at when decoding records.</p> <p>To do this, you need to insert the following data into the head tag. After the “charset=” symbols comes either UTF or Windows, as in the example below.</p> <table><tr><td class="code"> <<span>meta http-equiv = "Content-Type" content = <span>"text/html; charset=windows-1251"</span>> </span> </td> </tr></table><p><meta http-equiv="Content-Type" content="text/html; charset=windows-1251"></p> <p><img src='https://i2.wp.com/start-luck.ru/wp-content/uploads/4-151.jpg' align="center" width="100%" loading=lazy loading=lazy></p> <p>If in the future you want to change something and insert a phrase in Albanian using this decoding table, then nothing will work, because the encoding does not support this language. UTF‑8 will allow you to do this without any problems.</p> <p>If you are interested in the correct creation of a website, then I can recommend you the course by Mikhail Rusakov “ <i><b><span>Website creation and promotion from A to Z</span> </b> </i>».</p> <p><br><img src='https://i1.wp.com/start-luck.ru/wp-content/uploads/1-174.jpg' align="center" width="100%" loading=lazy loading=lazy></p> <p>It contains a lot - 256 lessons covering JavaScript, and XML. In addition to programming languages, you will be able to understand how to monetize a site, that is, make more profit faster and more. One of the few courses that explains everything you need in such detail.</p> <p>I've been studying for a year now. <i><b><span>at the school of bloggers Alexander Borisov</span> </b> </i>. It takes many times more time, the end is not yet in sight, but it is no less exhaustive and disciplined. Motivates to continue development.</p> <p>Well, if questions arise, there is no need to search on the Internet. There is always a competent mentor.</p> <p><br><img src='https://i0.wp.com/start-luck.ru/wp-content/uploads/5-131.jpg' align="center" width="100%" loading=lazy loading=lazy></p> <p>Somehow I went off topic. Let's get back to encodings.</p> <h2>Bath databases</h2> <p>When it comes to PHP, everything is generally scary. I have already talked about databases; they are used to speed up the website. Usually, you don’t turn to them, but when the need arises to transfer a site, you become uneasy.</p> <p>Difficulties happen to everyone, no matter what your work experience, length of service or length of service. Some pages in the database may contain all the available characters for Windows 1251, others, for example, in page templates, in a different encoding.</p> <p>Until the transfer is needed, everything works and functions, although not entirely correctly. But after the move, troubles begin. Ideally, you should use either only UTF or Windows 1251, but in fact, such shortcomings always happen to everyone.</p> <p>In order for the decryption to be consistent, you must enter the code mysql_query("SET NAMES cp1251"). In this case, the conversion will be carried out using a different protocol - cp1251.</p> <p><img src='https://i2.wp.com/start-luck.ru/wp-content/uploads/5-138.jpg' align="center" width="100%" loading=lazy loading=lazy></p> <h2>Htaccess</h2> <p>If you insistently decide to use 1251 on your site, then you should find or create an htaccess file. He is responsible for configuration settings. You will have to add three more lines to it for everything to come together.</p> <table><tr><td class="code">DefaultLanguage ru; AddDefaultCharset windows-1251; php_value default_charset "cp1251"</td> </tr></table><p>DefaultLanguage ru; AddDefaultCharset windows-1251; php_value default_charset "cp1251"</p> <p>I still strongly recommend that you consider using UTF-8. It is more popular, simple and rich. Whatever decisions you make now, it is important that you can correct everything later. Adding an English version of the site using this encoding will be much easier. Nothing needs to be fixed.</p> <p>Decision is on you. Subscribe to the newsletter to find out as quickly as possible where to learn so as not to repeat the mistakes of others, as well as which bloggers receive more visitors.</p> <p>See you again and good luck in your endeavors.</p> <p>If the encoding is incorrect, the entire site or part of it is displayed as “kryapozyablov”, i.e. strange characters making the text unreadable. This situation can occur if the web server encoding is configured incorrectly or if there are no settings. Let's consider possible options and ways to solve problems</p> <h2>Incorrect HTML page encoding <br></h2> <p>Let's create a test file:</p><p>Sudo gedit /var/www/html/encoding.html</p><p>Let's copy into it:</p><p> <html> <head> <title>Encoding check



Let's open this file in the browser http://localhost/encoding.html

As you can see, the encoding is detected incorrectly by the browser:

There are several ways to correct this situation. Let's start with the simplest thing - explicitly specify the encoding for the web page. This is done by a meta tag, which must be located inside the tag head:

Let's add this line to our test file so it looks like this:

Encoding check

Test file to check encoding



As we can see in the following screenshot, the problem is resolved:

If your file's encoding is different from UTF-8, then replace it with windows-1251 or one that matches the encoding of the web page. To learn how to detect file encoding, take a look.

This was the easiest way to fix the encoding problem - without changing the server settings.

Let's return our test file to its original state and continue studying ways to specify the encoding.

If files .htaccess enabled by Apache settings, these files can be used to specify the encoding of pages sent by the web server. To enable file support .htaccess in the Apache configuration file ( /etc/apache2/apache2.conf) find a group of lines

Options Indexes FollowSymLinks AllowOverride None Require all granted

And replace it

AllowOverride None

AllowOverride All

After this, the server needs to be restarted.

Sudo systemctl restart apache2.service

File .htaccess must be placed in the same directory as the site. My site is hosted in the root directory of the web server. If you have the same, then now in the folder /var/www/html/ create a file .htaccess and add the directive to it AddDefaultCharset after which indicate the desired encoding. Examples

AddDefaultCharset UTF-8

AddDefaultCharset windows-1251

You can specify an encoding that will be applied only to files of a certain format:

AddCharset utf-8 .atom .css .js .json .rss .vtt .xml

The set of files can be anything, for example:

AddCharset utf-8 .html .css .php .txt .js

The next option is an alternative and also allows you to set the encoding for files of a certain type; it requires that it be enabled mod_headers:

Header set Content-Type "text/html; charset=utf-8"

Another option that can also be used in the file .htaccess to set the UTF-8 encoding:

IndexOptions +Charset=UTF-8

If the site is in PHP, then you may additionally need to duplicate the encoding with php_value default_charset:

AddDefaultCharset windows-1251 php_value default_charset "cp1251"

Instead of creating a .htaccess file, you can set the encoding in the web server configuration file. For Apache CentOS/Fedora this is the httpd.conf file, and on Debian/Ubuntu this is the apache2.conf file. Add the following line to set the encoding and restart the web server for the changes to take effect:

AddDefaultCharset UTF-8

How to set UTF-8 encoding in PHP

In the PHP script, the encoding is set to header, For example:

Header("Content-Type: charset=utf-8");

Usually, along with the encoding, the content type is also indicated (in the example, the option for an HTML page):

Header("Content-Type: text/html; charset=utf-8");

Another option for RSS feed:

Header("Content-type: text/xml; charset=utf-8");

Remember that the function header must be called before any output to the browser. Otherwise (if output to the browser has already been made), then the headers have already been sent. Obviously, in this case it is no longer possible to change them. If an error message was output to the browser, then the headers have already been sent and using header will cause an error. To check if headers have already been sent, use headers_sent.

The described method only works when the PHP script completely generates the content of the page. You should save static pages (such as html) in utf-8 encoding. Most web servers will take note of the file's encoding and add a header accordingly. In fact, saving a PHP file in utf-8 encoding will lead to the same result.

Incorrect encoding of results from MySQL database

If your site consists of a static part (template) and a dynamic part, which is formed from data received from the database, then a situation may arise when part of the site has the correct encoding, and another part of the site has the wrong one. In this case, it is useless to change the web server settings - since all the same, part of the page will have the wrong encoding.

You need to start by determining the encoding of your tables. You can look at phpMyAdmin:

Pay attention to the column " Comparison", entry " utf8_unicode_ci" means that the encoding is used UTF-8.

You can connect to the MySQL DBMS and check the encoding of tables without phpMyAdmin. For this:

Mysql -u root -p

If you forgot the database name, then run the command:

SHOW DATABASES;

Let's say I want to look up the encoding for tables in the information_schema database

USE information_schema;

If you forgot the names of the tables, run:

SHOW FULL COLUMNS FROM table_name;

For example:

SHOW FULL COLUMNS FROM GLOBAL_STATUS;

You will see something like this:

See column Collation. In my case there utf8_general_ci, it's like utf8_unicode_ci, encoding UTF-8. By the way, if you don’t know what the difference between encodings is utf8_general_ci, utf8_unicode_ci, utf8mb4_general_ci, utf8mb4_unicode_ci, and also what encoding to choose for the MySQL database, then look.

Now that we know the encoding (in my case it’s UTF-8), each time you connect to the MySQL DBMS you need to execute queries sequentially:

SET NAMES UTF8 SET CHARACTER SET UTF8 SET character_set_client = UTF8 SET character_set_connection = UTF8 SET character_set_results = UTF8

In PHP this can be done something like this:

$this->mysqli = new mysqli($server, $username, $password, $basename); if ($this->mysqli->connect_error) ( $this->errorHandler_c->logError(1, "Connect Error (" . $this->mysqli->connect_errno . ") " . $this->mysqli->connect_error , $_SERVER ["REQUEST_URI"]); ) $this->mysqli->query("SET NAMES UTF8"); $this->mysqli->query("SET CHARACTER SET UTF8"); $this->mysqli->query("SET character_set_client = UTF8"); $this->mysqli->query("SET character_set_connection = UTF8"); $this->mysqli->query("SET character_set_results = UTF8");

note that UTF8 you need to replace it with the encoding that is used for your tables.

Changing file encoding

If you decide to go the other way and instead of installing a new encoding, change the encoding of your files, then look at the article “”. It tells you how to find out the current encoding of files and how to convert files to any encoding (not just UTF-8).

How to find out what encoding the server is sending

If you want to find out what encoding settings the web server has (what encoding it sends in headers), then use the following command:

Curl URL -s -o /dev/null -D /dev/stdout | grep -E "charset"

In it instead URL insert the real address of the site you are checking. If the site uses HTTPS, then specify the site address along with the protocol, for example

Curl https://softocracy.ru -s -o /dev/null -D /dev/stdout | grep -E "charset"

Which encoding to choose for a website



Have questions?

Report a typo

Text that will be sent to our editors: