I discovered this problem with converting Microsoft Word documents to HTML.

Word creates "smart quotes" quotation marks in a character set that isn't recognized by the latest version of Apache in Fedora. All the quotations appear like question marks "?" on the HTML browser output, though it looks fine if the document is edited in Word or Frontpage.

I found this link which tells you how to disable them and then replace all the bad characters. You need to use the "Tools, AutoCorrect Options" menu and disable the smart feature in two places, "Autoformat" and "Autoformat as you type".


The problem is that Word has a number of smart features you need to disable too. Hypens are "smarted" to dashes, apostrophes are "smarted" to a character that looks almost the same. Elipses ("...") are "smarted" to something that looks almost the same too. The copyright symbol (c) is non standard too. Bulleted lists are affected too. I had to convert all my bullets into standard ASCII characters like "o" and "-" etc. I had to disable as many smart features in list as I could find.

All these non standard characters show up as "?" marks when converted to HTML. There were question signs all over my web pages! It took me a day to figure out what was wrong.

To add insult to injury, Word saves multiple spaces as multiple space characters, but HTML requires multiple spaces to be represented with multiple "" strings. If you have more than one space in a row, they will show up as a "?" too!

It worked fine in RedHat 9, but the Fedora version of Apache seems to be more sensitive.

Has anyone else had this problem? Is there an easier way to fix this? I know some of you will say "dump MS Word", but I need to create HTML from Word documents.