Dan's PowerShell Stuff: Get file encoding even if no Byte Order Marker

Note: you can find the latest version of the encoding functions in my PowerShell beautifier project:
https://github.com/DTW-DanWard/PowerShell-Beautifier
Check out file src/DTW.PS.FileSystem.Encoding.psm1

Every now and then you need to be able to programmatically determine a file's encoding. Maybe you are writing a utility that edits files and you want ensure you maintain the original encoding type. Perhaps you want to make sure that certain files have a Byte Order Marker (BOM).

If the file has a BOM, this is easy. If it doesn't... aw, crap. At that point you have to analyze the file's contents and make a judgement call based on what you see. I wrote a function to do this: Get-DTWFileEncoding

Get-DTWFileEncoding returns a System.Text.Encoding type based on the file specified. Here's an example of a big-endian file with a BOM:

As you can see, the System.Text.Encoding type is returned and the BOM type has the correct value: FE FF

Here's an example for another big-endian file, this time with no BOM:

The returned Encoding type info looks the same as the first but if you inspect the Preamble, there's no value.

There are some other handy functions in there as well:

Add-DTWFileEncodingByteOrderMarker - adds a byte order marker file encoding to a file.
Compare-DTWFiles - compares two files and returns $true if same, $false otherwise. Uses the two functions below to do comparisons.
Compare-DTWFilesIgnoringBOM - compares two files, ignoring BOMs, returning $true if same, $false otherwise.
Compare-DTWFilesIncludingBOM - compares two files, including BOMs, returning $true if same, $false otherwise.

Again, you can get the encoding functions at the beautifier:

https://github.com/DTW-DanWard/PowerShell-Beautifier

6 comments:

AnonymousFebruary 20, 2013 at 12:02 PM
I love it when you find someone has done all the hard work for you. Thanks this is brilliant.
Dan WardFebruary 20, 2013 at 12:41 PM
Thanks! I think there might be a bug in the algorithm (there's might be a corner case I missed); let me know if you find any issues.
AnonymousMarch 27, 2013 at 8:03 AM
"file --mime filename" on Linux works so much faster
AnonymousSeptember 28, 2014 at 9:49 PM
I think you have your bit order mark wrong.. You're script has the BOM reversed for Big and Little Endian. http://en.wikipedia.org/wiki/Byte_order_mark
Ian MNovember 22, 2025 at 2:42 AM
This is a really helpful set of PowerShell tools for working with file encodings.

Dan's PowerShell Stuff

Monday, February 20, 2012

Get file encoding even if no Byte Order Marker

6 comments:

About Me

Blog Archive