Check out file src/DTW.PS.FileSystem.Encoding.psm1
Every now and then you need to be able to programmatically determine a file's encoding. Maybe you are writing a utility that edits files and you want ensure you maintain the original encoding type. Perhaps you want to make sure that certain files have a Byte Order Marker (BOM).
If the file has a BOM, this is easy. If it doesn't... aw, crap. At that point you have to analyze the file's contents and make a judgement call based on what you see. I wrote a function to do this: Get-DTWFileEncoding
Get-DTWFileEncoding returns a System.Text.Encoding type based on the file specified. Here's an example of a big-endian file with a BOM:
As you can see, the System.Text.Encoding type is returned and the BOM type has the correct value: FE FF
Here's an example for another big-endian file, this time with no BOM:
The returned Encoding type info looks the same as the first but if you inspect the Preamble, there's no value.
There are some other handy functions in there as well:
- Add-DTWFileEncodingByteOrderMarker - adds a byte order marker file encoding to a file.
- Compare-DTWFiles - compares two files and returns $true if same, $false otherwise. Uses the two functions below to do comparisons.
- Compare-DTWFilesIgnoringBOM - compares two files, ignoring BOMs, returning $true if same, $false otherwise.
- Compare-DTWFilesIncludingBOM - compares two files, including BOMs, returning $true if same, $false otherwise.
Again, you can get the encoding functions at the beautifier:
I love it when you find someone has done all the hard work for you. Thanks this is brilliant.ReplyDelete
Thanks! I think there might be a bug in the algorithm (there's might be a corner case I missed); let me know if you find any issues.ReplyDelete
"file --mime filename" on Linux works so much fasterReplyDelete
I think you have your bit order mark wrong.. You're script has the BOM reversed for Big and Little Endian. http://en.wikipedia.org/wiki/Byte_order_markReplyDelete
* Byte Order Mark. sorry, buzzed.Delete