Auto-detect the correct MIME type regardless of file extension or response mime-type

pganti · June 28, 2013, 9:44pm

This might be a corner case to start with but nevertheless would be good to identify mismatches of file extension or mime type with that of a true file type.

If you go http://www.httparchive.org/viewsite.php?pageid=9285021 and look at image requests by format you would conclude that most of this site would be png. Since WPT powers that data lets look at their real waterfall and inspecting response mime types which tell us again that its image/png

http://www.webpagetest.org/result/130628_HQ_18TD/1/details/

However if you download one of those files (say /department-bucket-folder/department_buckets/6-1-69-1-480x480-1370359903.png ) and use “file” command to really look the content of the file you shall see it is a JPEG

localhost:global pganti$ file test.png
test.png: JPEG image data, JFIF standard 1.01

You might just even look at strings on the file and still guess it is JPG

The same is true of other images. Most likely it might be caused by https://drupal.org/node/568772 or some variant wherein the content owners just use a convenient extension that bears no relation to the content type.

My final ask here is that how can we catch these kinds of mismatches in WPT so that when we can flag such content types independent of the mime/file extensions? Would it be helpful or too much to do?

pmeenan · July 2, 2013, 12:56am

pganti:

This might be a corner case to start with but nevertheless would be good to identify mismatches of file extension or mime type with that of a true file type.

If you go http://www.httparchive.org/viewsite.php?pageid=9285021 and look at image requests by format you would conclude that most of this site would be png. Since WPT powers that data lets look at their real waterfall and inspecting response mime types which tell us again that its image/png

WebPageTest Test - WebPageTest Details

However if you download one of those files (say /department-bucket-folder/department_buckets/6-1-69-1-480x480-1370359903.png ) and use “file” command to really look the content of the file you shall see it is a JPEG

localhost:global pganti$ file test.png
test.png: JPEG image data, JFIF standard 1.01

You might just even look at strings on the file and still guess it is JPG

The same is true of other images. Most likely it might be caused by https://drupal.org/node/568772 or some variant wherein the content owners just use a convenient extension that bears no relation to the content type.

My final ask here is that how can we catch these kinds of mismatches in WPT so that when we can flag such content types independent of the mime/file extensions? Would it be helpful or too much to do?

For images it is reasonably easy to detect the correct image format by looking at the first 2 bytes of the response (Gif, Png and jpeg all have unique signatures). It wouldn’t be too hard to add another field that tracks the actual image type (though it’s more of a feature request than a bug ).

Images are probably the only file type where auto-detection is feasible though.

pganti · July 5, 2013, 6:52pm

Thank you

Agreed. In general other than images anything else that can have a standard format with specific magic headers can be auto-detected, right?

pmeenan · July 10, 2013, 2:34pm

In theory - though that really only opens up flash or video. None of the text resources (js, css, html) would be auto-detectable.

Topic		Replies	Views
Images not recognised for "Compress Images" Support	3	142	September 12, 2013
Images N/A Support	2	107	January 8, 2016
How does webpagetest check if an image can be compressed??!? Optimization Discussions	2	205	June 23, 2014
Image won't optimize Discuss Test Results	2	106	December 7, 2016
Image Optimisation Discuss Test Results	6	117	November 21, 2009

Auto-detect the correct MIME type regardless of file extension or response mime-type

Related topics