This might be a corner case to start with but nevertheless would be good to identify mismatches of file extension or mime type with that of a true file type.
If you go http://www.httparchive.org/viewsite.php?pageid=9285021 and look at image requests by format you would conclude that most of this site would be png. Since WPT powers that data lets look at their real waterfall and inspecting response mime types which tell us again that its image/png
However if you download one of those files (say /department-bucket-folder/department_buckets/6-1-69-1-480x480-1370359903.png ) and use “file” command to really look the content of the file you shall see it is a JPEG
You might just even look at strings on the file and still guess it is JPG
The same is true of other images. Most likely it might be caused by https://drupal.org/node/568772 or some variant wherein the content owners just use a convenient extension that bears no relation to the content type.
My final ask here is that how can we catch these kinds of mismatches in WPT so that when we can flag such content types independent of the mime/file extensions? Would it be helpful or too much to do?
For images it is reasonably easy to detect the correct image format by looking at the first 2 bytes of the response (Gif, Png and jpeg all have unique signatures). It wouldn’t be too hard to add another field that tracks the actual image type (though it’s more of a feature request than a bug ).
Images are probably the only file type where auto-detection is feasible though.