[funsec] Image forensics

Dr. Neal Krawetz hf at hackerfactor.com
Mon Dec 28 11:13:55 CST 2009


On 27 Dec 2009, Rob, grandpa of Ryan, Trevor, Devon & Hannah wrote:
> An interesting analysis of a graphic recently used by Victoria's Secret in their
>
> advertising.  This gives chapter and verse of the techniques used, and results 
> obtained, demonstrating the ability to determine if an image has been altered, and 
> even which parts of an image have been modified, and how.
>
> http://www.hackerfactor.com/blog/index.php?/archives/322-Body-By-Victoria.html

[snip]

Thanks for the compliments.
(I'm just catching up on my emails...)


Re: Dan Kaminsky
> Neal's code is neat and pretty, but chapter and verse is no substitute  
> for open code and side by side checks. A LOT of his output bears a  
> strong resemblence to edge detection (really, look for high frequency  
> signal, it'll show up in every test).

Edges can show up for many reasons.
  - The edge may be a high frequency region (as you stated) that appears.
  - With algorithms like ELA and LG, high contrast edges (like stripes on
    a zebra) can be at a higher error level or strong gradient than the
    rest of the image. However, it will not be significantly stronger.
    (If ELA has a black background, then the high contrast edge may be
    grayish, but not white.)
  - Artists usually make changes at edges to reduce visual detection.
    Think about it: if you are going to cut out or mask something, you are
    going to do it along the edge.  In the VS example, her outline is
    visible, but inside edges are not.  If the algorithms were only
    picking up edges, then all edges (inside, outside, and outline) should
    be at the same level.  They are not.

As a counter example to your edge theory, consider:
http://www.hackerfactor.com/blog/index.php?/archives/338-Id-Rather-Wear-Photoshop.html
(If you get a 503 server error, just reload.  GoDaddy's server is having
trouble with the concurrent connection load right now.  This will be
fixed in January.)
In the Error Level Analysis, the halo totally disappears, even though it
is a high contrast and high frequency element (white on dark).
If the algorithm was measuring edges, then the halo should still be visible
at least to some degree.

Second, with regards to "open code", I strongly disagree with your
assumption.  You seem to assume that releasing the code will allow people
to validate the methods.

 - If I release my own tool, then they will just use it and look at the
   results.  This does not validate the code nor the methods.

 - If I don't release my own tools, but describe the algorithms, then
   people will create their own and perform a more scientific comparison.

If you create your own tool that implements a variation of the algorithm(s)
and you cannot generate the same kind of results, then there is either
something wrong with your code or with mine.  Now we can do a proper
comparison.  We have a hypothesis and multiple tools to test it.

As an example, I have implemented my own PCA, DCT, and wavelet libraries.
(I couldn't use any of the public ones due to GPL issues.)  To validate
my libraries, I compared the results with GSL and other public libraries.
Since GSL and the other public libraries generate the same output as
my own library, it validates the implementation and method.

Thus, to validate the algorithms I use, someone else needs to implement
something based on the description of the algorithm.  Already, someone
implemented ELA based on the description in my Black Hat presentation:
  http://www.tinyappz.com/wiki/Error_Level_Analyser
His tool creates different coloring (he decided to use a temperature map),
but it generates results that are similar enough to validate the algorithm
and implementation.

There is another group that is working on their own variation of Luminance
Gradient, but they have not yet released their code. (And I don't know if
they plan to.)  Then again, my LG implementation is not unique.  There are
dozens of published papers that implement variations of the algorithm.
The algorithm I use is one of the most trivial methods (but it is fast
and effective).

Finally, I have no intention of releasing my code to the open source
community.  My code is designed to assist forensic investigators with a
serious problem: distinguishing real photos from computer graphics, and
identifying manipulation.  (This is the "real vs virtual" child porn
problem.)  A full, public release only helps the bad guys.
(Yes: this is the Security by Obscurity vs Full Disclosure debate.  I've
chosen my side.)


Re: Imri Goldberg
John Graham-Cummings' copy-move code is really pretty cool.
I wrote my own variation (based on the same paper that he cites); mine is
heavily optimized.  I described some of my optimization at:
http://www.hackerfactor.com/blog/index.php?/archives/308-Send-In-The-Clones.html
There is even a group working on their own variation:
  http://www.tinyappz.com/wiki/Copymove
(If John's code, my code, and Tinyappz all generate similar results, then
the algorithm must work and the methodology must be sound!)


Re: Martin Tomasek
> I like wavelet-based algorithms the most.

To each their own. :-)
Wavelets definitely have some strong points.
But for signal analysis, I'm actually growing very fond of Gaussian
Pyramid Decomposition.

					-Neal
--
Neal Krawetz, Ph.D.
Hacker Factor Solutions
http://www.hackerfactor.com/
Author of "Introduction to Network Security" (Charles River Media, 2006)
and "Hacking Ubuntu" (Wiley, 2007)



More information about the funsec mailing list