Apple’s CSAM system was deceived, but the company has two safeguards

Update: Apple mentioned a second inspection of the server, and a professional computer vision company outlined a possibility of what this could be-described in “How the second inspection might work” below.
After the developers reverse engineered parts of it, the early version of the Apple CSAM system has been effectively tricked to mark an innocent image. However, Apple stated that it has additional safeguards to prevent this from happening in real life.
The latest development occurred after the NeuralHash algorithm was published to the open source developer website GitHub, anyone can experiment with it…
All CSAM systems work by importing a database of known child sexual abuse materials from organizations such as the National Center for Missing and Exploited Children (NCMEC). The database is provided in the form of hashes or digital fingerprints from images.
Although most technology giants scan photos uploaded in the cloud, Apple uses the NeuralHash algorithm on the customer’s iPhone to generate a hash value of the stored photo, and then compares it with the downloaded copy of the CSAM hash value.
Yesterday, a developer claimed to have reverse engineered Apple’s algorithm and released the code to GitHub-this claim was effectively confirmed by Apple.
Within a few hours after GitHib was released, the researchers successfully used the algorithm to create an intentional false positive-two completely different images that generated the same hash value. This is called a collision.
For such systems, there is always a risk of collisions, because the hash is of course a greatly simplified representation of the image, but it is surprising that someone can generate the image so quickly.
The deliberate collision here is just a proof of concept. Developers do not have access to the CSAM hash database, which would require the creation of false positives in the real-time system, but it does prove that collision attacks are relatively easy in principle.
Apple effectively confirmed that the algorithm is the basis of its own system, but told the motherboard that this is not the final version. The company also stated that it never intended to keep it confidential.
Apple told Motherboard in an email that the version analyzed by the user on GitHub is a generic version, not the final version used for iCloud Photo CSAM detection. Apple said it also disclosed the algorithm.
“The NeuralHash algorithm [...] is part of the signed operating system code [and] security researchers can verify that its behavior conforms to the description,” wrote an Apple document.
The company went on to say there are two more steps: running a secondary (secret) matching system on its own server, and manual review.
Apple also stated that after users pass the 30-match threshold, a second non-public algorithm running on Apple’s servers will check the results.
“This independent hash was chosen to reject the possibility that the erroneous NeuralHash matches the encrypted CSAM database on the device due to adversarial interference of non-CSAM images and exceeds the matching threshold.”
Brad Dwyer of Roboflow found a way to easily distinguish between the two images posted as a proof of concept for a collision attack.
I am curious how these images look in CLIP of a similar but different neural feature extractor OpenAI. CLIP works similarly to NeuralHash; it takes an image and uses a neural network to generate a set of feature vectors that map to the content of the image.
But OpenAI’s network is different. It is a general model that can map between images and text. This means that we can use it to extract human-understandable image information.
I ran the two collision images above through CLIP to see if it was also fooled. The short answer is: no. This means that Apple should be able to apply a second feature extractor network (such as CLIP) to the detected CSAM images to determine whether they are real or fake. It is much more difficult to generate images that deceive two networks at the same time.
Finally, as mentioned earlier, the images are manually reviewed to confirm that they are CSAM.
A security researcher said that the only real risk is that anyone who wants to annoy Apple could provide false positives to human reviewers.
“Apple actually designed this system, so the hash function does not need to be kept secret, because the only thing you can do with’non-CSAM as CSAM’ is to annoy Apple’s response team with some junk images until they implement filters to eliminate analysis Those garbage in the pipeline are false positives,” Nicholas Weaver, a senior researcher at the Institute of International Computer Science at the University of California, Berkeley, told Motherboard in an online chat.
Privacy is an issue of increasing concern in today’s world. Follow all reports related to privacy, security, etc. in our guidelines.
Ben Lovejoy is a British technical writer and EU editor for 9to5Mac. He is known for his columns and diary articles, exploring his experience with Apple products over time to get more comprehensive reviews. He also writes novels, there are two technical thrillers, a few short science fiction films and a rom-com!


Post time: Aug-20-2021