Amazon Rekognition Collection Strategies
Lots of companies are looking at ways to build applications and services that include Facial recognition features. Amazon
Rekognition makes it easy to integrate facial recognition into an application. But, how you set up your facial collections can
impact your searches. Amazon provides great information on the service and how to use the API. But to architect a more complete
solution, we needed to understand how facial data is stored and how the searching and retrieval of this data can impact our
application. We need to understand how Rekognition manages the new facial data.
We will be using a simple photo / video storage application to discuss some design decisions. We will look at some options
regarding facial recognition and understand some trade-offs based on a couple different approaches.
Application Requirements
Our basic image / video sharing application will upload images and video to an S3 bucket and provide some user interface to
manage the images and video. There are lots of other application requirements, but for this article we are going to focus on the
facial recognition requirements that could have an impact on our use of Rekognition collections. For the facial recognition
portion of the application there are some services we would want to provide to our users.
- Search for faces matching an face in our collection
- Filter current images based on face in one image
- Filter on other information detected about face details (Gender, age range, Eye Glasses, beard, etc)
- Search for matching faces based on a new image not uploaded yet
- Matching images for face in video
- Ability to assign Name and other information to image/face
Amazon Rekognition is able to detect a number of things about each face like, Age Range, Beard, Emotions, Eyeglasses, Gender,
Smile, etc. This information is returned from the indexFaces API call. Since we don't want to index the image multiple times, we
should save this in a place to make this easy to search and retrieve like DynamoDB or ElasticSearch. If we also want to attach
other information to this face, like a name, this would need to be added to our database. Other information about the image
itself could also be stored in the database, like location, date taken, etc.
We also need to understand if this is a single user application, or are we going to have multiple users using this application.
If we have multiple users is there going to be any relationship between the users or are they isolated or independent? For our
example, we are going to allow multiple users and it will be a combination when some users are related and others are
independent.
Understanding Rekognition
The Amazon Rekognition service has a lot of features. In this article we are going to focusing on the facial recognition
features. Before we can start making some design decisions, we need to under some details of how the Amazon Rekognition service
works. First we need to understand this concept of a collection and then how some of the API calls interact with these
collections. If we expect to have lots of users all around the world, how are certain features of Amazon Rekognition going to
impact our application? How much information can we share between users? How much do we want to share? Let’s take a deeper look
at Rekognition collections and a few specific points to understand how this is going to scale.
What are Amazon Rekognition collections?
Amazon Rekognition can be used to detect faces in images and videos. Videos can either be stored on S3 or streaming videos. But
to match a face with other faces, we need to have a collection of face data. An Amazon Rekognition collection is
a group of facial features that we can use to try to match other faces from other images and videos.
We can think of a Rekognition collection as a face database. Amazon Rekognition does not store the image bytes of each face; it
stores data points about facial features (eye, nose, mouth, etc) that it can use to find similarities to other images. The key
parameter in a collection is the FaceId. This parameter is created with the indexFaces API call. Each face that is stored in a
collection has a unique FaceId. Because there may be multiple faces in a .jpg or .png image, a bounding box is defined to locate
the face used from this image file. All this information and lots more is returned from the indexFaces API call. We can save this
in another database for easy lookup later.
A FaceId is the unique key into an Amazon Rekognition collection. It represents a specific facial representation in a specific image.
So, similar faces of the same person with slightly different tilt or eyes open or closed will have different FaceIds. Amazon
Rekognition can detect the same image file being processed by indexFaces. Running the indexFaces API with the same image,
produces the same FaceId. Even if the file is given a different name on S3, Rekognition detects this as the same image. To help
map the face or faces back to the source image, Rekognition provides a parameter called ExternalImageId. The
ExternalImageId parameter is provided to the indexFaces API and stored in the collection with the FaceId. This parameter is
optional but can be useful to map this FaceId back to the original file on S3 for example. If an image has multiple faces,
indexFaces API will detect all the faces, giving each face a unique FaceId, but they will all have the same ExternalImageId.
Before we can search or compare faces, we need to create an Amazon Rekognition collection to hold all the faces that we find in
our images. We do this with the createCollection API call. There is no published limit on the number of collections an AWS
account can create, but as we will see there are some trade-offs of 1 verses many collections. For our photo application, users
will upload images that will be analyzed for faces and put into an Amazon Rekognition collection. The more images a user uploads,
the more faces we can add to our collection. This could allow them to filter images or to identify people in images and
videos.
When doing a search in a collection the source face can be an existing face in our collection, a face from an image file or
video data. An image file or some data cropped from an image can be used for a search and passed to the searchFacesByImage API
call. This API call will detect the largest face in the image passed in and search for matching stored faces in our collection.
During a video scan, Rekognition will detect a face and use facial data to find similar faces in the collection. These all work
basically the same, but for this discussion we will use searches using faces already in our collection. All the search functions
work against a single collection, so the design decisions will be the same regardless of the search API used. Details about
working with videos can be found in the AWS Rekognition documentation.
The searchFaces API call will look for faces in the collection and try to match them to the source face. The API call will use
the facial detail and return a list of faces that are "similar". The data returned from the searchFaces API will include the
FaceId, some face details, bounding box, the ExternalImageId and the predicted “similarity”. Facial recognition is not exact, but
we can set some thresholds how similar the faces should match. Amazon does a good job explaining this in the Rekognition
documentation. In our application we may use an 80% threshold to try to find similar faces that might be different ages or makeup
or relatives. If we are trying to identify a person in the image, then we may want to look for matches of 99% and match this
FaceId to a name we stored in another database. This data gives us some options to filter the images to display. As our
collection grows, the search API calls will provide us with lots of data that we can use in our application to sort, filter, and
identify faces in our images and videos.
Rekognition collections are located in a specific region
Amazon Rekognition collections are created in a specific region. So depending on where the users are located and how many users
we have, this could have an impact on our collections. This should also be considered with the location of our S3 buckets for
storing the images and video. One approach would be to consolidate everything to a couple regions. Another approach would be to
distribute the application to multiple regions around the world with storage and Rekognition collections in each location.
Collections have 20 Million face limit
A 20 million face limit may seem like a lot of faces, but if each user uploads 1000 faces that is 20,000 users max in a single
collection. We would then need to create a new collection for the next set of 20,000 users. So if we plan to have a lot of users,
then one collection would not work.
FaceIds are valid for a specific collection
When we use the indexFaces API call to add faces from an image to a collection, Amazon Rekognition creates a unique FaceId for
the collection provided to the indexFaces call. We can call indexFaces multiple times with the same parameters and we get the
same FaceId. However, if we use the same image and everything, except use a different collection, we get a different FaceId. If
we think of the collection as a table and FaceId as the key, this is how we can search for similar faces. However, there is not a
way to link collections together or provide any relationship between collections. For example, we cannot use the FaceId from
Collection A and find similar faces in collection B. To find similar images in 2 collections we would need to A) use indexFaces
to put the face into Collection B and then search or B) use the original image and do a searchFacesByImage on Collection B to
find similar faces. Either way we are doing extra search API calls to find matches in other collections.
Different ExternalImageIds create new FaceIds
The ExternalImageId is a parameter provided to the indexFaces API call to attach some external information to the FaceId. The
searchFaces API call will take a source FaceId and return similar FaceIds along with the ExternalImageId for that FaceId.
However, one thing to understand is that given the exact same image file, collection-id, but a different ExternalImageId –
the indexFaces API call will create a different FaceId. Something like this could happen if duplicate images were uploaded but the
ExternalImageId is based on a random ID number or some other method. Even if the image file is exactly the same any change in ExternalImageId
will create a different FaceId. And with Multiple FaceIds, a search will return all the FaceIds and the different ExternalImageIds, so
it could look like 2 images match, but when displayed it will look like the same image is shown twice. The point to understand is the strategy
we use to create the ExternalImageId parameter can have an impact on the number of FaceIds in our collection.
Cost of API calls
Amazon Rekognition API calls start at about $0.001 per API call. Video is billed at about $0.10 per minute of video analyzed. A
collection may act like a database, but searching a Rekognition collection is much more expensive than other database API calls.
So it probably best to save the data from indexFaces in a less expensive database like DynamoDB. This can be used to save
previous searches and build relationships that are quicker to search. Then use the search APIs for more complex searches and
updating relationships as new files are added. Building the relationships as the images are added could reduce the need to
reindex or perform extra searches. We can update related images that may have been indexed earlier with likely matches and save
this in a database. We can then display images based on the database relationships first and then if needed, we can run a
searchFaces call to get a similarity score for other faces.
What are the trade-offs?
As with every application we are going to have to make some trade-offs in our application. Our simple application is primarily
focused around Rekognition. So, how we implement this component will have an impact on other areas of our application. So the
first thing we need to decide is will we have one or many collections?
A Single collections
The simplest option would be to have a single collection for all the face details. All images would run through indexFaces and
the details stored in a common collection.
Pros: A big positive with one collection is the ease of searches. indexFaces adds the face detail to a single
collection so searches match everything in the collection. This can make sharing of information easier
between groups of users.
Cons: Unfortunately the negatives are significant for our application. The limit of 20 Million faces in a
collection puts a limit on the number of users or images that can be processed. Also having all the faces
in a single collection means a search will return all the matches. There is no filter, so we will need
another way to filter the matches that belong to another user. Since a collection is located in a single
region, we could have some extra data transfer and storage issues for users that are not located in that
region. There could also be some security or privacy concerns if we have all this information for
different users in the same collection.
A collection per User
The opposite approach from the single collection is to give each user their own collection. This does address some of the
negatives from the single collection, but it also has some trade-offs.
Pros: Each user will have their own collection and can have up to 20 Million face details. This should be plenty
for each user. The collection could be located in different regions so they are closer to the user and
images and videos that are uploaded to their S3 bucket. Each user can have privacy, by having only their
data stored in the collection.
Cons: Trying to create a group to share face data is difficult because we cannot link collections together, so
multiple searches would have to be done to compare face data from multiple users. The Rekognition API call
cost is much higher than other database API calls, so this could drive up application costs. The more
people we allow in a group the more searches of different collections. The larger the group of users and the more cross group searches we
have will drive up the API costs and increase the search and other function complexity.
A hybrid approach
If we were to take a hybrid approach, then we could use a single collection for a small group of users. We could then have many
collections for individuals or groups. While not perfect, we get some benefits from a single collection and some from having
separate collections.
Pros: Users would be created under a group or family account. The group would have the limit of 20 Million
faces, so we can adjust the group sizes based on the estimated images per user to stay under this limit.
We could start with 5 or 10 in a group and see if this needs to be adjusted based on the average number of
images and faces uploaded. Isolation and privacy of facial images would be at the group level. So people
that are part of a group would have to agree that some of their images or data may be shared with
others in the group. A single search of the collection can pull up matches from anyone’s images in the
collection. The collections can be located close to the users region. This will provide better performance
because the majority of the users of the group should be close to the region with the collection and images.
Cons: There will be a limited number of people in a group, so it is impossible to share between groups without
extra API calls. People will need to agree to be part of a group and understand that others may have
access to some of their uploaded images and face information. Since multiple users are part of a group,
there will need to be a way to filter face detail information to only show images that belong to a
specific user. This will add a little overhead to searches and displaying of data. There could be
performance issues if there are users that are not located in the region the collection is store, but
mostly during uploading and indexing.
Summary
As we start to architect the application, we see how it is important to understand the options for using collections. For our
simple photo sharing application, the hybrid approach seems to make the best sense. But for a single use application, like tenant
identification at a condo security desk, a single collection would be a perfect fit. Or other application like security for a
large office building with multiple tenants, separate collections for each company, with no sharing might be the best option. The
key is to understand how collections work with searches and what results are needed for each application.
The hybrid solution seems to be the best option for our application. We can create the group account and then create user
accounts under the group account. The collection will belong to the group and all the faces will be stored in a common collection
for the group. We can also store other face details returned from indexFaces in another database for quick searches. We will need
to use the ExternalImageId as a key to a table to pull up information about the image. This could be image info, but it will also
need to contain some user ownership information and some customer user data. We will use the user ownership information to filter
data when we only want to display images for a specific user.
The account or group owner would determine the location of the collection. This could be automated or the user could select. This
might determine where the s3 bucket is located and the primary storage is for the images. If users are not in the same region,
files may need to be uploaded to the region the collection is located to improve indexing performance. Since this only happens
during the uploading of images, this may not have a major impact on overall performance. We can always use tools like CloudFront
to cache images that will be sent to web pages and other apps for display.
While Amazon Rekognition offers a lot of capabilities, it is important to understand the details as we plan to integrate these
services into our application. A single collection has some search and matching advantages, but also has some limits that could
impact scalability. Going with multiple collections allows some segmentation of the data and can help with scalability. We hope
that this has provided some insight into Amazon Rekognition and some important details that you can use when building your
applications.