FINDINGPEOPLEINNEWS
athesis
submittedtothedepartmentofcomputerengineering
andtheinstituteofengineeringandscience
ofbilkentuniversity
inpartialfulfillmentoftherequirements
forthedegreeofmasterofscience
ByDeryaOzkanJuly,2007
IcertifythatIhavereadthisthesisandthatinmyopinionitisfullyadequate,inscopeandinquality,asathesisforthedegreeofMasterofScience.
Asst.Prof.Dr.PinarDuyguluS¸ahin(Advisor)
IcertifythatIhavereadthisthesisandthatinmyopinionitisfullyadequate,inscopeandinquality,asathesisforthedegreeofMasterofScience.
Asst.Prof.Dr.SelimAksoy
IcertifythatIhavereadthisthesisandthatinmyopinionitisfullyadequate,inscopeandinquality,asathesisforthedegreeofMasterofScience.
Prof.Dr.
ApprovedfortheInstituteofEngineeringandScience:
Prof.Dr.MehmetB.Baray
DirectoroftheInstitute
ii
ABSTRACT
AGRAPHBASEDAPPROACHFORFINDING
PEOPLEINNEWS
DeryaOzkan
M.S.inComputerEngineering
Supervisor:Asst.Prof.Dr.PinarDuyguluS¸ahin
July,2007
Alongwiththerecentadvancesintechnology,largequantitiesofmulti-modaldatahasarisenandbecameprevalent.Hence,effectiveandefficientretrieval,organizationandanalysisofsuchdataconstitutesabigchallenge.Bothnewsphotographsonthewebandnewsvideosontelevisionformsthiskindofdatabycoveringrichsourcesofinformation.Peoplearemostlythemainsubjectofthenews;therefore,queriesrelatedtoaspecificpersonareoftendesired.
Inthisstudy,weproposeagraphbasedmethodtoimprovetheperformanceofpersonqueriesinlargenewsvideoandphotographcollections.Weexploitthemulti-modalstructureofthedatabyassociatingtextandfaceinformation.Ontheassumptionthataperson’sfaceislikelytoappearwhenhis/hernameismentionedinthenews,onlythefacesassociatedwiththequerynameareselectedfirsttolimitthesearchspaceforaqueryname.Then,weconstructasimilaritygraphofthefacesinthislimitedsearchspace,wherenodescorrespondtothefacesandedgescorrespondtothesimilaritybetweenthefaces.Amongthesefaces,therecouldbemanyfacescorrespondingtothequeriedpersonindifferentconditions,posesandtimes.Therecouldalsobeotherfacescorrespondingtootherpeopleinthenewsorsomenon-faceimagesduetotheerrorsinthefacedetectionmethodused.However,inmostcases,thenumberofcorrespondingfacesofthequeriedpersonwillbelarge,andthesefaceswillbemoresimilartoeachotherthantoothers.Tothisend,theproblemistransformedintoagraphproblem,inwhichweseektofindthedensestcomponentofthegraph.Thismostsimilarsubset(densestcomponent)islikelytocorrespondtothefacesofthequeryname.Finally,theresultofthegraphalgorithmisusedasamodelforfurtherrecognitionwhennewfacesareencountered.Inthepaper,ithasbeen
iii
iv
shownthatthegraphapproachcanalsobeusedfordetectingthefacesoftheanchorpersonswithoutanysupervision.
Theexperimentsareperformedontwodifferentdatasets:newsphotographsandnewsvideos.ThefirstsetconsistsofthousandsofnewsphotographsfromYahoo!newswebsite.Thesecondsetis229broadcastnewsvideosprovidedbyNISTforTRECVID2004.Imagesfromthebothsetsaretakeninreallifeconditionsand,therefore,havealargevarietyofposes,illuminationsandexpres-sions.Theresultsshowthatproposedmethodoutperformsthetextonlybasedmethodsandprovidescuesforrecognitionoffacesonthelargescale.
Keywords:Facerecognition,faceretrieval,SIFTfeatures.
¨OZET
HABERLERDEKIONEMLIYUZLER:ONEMLI
KISILERVEONEMLIYUZNOKTALARI
DeryaOzkan
BilgisayarM¨uhendisli˘gi,,Y¨uksekLisansTezY¨oneticisi:Yard.Do¸c.Dr.PinarDuyguluS¸ahin
Temmuz,2007
Bualmada,haberfotoraflarndanoluangeniverikmelerindekiilerinsorgulan-masnsalayanbiryntemsunulmutur.Yntemisimveyzlerinilikilendirilmesinedayanmaktadr.Haberbaslndakiininismigeiyorisefotoraftadaokiininyznnbulunacavarsaymyla,ilkolaraksorgulananisimileilikilendirilmi,fotoraflardakitmyzlerseilir.Buyzlerarasndasorgukiisineaitfarklkoul,pozvezaman-lardaekilmi,pekokresminyannda,haberdeismigeenbakakiilereaityzleryadakullanlanyzbulmayntemininhatasndankaynaklananyzolmayanresimlerdebulunabilir.Yinede,ouzaman,sorgukiisineaitresimlerdahaokolup,bures-imlerbirbirinedierlerineolduundandahaokbenzeyeceklerdir.Bunedenle,yzlerarasndakibenzerliklerizgeselolarakbetimlendiinde,birbirineenokbenzeyenyzlerbuizgedeenyounbileenolacaktr.Bualmada,sorguismiyleilikilendirilmi,yzlerarasndabirbirineenokbenzeyenaltkmeyibulan,izgeyedayalbiryntemsunulmaktadr.
Anahtars¨ozc¨ukler:Y¨uztanıma,y¨uzsorgulama,SIFT.
v
Acknowledgement
IwouldfirstliketoexpressmygratitudetomyadvisorPinarDuyguluS¸ahinforherguidanceandsupportthroughoutmystudies.Besideshervaluablecommentsandteaching,shewasmymainsourceofmoralesupportbyhighlymotivatingmeduringmygraduatestudy.Iamhonouredtobeherfirstmastersstudent.IamverythankfultoSelimAksoyforhissuggestionsandvaluablecomments.IwaspleasedtobeapartoftheRETINAteam,andhavingsuchanicefriendshipwiththegroupmembers.ThetwoyearswiththeminEA522willbeaverypleasantmemoryforme.
¨ITAK˙ThisresearchispartiallysupportedbyTUBCareergrantnumber104E065andgrantnumber104E077.
vi
Contents
1Introduction1.1Motivation...............................1.2SummaryofContribution......................1.3
OrganizationoftheThesis......................
2Background2.1OnIntegrationofNamesandFaces.................2.2
OnFaceRecognition.........................2.2.1HolisticMethods.......................2.2.2LocalMethods........................2.2.3HybridMethods........................
2.3OntheUseofInterestPoints....................2.4
OntheUseofGraphTheoreticalMethodsinComputerVision..
3GraphBasedPersonFindingApproach3.1
Overview................................
vii
115811111415161718192121
CONTENTS3.2IntegratingNamesandFaces.....................3.3
ConstructingSimilarityGraphofFaces...............3.3.1GeometricalConstraint....................3.3.2UniqueMatchConstraints..................3.3.3
SimilarityGraphConstruction................
3.4GreedyGraphAlgorithmforFindingtheDensestComponent..3.5AnchorpersonDetectionandRemovalforNewsVideos......3.6
DynamicFaceRecognition......................3.6.1DegreeModeling.......................3.6.2
DistanceModeling......................
4Experiments4.1
DataSets...............................4.1.1NewsPhotographs......................4.1.2NewsVideos..........................
4.2EvaluationCriteria..........................4.3
ExperimentalResultsonNewsPhotographs............4.3.1MatchingPoints........................4.3.2GraphApproach.......................4.3.3
OnlineRecognition......................
4.4ExperimentalResultsonNewsVideos................
viii
22242627282931323233
36363637383939394040
CONTENTS4.4.1IntegratingFacesandNames................4.4.2AnchorDetection.......................4.4.3GraphApproach.......................
4.5AMethodforFindingtheGraphThresholdAutomaticaly....4.6
PerformanceAnalysis.........................
5Comparison5.1BaselineMethod...........................5.2
FeatureSelectionandSimilarityMatrixConstruction.......5.2.1FindingTrueMatchingPoints................
5.3
ExtractingSimilarGroupofFaces..................5.3.1k-nnApproach........................5.3.2
One-classClassification....................
6ConclusionsandFutureWork6.1Conclusions..............................6.2
FutureWork..............................
ADifferentFormsofNamesinNewsPhotographsix
4042424344
53535454555555
636365
72
ListofFigures
1.1Sampledetectedfacesinnewsphotographs[8],forwhichthenamePresidentGeorgeW.Bushappearsintheassociatedcaption....
1.2Samplenewsphotographsandtheirassociatedcaptions.Therecanbemorethanonefaceonthephotographormorethanonenameinthecaption..........................
1.3Sampleshotsfromnewsvideosandthespeechtranscripttextsalignedwiththoseshots........................
1.4Key-framesfromtwodifferentvideos.Thenumbersbeloweachimageshowthedistancetoshot,inwhichthename’Clinton’ismentioned.Notethatinbothcases,Clintondoesnotappearvi-suallyintheshotinwhichhisnameismentionedbutappearsinpreceding(upimage)orsucceedingshots(bottomimage).....
1.5Samplefacesfromnewsphotographs[8](onthetop),andnewsvideos(onthebottom).
.......................
x
2
3
4
5
8
LISTOFFIGURES1.6Thefourstepsofouroverallapproach.Step1:Limitthesearchspaceofaquerypersontotheimagesthatareassociatedwiththequeryname.Step2:Constructasimilaritygraphoffacesinthissearchspace.Step3:Findthelargestdensestcomponentofthegraphcorrespondingtothefacesofthequeryperson.Step4:Usetheresultofthepreviousapproachasamodelinrecognizingnewfaces..................................
2.1Themainstepsinthefacerecognitionprocess(takenfrom[54])..3.1
Thefirstimageontheleftshowsallthefeaturepointsandtheirmatchesbasedontheminimumdistance.Thesecondimageontherightshowsthematchesthatareassignedastrueaftertheapplicationofgeometricalconstraints.................
3.2ForapairoffacesAandB,letA1andA2betwopointsonA;andB1isapointonBwiththearrowsshowingthematchesandtheirdirection.OntheleftisamultipleassignmentwherebothpointsA1andA2onAmatchB1onB.Insuchacase,thematchbetweenA2andB2iseliminated.OntherightisaonewaymatchwhereB1isamatchforA1,whereasB1matchesanotherpointA2onA.ThematchofA1toB1iseliminated.ThematchofB1toA2remainsthesameifB1isalsoamatchforA2;otherwiseitiseliminated................................
3.3Anexampleforuniquematchconstraint.Matchesfromthelefttotherightimageareshownbyred,dashedlines,whereasmatchesfromrighttoleftareshownbyyellowlines.Theleftimageshowsthematchesassignedafterapplyinggeometricalconstraints,butbeforeapplyingtheuniquematchconstraints.Therightimageshowstheremainingmatchesafterapplyingtheuniquematchcon-straints.................................
xi
1015
27
28
28
LISTOFFIGURES3.4Examplesformatchingpoints.Notethat,evenforfaceswithdif-ferentsize,poseorexpressionsthemethodsuccessfullyfindsthecorrespondingpoints..........................
3.5Dissimilaritymatrixfor200imagesinthesearchspaceforthenameHansBlix.Inthissearchspace,97oftheimagesaretrueHansBliximages,andtheremaining103arenot.Forvisualization,the97trueBliximagesareputonthetopleftofthematrix.Darkcolorscorrespondtolargersimilarityvalues.............
3.6Exampleofconvertingaweightedgraphtoabinarygraph.Nodesandtheirdistancesaregiveninthefirstimage.Theresultinggraphafterapplying0.65astheproximitythresholdisgiveninthesecondimage.Boldedgesaretheedgesthatremainafterconversion.Thefinaldensestcomponentofthisgraphiscircledinthelastimage.
3.7Anchorpersonprobleminnewsvideos.Afterintegratingnamesandfacesforaqueryperson,itishighlyprobablethateitheroneoftheanchorpersonswillbereturnedasthedensestcomponentofthelimitedsearchspace.Thefirstfigureontheleftcorrespondstoasample..............................
4.1Namesof23peopleareusedintheexperiments.Thetotalnumberoffacesassociatedwithanameisrepresentedbyredbarsandnumberofcorrectfacesbygreenbars.
...............
4.2Recall-precisioncurveof23peopleinthenewsphotographsdataset.Precisionandrecallvalueschangedependingonthethreshold.Weusedthresholdvaluesbetween0.55and0.65toshowtheeffect.Thethresholdusedintherestofourexperimentsismarkedwithred.
..................................
xii
29
30
34
35
45
46
LISTOFFIGURES4.3Recallandprecisionvaluesfor23peopleforgraphthresholdvalueof0.575.Bluebarsrepresentrecallandredbarsrepresentprecisionvaluesthatareachievedwiththeproposedmethod.Greenbarsareprecisionvaluesforthebaselinemethod,whichdoesnotusethevisualinformationandretrievesthefaceswhennameappearsinthecaption.
............................
4.4Sampleimagesretrieved(ontheleft)andsampleimagesnotre-trieved(ontheright)forthequerynames:GeorgeBush(top),ColinPowell(middle),HansBlix(bottom)..............
4.5Recallandprecisonvaluesoftheheld-outset(ontheleft)andtheconstructedmodel(ontheright)foronlinerecognitionwithdegreemodelingmethod............................
4.6Recallandprecisonvaluesoftheheld-outset(ontheleft)andtheconstructedmodel(ontheright)foronlinerecognitionwithdistancemodelingmethod.......................
4.7Recallvaluesoftheheld-outsetandtheconstructedmodelforeachpersonforK=10........................
4.8Precisonvaluesoftheheld-outsetandtheconstructedmodelforeachpersonforK=10........................
4.9ThefigureshowsfrequencyofBillClinton’svisualappearancewithrespecttothedistancetotheshotinwhichhisnameismentioned.Left:whenthewholedatasetisconsidered,right:whenthefacesappearingaroundthenamewithintheprecedingandthefollowingtenshotsareconsidered.OverthewholedatasetClintonhas240facesand213ofthemappearintheselectedrange.........
4.10TherelativepositionofthefacestothenameforBenjaminNe-tanyahu,SamDonaldson,SaddamHussein,andBorisYeltsinre-spectively................................
xiii
47
48
48
49
49
50
50
51
LISTOFFIGURES4.11Detectedanchorsfor6differentvideos................4.12Recall-precisionvaluesforrandomlyselected10newsvideosfor
thresholdvaluesvaryingbetween0.55and0.65...........4.13Sampleimagesretrievedforfivepersonqueriesinexperiments.
EachrowcorrespondstosamplesforClinton,Netanyahu,SamDonaldson,Saddam,Yeltsinqueriesrespectively...........4.14Precisionsvaluesachievedforfivepeopleusedinourtests.....5.1RecognitionratesoftheeigenfacemethodfordifferentKvalues.5.2
Examplesformatchingpoints.Firstcolumnformatchesfoundbyusingtheoriginalmatchingmetricofsift.Secondcolumnformatchesfoundbyapplyingtheproposedmethod..........5.3
Similaritymatrixfor201imagesinthesearchspaceforthenameHansBlix.Inthissearchspace,98oftheimagesaretrueHansBliximages,andtheremaining103arenot.Forvisualization,the98trueBliximagesareputonthetopleftofthematrix.Darkcolorscorrespondtolargersimilarityvalues.............5.4
Recall-precisioncurveof23peopleinthetestset.Precisionandrecallvalueschangedependingonthethreshold...........5.5
Examplesformatchingpointsassignedbythehomographymatrixfoundafterransac...........................5.6
Recognitionratesofthek-nnapproachfordifferentKandkval-ues.Recallvaluesontheleftandprecisionontheright.Kisthepercentageoftheimagesusedfortesting,andkisthenumberofneighborsink-nn.
..........................
xiv
51
51
525254
59
60
60
61
61
LISTOFFIGURES5.7Similaritymatrixfor201imagesinthesearchspaceforthenameHansBlix.Inthissearchspace,98oftheimagesaretrueHansBliximages,andtheremaining103arenot.Forvisualization,the98trueBliximagesareputonthetopleftofthematrix.Darkcolorscorrespondtolargersimilarityvalues.............
5.8Recall-precisioncurveof23peopleinthetestset.Precisionandrecallvalueschangedependingonthethreshold...........
xv
62
62
ListofTables
4.1RecognitionratesofdegreemodelingfordifferentKvalues.(Kpercentoftheimagesareusedastheheld-outset............
41
4.2RecognitionratesofdistancemodelingfordifferentKvalues.(Kpercentoftheimagesareusedastheheld-outset.........
41
4.3Numberoffacescorrespondingtothequerynameovertotalnum-beroffacesintherange[-10,10]and[-1,2]..............
41
4.4Numbersinthetableindicatethenumberofcorrectimagesre-trieved/totalnumberofimagesretrievedforthequeryname...
43
5.15.2
RecognitionratesoftheeigenfacemethodforfordifferentKvalues.54RecognitionratesofsupervisedmethodfordifferentKvalues.(Kisthepercentageoftheimagesthatareusedintesting;kisthenumberofneighboursused......................
56
5.3Recall-precisionratesoftwooneclassclassificationmethods:w1(nearestneighbordatadescriptionmethod)andw2(k-nearestneighbordatadescriptionmethod)(appliedontfidf’s).......
58
xvi
Chapter1Introduction
1.1
Motivation
Alongwiththerecentadvancesintechnology,largequantitiesofmulti-modaldatahasbecomemoreavailableandwidespread.Withitsemergence,effectiveandefficientretrieval,organizationandanalysisofsuchkindofbigdatahasbecomeachallengingproblemandarousedinterest.Newsphotographsonthewebandnewsvideosontelevisionaretwoexamplesofthistypeofdata.Theyacquirerichsourcesofinformationwherein.Hence,accessingthemisespeciallyimportantanditsimportancehasalsobeenacknowledgedbyNISTinTRECVIDvideoretrievalevaluation[1].
Peopleareusuallythemainsubjectofthestoriesinthenews.Therefore,queriesrelatedtoaspecificpersonisoftenadesiredtask.Ingeneralthesepeoplevisuallyappeararoundwhenhis/hernameismentionedinthenews.Onthisaccount,thecommonwaytoretrieveinformationrelatedtoapersonistosearchusinghis/hername.However,suchanapproachislikelytoyieldincorrectresultsduetothefollowingreasons:theremaybeotherpeopleinthesamestorythatalsoappearvisuallybesidesthequeryname,andthequerymaynotappearintheassociatedpictureevenifhis/hernameismentionedinthestory.Sotogetthetrueimagesofthequeryperson,afacedetectionshouldbeappliedtoextract
1
CHAPTER1.INTRODUCTION2
thedesiredfacesoftheperson.But,thismayalsocausesomenon-faceimagestoariseinthesearchresults,duetotheerrorsoffacedetectionalgorithmused.Figure1.1exemplifiessomeofthedetectedfacesinnewsphotographsthatareassociatedwiththequerynamePresidentGeorgeW.Bush.Evenifthereappearsthefacesofthequeryperson,therealsoappearsthefacesofotherpeopleandsomenon-faceimagesthatarealsoassociatedwiththesamename.
Figure1.1:Sampledetectedfacesinnewsphotographs[8],forwhichthenamePresidentGeorgeW.Bushappearsintheassociatedcaption.
Newsphotographsappearwithanassociatedcaptionontheweb(seeFig-ure1.2).Therecouldbemorethanonefaceonthephotographormorethanonenameinthecaption.Thence,itisindeterminatethatwhichnamegoeswithwhichface.Similarlyinnewsvideos,thereisaspeechtranscriptalignedwitheachshot(seeFigure1.3).Theface-nameassociationproblemisalsoencoun-teredfortheseimages.Moreover,innewsvideosthereisusuallyatimeshiftbetweentheappearanceofanameandtheappearanceofthefacebelongingtothatname.Therefore,usingasingleshottemporallyalignedwiththespeechmayyieldincorrectresults.Anotherproblem,whichismoreimportant,isthatthemostfrequentfaceusuallycorrespondstotheanchorpersonorreporterratherthanthefaceofthequeryname(seeFigure1.4).Onthesegrounds,inordertoretrievethecorrectimagesofaparticularperson,visualinformationmustbeincorporatedandthefacesofthepersonneedtoberecognized.
Ontheotherhand,facerecognitionisalongstandingandadifficultproblem(see[23,54,24,49,48]forrecentsurveys).Althoughmanydifferentapproacheshavebeenproposedforrecognizingfaces,mostofthecurrentfacerecognitionmethodsareevaluatedonlyincontrolledenvironmentsandforlimiteddatasets.However,forlargerandmorerealisticdatasetslikenewsphotographsand/orvideos,facerecognitionisstilldifficultanderror-proneduetothenoisyandcomplicatednatureofthesesets.Thesesetscontainlargevariationsinpose,
CHAPTER1.INTRODUCTION3
Figure1.2:Samplenewsphotographsandtheirassociatedcaptions.Therecanbemorethanonefaceonthephotographormorethanonenameinthecaption.illuminationandfacialexpression,whichcausethefacerecognitionproblemevenmoredifficult.
Recently,ithasbeenshownthattheperformanceofpersonqueriescanbeimprovedbyintegratingnameandfaceinformation[17,19,27,4,13,47].Whentextinformationisprovidedtogetherwiththevisualappearance,thefacerecog-nitionproblemcanbesimplifiedandtransformedintotheproblemoffindingassociationsbetweennamesandfaces[44,8,9].
Inthisstudy,weproposeamethodforimprovingtheperformanceofpersonqueriesinnewsdatasetsbyexploitingfrombothtextandvisualinformation.Asearchisstartedfirstbylookingforthenameofaquerypersoninthecaptionorinthespeechtranscripttext.Basedonthissearch,welimitoursearchspaceforthequerypersontotheimages/framesthatareassociatedwiththequeryname.Although,theremaybefacescorrespondingtootherpeopleinthestory,orsomenon-faceimagesduetothefacedetectionalgorithmused,thefacesofthequerypersonarelikelytobethemostfrequentlyappearingonesthananyotherpersoninitslimitedsearchspace.Eveniftheexpressionsorposesvary,differentappearancesofthefaceofthesamepersontendtobemoresimilartoeachotherthantothefacesofothers.Inotherwords,facesofthequeryperson
CHAPTER1.INTRODUCTION4
Figure1.3:Sampleshotsfromnewsvideosandthespeechtranscripttextsalignedwiththoseshots.
formsthelargestgroupofsimilarfacesinitslimitedsearchspace.
Inthiscontext,theproblemoffindingthefacesofthequerypersoninthelimitedsearchspaceisdualtothegraphproblemoffindingthelargestdensestcomponent.Tothisend,wefirstfindthesimilaritiesbetweenfacesinthesearchspaceandconstructasimilaritygraphoffaces,wherenodescorrespondtofacesandedgescorrespondtofacesimilarities.Then,wetransformourproblemtoagraphproblemoffindingthelargestdensestcomponentofthegraph.Thislargestdensestcomponentreferstothebiggestsetofhighlyconnectednodesinthegraph;thuslargestgroupofsimilarfacescorrespondingtothefacesofthequeryperson.Whenwefindthefacesofthequerypersonwiththeproposedapproach,thereturnedsolutionisalsousedasamodelforrecognizingnewfaces.Differentfromthemostofcurrentfacerecognitionsystems,wefindsimilaritybetweenthetwofacesbasedontheSIFTfeaturesextractedfromthosefaces.Theproposedmethodexploitsfromthescaleandilluminationinvariancechar-acteristicsofSIFTfeatures.Besides,italsoovercomestheproblemsinholisticappearanceorlocalfacialfeaturebasedfacerecognitionmodelsbybeingless
CHAPTER1.INTRODUCTION5
Figure1.4:Key-framesfromtwodifferentvideos.Thenumbersbeloweachimageshowthedistancetoshot,inwhichthename’Clinton’ismentioned.Notethatinbothcases,Clintondoesnotappearvisuallyintheshotinwhichhisnameismentionedbutappearsinpreceding(upimage)orsucceedingshots(bottomimage).
sensitivetovariationsinnoise,occlusion,andillumination;andbyworkingalsointheabsenceofanyofthefacialfeatures.
Theproposedmethodisnotasolutiontothegeneralfacerecognitionprob-lem.Rather,itisamethodtoincreasetheretrievalperformanceofthepersonqueriesinthelargedatasetswherenamesandfacesappeartogetherandwheretraditionalfacerecognitionsystemscannotbeused.Itdoesnotrequireatrainingstepforaspecificpersonandtherefore,thereisnolimitonthenumberofpeoplequeried.
Inthefollowingtwosection,webrieflydescribetheoverallapproach,andthenpresenttheorganizationofthesis.
1.2SummaryofContribution
Ourpersonfindingapproachisbasedonfirstlimitingthesearchspaceofaquerypersonbyusingtextinformationandthensolvingtheproblembytransforming
CHAPTER1.INTRODUCTION6
ittoagraphproblem.Afterfindingthefacesofthequeryperson,wecanusetheresultasamodelforrecognizingnewfaces.Theoverallapproachconsistsoffoursteps:constructingalimitedsearchspaceforaquerypersonbyusingtextandnameinformation,definingsimilaritiesbetweenfacesinthissearchspacetoformasimilaritygraphoffaces,findingthedensestcomponentofthisgraphwhichcorrespondstothefacesofthequeryperson,andfinallyusingtheresultasamodelfurtherinrecognizingnewfaces.ThemainstepsoftheproposedmethodaregiveninFigure1.6.
Inthefirststep,weusethetextinformationtolimitoursearchspaceforaqueryname.Itconsistsofsearchingforthequerynameinthecaptionorinthespeechtranscript,andchoosingtheimages/framesthatareassociatedwiththequeryname.Asstatedearlier,thereisusuallyatimeshiftbetweentheappearanceofanameandtheappearanceofthefacebelongingtothatnameinnewsvideos.Thisproblemishandledbytakingawindowaroundthename.Thesolutionis,ratherthansearchingthefacesonlyontheshotsincludingthenameoftheperson,alsotoincludetheprecedingandsucceedingshots.
Weassumethatinthislimitedsearchspace,facesofthequerypersonappearmorethanfacesofanyotherperson.Also,facesofthesamepersontendtobemoresimilartoeachotherthantothefacesofothers.Therefore,thefacesofthequerypersonformsthelargestgroupofsimilarfacesinthelimitedsearchspace.Hence,inthesecondstep,weassignasimilaritymeasurebetweeneachpairoffacesinthelimitedsearchspaceandrepresentthesesimilaritiesamongfacesinagraphstructure.Inthissimilaritygraph,nodescorrespondtothefacesandtheedgeweightscorrespondtotheassignedsimilarities.
Inthisstudy,thesimilaritybetweenfacesisrepresentedbyusinginterestpointsextractedfromtheface.WeuseLowe’sSIFTfeatures,whichhavebeenshowntobesuccessfulinrecognizingobjects[35,39,26]andfaces[28].DifferentfromtheoriginalmatchingmetricofSIFT,weassigntheinterestpointshav-ingtheminimumdistanceastheinitialmatchingpoints.Then,weapplytwoconstraintonthesematchedpoints:geometricalconstraintanduniquematchconstraint,toeliminatethefalsematchesandselectthebestmatchingpoints.
CHAPTER1.INTRODUCTION7
Finally,theaveragedistanceofmatchingpointsbetweentwofacesisusedtoassignthedistancebetweenthesetwofaces.UsingtheSIFTfeatures,webothexploitfromthescaleandilluminationinvariantpropertyofthesefeaturesandareabletoassignadistancemetricbetweentwofacesevenintheabsenceofanyoffacialfeatures.
Intheconstructedsimilaritygraph,thenodescorrespondingtothefacesofthequerypersonwillbemorestronglyconnectedtoeachotherthantoothernodescorrespondingtootherfacesinthegraph.Moreover,thequerypersonistheonewhosefaceusuallyappearsthemostfrequentlyinitssearchspace.Consideringallthese,theproblemistransformedintoagraphprobleminthethirdstepandsolvedbyagreedygraphalgorithmthatreturnsthelargestdensestcomponentofthegraph.Thislargestdensestcomponentreferstothesetofhighlyconnectednodesinthegraph;thusthesetofmostsimilarfacescorrespondingtothefacesofthequeryperson.
Thefinalstepinvolvesusingtheresultofthegraphalgorithmasamodeltorecognizenewfaces.Withtheproposedgraphalgorithm,wecanautomaticallyobtainthelabeledtrainingsetforlearningthemodelinrecognizingnewfaces.Weproposetwodifferenttechniquestousetheresultbasedon:degree,distance,andmatchnumbermodeling.Inallthosetechniques,themodelistrainedbyusingthereturnedgraphasthetrainingset.
Themethodhasbeenappliedontwodifferentnewsdatasets:newspho-tographsandnewsvideos.Thefirstdataset,namelythenewsphotographscollectedbyBergetal.[8],isquitedifferentfrommostoftheexistingdatasets(seeFigure1.5).Itconsistsoflargenumberofphotographswithassociatedcap-tionscollectedfromYahoo!NewsontheWeb.Photographsaretakeninreallifeconditionsratherthaninrestrictedandcontrolledenvironments.Therefore,theyrepresentalargevarietyofposes,illuminationsandexpressions.Theyaretakenbothindoorsandoutdoors.Thelargevarietyofenvironmentalconditions,occlusions,clothing,andagesmakethedatasetevenmoredifficulttoberecog-nized.
TheseconddatasetisbroadcastnewsvideosprovidedbyNISTforTRECVID
CHAPTER1.INTRODUCTION8
Figure1.5:Samplefacesfromnewsphotographs[8](onthetop),andnewsvideos(onthebottom).
2004videoretrievalevaluation[1].Duetothehighernoiselevelandlowerres-olution,newsvideosisaharderdatasettoworkwith.Inordertohandletheproblemduetoanchorpersonfaces,weaddamechanismtodetectandremovetheanchorpeople.Theanchorpersonistheonewhoappearsthemostfrequentlyineachnewsvideo.Hence,weapplythedensestcomponentalgorithmtoeachvideoseperatelyandautomaticallydetecttheanchorperson,sincethefacesoftheanchorpersonwillcorrespondtothebiggestdensestcomponentofthegraphformedbythefacesinthatvideo.
1.3OrganizationoftheThesis
Therestofthepaperisorganizedasthefollowing:
InChapter2,wesummarizerelatedpreviousworksonthesimilarproblems.inChapter3,wepresentadetailedexplanationoftheoverallgraphapproach:as-sociationofnamesandfacestolimitthesearchspaceforaqueryname,definitionofthesimilaritymeasuretoconstructthesimilaritygraph,findingthedensestcomponentofthegraph,andusingtheobtainedresultinrecognizingnewfaces.Themethodisalsoappliedonnewsvideosfordetectingtheanchorpeopleauto-matically.InChapter4,experimentalresultsaregivenalongwiththedescriptionofthedatasetsused.InChapter5,theproposedmethodiscomparedwithsomeotherpossiblemethods.Finally,wesummarizetheoverallworkinChapter6and
CHAPTER1.INTRODUCTION9
concludewithfutureresearch.
CHAPTER1.INTRODUCTION10
Figure1.6:Thefourstepsofouroverallapproach.Step1:Limitthesearchspaceofaquerypersontotheimagesthatareassociatedwiththequeryname.Step2:Constructasimilaritygraphoffacesinthissearchspace.Step3:Findthelargestdensestcomponentofthegraphcorrespondingtothefacesofthequeryperson.Step4:Usetheresultofthepreviousapproachasamodelinrecognizingnewfaces
Chapter2Background
Inthisthesis,weproposemethodforfindingandrecognizingfacesinnewsbyintegratingnamesandfaceswithagraphbasedapproach..Inthefollowingsections,wefirstdiscusssomeofthepreviousworkontheuseofnameandtextinformation.Then,wedefinetheproblemoffacerecognitionandimportanceofinterestpointsinliteratureonsolvingtherecognitionproblem.Laterinthelastsection,webrieflytalkontheuseofgraphtheoreticalmethodsincomputervision.
2.1OnIntegrationofNamesandFaces
Newsvideoandnewsphotographcollectionspossessdifferenttypesofresources,suchastext,speech,andvisuality.Recently,ithasbeenshownthateffectiveandefficientaccessingandutilizationofsuchmulti-modaldatacanbesimplifiedbyintegratingthedifferenttypesofresources.Inmanyofthepreviousworksinliterature,namesandfacesareassociatedforbetterqueryresults.
In[52],Yangetal.showedthatthecombinationoftextandfaceinformationallowsbetterretrievalperformancesinnewsvideos.Inthatwork,theyavoidthedifficultiesoffacerecognitionbyusingthetextinformationforselecting
11
CHAPTER2.BACKGROUND12
someshotsastheinitialresultsandapplyingfacerecognitiononthoseshots.Bythisway,theyaimtoreducethenumberoffacestoberecognized,henceimprovetheaccuracy.Thetimingbetweennamesandappearancesofpeopleismodeledbypropagatingthesimilarityscoresfromtheshotscontainingthequeryperson’snametotheneighboringshotsinawindow.TheEigenfacealgorithmisusedforfacerecognition.Inthepaper,ananchordetectorhasalsobeenbuiltbycombiningthreeinformationresources:colorhistogramfromimagedata,speakerIDfromaudiodata,faceinfofromfacedetection.TheyuseFisher’sLinearDiscrimant(FLD)toselectthedistinguishingfeaturesforeachsourceofdata.Then,theyunitetheselectedfeaturesintoanewfeaturevectortobeusedinclassification.Uponasimilarapproach,[16]usesthetextandimagefeaturestogethertoiterativelynarrowthesearchforbrowsingandretrievalofwebdocuments.
[14]alsounifiesthetextualandvisualstatisticsinasingle
indexingvectorforretrievalofwebdocuments.
Bergetal.[9]proposedamethodforassociatingthefacesinthenewspho-tographswithasetofnamesextractedfromthecaptions.Inthatpaper,theyfirstperformkernelprincipalcomponentanalysis(kPCA)toreducethedimen-sionalityoftheimagedataandlineardiscriminantanalysis(LDA)toprojectthedataintoitsdiscriminantcoordinates.EachimageisthenrepresentedwithbothavectorgainedafterthekPCAandLDAprocesses,andasetofassociatednamesextractedfromthecaption.Amodifiedversionofk-meansclusteringisusedtoassignalabelforeachimage.Clustersthatarefarawayfromthemeanareremovedfromthedata,anddiscriminantcoordinatesarere-estimated.Finally,clusterswhichshowhighfacialsimilarityaremerged.Theresultsgiveninthiswork,arethenimprovedin[8]byanalyzinglanguagemorecarefully.Inthelatterwork,theyalsolearnanaturallanguageclassifierthatcanbeusedtodeterminewhoispicturedfromtextalone.
[19]integratesnamesandfacesusingspeechtranscripts,andimprovestheretrievalperformanceofpersonqueriesonTRECVID2004.Itfirstsearchesoverthespeechtranscripttextandselectsthekey-framesthatarealignedwiththequeryname.Then,itappliesSchneiderman-Kanade’sfacedetectionalgorithmoneachkey-frame.However,manyfalsepositivesareproducedwithsuchan
CHAPTER2.BACKGROUND13
approach.Hence,skincolorinformationisusedtoeliminatethefalsepositives.TheprobabilityofapixelbeingaskinpixelismodeledusingaGaussianproba-bilitydistributiononHSVcolorspace.Then,threefeatures(colorfeature,PCA,ICA)areextractedfromthefacestobeuseingroupingsimilarfaces.Whilegrouping,itusesG-meansclusteringandstartsfromsmallnumberofclusters,K,andincreasesKiterativelyifsomeclustersfailtheGaussianitytest(Kolmogorov-Simirovtest)[25].Aftergrouping,eachclusterisrepresentedbyonerepresen-tativeface:theonethatistheclosesttothemeanofitscluster.Hence,theproposedmethodincreasesthespeedofthesystembyreducingthenumberofimagesprovidedtotheuser.Anchorfilteringisalsoexperimentedbyselectingtheanchorrepresentativesandremovingthemfromtherestofthecluster.In[2]amethodforautomaticallylabelingofthecharactersinTVorfilmbyusingbothvisualandtextualinformation.First,italignsthesubtitlesandthescriptfortaggingsubtitleswithspeakeridentity.Itdetectsfacesandthentrackstocomposefacetracks.Whileobtainingthefacetracks,thenumberofpointtrackswhichpassthroughfacesiscounted,andifthisnumberislargerelativetothenumberofpointtrackswhicharenotincommontobothfaces,amatchisdeclared.Torepresenttheappearanceofaface,ninefacialfeaturesarelocatedandtwolocalfeaturedescriptorsareused:siftandsimplepixel-wiseddescriptorformedbytakingthevectorofpixelsintheellipticalregionandnormalizingtoobtainlocalphotometricinvariance.Furthertousetheadditionalcues,aboundingboxispredictedforeachface,whichisexpectedtocontaintheclothingofthecorrespondingcharacter.Speakeraredetectedbyasimplelipmotiondetectionalgorithm.Then,eachunlabeledfacetrackisrepresentedbyasetoffaceandclothingdescriptors.Finally,aprobabilisticmodelisusedtoclassifythesetracksbasedontheweightedprobabilitiesofthefaceandclothappearancefeatures.
CHAPTER2.BACKGROUND14
2.2OnFaceRecognition
Facerecognitionisalong-standingandadifficultproblem,whichhasreceivedgreatattentionduetoitsdemandondifferentfieldslikecommercialandlawenforcementapplications.Theproblemhasarousedinterestinvarioussciencedomainssuchaspatternrecognition,computervision,imageanalysis,psychology,andneurosciences.Thereisadirectrelationamongthefindingsinanyofthosescience.Asstatedin[54],thereexistsstillmanyambiguousquestionsinvolvedintheprocessofhumanperceptionoffacesthatpsychologistsandneuroscientistsworkon,suchas:
•Isfacerecognitionauniqueanddifferentfromobjectrecognition?•Dowerecognizefacesasawholeorbyindividualfeatures?•Whichfeaturesaremoreimportantforfacerecognition?
•Ishumanfaceperceptioninvarianttochangesinview-point,lightingand/orexpressions?
Inthecontextofcomputervision,abroadstatementofthefacerecognitionproblemisdeclaredasthethis:Givenastilloravideoimage,identifyorverifyoneormorepersonsintheimagebyusingastoreddatabaseoffaces([54,49]).Therearethreemainstepsinvolvedintheconfigurationofagenericfacerecog-nitionsystem(seeFigure2.1):detectionofthefaceregionintheimage,featureextractionfromthefaceregion,recognition.
In[54]andin[48],thefacerecognitionmethodshavebeenputintooneofthethreecategories:holisticmethods,localmethodsorhybridmethods.Wegiveabriefsummaryofthethosethreeapproachesinthenextthreesubsections.
CHAPTER2.BACKGROUND15
Figure2.1:Themainstepsinthefacerecognitionprocess(takenfrom[54]).
2.2.1HolisticMethods
Holisticmethodsusetewholefaceastherawinputtothesystem.Inthosemethods,eachfaceimageisrepresentedwithavectorcomposedbyconcatenat-ingthegray-levelpixelvaluesoftheimage.Amongtheholisticmethods,themostdeclaredtechniquesusedarePrincipalComponentAnalysis(PCA),FisherDiscriminantAnalysis(FDA),LinearDiscriminantAnalysis(LDA),andInde-pendentComponentAnalysis(ICA).
ThemainideainPCAistoprojectthetrainingdataintoasub-dimensionalfeaturespace,inwhichbasisvectorscorrespondtothemaximumvariancedi-rectionintheoriginalimagespace.EachPCAbasisvectorwasreferredasan”eigenface”in[36,37]thatcanbedisplayedasasortofghostlyface.Intheclas-sificationstage,theinputfaceimageisfirstprojectedintothesubspacespannedbytheeigenfaces;whereeachfaceisrepresentedbyalinearcombinationoftheeigenfaces.Then,thenewfaceisclassifiedaccordingitspositioninthefacespacetothepositionsoftheknownindividuals.Laterinliterature,extensionsofthePrincipalComponentAnalysishavebeenproposedasin[53]byusingtwo-dimensionalPCAandin[50]byselectingdiscriminanteigenfacesforface
CHAPTER2.BACKGROUND16
recognition.
In[5],Bartlettetal.claimedthatICArepresentationsoffacesaresuperiortoPCA;henceICAcanperformbetterperformancesacrosssessionsandchangesinexpression.InICA,dataisprojectedontosomebasisthatarestatisticallyindependent.Italsoattemptstominimizessecond-orderandhigher-orderde-pendenciesintheinputdata.
Holisticmethodscanbeadvantageousinthecontextofcoveringtheglobalappearanceofthefaces.Oneotheredgeoftheholisticmethodsisthattheymaintainthediffusetextureandshapeinformation;hencecandifferentiatethefaces.However,representingeachfaceimagebyafeaturevectormakestheholis-ticmethodssensitivetovariationsinappearancecausedbyocclusion,changesinilluminationand/orexpressions.Toovercomethisproblem,localfeaturerepre-sentationshavebeendevelopedinliteraturethatarelesssensitivetochangesinappearance.
2.2.2LocalMethods
Theimportanceofhair,eyesandmouthinfacehumanperceptionhasbeenhighlightedinpsychologyandneurosciences([29,11]).Ithasalmostbeenclaimedin[11]thatnoseplaysanimportantroleinfacerecognition.Onthesegrounds,localmethodshavebeenpresentedthatrepresenteachfaceimagebyasetoflowdimensionalfeaturevector,whichusuallycorrespondtothefacialfeatureslikeeyes,mouth,andnose.
Mainly,therearetwoapproachesusedinlocalmethods:localfeature-basedmethodsandlocalappearance-basedmethods.Thegeometricrelationsofthefeaturesareconsideredinlocalfeature-basedmethods.Insomeprimitivestudiesofthistypes,onlythegeometricmeasures–suchasdistancebetweentheeyesorthesizeoftheeyes–havebeenconsidered[32,31].Then,in[38]Manjunathetal.proposedamethodthatpreservesboththelocalinformationandtheglobaltopologyoftheface.Themethodstoresboththelocalinformationandthe
CHAPTER2.BACKGROUND17
featureinformationofthedetectedfacialfeatures;andconstructsatopologicalgraphusingthesefeatures.Then,theclassificationstage,theproblemissolvedbyagraphmatchingscheme.
Evenifthemethodin[38]isadvantageoussinceitconsidersboththesimilarityofthelocalfeaturesandtheglobaltopology,itissensitivevariationsthatcausesanychangeinthistopology.Consequently,Ladesetal.[12]proposedElasticBunchGraphMatchingScheme(EBGM)thatisbasedonadeformabletopologygraph.Eventhoughthemethodresolvestheproblemcausedbythechangesinappearance,itcannotremovetheproblemcausedbytheocclusionofanyoffacialfeatures.
Motivatedbythefoundingsinpsychologythatasetofsimplelinescanchar-acterizethestructureofanobject,henceissufficienttorecognizeitsshape;afaceisrepresentedbytheFace-ARGin[42].Face-ARGconsistsofasetofnodesthatcorrespondstolinefeatures,andbinaryrelationsbetweenthem.UsingaFace-ARG,allthegeometricquantitiesandstructuralinformationofthefacecanbeencodedinanAttributedRelationalGraph(ARG)structure.Then,intheclassificationstage,partialARGmatchingisappliedontheconstructedFace-ARG’softhetestandreferencefaces.Theadvantageofthemethodoverrecentmethodsisthatitcanstillworkinpresenceofocclusionandexpressionchanges.Mostofthelocalfeature-basedmethodsaresensitivetoaccuratefeaturepointlocalization;whichisstillanunresolvedproblem.Hence,inlocalappearance-basedmethods,facialfeatureregionsaredetectedfirstandfeatures(suchasGaborwavelet[38]andHaarwavelet[34])areextractedfromtheseregionsintheimage.Differentclassifiersareappliedonthesefeaturesfinallyintheclassificationstep.
2.2.3HybridMethods
Itisstillaquestionifwerecognizefacesasawholeorbyindividualfeatures.Holisticmethodscovertheglobalfaceappearanceandcanmaintainthediffuse
CHAPTER2.BACKGROUND18
textureandshapeinformation.However,theyareverysensitivetothechangesinappearancethatmaybecausedbyocclusion,changesinilluminationand/orexpressions.Localmethodsislesssensitivetothosechanges,howeveraccuratelocalizationoffacialfeaturepointsstillremainsthatcanaffecttheperformanceofrecognition.Especiallyforimagesofsmallsize,facialfeaturedetectionisevenmoreproblematic;hencethelocalmethodscannotbeappliedinthatcase.Howtousethelocalfeaturestowithoutdisregardingtheglobalstructureisanotheropenproblem.Toovercomethoseproblemsandbenefitfromdifferentadvantagesofbothapproaches,hybridmethodscanbeused.Hybridmethodsaimtousebothholisticandlocalapproaches.However,howtousewhichfeaturesandwhichclassifiersinhybridmethodsisstillindeterminateandnotmuchinvestigatedinliteratureyet.
2.3OntheUseofInterestPoints
Recently,ithasbeenshownthatlocalimagefeaturesprovideagoodrepresenta-tionoftheimageforrecognitionandretrieval[6,41],whileglobalfeaturestendtobesensitivetoimagevariationsinviewpoint,poseandillumination.Amongthelocalfeatures,ScaleInvariantFeatureTransform(SIFT)techniquehasbeenshowntobesuccessfulinrecognizingobjects[35,39,26]andfaces[28,10].Thetechnique,whichhasfirstbeenpresentedin[35],providesdistinctiveinvariantfeaturesthatcanbeusedinreliablematchingbetweendifferentimagesofanob-jectorscene.Themostpowerfulaspectofthesefeaturesisthattheyareinvarianttoimagescaleandrotation,andpartiallyinvarianttochangeinilluminationand3Dcameraviewpoint.
In[10],applicationoftheSIFTapproachinthescopeoffaceauthenticationhasbeeninvested.Theredifferentmatchingschemesareproposedinthepaper:1.Minimumpairdistance,2.Matchingeyesandmouth,3.Matchingonaregulargrid.Thefirstschemeproposestocomputethedistancesbetweenallpairsofkeypointsintwoimagesandassigntheminimumdistanceasthematchingscore.Usingthethatfaceandmouthregionsprovidemostoftheinformationforface
CHAPTER2.BACKGROUND19
recognition,onlythefeaturesintheseregionsareusedinthesecondscheme.Finally,featurelocationsareconsideredinthethirdschemebydividingthefaceimageintogridsandmatchingthefeaturesofcorrespondinggrids.
Sivicetal.hasproposedapersonretrievalsystemin[28]thatrepresentseachfaceimageasacollectionofoverlappinglocalSIFTdescriptorsplacedatthefivefacialfeaturelocations(eyes,mouth,nose,andmiddlepointbetweentheeyes).Theyfirstusetrackingtoassociatefacesintoface-trackswithinashottoobtainmultipleexemplarsofthesameperson.Then,theyrepresenteachface-trackwithahistogramofprecomputedface-featureexemplars.Thishistogramisusedformatchingtheface-tracks;henceretrievingsetsoffacesacrossshots.
2.4OntheUseofGraphTheoreticalMethodsinComputerVision
Graphtheoreticalapproacheshaverecentlybeenusedincomputervisionprob-lemsduetotheirrepresentationalpowerandflexibility.Theyallowvisionprob-lemstobecastinastrongtheoreticalareaandaccesstothefulldepotofgraphalgorithmsdevelopedincomputerscienceandoperationsresearch.Themostcommongraphtheoreticalproblemsusedincomputervisionincludemaximumflow,minimumspanningtree,maximumclique,shortestpath,maximalcommonsubtree/subgraph,graphpartitioning,graphindexing,graphmatching,etc.[18].Graphpartitioningalgorithmsthathavebeenusedin[22,30,45],targetthetwotypicalapplicationsofcomputervision:imagesegmentationandperceptualgrouping.Theyaddresstheproblemofmakingcutsinaweightedgraphaccordingtoanappropriateminimumweightcriterion.Intheseworks,dataelements(i.e.imagepixelpoints)correspondtotheverticesinthegraph,andsimilaritybetweenanytwoverticescorrespondtotheedgeweightbetweenthosevertices.In[7],theproblemofcontentbasedimageretrievalhasbeensolvedbyagraphmatchingscheme.Themainideausedwastorepresentanimagequeryasanattributedrelationgraph,andselectasmallnumberofmodelimagegraphsthataresimilar
CHAPTER2.BACKGROUND20
tothequeryimagegraph.
[3]hasproposedagraphtheoreticclusteringmethodforimagegroupingandretrieval.Themotivationoftheworkwasthatanefficientretrievalalgorithmshouldbeabletoretrieveimagesthatarenotonlyclose(similar)tothequeryimagebutalsoclosetoeachother.However,mostoftheexistingfeatureextrac-tionalgorithmscannotalwaysmapvisuallysimilarimagestonearbylocationsinthefeaturespace.Henceintheretrievalstep,itisoftentoretrieveirrelevantimages(ornottoretrieverelevantimages)simplybecausetheyareclosetothequeryimage(orabitfarawayfromthequeryimage).Inthiscontext,theyretrievebestNmatchesforaqueryimage,andbestNmatchesofeachoftheretrievedimages.Agraphisconstructedwithallthoseretrievedimages,inwhichnodescorrespondstotheimagesandedgeweightscorrespondtothesimilarities.Then,theretrievalproblemistransformedintoandsolvedbytheproblemoffindingthesetofnodesinthegraph,thatarenotasdenseasmajorcliquesbutarecompactenoughwithinuserspecifiedthresholds.
Chapter3
GraphBasedPersonFindingApproach
3.1
Overview
Itislikelythatinthenews,apersonwillappearwhenhis/hernameismentioned.Followingupthiscue,westartsearchforapersonbyfirstlookingforthenameofthequerypersonandlimitoursearchspacetotheimagesthatareassociatedwiththatname.Althoughtheremightbethefacesofotherpeopleornon-faceimagesinthislimitedsearchspace,mostlyquerypersonwillbetheonethatappearsmorethananyotherindividual.Visually,facesofaparticularpersontendtobemoresimilartoeachotherthantofacesofotherpeople.Basedontheseassumptions,wetransformtheproblemtoagraphproblem,inwhichthenodescorrespondtofacesandtheedgescorrespondtothesimilaritiesbetweenfaces,andweseektofindthelargestdensestcomponentinthisgraph.Hence,ifwecandefineasimilaritymeasureamongthefacesinthelimitedsearchspaceandrepresentthesimilaritiesinagraphstructure,thentheproblemoffindingthemostsimilarfacescorrespondingtotheinstancesofqueryname’sfacecanbetackledbyfindingthedensestcomponentinthegraph.
Inthefollowingthreesections,wefirstexplainthestepsofthegraphbased
21
CHAPTER3.GRAPHBASEDPERSONFINDINGAPPROACH22
personfindingapproach:integratingnamesandfacestolimitthesearchspace,definingasimilaritymeasurebetweenfacestoconstructthesimilaritygraph,andthegreedygraphalgorithmtofindthelargestdensestcomponentofthegraph.Followingthosesections,wewillexplaintheuseofpersonfindingapproachforautomaticanchorpersondetectioninnewsvideos.Then,inthelastsectionwedefinetwomethodstousetheoutputofthethreesteppersonfindingschemelaterinrecognizingnewfaces.
3.2IntegratingNamesandFaces
Thefirststepofouralgorithminvolvesintegratingnameandfaceinformation.Inthisstep,weusethenameinformationtolimitoursearchspacetotheimagesaroundwhichthenameofthequerypersonappears.Tothisend,welookforthenameofthequerypersoninthecaptionorinthespeechtranscript;andchoosetheimagesthatareassociatedwiththequeryname.
Ontheweb,thenewsphotographsappearwiththecaptions.However,therecanbemorethanonefaceinaphotographandmorethanonenameinthecor-respondingcaption.Therefore,itisnotknownwhichfacegoeswithwhichname.Usingtheassumptionthatapersonislikelytoappearinaphotographwhenhis/hernameismentionedinthecaption,wereducethefacesetforaqueriedpersonbyonlychoosingthephotographsthatincludethenameofthatpersonintheassociatedcaption.However,aperson’snamecanappearindifferentforms.Forexample,thenamesGeorgeW.Bush,PresidentBush,U.S.President,andPresidentGeorgeBush,allcorrespondtothesameperson.Wemergedthesetofdifferentnamesusedforthesamepersontofindallfacesassociatedwithalldifferentnamesofthesameperson.Adetailedlistofdifferentformsofnamescorrespondingtoeachquerypersonusedinourexperimentsisgivenintheap-pendix.
Forthenewsvideos,theprobabilityofapersonappearingonthescreenishighwhenhis/hernameismentionedinthespeechtranscripttext.Thus,looking
CHAPTER3.GRAPHBASEDPERSONFINDINGAPPROACH23
fortheshotsinwhichthenameofthequerynameismentionedisagoodplacetostartsearchoverpeople.However,itisproblematicsincetherecanbeatimeshiftbetweentheappearanceofthepersonvisuallyandtheappearanceofhis/hername.
Recently,ithasbeenshowedthatthefrequencyofaperson’svisualappear-ancewithrespecttotheoccurrenceofhis/hernamecanbeassumedtohaveaGaussiandistribution[52].Weusethesameideaandsearchfortherangewherethefaceislikelytoappearrelativetothename.Asweexperimentedon“Clinton”query,weseethattakingthetenprecedingandthetensucceedingshotstogetherwiththeshotwherethenameismentionedisagoodapproximationtofindmostoftherelevantfaces(seeFigure4.9).However,thenumberoffacesinthisrange(whichwerefertoas[-10,10])canstillbelargecomparedtotheinstancesofthequeryname.Aswewillexplainintheexperiments,itisseenthattakingonlyoneprecedingandtwofollowingshots(whichwerefertoas[-1,2])isalsoagoodchoice.
Anotherprobleminnewsvideosisthatusuallythefacesoftheanchorpersonappeararoundaname.Forsolutiontothisproblem,weuseananchorpersondetectionmethodbasedonourgraphbasedapproachaswewillbeexplainedlater.
Integratingnamesandfacesproducesbetterretrievalperformancescomparedtosolelytext-basedmethods.However,theresultingsetmaystillcontainmanyfalsefacesduetothefollowingreasons:thequerypersonmaynotappearvisuallyevenifhis/hernameismentioned,theremaybeotherpeopleinthesamestorythatalsoappearvisuallytogetherwiththequeryperson,theremaybenon-faceimagesreturnedbythefacedetectionalgorithmused.However,thefacesofthequerypersonarelikelytobethemostfrequentlyappearingonesthananyotherpersoninthesamespace.Eveniftheexpressionsorposesvary,differentappearancesofthefaceofthesamepersontendtobemoresimilartoeachotherthantothefacesofothers.Inthefollowingsections,weexplainourstrategytoincreaseretrievalperformancebyfindingthecorrectfacesbyusingvisualsimilarities.
CHAPTER3.GRAPHBASEDPERSONFINDINGAPPROACH24
3.3ConstructingSimilarityGraphofFaces
WerepresentthefaceswiththeinterestpointsextractedfromtheimagesusingtheSIFToperator[35].Lowe’sSIFToperator[35],haverecentlybeenshowntobesuccessfulinrecognizingobjects[39,26]andfaces,[28].TheSIFTtech-niqueconsistsoffourmainsteps:1.Scale-spaceextremadetection,2.Keypointlocalization,3.OrientationAssignment,4.Keypointdescriptor.
Inthefirststep,potentialinterestpointsareextractedbylookingforallscalesandimagelocations.Ascalespaceoftheimageisconstructedfirstbyconvolvingtheimagewithvariable-scaleGaussianfunction.LettheinputimagebeI(x,y)then,theGaussian-blurredimageL(x,y,σ)canberepresentedby:
L(x,y,σ)=G(x,y,σ)∗I(x,y),
where,
y21−x2+2G(x,y,σ)=e2σ.22πσ
Difference-ofGaussianfunction(DoG)isusedtodetectstablekeypointloca-tionsinthisscalespace.DoGoftwonearbyscalesseperatedbyconstantkisgivebby:
DoG(x,y,σ)=(G(x,y,kσ)−G(x,y,σ))∗I(x,y)
=L(x,y,kσ)−L(x,y,σ).
Localmaximaand/orminimaofDOFgivesthecandidatekeypointlocation,andiscomputedbycomparingeachsamplepointwithboththeeightneighborpointsonthesamescaleandthenineneighborpointsontwoimagesintheneighborscale.
Inthesecondstep,eachcandidatekeypointlocationisfittoanearbydatatodetermineitslocation,scaleandratioofprincipalcurvatures.First,athreshold
CHAPTER3.GRAPHBASEDPERSONFINDINGAPPROACH25
onminimumcontrastisappliedtoremovetheunstableextremawithlowcon-trast.Then,asecondthresholdisappliedontheratioofprinciplecurvaturestoeliminatethestrongresponsesofdifference-of-Gaussianfunctionalongedgeswhicharepoorlydetermined.
Anorientationisassignedtothekeypointlocationsinthethirdstep,basedonlocalimagegradientdirections.Anorientationhistogramisformedfromgra-dientorientationsofthesamplepointsaroundakeypointbyprecomputingtheirpixeldifferencesinascaleinvariantmanner.Peaksinthehistogramdenominatedominantdirectionsoflocalgradients;thusthe80percentofthehighestpeakinthehistogramisusedtoassigntheorientationofakeypoint.
Inthelaststep,adescriptorforeachkeypointiscomputedfromimagegra-dients.GradientsofpointswithinanarrayaroundthekeypointareweightedbyaGaussianwindowanditscontentissummarizedintoadescriptorarray,usingorientationhistograms.Thesefeaturesarethennormalizedtounitlengthandathresholdisappliedtothisunitvectorvaluestoreducetheeffectsofilluminationchange.
Wefirstuseaminimumdistancemetrictofindallmatchingpointsandthenremovethefalsematchesbyaddingsomeconstraints.Foreachpairoffaces,theinterestpointsonthefirstfacearecomparedwiththeinterestpointsonthesecondfaceandthepointshavingtheleastEuclideandistanceareassumedtobethecorrectmatches.However,amongthesetherecanbemanyfalsematchesaswell(seeFigure3.1).
Inordertoeliminatethefalsematches,weapplytwoconstraints:thege-ometricalconstraintandtheuniquematchconstraints.Geometricalconstraintexpectsthematchingpointstoappeararoundsimilarpositionsonthefacewhenthenormalizedpositionsareconsidered.Thematcheswhoseinterestpointsdonotfallinclosepositionsonthefaceareeliminated.Uniquematchconstraintensuresthateachpointmatchestoonlyasinglepointbyeliminatingmultiplematchestoonepointandalsobyremovingone-waymatches.Inthenexttwosubsections,wegivethedetailsofhowthoseconstraintsareapplied.
CHAPTER3.GRAPHBASEDPERSONFINDINGAPPROACH26
3.3.1GeometricalConstraint
Weexpectthatmatchingpointswillbefoundaroundsimilarpositionsontheface.Forexample,thelefteyeusuallyresidesaroundthemiddle-leftofaface,evenindifferentposes.Thisassumptionpresumesthatthematchingpairofpointswillbeincloseproximitywhenthenormalizedcoordinates(therelativepositionofthepointsonthefaces)areconsidered.
Toeliminatefalsematcheswhicharedistantfromeachother,weapplyageometricalconstraint.Forthispurpose,werandomlyselectedasetofimagesof10people(5facesforeachperson).Then,wemanuallyassignedtrueandfalsematchesforeachcomparisonandusedthemastrainingsamplestoberunonaquadraticBayesnormalclassifier([43,51])toclassifyamatchedpointastrueorfalseaccordingtoitsgeometricaldistance.Thegeometricaldistance
√th
correspondingtotheiassignmentreferstoX2+Y2where
locX(i)locX(match(i))
−,
sizeX(image1)sizeX(image2)locY(i)locY(match(i))
−,
sizeY(image1)sizeY(image2)
X=
Y=
andlocXandlocYholdXandYcoordinatesofthefeaturepointsintheimages,sizeXandsizeYholdXandYsizesoftheimagesandmatch(i)corre-spondstothematchedkeypointinthesecondimageoftheithfeaturepointinthefirstimage.
InFigure3.1,matchesbeforeandaftertheapplicationofthisgeometricalconstraintareshownforanexamplefacepair.Mostofthefalsematchesareeliminatedwhenthepointsthatarefarawayfromeachotherareremoved.Therelativeanglebetweenthepointscouldalsobeusedasageometriccon-straint.However,sincethecloserpointscouldhaveverylargeangledifferences,itisnotreliable.
CHAPTER3.GRAPHBASEDPERSONFINDINGAPPROACH27
Figure3.1:Thefirstimageontheleftshowsallthefeaturepointsandtheirmatchesbasedontheminimumdistance.Thesecondimageontherightshowsthematchesthatareassignedastrueaftertheapplicationofgeometricalconstraints.
3.3.2UniqueMatchConstraints
Aftereliminatingthepointsthatdonotsatisfythegeometricalconstraints,therecanstillbesomefalsematches.Usually,thefalsematchesareduetomultipleassignmentswhichexistwhenmorethanonepoint(e.g,A1andA2)areassignedtoasinglepoint(e.g,B1)intheotherimage,ortoonewayassignmentswhichexistwhenapointA1isassignedtoapointB1ontheotherimagewhilethepointB1isassignedtoanotherpointA2ornotassignedtoanypoint(Figure3.2).Thesefalsematchescanbeeliminatedwiththeapplicationofanotherconstraint,namelytheuniquematchconstraint,whichguaranteesthateachassignmentfromanimageAtoanotherimageBwillhaveacorrespondingassignmentfromimageBtoimageA.
Thefalsematchesduetomultipleassignmentsareeliminatedbychoosingthematchwiththeminimumdistance.Thefalsematchesduetoonewayassignmentsareeliminatedbyremovingthelinkswhichdonothaveanycorrespondingas-signmentfromtheotherside.AnexampleshowingthematchesbeforeandafterapplyingtheuniquematchconstraintsaregiveninFigure3.3andinFigure3.4.
CHAPTER3.GRAPHBASEDPERSONFINDINGAPPROACH28
Figure3.2:ForapairoffacesAandB,letA1andA2betwopointsonA;andB1isapointonBwiththearrowsshowingthematchesandtheirdirection.OntheleftisamultipleassignmentwherebothpointsA1andA2onAmatchB1onB.Insuchacase,thematchbetweenA2andB2iseliminated.OntherightisaonewaymatchwhereB1isamatchforA1,whereasB1matchesanotherpointA2onA.ThematchofA1toB1iseliminated.ThematchofB1toA2remainsthesameifB1isalsoamatchforA2;otherwiseitiseliminated.
Figure3.3:Anexampleforuniquematchconstraint.Matchesfromthelefttotherightimageareshownbyred,dashedlines,whereasmatchesfromrighttoleftareshownbyyellowlines.Theleftimageshowsthematchesassignedafterapplyinggeometricalconstraints,butbeforeapplyingtheuniquematchconstraints.Therightimageshowstheremainingmatchesafterapplyingtheuniquematchconstraints.
3.3.3SimilarityGraphConstruction
Afterapplyingtheconstraintsandassumingthattheremainingmatchesaretruematches,wedefinethedistancebetweenthetwofacesAandBastheaveragevalueofallmatches.
N
dist(A,B)=
i=1
D(i)
,N
whereNisthenumberoftruematchesandD(i)istheEuclideandistance
CHAPTER3.GRAPHBASEDPERSONFINDINGAPPROACH29
Figure3.4:Examplesformatchingpoints.Notethat,evenforfaceswithdifferentsize,poseorexpressionsthemethodsuccessfullyfindsthecorrespondingpoints.betweentheSIFTdescriptorsofthetwopointsfortheithmatch.
Asimilaritygraphforallfacesinthesearchspaceisthenconstructedusingthesedistances.WecanrepresentthegraphasamatrixasinFigure3.5.Thematrixissymmetricandthevaluesonthediagonalareallzero.Foramoreclearvisualrepresentation,thedistancesforthefacescorrespondingtothepersonweareseekingareshowntogether.Clearly,thesefacesaremoresimilartoeachotherthantotheothers.Ourgoalistofindthissubsetwhichwillcorrespondtothedensestcomponentinthegraphstructure.
3.4GreedyGraphAlgorithmforFindingtheDensestComponent
Intheconstructedsimilaritygraph,facesrepresentthenodesandthedistancesbetweenthefacesrepresenttheedgeweights.Weassumethat,inthisgraphthenodesofaparticularpersonwillbeclosetoeachother(highlyconnected)anddistantfromtheothernodes(weaklyconnected).Hence,theproblemcanbetransformedintofindingthedensestsubgraph(component)intheentiregraph.TofindthedensestcomponentweadaptthemethodproposedbyCharikar[15]wherethedensityofsubsetSofagraphGisdefinedas
CHAPTER3.GRAPHBASEDPERSONFINDINGAPPROACH30
Figure3.5:Dissimilaritymatrixfor200imagesinthesearchspaceforthenameHansBlix.Inthissearchspace,97oftheimagesaretrueHansBliximages,andtheremaining103arenot.Forvisualization,the97trueBliximagesareputonthetopleftofthematrix.Darkcolorscorrespondtolargersimilarityvalues.
f(S)=
|E(S)|
,|S|
whereE(S)={i,j∈E:i∈S,j∈S}andEisthesetofalledgesinG.Inotherwords,E(S)isthesetofedgesinducedbysubsetS.ThesubsetSthatmaximizesf(S)isdefinedasthedensestcomponent.
OurgoalistofindthesubgraphSwiththelargestaveragedegreethatisthesubgraphwiththemaximumdensity.Initially,thealgorithmpresentedin[15]startswiththeentiregraphGandsetsS=G.Then,ineachstep,thevertexwiththeminimumdegreeisremovedfromS.Thealgorithmalsocomputesthevalueoff(S)foreachstepandcontinuesuntilthesetSisempty.Finally,thesetS,thathasmaximumf(S)value,isreturnedasthedensestcomponentofthegraph.
CHAPTER3.GRAPHBASEDPERSONFINDINGAPPROACH31
Inordertoapplytheabovealgorithmtotheconstructedsimilaritygraph,weneedtoconvertitintoabinaryform,sincethealgorithmdescribedaboveonlyworkswellforbinarygraphs.Thus,beforeapplyingit,weconvertouroriginaldissimilarityvaluesintoabinaryform,inwhich0indicatesnoedgeand1indicatesanedgebetweentwonodes.Thisconversioniscarriedoutbyapplyingathresholdonthedistancebetweenthenodes.Thisthresholdalsoconnoteswhatwedefineasnear-byand/orremote.AnexampleofsuchaconversionisgiveninFigure3.6.Intheexample,assumethat0.65isdefinedasourproximitythreshold.Inotherwords,ifthedistancebetweentwonodesislessthanorequalto0.65thenthesetwonodesarenear-by;thereforeweputanedgebetweenthem.Otherwise,noedgeismaintainedbetweenthesenodes,sincetheyarefarawayfromeachother.
3.5AnchorpersonDetectionandRemovalforNewsVideos
Whenwelookattheshotswherethequerynameismentionedinthespeechtranscript,itislikelythattheanchorperson/reportermightbeintroducingorwrappingupastory,withtheprecedingorsucceedingshotsbeingrelevant,butnotthecurrentone.Therefore,whentheshotsincludingthequerynameareselected,thefacesoftheanchorpersonwillappearfrequentlymakingouras-sumptionthatthemostfrequentfacewillcorrespondtothequerynamewrong.Hence,itishighlyprobablethattheanchorpersonwillbereturnedasthedensestcomponentbythepersonfindingalgorithm(seeFigure3.7).Thesolutionistodetectandremovetheanchorpersonbeforeapplyingthealgorithm
In[17],asupervisedmethodforanchorpersondetectionisproposed.Theyintegratecolorandfaceinformationtogetherwithspeaker-idextractedfromtheaudio.However,thismethodhassomedisadvantages.Firstofall,ithighlydependsonthespeaker-id,andrequirestheanalysisofaudiodata.Thecolorinformationisusefultocapturethecharacteristicsofstudiosettingswheretheanchorpersonislikelytoappear.But,whentheanchorpersonreportsfroman-otherenvironmentthisassumptionfails.Finally,themethoddependsonthe
CHAPTER3.GRAPHBASEDPERSONFINDINGAPPROACH32
factthatthefacesofanchorpeopleappearinlargesizesandaroundsomespecificpositions,butagaintheremaybecaseswherethisisnotthecase.
Inthisstudy,weusethegraphbasedmethodtofindtheanchorpeopleinanunsupervisedway.Theideaisbasedonthefactthat,theanchorpeopleareusuallythemostfrequentlyappearingpeopleinbroadcastnewsvideos.Fordifferentdaystheremaybedifferentanchorpeoplereporting,butgenerallythereisasingleanchorpersonforeachday.Hence,weapplythedensestcomponentbasedmethodtoeachnewsvideoseparately,tofindthepeopleappearingmostfrequently,whichcorrespondtotheanchorpeople.
3.6DynamicFaceRecognition
Theoverallschemeexplainedintheprevioussectionsreturnsasetofimagesclas-sifiedasthequeryperson(densestcomponent)andtherestasothers(outliers).Also,thegraphalgorithmworksonthewholesetofimagesinthesearchspaceofthequeryperson.Thus,whenanewfaceisencountered,thealgorithmneedstobere-runonthewholesettolearnthelabelofthenewface–querypersonoroutlier.However,sincetheschemereturnstheclassifiedimages,thisresultcanbeusedasamodeltorecognizenewfacesdynamicallyandcheckifitbelongtothefacesofthequeriedperson.Moreover,thistaskcanbeachievedwithoutanysupervision,sincetheschemeprovidesusthetrainingdatalabeledautomatically.Inthenexttwosubsections,weexplainhowtheoutputofthepersonfindingapproachcanbeusedinrecognizingnewfaces.Wemodelthereturnedsolutionintwowaystolearnthethresholdsfor:averagedegreeandaveragedistance.
3.6.1DegreeModeling
Asexplainedin3.4,thegreedydensestcomponentalgorithmworksiterativelybyremovingonenodefromthegraphuntilthereisonelastnodeleftinthegraph.Averagedensityofeachsubgraphiscomputedineachiterationand
CHAPTER3.GRAPHBASEDPERSONFINDINGAPPROACH33
finallythesubgraphwiththelargestaveragedensityisassignedasthedensestcomponent.Amongtheseiterations,thereisonelastnoderemovedfromthecurrentsubgraphthatresultsinthedensestcomponentinthenextiteration.Thislastnodecanbethoughtofasthebreakingpoint,andindicateanevidenceforthemaximumnumberoftotalconnections(edges)fromanoutliernodetoallthenodesinthedensestcomponent.Thistotalnumber–degreeofthenearestoutliertoallthefacesrecognizedasthequeryperson–canusedasathresholdinfurtherrecognition.Whenanewfaceisencountered,itsdegreetoallthefacesinthedensestcomponentiscomputedfirst.Then,thefaceislabeledasthequerypersonifitsdegreeisgreaterthanthefounddegreethreshold.
3.6.2DistanceModeling
Inthismethod,averagedistanceoftrue-trueandfalse-truematchesareused.Foreachnodeinthegraph,itsaveragedistancetoallthenodesinthedensestcomponent–hencethefacesofthequeryperson–arecomputed.Ifanodewasinthedensestcomponent,thenitsaveragedistanceislabeledasatrue-truematchdistance,elseafalse-truematchdistance.ThesedistancesarethentrainedwiththequadraticBayesnormalclassifier([43,51])tolearntheaveragedistancethresholdandclassifynewtestimagesbasedonitsaveragedistancetotrueimagesinthetrainingset.
CHAPTER3.GRAPHBASEDPERSONFINDINGAPPROACH34
Figure3.6:Exampleofconvertingaweightedgraphtoabinarygraph.Nodesandtheirdistancesaregiveninthefirstimage.Theresultinggraphafterapplying0.65astheproximitythresholdisgiveninthesecondimage.Boldedgesaretheedgesthatremainafterconversion.Thefinaldensestcomponentofthisgraphiscircledinthelastimage.
CHAPTER3.GRAPHBASEDPERSONFINDINGAPPROACH35
Figure3.7:Anchorpersonprobleminnewsvideos.Afterintegratingnamesandfacesforaqueryperson,itishighlyprobablethateitheroneoftheanchorpersonswillbereturnedasthedensestcomponentofthelimitedsearchspace.Thefirstfigureontheleftcorrespondstoasample
Chapter4Experiments
4.1
DataSets
Themethodproposedinthisthesiswastestedontwodifferentdatasets:newsphotographsonthewebandbroadcastnewsvideosontelevision.Inthenexttwosubsections,webrieflydescribebothdatasetsusedinourexperiments.
4.1.1NewsPhotographs
ThedatasetconstructedbyBergetal.originallyconsistsofabouthalfamillioncaptionednewsimagescollectedfromYahoo!NewsontheWeb.Afterapplyingafacedetectionalgorithmandprocessingtheresultingfaces,theywereleftwithatotalof30,281detectedfaces[8].Eachimageinthissetisassociatedwithasetofnames.Atotalof13,292differentnamesareusedforassociation.Howevermorethanhalf(9,609)ofthemareusedonlyonceortwice.Also,aswementionedpreviously,aparticularpersonmaybecalledbydifferentnames.Forexample,thenamesusedforGeorgeWBushandtheirfrequencyare:GeorgeW(1485);W.Bush(1462);GeorgeW.Bush(1454);PresidentGeorgeW(1443);PresidentBush(905);U.S.President(722);PresidentGeorgeBush(44);PresidentBushs(2);PresidentGeorgeWBush(2);GeorgeWBush(2).
36
Wemergethesetof
CHAPTER4.EXPERIMENTS37
differentnamesusedforthesamepersonandthentaketheintersectiontofindfacesassociatedwithdifferentnamesofthesameperson.Adetailedlistofdiffer-entformsofnamescorrespondingtoeachquerypersonusedinourexperimentsisgivenintheappendix.
Generally,thenumberoffacesintheresultingsetislessthanthenumberofallnamessinceacaptionmayincludemorethanoneinstanceofthereferredname.Forexample,forBushthenumberoffacesis2,849whilethetotalnumberofallreferrednamesis7,528.Intheexperiments,thetop23peopleappearingwiththehighestfrequencies(morethan200times)areused.Figure4.1showsthetotalnumberoffacesassociatedwiththegivennameandthenumberofcorrectfacesforthe23peopleusedintheexperiments.
4.1.2NewsVideos
Theseconddatasetusedintheexperimentsisthebroadcastnewsvideospro-videdbyNISTforTRECVIDvideoretrievalevaluationcompetition2004[1].Itconsistsof229movies(30minuteseach)fromABCandCNNnews.Theshotboundariesandthekey-framesareprovidedbyNIST.SpeechtranscriptsextractedbyLIMSI[21]areusedtoobtaintheassociatedtextforeachshot.Fortheexperiments,wechoose5people,namelyBillClinton,BenjaminNe-tanyahu,SamDonaldson,SaddamHusseinandBorisYeltsin.Inthespeechtranscripttext,theirnamesappear991,51,100,149and78timesrespectively.ThefacedetectionalgorithmprovidedbyMikolajcyzk[40]isusedtoextractfacesfromkey-frames.Duetohighnoiselevelsandlowimageresolutionquality,thefacedetectorproducesmanyfalsealarms.Onrandomlyselectedtenvideos,in2942images,1395regionsaredetectedasfacesbutonly790ofthemarerealfacesand580facesaremissed.Intotal,31,724facesaredetectedoverthewholedataset.
CHAPTER4.EXPERIMENTS38
4.2EvaluationCriteria
Forevaluation,wegivetheexperimentalresultsbasedonrecallandprecisionvalues.Thesevaluesarecomputedasfollows:LetNbethetotalnumberoffacesreturnedbythealgorithmasthefacesofthequeryperson.AmongthoseN,letnbethetotalnumberoffacesthatreallybelongtothatperson.Then,theprecisionvalueofthisresultis:
precison=
n.N
Ifthereisatotalofmfacesinthewholedatasetthatbelongtothequeryperson,thentherecallvalueoftheresultis:
recall=
n.m
Weshouldalsodenotethatasthebaseline,weusetheimagesreturnedbythefacedetector.Hence,m(groundtruthofthequeryperson)iscomputedamongthosedetectedfaces.
Afterfindingtherecallandprecisionvaluesforeachquerypersonindividually,wefinallycomputetheweightedresultsforaveragerecallandprecisionvalues.Whatwemeanwithweightedisthatrecalland/orprecisionvalueofeachpersonisweightedbythenumberofimagesthatappearinhis/herlimitedsearchspace.Letrecall(i)betherecallvaluefortheithperson,andS(i)bethetotalnumberofimagesinitslimitedsearchspace.Then,weightedaveragerecallis:
S(i)∗recall(i)wavgrecall=.S(i)
Inthefollowingtwosubsections,wegivetheresultsoftheseexperimentsonbothsetsseparately:
CHAPTER4.EXPERIMENTS39
4.3
4.3.1
ExperimentalResultsonNewsPhotographs
MatchingPoints
Asthefirststep,thepointshavingtheminimumdistanceaccordingtotheirSIFTdescriptorsaredefinedasthematchingpoints.Thesepointsarefurthereliminatedusingthetwoconstraints.Afterthiseliminationprocess,73%ofallpossibletruematchesarekeptandweloseonly27%oftruematches.Amongtheseassignments,weachievedacorrectmatchingrateof72%.
4.3.2GraphApproach
Thesuccessofouralgorithmvarieswiththethresholdthatischosenwhilecon-vertingtheweighteddissimilaritygraphtoabinaryone.Forthenewspho-tographsdataset,averagerecallandprecisionvaluesbyvaryingthethresholdbetween0.55and0.65isgiveninFigure4.2.Basedonthesevalues,thethreshold0.575ischosentorepresenttherecallandprecisionvaluesforeachperson.Thethreshold0.575ischosentorepresenttherecallandprecisionvaluesforeachpersonThesevaluesaregiveninFigure4.3forthisthreshold.Averageprecisionvalueisobtainedas48%forthebaselinemethodwhichassumesthatallthefacesappearingaroundthenameiscorrect.Withtheproposed,methodweachieved68%recalland71%precisionvaluesontheaverage.Themethodcanachieveupto84%recall-asforGrayDavis-and100%precision-asforJohnAshcroft,HugoChavez,JiangZeminandAbdullahGul.Wehadinitiallyassumedthat,afterassociatingnames,truefacesofthequeriedpersonappearmorethananyotherpersoninthesearchspace.However,whenthisisnotthecase,thealgorithmgivesbadretrievalresults.Forexample,thereisatotalof913imagesassociatedwithnameSaddamHussein,butonly74ofthemaretrueSaddamHusseinimageswhile179ofthemareGeorgeBushimages.
CHAPTER4.EXPERIMENTS40
Toshowthatoursystemworksalsoonindividualsappearinginasmallnum-berofcaptions,weperformedexperimentson10peopleappearinglessthan35timesandobtainedaveragerecallandprecisionvalues85%and66%.Asanotherexperiment,wechangedthenumberofinstancesofafacebyremovingsomeofthecorrectfacesorbyaddingsomeincorrectfaces.For4peoplehavingaround200instancesandsimilarnumberoftrueandfalseimages(i)weremoved50oftrueimagesoffromeachoftheirsearchspace,(ii)weadded100falseimages.Originally,averagerecallandprecisionvalueswere63%and95%.Weobtained59%recallan89%precisionafter(i),and58%recalland70%precisionafter(ii).Althoughtheprecisionissomewhataffected,resultsarestillacceptable.SomesampleimagesretrievedandnotretrievedforthreepeoplefromthetestsetareshowninFigure4.4.
4.3.3OnlineRecognition
Therecognitionmethodsexplainedin3.6aretestedonthenewsphotographsdataset.Intheseexperiments,Kpercentageoftheimagesinthesearchspaceofaquerypersonisselectedastheheld-outsetandthegraphalgorithmisappliedontheremainingimagestogetthemodel.AverageresultsofthedegreemodelingmethodaregiveninFigure4.5andinTable4.1.AndtheaverageresultsofthedistancemodelingmethodaregiveninFigure4.6andinTable4.2.ForK=10,therecallandprecisionvaluesforeach23personisgiveninFigures4.7and4.8respectively.fordistancemodelingtechnique.
4.4
4.4.1
ExperimentalResultsonNewsVideos
IntegratingFacesandNames
Forbetterunderstandingofthedistributions,weplotthefrequencyoffacesrelativetothepositionofthenamesforthefivepeoplethatwehavechosenfor
CHAPTER4.EXPERIMENTS41
Table4.1:Recognitionratesofdegreemodelingcentoftheimagesareusedastheheld-outset.
K1020304050modelrecall0.680.680.670.660.66precision0.710.710.710.700.70testsetrecall0.690.700.700.700.69precision0.710.710.700.700.70fordifferentKvalues.(Kper6070800.610.680.700.66900.580.640.760.620.650.630.700.690.690.690.690.68Table4.2:RecognitionratesofdistancemodelingfordifferentKvalues.(Kpercentoftheimagesareusedastheheld-outset.
K102030405060708090modelrecall0.690.680.670.670.660.650.630.610.58precision0.710.710.700.700.700.700.690.680.65testsetrecall0.720.700.700.680.670.670.640.620.57precision0.720.720.720.720.720.720.710.710.69ourexperimentsinFigure4.10.Itisseenthattakingonlyoneprecedingandtwofollowingshots(whichwerefertoas[-1,2])isalsoagoodchoice.Table4.3showsthat,mostofthecorrectfacesfallintothisselectedrangebyremovingmanyfalsealarms.
Table4.3:Numberoffacescorrespondingtothequerynameovertotalnumberoffacesintherange[-10,10]and[-1,2].
RangeClintonNetanyahuDonaldsonSaddamYeltsin[-10,10]213/69059/383137/119718/100421/488[−1,2]160/24576/114102/33014/33219/157
CHAPTER4.EXPERIMENTS42
4.4.2AnchorDetection
Weappliedthedensestcomponentbasedmethodtoeachnewsvideoseparately,tofindthepeopleappearingmostfrequently,whichcorrespondtotheanchorpeople.Werunthealgorithmon229videosinourtestset,andobtainedaveragerecallandprecisionvaluesas0.90and0.85respectively.ImagesthataredetectedasanchorpersonintendifferentvideosaregiveninFigure4.11.
4.4.3GraphApproach
Inordertodetermineareasonablethresholdusedinconvertingtheweightedsim-ilaritygraphtoabinaryoneforthenewsvideos,werandomlyselected10videosandrecordedrecall-precisionvaluesofdifferentthresholdsforanchorpersonde-tection.ThesevaluesareplottedinFigure4.12.Furtherinourexperiments,weselectthepointmarkedwithacrossintherecall-precisioncurve,whichcorre-spondstothresholdof0.6.Thesamethresholdisusedbothforanchorpersondetectionandforpersonqueries.
Afterselectingtherangewherethefacesmayappearweapplythedensestcomponentalgorithmtofindthefacescorrespondingtothequeryname.WehaverecordedthenumberoftruefacesofthequerynameandtotalnumberofimagesretrievedasinTable4.4.Thefirstcolumnofthetablereferstototalnumberoftrueimagesretrievedvs.totalnumberoftrueimagesretrievedbyusingonlythespeechtranscripts-selectingtheshotswithininterval[-1,2].Thenumbersafterremovingthedetectedanchorpeoplebythealgorithmfromthetext-onlyresultsaregiveninthesecondcolumn.Andthelastcolumnisforapplyingthealgorithmtothisset,fromwhichtheanchorpeopleareremoved.TheprecisionvaluesaregiveninFigure4.14.SomesampleimagesretrievedforeachpersonareshowninFigure4.13.
Ascanbeseenfromtheresults,wekeepmostofthecorrectfaces(especiallyafteranchorpersonremoval),andwegetrejectmanyoftheincorrectfaces.Hencethenumberofimagespresentedtotheuserisdecreased.Also,ourimprovementin
CHAPTER4.EXPERIMENTS43
Table4.4:Numbersinthetableindicatethenumberofcorrectimagesretrieved/totalnumberofimagesretrievedforthequeryname.QuerynameText-onlyAnchorremovedMethodapplied
Clinton160/2457150/1765109/1047
NetanyahuSamDonaldson6/114102/3305/7481/2004/3267/67
Saddam14/33214/2279/110
Yeltsin19/15717/12210/57
precisionvaluesarerelativelyhigh.Averageprecisionofonlytextbasedresultsincreasesby29%afterancherpersonremoval,andby152%afterapplyingtheproposedalgorithm.
4.5AMethodforFindingtheGraphThresholdAutomaticaly
Thenormalizedcutmetricpresentedin[45]canbeusedforselectingthethresholdtoconverttheweightedgraphtoabinaryone.LetAbethesetofverticesofacutandVbethesetofallverticesingraphG.ThenthevalueofacutandnormalizedcutofAaredefinedasfollows:
uA,vV−A
cut(A,V)=w(u,v),
Ncut(A,V)=
cut(A,V)cut(A,V)+,
assoc(A,V)assoc(V−A,V)
wherew(u,v)isafunctionofsimilaritybetweennodesuandv;and
assoc(A,V)=uA,tVw(u,t)isthetotalconnectionfromnodesinAtoallnodesinthegraph.
Weobtaindifferentbinarygraphsbychoosingdifferentthresholdintherange0.55and0.65.Thengraphalgorihmisrunonthesegraphsseperatelyandthe
CHAPTER4.EXPERIMENTS44
normalizedcutvalueforeachgraphiscalculatedasdefinedasabove.Amongall,thethresholdsappliedinconstructingthegraphwithminimumNcutvalueisselected.Theoverallweightedrecallandprecisionvaluesareachievedas74.01and68.55respectively.
4.6PerformanceAnalysis
Theperformanceofoursystemismainlybasedoncomputingthesimilarityvaluessincewecompareeachfacewithallotherfacesinthesearchspace.Hence,thetimecomplexityofconstructingthesimilaritymatrixisO(N∗N),whereNisthetotalnumberofimagesinthelimitedsearchspaceofaqueryname.ThetimecomplexityofthegreedygraphalgorithmisO(N);andittakesconstanttimetocheckifanewimagebelongstothequerypersonaftertheresultofthegraphapproachismodeled.
Toformanexample,thesimilaritymatrixofasearchspacewith200picturesisconstructedin9minutesonaPentiumIV3GHzmachinewith2GBmemory;andittakeslessthan1secondtopartitionthisgraphwiththedensestcomponentalgorithm.
CHAPTER4.EXPERIMENTS45
Figure4.1:Namesof23peopleareusedintheexperiments.Thetotalnumberoffacesassociatedwithanameisrepresentedbyredbarsandnumberofcorrectfacesbygreenbars.
CHAPTER4.EXPERIMENTS46
Figure4.2:Recall-precisioncurveof23peopleinthenewsphotographsdataset.Precisionandrecallvalueschangedependingonthethreshold.Weusedthresholdvaluesbetween0.55and0.65toshowtheeffect.Thethresholdusedintherestofourexperimentsismarkedwithred.
CHAPTER4.EXPERIMENTS47
Figure4.3:Recallandprecisionvaluesfor23peopleforgraphthresholdvalueof0.575.Bluebarsrepresentrecallandredbarsrepresentprecisionvaluesthatareachievedwiththeproposedmethod.Greenbarsareprecisionvaluesforthebaselinemethod,whichdoesnotusethevisualinformationandretrievesthefaceswhennameappearsinthecaption.
CHAPTER4.EXPERIMENTS48
Figure4.4:Sampleimagesretrieved(ontheleft)andsampleimagesnotretrieved(ontheright)forthequerynames:GeorgeBush(top),ColinPowell(middle),HansBlix(bottom).
Figure4.5:Recallandprecisonvaluesoftheheld-outset(ontheleft)andtheconstructedmodel(ontheright)foronlinerecognitionwithdegreemodelingmethod.
CHAPTER4.EXPERIMENTS49
Figure4.6:Recallandprecisonvaluesoftheheld-outset(ontheleft)andtheconstructedmodel(ontheright)foronlinerecognitionwithdistancemodelingmethod.
Figure4.7:Recallvaluesoftheheld-outsetandtheconstructedmodelforeachpersonforK=10.
CHAPTER4.EXPERIMENTS50
Figure4.8:Precisonvaluesoftheheld-outsetandtheconstructedmodelforeachpersonforK=10.
Figure4.9:ThefigureshowsfrequencyofBillClinton’svisualappearancewithrespecttothedistancetotheshotinwhichhisnameismentioned.Left:whenthewholedatasetisconsidered,right:whenthefacesappearingaroundthenamewithintheprecedingandthefollowingtenshotsareconsidered.OverthewholedatasetClintonhas240facesand213ofthemappearintheselectedrange.
CHAPTER4.EXPERIMENTS51
Figure4.10:TherelativepositionofthefacestothenameforBenjaminNe-tanyahu,SamDonaldson,SaddamHussein,andBorisYeltsinrespectively.
Figure4.11:Detectedanchorsfor6differentvideos.
Figure4.12:Recall-precisionvaluesforrandomlyselected10newsvideosforthresholdvaluesvaryingbetween0.55and0.65.
CHAPTER4.EXPERIMENTS52
Figure4.13:Sampleimagesretrievedforfivepersonqueriesinexperiments.EachrowcorrespondstosamplesforClinton,Netanyahu,SamDonaldson,Saddam,Yeltsinqueriesrespectively.
Figure4.14:Precisionsvaluesachievedforfivepeopleusedinourtests.
Chapter5Comparison
Therearetwomainissuestobeanalyzedintheoverallpersonfindingapproach:definitionofthesimilaritymeasureingraphconstructionandthedensestcom-ponentalgorithminpartitioningthisgraph.Forcomparisonreasons,wefirstanalyzethesetwoissuesinthefollowingtwosections,andthencomparetheresultswiththepreviousapproachonthesamedataset.
5.1BaselineMethod
Theprinciplecomponentanalysis(pca)isawell-knownmethodthathasalsobeenusedinfacerecognitionaseigenfaces[36].Astocomparewithabaselinemethod,wehaveapplieditonthenewsphotographsdata,togiveanideaoftheperformanceofthetraditionalfacerecognitionmethods.Theexperimentsareconductedonthegroundtruthfacesofthetop23peopleusedinexperiments.Foreachperson,(100-K)percentoftheimagesareselectedfortraining.RemainingKpercentoftheimagesarethenclassifiedasbeingoneofthesepeopleinthetrainset.ThealgorithmisrunKtimeswithdifferentrandomgroupsofimagesfortestingandtraining.Recognitionrateiscalculatedbydivingthetrulylabeledimagesbytotalnumberofimagestested.TheaveragerecognitionratesofbothtestandtrainsetfordifferentKaregiveninTable5.1andin5.1,whichshows
53
CHAPTER5.COMPARISON54
offlowratesfortestimages.
Table5.1:RecognitionratesoftheeigenfacemethodforfordifferentKvalues.
K102030405060708090test51.9051.8350.4148.9146.3343.4839.6434.6528.13train98.5698.5798.5898.7498.8098.8999.1299.3099.45Figure5.1:RecognitionratesoftheeigenfacemethodfordifferentKvalues.
5.2FeatureSelectionandSimilarityMatrixConstruction
5.2.1FindingTrueMatchingPoints
Asstatedearlier,wehavenotusedtheoriginalmatchingmetricofSIFT,sinceitdoesnotworkwellforfaces.(SeeFigure5.2forsamplematchesfoundbyusingtheoriginalmatchingmetricandtheproposedmethodareshown.)Toseehowwelltheoriginalmetricperforms,wehaveconstructedthesimilaritygraphwithusingthematchesoftheoriginalmetricandthenappliedthedensestcomponentalgorithm.AsamplesimilaritymatrixcanbeseeninFigure5.3.TheperformanceresultsfordifferentgraphthresholdsareplottedinFigure5.4.Therecallandprecisionvaluesforthesamethresholdinouroriginaltests(0.575)arerecordedas0.91and0.59respectively.
Inasecondexperiment,wehaveappliedtheRansacalgorithm[20]tofindthe
CHAPTER5.COMPARISON55
homographymatrixandassigntruematchesamongall.ResultsofthistechniquesaregiveninFigure5.5.Asitcanbeperceivedfromtheresults,affineconstraintsdoesnotworkwellduetodeformabilitypropertyofthefaces.
5.3ExtractingSimilarGroupofFaces
Wecomparethegreedygraphalgorithmforfindingthedensestcomponentwithonewell-knownapproach:k-nearestneighborandtheone-classclassificationmethodsinthefollowingtwosubsections.Alltheexperimentsareconductedonnewsphotographsdataset,wherewehadachieved68%recalland71%preci-sionvaluesontheaverage.
5.3.1k-nnApproach
Inthelastexperimentsamethodsimilartothek-nearestneighborsclassificationhasbeenused.Foreachfaceinthetestset,wefindthedistancesofthatfacetoallthefacesinthetrainingset,andselectthenearestkfaces(k-neighbors).Ifnumberoftruefacesaregreaterthanthenumberoffalseonesinthisk-neighbors,thenthetestfaceisclassifiedasatrueface.Thetestsareconductedwithdifferentnumberoftrainingandtestingsets.TheresultsaregiveninTable5.2andinFigure5.6fordifferentK,whereKindicatespercentageoftheimagesareusedfortesting.Theresultsshowthatthegreedydensestcomponentalgorithmoutperformsthek-nnapproach.
5.3.2One-classClassification
Givenasetofdataitems,one-classclassificationmethodsaimtofindatargetclassagainsttheoutliers(??).Inotherwords,givenatestsample,itiseitheracceptedasbelongingtothetargetclassorrejected.Hence,one-classclassifica-tionapproachdiffersfromanyothertraditionalmultiortwo-classclassification
CHAPTER5.COMPARISON56
Table5.2:RecognitionratesofsupervisedmethodfordifferentKvalues.(Kisthepercentageoftheimagesthatareusedintesting;kisthenumberofneighboursused.
K102030405060708090k(neighbours)=11recall0.530.520.510.490.470.450.410.360.33precision0.360.360.350.340.340.330.320.310.31k(neighbours)=5recall0.610.610.600.580.560.540.510.470.40precision0.380.380.380.380.370.370.360.350.34k(neighbours)=3recall0.680.670.650.640.620.590.560.520.45precision0.410.400.400.400.390.390.380.370.36approachesbyholdingonlytheinformationofthetargetclass.Withthegreedydensestcomponentalgorithm[15],wealsoseektofindonlythenodesbelongingtothedensestcomponent(hencethefacesofthequeryperson)andassumeallothersareoutliers.Inthiscontext,themostsimilarproblemtotheproblemoffindingthedensestcomponentofthegraphisone-classclassificationproblem.Tothisend,wecomparetheone-classclassificationmethodsin??withthegraphalgorithmusedinthisstudy.
Thesimilaritygraphconstructedasdescribedin??keepsonlythedistancesamongfaces.Hence,theone-classclassificationmethodscannotbeappliedtothisgraph.Sotocomparethegreedygraphdensestcomponentmethodwithanyofthosemethods,weusedtheBag-Of-Featuresapproachasin[46,33]forgraphconstruction.Then,weappliedboththegreedygraphmethodandtwooftheone-classclassificationmethods(nearestneighbordatadescriptionmethodandk-nearestneighbordatadescriptionmethod)thatgaveusthebestresultsamongall.
Toconstructthegraph,wefirstextractedsiftfeaturesfromeachfaceimageandclusteredthesefeaturesusingk-meansclusteringinto50clusters.Then,ahistogramofthesizeofnumberofclusters(50)isformedforeachimageshowing
CHAPTER5.COMPARISON57
thedistributionofclusters.InInformationRetrieval,thefrequenciesoftheclus-tersareweightedby’termfrequencyinversedocumentfrequency(tf-idf)’whichiscomputedas:
nidNlogndni
tfidf=(5.1)
where,nidisnumberofoccurrencesoftermiindocumentd,ndistotalnum-beroftermsindocumentd,Nisthetotalnumberofdocumentsindatabaseandniisthenumberofdocumentsindatabasecontainingtermi.
Adaptingthesameapproach,wefindtheweightedfrequenciesoftheclustersandusethemasthefinalfeaturevectorforeachimage.Theone-classclassi-ficationmethodscanthenbeappliedonthesefeatures.Toapplythedensestcomponentgraphalgorithm,thesimilarityamongthefacesarefoundbynormal-izedscalarproductoftfidfvectorsbyusingthefollowingequation:
tfidf(f1)∗tfidf(f2)norm(f1)∗norm(f2)
distance(f1,f2)=(5.2)
where,tfidf(f1)istf-idfvectoroffaceimagef1andnorm(f1)isthenormoftfidfvectoroffaceimagef1.ThesimilaritymatrixconstructedasdescribedaboveforthequerynameHansBlixisshowninFigure5.7.Theprecision-recallcurveofthedensestcomponentgraphalgorithmforvaryinggraphthresholdsisgiveninFigure5.8.Recallandprecisionvaluesofone-class-classificationmethodsfordifferentK-foldvalidationtestsaregiveninTable5.3.Theresultsindicatethattheonce-classclassificationmethodsisnotsuperiortothegreedydensestcomponentalgorithmusedinourexperimentsforfindingthemostsimilarsetoffaces.
CHAPTER5.COMPARISON58
Table5.3:Recall-precisionratesoftwooneclassclassificationmethods:w1(near-estneighbordatadescriptionmethod)andw2(k-nearestneighbordatadescrip-tionmethod)(appliedontfidf’s).
trainingsetK=10K=20K=30K=40K=50recprerecprerecprerecprerecprew11.000.541.000.531.000.521.000.521.000.51w20.900.570.900.550.900.540.900.540.900.53testsetK=10K=20K=30K=40K=50recprerecprerecprerecprerecprew1.900.500.910.500.900.500.900.500.900.49w20.840.530.880.540.870.530.860.530.860.52CHAPTER5.COMPARISON59
Figure5.2:Examplesformatchingpoints.Firstcolumnformatchesfoundbyusingtheoriginalmatchingmetricofsift.Secondcolumnformatchesfoundbyapplyingtheproposedmethod.
CHAPTER5.COMPARISON60
Figure5.3:Similaritymatrixfor201imagesinthesearchspaceforthenameHansBlix.Inthissearchspace,98oftheimagesaretrueHansBliximages,andtheremaining103arenot.Forvisualization,the98trueBliximagesareputonthetopleftofthematrix.Darkcolorscorrespondtolargersimilarityvalues.
Figure5.4:Recall-precisioncurveof23peopleinthetestset.Precisionandrecallvalueschangedependingonthethreshold.
CHAPTER5.COMPARISON61
Figure5.5:Examplesformatchingpointsassignedbythehomographymatrixfoundafterransac.
Figure5.6:Recognitionratesofthek-nnapproachfordifferentKandkvalues.Recallvaluesontheleftandprecisionontheright.Kisthepercentageoftheimagesusedfortesting,andkisthenumberofneighborsink-nn.
CHAPTER5.COMPARISON62
Figure5.7:Similaritymatrixfor201imagesinthesearchspaceforthenameHansBlix.Inthissearchspace,98oftheimagesaretrueHansBliximages,andtheremaining103arenot.Forvisualization,the98trueBliximagesareputonthetopleftofthematrix.Darkcolorscorrespondtolargersimilarityvalues.
Figure5.8:Recall-precisioncurveof23peopleinthetestset.Precisionandrecallvalueschangedependingonthethreshold.
Chapter6
ConclusionsandFutureWork
6.1
Conclusions
Inthispaper,weproposeagraphbasedmethodforqueryingpeopleinlargenewsphotographandvideocollectionswithassociatedcaptionsorspeechtranscripttexts.Givensimilaritymeasuresbetweenthefaceimagesinadataset,theproblemistransformedintoagraphprobleminwhichweseekthelargestdensestcomponentofthegraphcorrespondingtothelargestgroupofsimilarfaces.WeuseSIFTdescriptors[35]torepresenteachfaceimageanddefinethesimilarityvaluesbyusingtheaveragedistancesofthematchinginterestpoints.Then,weapplyagreedygraphalgorithm[15]tofindthedensestcomponentofthegraphcorrespondingtothefacesofthequeryperson.Inthepaper,wealsoproposetwodifferentmethodstousetheresultsofthegraphbasedpersonfindingapproachinfutherrecognitionofnewfaces.
Forlargerealisticdatasets,facerecognitionandretrievalisstilladifficultandanerror-proneproblemduetolargevariationsinpose,illuminationandexpressions.Inthisstudy,wehavedescribedamulti-modalapproachforqueryinglargenumbersofpeopleinsuchdatasets.Themethoddoesnotrequiretrainingforanyspecificpersonandthusitcanbeappliedtoanynumberofpeople.Withthisproperty,itissuperiortoanysupervisedmethodwhichrequireslabelingof
63
CHAPTER6.CONCLUSIONSANDFUTUREWORK64
largenumberofsamples.Theresultsachievedarealsoveryclosetosupervisedmethods.
Theexperimentsareconductedontwodifferentnewsdatasets.ThefirstsetconsistsofthousandsofnewsphotographswithassociatedcaptionscollectedfromYahoo!news.Thecaptionsareusedtolimitthenumberofimagesforaquerynameandonlytheimagesassociatedwiththenameareselected.Inthisdata,over20%increaseinprecisionisachievedcomparedtosolelytext-basedmethods.Forindividuals,upto84%recalland100%precisionvaluescanbeobtained.
Thesecondexperimentsareconductedon229broadcastnewsvideosarchive.Wefistusethespeechtranscriptsandselecttheneighboringshotsinwhichthenameofthequerynameappearstolimitoursearchspace.Applyingtheproposedpersonfindingalgorithmoneachvideoseparately,wedetecttheanchorpersonineachvideo.Then,weremovedetectedanchorpersonfromthesearchspaceofthequerynameandapplythealgorithmtotheremainingimages.Experimentsshowthatweimprovepersonsearchperformancesrelativetoonlytextbasedresults.Averageprecisionvaluesofonlytextbasedresultsareincreasedby29%afteranchorpersonremoval,andby152%afterapplyingtheproposedalgorithm.Thepersonfindingalgorithmalsoperformswellforanchorpersondetectionwithoutrequiringanysupervision.
Theproposedmethodisanoverallschemetofindandrecognizefacesinlargenewsphotographandvideocollections.Eachstepofthemethodcanbecommitedwithanothertecniques,forinstancesimilaritiesbetweenfacescanbeassignedwithadifferentapproachorthedensestcomponentcanbeextractedwithanothergraphpartitioningalgorithm.However,experimentsshowthatoursimilaritydefinitionworksbetterthanothertraditionalapproachesforthisdataset;andthegreedygraphalgorithmiscomparabletooneotherthemostpossibleapproach,namelyone-classclassification.
Oneoftheimportantremarkstobemadeonthemethodisthatevenifitisnotafacerecognitionschemeonthewhole,itisinstrumentalinreducingthenumberofimagespresentedtotheuserbyimprovingtheretrievalperformance
CHAPTER6.CONCLUSIONSANDFUTUREWORK65
ofbaselinemethods.
6.2FutureWork
Beforeapplyingthegreedydensestcomponentalgorithm,weconvertourweighedgraphconsistingofdissimilarityvaluesintoabinarygraph.However,thisignoressomeoftheinformation.Amethod,whichdoesnotviolatetheweightedpropertyofthegraph,mayyieldbetterresults.Inthisstudy,SIFTdescriptorsareusedtorepresentthesimilarityofthefaces.Otherrepresentationsorsimilaritymeasurescanalsobeusedtoconstructthegraphstructure.
In[28]setsoffaceexemplarsforeachpersonaregatheredautomaticallyinshotsfortrackinginvideo.Asimilarapproachcanbeadaptedandinsteadoftakingasinglefacefromeachshotbyonlyconsideringthekey-frames,facedetectioncanbeappliedtoallframestoobtainmoreinstancesofthesame.Thisapproachcanhelptofindbettermatchinginterestpointsandmoreexamplesthatcanbeusedinthegraphalgorithm.
Sincetheproposedapproachisanoverallscheme,itcanalsobeappliedtootherproblemssuchasobjectrecognitionorimageregionannotation.Forin-stance,inthecontextofregionannotation,annotationsofimagescanbeusedforlimitingthesearchspaceforaregion.Similary,inthislimitedspace,theregionofinterestisexpectedtoformthelargestsimilargroupofregions.
Bibliography
[1]Trecvideoretrievalevaluation
http://www-nlpir.nist.gov/projects/trecvid/,2004.
[2]J.Sivic.EveringhamandA.Zisserman.Hello!mynameis...buffy–auto-maticnamingofcharactersintvvideo.InProceedingsoftheBritishMachineVisionConference,2006.
[3]S.AksoyandR.M.Haralick.Graph-theoreticclusteringforimagegrouping
andretrieval.InIEEEInternationalConferenceonComputerVisionandPatternRecognition(CVPR),pages63–69,1999.
[4]K.Barnard,P.Duygulu,N.deFreitas,D.A.Forsyth,D.Blei,andM.Jor-dan.Matchingwordsandpictures.JournalofMachineLearningResearch,3,2003.
[5]M.Bartlett,H.Lades,andT.Sejnowski.Independentcomponentrepresen-tationsforfacerecognition,1998.
[6]S.Belongie,J.Malik,andJ.Puzicha.Shapematchingandobjectrecognition
usingshapecontexts,2001.
[7]S.Beretti,A.DelBimbo,andE.Vicario.Efficientmatchingandindexing
ofgraphmodelsincontent-basedretrieval.IEEETransactionsonPatternAnalysisandMachineIntelligence,23(10):1089–1105,2001.
[8]T.Berg,A.C.Berg,J.Edwards,andD.A.Forsyth.Who’sinthepicture.
InNeuralInformationProcessingSystems(NIPS),2004.
66
BIBLIOGRAPHY67
[9]T.Berg,A.C.Berg,J.Edwards,M.Maire,R.White,Y.-W.Teh,
E.Learned-Miller,andD.A.Forsyth.Namesandfacesinthenews.InIEEEConf.onComputerVisionandPatternRecognition(CVPR),2004.[10]M.Bicego,A.Lagorio,E.Grosso,andM.Tistarelli.Ontheuseofsift
featuresforfaceauthentication.InCVPRW’06:Proceedingsofthe2006ConferenceonComputerVisionandPatternRecognitionWorkshop,page35,Washington,DC,USA,2006.IEEEComputerSociety.
[11]V.Bruce.Recognizingfaces,1988.LawrenceErlbaumAssociates,London,
U.K.
[12]J.M.FellousC.vonderMalsburg,N.KrugerandL.Wiskott.Facerecogni-tionbyelasticbunchgraphmatching.InComputerAnalysisofImagesandPatterns1997,pages456–463,1997.
[13]C.Carson,S.Belongie,H.Greenspan,andJitendraMalik.Blobworld:Image
segmentationusingexpectation-maximizationanditsapplicationtoimagequerying.IEEETransactionsonPatternAnalysisandMachineIntelligence,24(8),2002.
[14]M.LaCascia,S.Sethi,andS.Sclaroff.Combiningtextualandvisualcues
forcontent-basedimageretrievalontheworldwideweb.InProc.IEEEWorkshoponContent-BasedAccessofImageandVideoLibraries,SantaBarbaraCAUSA,June1998.
[15]M.Charikar.Greedyapproximationalgorithmsforfindingdensecomponents
inagraph.InAPPROX’00:Proc.ofthe3rdInternationalWorkshoponApproximationAlgorithmsforCombinatorialOptimization,London,UK,2000.
[16]F.Chen,U.Gargi,L.Niles,andH.Schuetze.Multi-modalbrowsingof
imagesinwebdocuments.InProceedingsofSPIEDocumentRecognitionandRetrievalVI,1999.
[17]M-Y.ChenandA.Hauptmann.Searchingforaspecificpersoninbroadcast
newsvideo.InInternationalConferenceonAcoustics,Speech,andSignalProcessing(ICASSP’04),Montreal,Canada,May17-212004.
BIBLIOGRAPHY68
[18]S.Dickinson,M.Pelillo,andR.Zabih.Introductiontothespecialsection
ongraphalgorithmsincomputervision.IEEETransactionsonPatternAnalysisandMachineIntelligence,23(10):1049–1052,2001.
[19]P.DuyguluandA.Hauptmann.What’snews,what’snot?associatingnews
videoswithwords.InThe3rdInternationalConferenceonImageandVideoRetrieval(CIVR)Ireland,July21-23,2004.
[20]M.A.FischlerandR.C.Bolles.Randomsampleconsensus:aparadigmfor
modelfittingwithapplicationstoimageanalysisandautomatedcartography.Commun.ACM,24(6):381–395,1981.
[21]J.L.Gauvain,L.Lamel,andG.Adda.Thelimsibroadcastnewstranscrip-tionsystem.SpeechCommunication,37(1-2),2002.
[22]Y.Gdalyahu,D.Weinshall,andM.Werman.Self-organizationinvision:
Stochasticclusteringforimagesegmentation,perceptualgrouping,andim-agedatabaseorganization.IEEETransactionsonPatternAnalysisandMa-chineIntelligence,23(10):1053–1074,2001.
[23]R.Gross,S.Baker,I.Matthews,andT.Kanade.Facerecognitionacross
poseandillumination.InStanZ.LiandAnilK.Jain,editors,HandbookofFaceRecognition.SpringerVerlag,2004.
[24]R.Gross,J.Shi,andJ.Cohn.Quovadisfacerecognition?InThirdWork-shoponEmpiricalEvaluationMethodsinComputerVision,2001.
[25]G.HamerlyandC.Elkan.Learningthekink-means.InAdvancesinNeural
InformationProcessingSystems,volume17,2003.
[26]S.HelmerandD.G.Lowe.Objectrecognitionwithmanylocalfeatures.
InWorkshoponGenerativeModelBasedVision2004(GMBV),WashingtonD.C.,2004.
[27]N.IkizlerandP.Duygulu.Personsearchmadeeasy.InTheFourthIn-ternationalConferenceonImageandVideoRetrieval(CIVR),Singapore,2005.
BIBLIOGRAPHY69
[28]M.EveringhamJ.SivicandA.Zisserman.Personspotting:videoshot
retrievalforfacesets.InInternationalConferenceonImageandVideoRetrieval(CIVR),Singapore,2005.
[29]G.M.DaviesJ.W.ShepherdandH.D.Ellis.Studiesofcuesaliency,1981.
InPerceivingandRememberingFaces,G.M.Davies,H.D.Ellis,andJ.W.Shepherd,Eds.AcademicPress,London,U.K.
[30]I.JermynandH.Ishikawa.Globallyoptimalregionsandboundariesas
minimumratioweightcycles.IEEETransactionsonPatternAnalysisandMachineIntelligence,23(10):1075–1088,2001.
[31]Y.KayaandK.Kohayashi.Abasicstudyonhumanfacerecognition,1972.
FrontiersofPatternRecognition,S.Watanabe,ed.,p.265.
[32]MichaelDavidKelly.Visualidentificationofpeoplebycomputer.PhDthesis,
1971.
[33]S.Lazebnik,C.Schmid,andJ.Ponce.Beyondbagsoffeatures:Spatial
pyramidmatchingforrecognizingnaturalscenecategories.InCVPR,2006.[34]H.LeandH.Li.Recognizingfrontalfaceimagesusinghiddenmarkovmodels
withonetrainingimageperperson.InICPR’04:ProceedingsofthePatternRecognition,17thInternationalConferenceon(ICPR’04)Volume1,pages318–321,Washington,DC,USA,2004.IEEEComputerSociety.
[35]D.G.Lowe.Distinctiveimagefeaturesfromscale-invariantkeypoints.In-ternationalJournalofComputerVision,60(2),2004.
[36]A.PentlandM.Turk.Eigenfacesforrecognition.JournalofCognitiveNeu-rosicence,3(1):71–86,1991.
[37]A.P.PentlandM.A.Turk.Facerecognitionusingeigenfaces.InIEEECon-ferenceonComputerVisionandPatternRecognition,1991.
[38]B.S.Manjunath,R.Chellappa,andC.vonderMalsburg.Afeaturebased
approachtofacerecognition.InIEEEConf.onComputerVisionandPat-ternRecognition(CVPR),1992.
BIBLIOGRAPHY70
[39]K.MikolajczkandC.Schmid.Aperformanceevaluationoflocaldescriptors.
InIEEEConferenceonComputerVisionandPatternRecognition,2003.[40]K.Mikolajczyk.Facedetector.INRIARhone-Alpes,2004.Ph.DReport.[41]K.MikolajczykandC.Schmid.Indexingbasedonscaleinvariantinterest
points.InInternationalConferenceonComputerVision(ICCV),pagesI:525–531,2001.
[42]B.Park.Facerecognitionusingface-argmatching.IEEETrans.Pattern
Anal.Mach.Intell.,27(12):1982–1988,2005.Member-Kyoung-MuLeeandMember-Sang-UkLee.
[43]P.E.HartR.O.DudaandD.G.Stork.InPatternclassification.JohnWiley
andSons,2001.
[44]S.SatohandT.Kanade.Name-it:Associationoffaceandnameinvideo.
InProceedingsofIEEEConferenceonComputerVisionandPatternRecog-nition(CVPR),1997.
[45]J.ShiandJ.Malik.Normalizedcutsandimagesegmentation.IEEETrans-actionsonPatternAnalysisandMachineIntelligence,22(8):888–905,2000.[46]J.SivicandA.Zisserman.Videogoogle:Atextretrievalapproachtoobject
matchinginvideos.InICCV’03:ProceedingsoftheNinthIEEEInterna-tionalConferenceonComputerVision,page1470,Washington,DC,USA,2003.IEEEComputerSociety.
[47]C.G.M.SnoekandM.Worring.Multimodalvideoindexing:Areviewofthe
state-of-the-art.MultimediaToolsandApplications,25(1),2005.
[48]X.Tan,S.Chen,Z.Zhou,andF.Zhang.Facerecognitionfromasingle
imageperperson:Asurvey.PatternRecognition.,39(9):1725–1745,2006.[49]R.ChellappaW.Zhao.Image-basedfacerecognition:Issuesandmethods.
ImageRecognitionandClassification,pages375–402,2002.
BIBLIOGRAPHY71
[50]J.Wang,K.N.Plataniotis,andA.N.Venetsanopoulos.Selectingdis-criminanteigenfacesforfacerecognition.26(10):1470–1482,2005.
[51]A.Webb.InStatisticalPatternRecognition.JohnWileyandSons,2002.[52]J.Yang,M-Y.Chen,andA.Hauptmann.Findingpersonx:Correlating
nameswithvisualappearances.InInternationalConferenceonImageandVideoRetrieval(CIVR),DublinCityUniversityIreland,2004.
[53]J.Yang,D.Zhang,A.F.Frangi,andJ.Yang.Two-dimensionalpca:Anew
approachtoappearance-basedfacerepresentationandrecognition.IEEETrans.PatternAnal.Mach.Intell.,26(1):131–137,2004.
[54]W.Zhao,R.Chellappa,P.J.Phillips,andA.Rosenfeld.Facerecognition:
Aliteraturesurvey.ACMComputingSurveys,35(4):399–458,2003.
PatternRecognitionLetters,
AppendixA
DifferentFormsofNamesinNewsPhotographs
GeorgeBushGeorgeW(1485),W.Bush(1462),GeorgeW.Bush(1454),PresidentGeorgeW(1443),PresidentBush(905),U.S.President(722),PresidentGeorgeBush(44),PresidentBushs(2),PresidentGeorgeWBush(2),GeorgeWBush(2)
SaddamHussein(911),PresidentSaddam(356),PresidentSaddamHussein(351),PresidentSaddamHussien(3),PresidentSaddamHussiein(1),MinisterSaddam(1)ColinPowell(618),SecretaryofStateColinPowell(606)SecretaryColinPowell(4),CollinPowell(3),
SecretaryGeneralColinPowell(2),SecretaryPowell(1),SecretaryofStatePowell(1)
TonyBlair(502),PrimeMinisterTonyBlair(472),PremierTonyBlair(2),PrimeMinisterBlair(2),MrBlair(1),PrimeMinisterTonyBair(1)
PrimeMinisterJean(158),JeanChretien(155),MinisterJeanChretien(145),PrimeMinisterJeanChretien(145),PrimeMinisterJohnChretien(2),PrimeMinisterChretien(2)GerhardSchroeder(311),ChancellorGerhardSchroeder(283)ChancellorSchroeder(1),ChancellorGerhardSchroeders(2)
ChancellorGerhardSchroder(1),ChancellorGerhardSchoeder(1)
SaddamHusseinColinPowellTonyBlairJeanChretienGerhardSchroeder72
APPENDIXA.DIFFERENTFORMSOFNAMESINNEWSPHOTOGRAPHS73
JohnAshcroftDonaldRumsfeldArielSharonsJunichiroKoizumiHugoChavezGeneralKofiRohMoo-hyunLuladaJacquesChiracVladimirPutinAbdullahGulJiangZeminJohnPaulSilvioBerlusconiDavidBeckhamGrayDavisHansBlixJohnAshcroft(147),GeneralJohnAshcroft(146)AttorneyGeneralJohnAshcroft(143),U.S.Attorney(106)U.S.Attorney(2),U.SAttorney(1)DonaldRumsfeld(279),DonaldH(47),SecretaryDonaldRumsfeld(84),DonaldH.Rumsfeld(44),SecretaryDonaldH(26),H.Donald(13),
SecretaryofStateDonaldRumsfeld(4),SecretaryRumsfeld(6)MinisterAriel(249),PrimeMinisterAriel(248)PrimeMinisterArielSharons(2)
JunichiroKoizumi(156),PrimeMinisterJunichiro(151)PrimeMinisterJunichiroKoizumi(149),PrimeMinisterKoizumi(1)HugoChavez(194),PresidentHugoChavez(186),PresidentHugoChaves(1),PresidentChavez(3)GeneralKofi(124),SecretaryGeneralKofi(61)Secretary-GeneralKofi(60),GeneralGeneralKofi(1)
Secretary-GenaralKofi(1),Annan,U.N.(1),U.N.Secretary(57)U.N.Secretary-(39),U.N.General(9)
RohMoo-hyun(86),RohMoo-(93),PresidentRohMoo-hyun(55)President-electRohMoo-hyun(10),PresidentRoh(61),PresidentRohMoo-(61),President-ElectRohMoo(1)Lulada(119),PresidentLuizInacioLula(30),President-electLuizInacioLula(19),PresidentLula(7),PresidentLulaDa(5),President-electLulaDa(2)
PresidentLuizLulaDa(1),PresidentLuisInacioLula(1)
President-electLuisInacioLula(1),LuizInacio(105),LuisInacio(4)JacquesChirac(143),PresidentJacquesChirac(138)PresidentChirac(4),PresidentJaquesChirac(3)VladimirPutin(146),PresidentPutin(4)PresidentVladimirPutin(136)
AbdullahGul(84),MinisterAbdullahGul(57)PrimeMinisterAbdullahGul(47),PremierAbdullahGul(9)MinisterAbdullah(74)
JiangZemin(94),PresidentJiang(85),PresidentJiangZemin(85),GeneralSecretaryJiangZemin(2)JohnPaul(135),JohnPaulII(57),JohnPaulII(42)SilvioBerlusconi(113),PrimeMinisterSilvioBerlusconi(81)PremierSilvioBerlusconi(22),PrimeMinisterSivlioBerlusconi(1)DavidBeck(94),DavidBeckham(93),captainDavidBeckham(17),captainBeckham(5)
GrayDavis(109),Gov.GrayDavis(73),GovernorGrayDavis(26)HansBlix(168),InspectorHansBlix(17),Dr.HansBlix(13),U.N.HansBlix(3)
因篇幅问题不能全部显示,请点此查看更多更全内容