O.T.Yıdız,L.AkarunandH.L.Akın
Afastnearestneighbouralgorithmwithalogarithmicexpectedtestingtimeforsmallfeaturesizesispresented.IthasbeentestedonaRobotVisionapplicationforYUVspace.Thealgorithmtimerequirementisbetterthanamultilayerperceptrontrainedforthesamepurpose.
Thealgorithmcanbeeasilyextendedtok-nearestneighbourwithoutaffectingthetimecomplexity.Thedifferenceoccursinthelaststep.Wesearchtwoclosestinstancesateacharrayin1-nearestneighbour,whereaswesearchtheclosest2kinstancesink-nearestneighbour.Asanexample,assumethatwewanttofind1-nearestneighbourof(x1¼2,x2¼3,x3¼4)fortheexampledatasetinFig.1.ThetwoclosestinstancestothatinstanceareshowninFig.2.Ifwesearchtheseteninstances,wefindthat(x1¼2,x2¼2,x3¼4)isthenearestneighbour.
Introduction:1(k)nearestneighbouristhesimplestandmostextensivelystudiednonparametriclearningalgorithminmachinelearning.Althoughithasaverygoodperformancecomparedtoitsparametricopponentsandhasnolearningphaseduetoitsnonpara-metricnature,ithasaseriousdrawback:lineartestingtime.Tofindouttheclassofatestcase,itmustbecomparedwithN(numberofinstancesinthetrainingset)elementsandtheelement(kelements)withtheminimumdistancetothetestcasemustbefound.Tocompareoneinstancewithanother,severalfunctionsexist,suchasEuclidiandistance(L2norm),Mahalonobisdistance,etc.Eachofthesefunc-tionscanbethoughtasamappingfromdfeaturesoftheinstancetoasinglenumber(distance).So,theexpectedtestingtimeofthenearestneighbouralgorithmisO(dN).Severalsolutionshavebeenproposedforreducingthislineartestingtime.SomeofthemareK-dtrees[1],geometrichashing[2]andR-trees[3].InthisLetter,weproposeanovelalgorithmbasedonbinarysearchtoefficientlycalculate1(k)nearestneighbourinO(d!logN)time.Ifdissmall,i.e.lessthanorequalto5,andNissignificantlylargerthand,thealgorithmbehaveslikeanO(logN)algorithm.Wealsoshowthatthiscorrespondstoestimatingthereducedorderingofvectorsfromdifferentconditionalorderings[4].
Algorithm:Thenovelalgorithmmainlydependsonbinarysearchonsortedinstances.Asapreprocessingstep,d!sortedarraysareprepared.Ifaninstancehasdfeatures,thenthereared!possiblepermutationsofthefeatures.Foreachpossiblepermutation,allinstancesaresortedinalphabeticorderaccordingtothatpermutation.Thisisdefinedintheliteratureastheconditionalorderingofasetofvectors[4].The1-nearestneighbourwhichwearelookingforisthenearestvectorinthereducedorderingofthevectors.Wewillshowthatthiscanbeestimatedfromdifferentconditionalorderingsthatcorrespondtopermutationsofvectorcomponents.Foreachpermuta-tion,NelementscanbesortedinO(NlogN)timeusingthequicksortalgorithm.Soford!possiblepermutations,thepreprocessingsteprequiresO(d!NlogN)timeandO(d!N)memoryspace.Fig.1showsadatasetwithd¼3features,N¼10instancessortedin6¼3!differentways.Thepreprocessingstepofthealgorithmcanbethoughtasthelearningphaseofthenearestneighbouralgorithm.
Fig.1Datasetafterpreprocessingstep
Inthetesting,eachtestinstancextissearchedind!sortedarraysusingbinarysearch.Ifanexactmatchisfound,thereisnoneedtosearchotherarraysfor1-nearestneighbour.Otherwise,wesearchthetwoclosestinstancesateachofthed!arrays.Thesetwoclosestinstancesareleftandrightborderinstancesinthebinarysearch.BecausesearchinganelementinasortedarraywithNelementsusingbinarysearchtakesO(logN)time,thealgorithmhasatimecomplexityofO(d!logN).Afterfindingthetwoclosestinstancesateacharray,wesearchthese2d!instancesfor1-nearestneighbour.ComparedtothebinarysearchthisstepofthealgorithmtakesonlyO(d!)time,whichissmallerthansearchingarrays.
theproblem.First,becauseofthelightingconditions,usuallyalargenumberofthepixelsofthetestframesareexactlythesameasthepixelsinthetrainingframes.Assumingthatthelabellercorrectlylabelsthepixels,thenearestneighbouralgorithmwillhave100%performanceonthesepixels,whereasMLPwillnothavethesameperformance.Secondly,sincethecolourspaceissmoothlydistributed(smallchangesinY,U,Vvalueswillresultinnochangeincolourclassification),nearestneighbourwillbehavebetterthanMLP.
MLPhasfourinputunits,Y,U,V,andabiasunit,tenoutputunitsorcoloursandinourtrainingsevenhiddenunits.TotestaninstancewithMLP(seetherelatedcolumninTable1):
3Â7multiplicationsforthefirstlayer,and7Â10multiplicationsforthesecondlayer,total91,floatingpointmultiplicationsmustbecarriedout.Tosumupthemultiplicationstoobtainhiddenvaluesandoutputvalues91floatingpointadditionoperationsmustbeperformed
Therearealsosevensigmoidoperations,whicharemuchmorecostlythanmultiplicationbecauseoftheexponentiationoperation.Therearealsosevenadditionandsevendivisionoperationsinthesigmoidfunction
Tofindthebestclass,ninecomparisonsmustbeperformedontheoutputunits
Thenovelnearestneighbourhassixsortedarrayswith200000instancesineach.Totestaninstancewiththenovelnearestneighbour(seetherelatedcolumninTable1):
Searchinganinstanceinonearrayrequireslog(200000)¼17compar-isonoperations.Sixarraysrequire102comparisonoperations
Tofindnearestneighbourin12instances,36integermultiplications,36integeradditions,36integersubtractions,and11comparisonsmustbeperformed
OnepossibleextensionofthealgorithmmaybetoconvertY,U,Vsortedvaluesintoalongintegerandperformingalloperationsonthese
values.Then,thesecondstepinnearestneighbourwillreduceto11comparisonoperations.
Conclusion:Wehavedevelopedanovelalgorithmthatestimatesthenearestneighbourfromdifferentconditionalorderingsofvectorcomponents.Thealgorithmiscomputationallyadvantageousespe-ciallywhenthenumberoftrainingsamplesislarge.Wehaveappliedthealgorithmtoacolourquantisationprobleminarobotapplication.Wearecurrentlyworkingonthestatisticalanalysisofthealgorithm.#IEE2004
ElectronicsLettersonlineno:20040144doi:10.1049/el:20040144
24October2003
O.T.Yıldız,L.AkarunandH.L.Akın(.DepartmentOfComputerEngineering,Bog¸iUniversity,Bebek,Istanbul)˘azicE-mail:yildizol@cmpe.boun.edu.trReferences
12345
Bentley,J.L.,andWeide,B.W.:‘Optimalexpected-timealgorithmsfor
closestpointproblems’,ACMTrans.Math.Soft.,1980,6,(4),pp.563–580
Califano,A.,andMohan,R.:‘Multidimensionalindexingforrecognizingvisualshapes’.Proc.IEEEConf.ComputerVisionandPatternRecognition,Maui,HI,USA,1991,pp.28–34
Guttman,A.:‘R-trees:adynamicindexstructureforspatialsearching’.Proc.ACMSIGMOD,Boston,MA,USA,1984,pp.47–57
Hardie,R.C.,andArce,G.R.:‘RankinginRpanditsuseinmultivariateimageestimation’,IEEETrans.CircuitsSyst.VideoTechnol.,1991,1,(2),pp.197–209
Akin,H.L.,Pavlova,P.,andYildiz,O.T.:‘‘‘Cerberus2002’’Robocup2002:RobotSoccerWorldCupVI’.2002Int.RobocupSymp.Pre-proceedings,Fukuoka,Japan,2002,p.448
ELECTRONICSLETTERS5thFebruary2004Vol.40No.3
因篇幅问题不能全部显示,请点此查看更多更全内容