Birkbeck ePrints: an open access repository of the
research output of Birkbeck College
http://eprints.bbk.ac.uk Fenner, Trevor; Levene, Mark; and Loizou, George (2005) A stochastic model for the evolution of the web allowing link deletion. ACM Transactions in Internet Technology [in press]. This is an author-produced version of a paper to be published in ACM Transactions in Internet Technology (ISSN 1533-5399). This version has been peer-reviewed but does not include the final publisher proof corrections, published layout or pagination. All articles available through Birkbeck ePrints are protected by intellectual property law, including copyright law. Any use made of the contents should comply with the relevant law. Citation for this version: Fenner, Trevor; Levene, Mark; and Loizou, George (2005) A stochastic model for the evolution of the web allowing link deletion. London: Birkbeck ePrints. Available at: http://eprints.bbk.ac.uk/archive/00000268 Citation for the publisher’s version: Fenner, Trevor; Levene, Mark; and Loizou, George (2005) A stochastic model for the evolution of the web allowing link deletion. ACM Transactions in Internet Technology [in press]. http://eprints.bbk.ac.ukContact Birkbeck ePrints at lib-eprints@bbk.ac.uk
AStochasticModelfortheEvolutionoftheWeb
AllowingLinkDeletion
arXiv:cond-mat/0304316 v2 25 Mar 2004TrevorFenner,MarkLevene,andGeorgeLoizouSchoolofComputerScienceandInformationSystems
BirkbeckCollege,UniversityofLondon
LondonWC1E7HX,U.K.
{trevor,mark,george}@dcs.bbk.ac.uk
Abstract
Recentlyseveralauthorshaveproposedstochasticevolutionarymodelsforthegrowthofthewebgraphandothernetworksthatgiverisetopower-lawdistributions.Thesemodelsarebasedonthenotionofpreferentialattachmentleadingtothe“richgetricher”phenomenon.Wepresentageneralisationofthebasicmodelbyallowingdeletionofindividuallinksandshowthatitalsogivesrisetoapower-lawdistribution.Wederivethemean-fieldequationsforthisstochasticmodelandshowthat,byexaminingasnapshotofthedistributionatthesteadystateofthemodel,weareabletotellwhetheranylinkdeletionhastakenplaceandestimatetheprobabilityofdeletingalink.Applyingourmodeltoactualwebgraphdatagivessomeinsightintothedistributionofinlinksinthewebgraphandprovidesevidenceoflinkdeletionandtheextenttowhichthishasoccurred.Ouranalysisofthedataalsosuggestsapower-lawexponentofapproximately2.15ratherthanthewidelypublishedvalueof2.1.
1Introduction
f(i)=Ci−τ,
(1)
Power-lawdistributionstakingtheform
whereCandτarepositiveconstants,areabundantinnature[Sch91].Theconstantτiscalledtheexponentofthedistribution.Examplesofsuchdistributionsare:Zipf’slaw,whichstatesthattherelativefrequencyofwordsinatextisinverselyproportionaltotheirrank,Pareto’slaw,whichstatesthatthenumberofpeoplewhosepersonalincomeisaboveacertainlevelfollowsapower-lawdistributionwithanexponentbetween1.5and2,Lotka’slaw,whichstatesthatthenumberofauthorspublishingaprescribednumberofpapersisinverselyproportionaltothesquareofthenumberofpublications,andGutenberg-Richter’slaw,whichstatesthatthenumberofearthquakesoveraperiodoftimehavingacertainmagnitudeisroughlyinverselyproportionaltothemagnitude.
Recentlyseveralresearchershavedetectedasymptoticpower-lawdistributionsinthetopologyoftheWorld-Wide-Web[BKM+00,DKM+02]and,inparallel,researchersfromavarietyoffieldsaretryingtheexplaintheemergenceofthesepowerlawsintermsoftheevolutionofcomplexnetworks.See[AB02,DM02,New03]forcomprehensivesurveysofre-centworkinthisarea,detailingvariousstochasticmodelsthatcanexplaintheevolutionof
1
thewebandothernetworkssuchastheinternet,citationnetworks,collaborationnetworksandbiologicalnetworks.Acommonthemeinthesemodelsisthatofpreferentialattachment,whichresultsinthe“richgetricher”phenomenon,forexample,whennewlinkstowebpagesareaddedinproportiontothenumberofcurrentlyexistinglinkstothesepages.Relatedap-proachesarethegeneraltheoreticalmodelcoveringbothdirectedandundirectedwebgraphs[CF03],thestochasticmultiplicativeprocessinwhichnodesappearatdifferenttimesandtherateofadditionoflinksvariesbetweennodes[AH01],andthestatisticalphysicsapproachthatusestherateequationtechnique[KR02].
Toexplainsituationswherepurepreferentialattachmentmodelsfail,we[LFLW02]andothers[PFL+02]havepreviouslyproposedextensionsofthestochasticmodelfortheweb’sevolutioninwhichtheadditionoflinksisprescribedbyamixtureofpreferentialandnon-preferentialmechanisms.In[LFLW02],wedevisedageneralstochasticmodelinvolvingthetransferofballsbetweenurns;thisalsonaturallymodelsquantitiessuchasthenumbersofwebpagesinandvisitorstoawebsite,whicharenotnaturallydescribedingraph-theoreticterms.WenotethatoururnmodelisanextensionofthestochasticmodelproposedbySimoninhisvisionarypaperpublishedin1955[Sim55],whichwascouchedintermsofwordfrequenciesinatext.WealsoconsideredanalternativeextensionofSimon’smodelin[FLL02]byaddingapreferentialmechanismfordiscardingballsfromurns(correspondingtodeletingwebpages);thisresultsinanexponentialcutoffinthepower-lawdistribution.
Oururntransfermodelisastochasticprocess,inwhichateachstepwithprobabilitypanewball(whichmightrepresentawebpage)isaddedtothefirsturnandwithprobability1−paballinsomeurnismovedalongtothenexturn.Weassumethataballintheithurnhasipinsattachedtoit(whichmightrepresentweblinks).Itisknownthatthesteady-statedistributionofthismodelisapowerlaw,withexponentτ=1+1/(1−p)[Sim55,LFLW02].Asmentionedabove,thepower-lawdistributionbreaksdownwhenballsmaybediscarded,resultinginapower-lawdistributionwithanexponentialcutoff.Sothequestionarisesastowhetheritalsobreaksdownwhenremovalofpinsisallowed.Weanswerthisquestionherebyshowingthatthepower-lawdistributiondoesnotbreakdown,undertheconstraintthatmoreballsareaddedtothesystemthanareremoved.(Whentheonlyremainingpinisremovedfromaballinthefirsturnthatballisremovedfromthesystem.)Thismodelgivesamorerealisticexplanationfortheemergenceofpowerlawsincomplexnetworksthanthebasicmodelwithoutlinkdeletion(i.e.pinremoval).Considering,forexample,thewebgraph,themodificationwemaketothebasicmodelisthatafterawebpageischosenpreferentially,sayaccordingtothenumberofitsinlinks,thereisasmallprobabilitythatsomelinktothispagewillbedeleted.(Thisisequivalenttodeletingalinkchosenuniformlyatrandom.)Apossiblereasonforthismaybethatpopularwebpagescompeteforinlinks,sothatthenumberofinlinksapagehasacquiredwillfluctuatewithitspopularity.Thisisevident,forexample,intheriseandfallinthepopularityofseveralsearchengines.LinkdeletionhasalsobeenconsideredbyDorogovtsevandMendes[DM01]inthecontextofadifferentmodelinwhich,ateachtimestep,anewnodeisaddedwithjustonelink,whichpreferentiallyattachesitselftoanexistingnode.Inaddition,ateachtimestepmlinksbetweenexistingnodesareaddedtopreferentiallychosennodesandclinksbetweenexistingnodesaredeleted,againpreferentially.Theirconclusionthatlinkdeletionincreasesthepower-lawexponentisalsoobtainedhereforourstochasticmodel.
ConsiderapowerlawsuchasLotka’slaw[Nic89].Ifthisholds,thenaplotonalog-logscaleofthenumberofauthors(thefrequency)againstthenumberofpublications(thevalue)
2
shouldrevealastraightlinewithanegativeslopeofaround−2.Thereisanobviousproblemwithalog-logtransformationifanyofthefrequenciesarezero;thatistosay,whentherearevaluesv1andv2,withv1 Analternativeapproachistosquashthenon-zerofrequenciesuptowardsthey-axis(i.e.thefrequencyaxis),afterignoringthezerofrequencyvalues.Thisisequivalenttoomittingfromthedatasetthosevaluesforwhichtherearenoauthorshavingthatnumberofpub-licationsandthenrankingtheremainingvaluesinincreasingorder,i.e.renumberingthevaluesstartingfromone.Thismethodalsoseemssomewhatdubious,sincethepower-lawrelationshipshouldreallyinvolvetheactualvaluesratherthantheirranks. Thestandardtechniqueforfittingapower-lawdistributionuseslinearregressiononlog-logtransformeddata,whichisnotpossibleifgapsarepresent.Anotherapproachforhandlinggapsistoconsideronlythevaluesupuntilthefirstgap;thisapproachisonlyreasonableifthefirstgapdoesnotoccurattoolowavalue.Wecallthistheunrankedapproach,andnotethatthisapproachignoresthelargevaluesinthetailofthedistribution.Theunrankedapproachsuggestsonepossiblesolution,butotherapproaches,suchassmoothingorusingtheHillplot[DdR00],arepossible.Ad-hocapproacheshavethedisadvantagethattheyarehardtojustify,andtheHillplot,asoriginallydefined,isonlyapplicableafterthedatahasbeensorted,andthisalsoseemsdifficulttojustify. Anothersolutionforhandlinggapsisthesecondmethoddescribedaboveofrankingthevalueswithnon-zerofrequenciesandsquashingup;wecallthistherankedapproach.Ingeneral,weshouldnotexpecttherankedandunrankedapproachestoyieldthesamepower-lawexponent.Wewouldexpecttheexponentcomputedbylinearregressiononthelog-logtransformeddatafromtheunrankedapproachtobesomewhathigher.Oneofthefindingsofthispaperisthatforinlinksinthewebgraphtherankedandunrankedapproachesleadtoasmallbutnoticeabledifferencebetweentheexponents.(Ourpreviouscommentsindicatewhywehavemisgivingswiththerankedapproach.) Therestofthepaperisorganisedasfollows.InSection2wepresentanurntransfermodelthatextendsSimon’smodelbyallowingapin,chosenuniformlyatrandom,tobediscardedwithsomespecifiedprobability.InSection3wederivethesteadystatedistributionofthemodel,which,asstated,followsanasymptoticpowerlawlike(1).InSection4weshowthat,byexaminingasnapshotofthedistributionofballsinurnsatthesteadystate,weareabletotellwhetherremovalofpinshastakenplaceandestimatethepinremoval 3 probability.InSection5weutiliseoururnmodeltodescribeadiscretestochasticprocessthatsimulatestheevolutionofthedegreedistributionofinlinksinthewebgraph.Asaproofofconcept,weanalysetheMay1999datasetforinlinkspresentedin[BKM+00],whichweobtainedfromRaviKumaratIBM.Weareabletoshowthatourmodelisconsistentwiththedataanddeterminetheextenttowhichlinkdeletionhasoccurred.Wealsoinvestigatethediscrepancybetweentherankedandunrankedapproaches.Withtherankedapproach,usinglinearregressiononthelog-logtransformeddata,anexponentof2.1isobtained,asin[BKM+00].However,theunrankedapproachresultsinanexponentof2.15,whichisshowntobeconsistentwithourstochasticmodel.Althoughthedifferencebetween2.1and2.15maynotseemsignificant,ithasbeenremarkedin[BKM+00]that“2.1isinremarkableagreementwithearlierstudies”andin[DKM+02]thattheexponentis“reliablyaround2.1(withlittlevariation)”,whichjustifiesusinmakingadistinctionbetweentheexponentsobtainedusingtherankedandunrankedapproaches.Finally,inSection6wegiveourconcludingremarks. 2AnUrnTransferModel Wenowpresentanurntransfermodelforastochasticprocesswithurnscontainingballs(whichmightrepresentwebpages)thathavepins(whichmightrepresenteitherinlinksoroutlinks)attachedtothem.Ourmodelallowsforpinstobediscardedwithasmallprobability.ThismodelcanbeviewedasanextensionofSimon’smodel[Sim55].WenotethatthereisacorrespondencebetweentheBarab´asiandAlbertmodel[BA99],definedintermsofnodesandlinks,andSimon’smodel,definedintermsofballsandpins,aswasestablishedin[BE01].Essentially,thecorrespondenceisobtainedbynotingthattheballsinanurncanbeviewedasanequivalenceclassofnodesallhavingthesameindegree(oroutdegree). Weassumeacountablenumberofurns,urn1,urn2,urn3,...,whereeachballinurniisassumedtohaveipinsattachedtoit.Initiallyalltheurnsareemptyexcepturn1,whichhasoneballinit.LetFi(k)bethenumberofballsinurniatstagekofthestochasticprocess,soF1(1)=1.Then,fork≥1,atstagek+1ofthestochasticprocessoneoftwothingsmayoccur: (i)withprobabilityp,0 (ii)withprobability1−panurnisselected,withurnibeingselectedwithprobability proportionaltoiFi(k),andaballischosenfromurni;then, (a)withprobabilityr,0 theballisdiscarded(thisisequivalenttoremovinganddiscardingapinfromtheballchosenfromurni).Inthespecialcasewhenr=1,theprocessreducestoSimon’soriginalmodel. Wenotethatchoosingaballpreferentially(inproportiontothenumberofpins)isequiv-alenttoselectingapinuniformlyatrandomandchoosingtheballitisattachedto.Thus,withprobability(1−p)(1−r),apinchosenuniformlyatrandomisdiscarded. 4 SinceiFi(k)isthetotalnumberofpinsattachedtoballsinurni,theexpectedtotalnumberofpinsintheurnsatstagekisgivenby kE iFi(k) +(k−1)(p+(1−p)r−(1−p)(1−r))i=1 =1=1+(k−1)(1−2(1−p)(1−r)). (2) Correspondingly,theexpectedtotalnumberofballsintheurnsisgivenby kkE Fi(k) −(1−p)(1−r) i=1 =1+(k−1)p−1φj,(3) j=1 whereφj,1≤j≤k−1,istheexpectedprobabilityofchoosingurn1atstep(ii)ofstage j+1,i.e.φj=E F1(j) kk φj.(5) j=1 Inordertoensurethatthereareatleastasmanypinsinthesystemasthereareballs andthat,onaverage,moreballsareaddedtothesystemthanareremoved,werequirethefollowingconstraint,derivedfrom(2)and(3), 1−2r (1−p)(1−r) .(6) Thisimplies 2(1−p)(1−r)≤1, (7) whichobviouslyholdsforr≥1/2.Inequality(7)expressesthefactthattheexpectednumberofpinsshouldnotbenegative,andfollowsfrom(2). Wenotethatwecouldmodifytheinitialconditionssothat,forexample,urn1initiallycontainedδballs,δ>1,insteadofjustoneball.Itcanbeshown,fromthedevelopmentofthemodelbelow,thatanychangeintheinitialconditionswillhavenoeffectontheasymptoticdistributionoftheballsintheurnsasktendstoinfinity,providedtheprocessdoesnotterminatewithalloftheurnsempty.Morespecifically,theprobabilitythattheprocesswillnotterminatewithalltheurnsemptyisgivenby 1− (1−p)(1−r) 3DerivationoftheSteadyStateDistribution FollowingSimon[Sim55],wenowstatethemean-fieldequationsfortheurntransfermodel.Fori>1theexpectednumberofballsinurniisgivenby Ek(Fi(k+1))=Fi(k)+βkr(i−1)Fi−1(k)+(1−r)(i+1)Fi+1(k)−iFi(k), (9) whereEk(Fi(k+1))istheexpectedvalueofFi(k+1),giventhestateofthemodelatstagek,and 1−p βk= k(1−2(1−p)(1−r)) . Themotivationforthisapproximationisthatthedenominatorinthedefinitionofβk,thetotalnumberofpins,hasbeenreplacedbyitsexpectationgivenin(2).Thisisareasonableassumptionsincethenumberofpinsisthedifferencebetweentwobinomialrandomvariables, ˆkandwithhighprobabilitythiswillbeclosetoitsexpectedvalue.Weobservethatusingβ asthenormalisingfactorinsteadofβkresultsinanapproximationsimilartothatofthe“pkmodel”in[LFLW02]. ˆkandtakingtheexpectationsof(9)and(11),weobtainReplacingβkbyβ ˆkr(i−1)E(Fi−1(k))+(1−r)(i+1)E(Fi+1(k))−iE(Fi(k))E(Fi(k+1))=E(Fi(k))+βandrespectively. ˆk(1−r)2E(F2(k))−E(F1(k)),E(F1(k+1))=E(F1(k))+p+β (12)(13) Inordertosolve(12)and(13),wewouldliketoshowthatE(Fi(k))/ktendstoalimit fiasktendstoinfinity.Supposeforthemomentthatthisisthecase,then,providedtheconvergenceisfastenough,E(Fi(k+1))−E(Fi(k))tendstofi.By“fastenough”wemeanthatǫi,k+1−ǫi,kiso(1/k)forlargek,where E(Fi(k))=k(fi+ǫi,k). 6 Now,letting β=kβˆk= 1−p )gi+1 gi + (1−riβ and p 1 =1+ 1 gΓ(i+α+1+ρ) 1+ η i3 ,whereα,η,ρandCareconstants. UsingthebinomialtheoremandthefactthatΓ(x+1)=xΓ(x),weget gi−1 i+α 1+ η 1− η i3 = i+α+ρ 3=1+ρ i3 i2 +O i1 i+α+1 gi ≈ i3 ρ =1−i2 +O 1 ρα i − i3 +(1−r) 1− ρ i2 +O 1 7 (17) (19) iβ , whichsimplifiesto 1+ ρ i2 (1−r)(ρ+1)−(2r−1)α +O 1 iβ .(22) Forthistoholdforalllargeenoughvaluesofi,werequire ρ= 1 2r−1 .(24) Itisstraightforwardtoobtainmoretermsintheexpansionof(19)byamoredetailedanalysis,forexampleρ(1−r)(α+ρ+1)2η= −rα2 1−p 1 1−p .(27) 4RecognisingPinRemoval Inthissectionweshowthat,usingthemean-fieldequationsderivedinSection3,weareabletodetectwhetherpinremovalhastakenplacebyinspectingastaticsnapshotofthesystematthesteadystate. Basedon(2)and(3)wehave pins balls 1−2(1−p)(1−r) .(30) Fromthisandthefactthat∆≥φ,itfollowsthat p .φ≤ 1−2(1−p)(1−r)Using(7)and(26),thisimplies φ ρ. (31) (32) Nowsupposewearegivenρ,∆andφ.Thenfrom(26)and(30)wecanderivethefollowingequations: φ(ρ−1) p= 2 1+ φ ρ = p 1−2(1−p)(1−r) .(36) Itisevidentthat(35)and(36)canonlybeconsistentifr≈1.Thus,simulatingtheurnprocesswithp′=(ρ−1)/ρandr=1wouldresultinthesamevalueforρasthatobtainedfrompandtheoriginalvalueofr.However,inthepresenceofpinremoval,thevalueof∆(i.e.theasymptoticvalueofballs/pins)wouldbelessthanp′,theprobabilityofinsertinganewballintothefirsturn.Thisprovidesadiscriminatorbetweenthetwoprocesses.Asaresult,byexaminingthesaidsnapshotweareabletoascertainwhethertheprocessisconsistentwiththeurnmodelinwhichpinsmaybediscarded,i.e.withr<1.Weapplythisanalysisinthenextsectiontothedegreedistributionofinlinksinthewebgraph. 9 5AModelfortheEvolutionoftheWebGraph Wenowdescribeadiscretestochasticprocessforsimulatingtheevolutionofthedegreedistributionofinlinksinthewebgraph.Inthismodel,ballscorrespondtowebpagesandpinscorrespondtoinlinks.AteachtimestepthestateofthewebgraphisadirectedgraphG=(N,E),whereNisitsnodesetandEisitsarcset.InthisscenarioFi(k),i≥1,isthenumberofpages(nodes)inthewebgraphhavingiinlinks.Wenotethat,althoughwehavechosenitodenotethenumberofinlinks,icouldalternativelydenotethenumberofoutlinks,thenumberofpagesinawebsite,oranyotherreasonableparameterwewouldliketoinvestigate;see[LFLW02]forfurtherdetails. Considertheevolutionofthewebgraphwithrespecttothenumberofpageshavingiinlinksatthekthstepoftheprocess.InitiallyGcontainsjustasinglepage.Ateachsteponeofthreethingscanhappen.First,withprobabilitypanewpagehavingoneincominglinkisaddedtoG;thisisequivalenttoplacinganewballinurn1.Alternatively,withprobability1−papageischosen,theprobabilityofchoosingagivenpagebeingproportionaltoi,thenumberofinlinksthepagecurrentlyhas;thisisequivalenttopreferentiallychoosingaballfromurni.Then,withprobabilityrthechosenpagereceivesanewinlink;thisisequivalenttotransferringthechosenballfromurnitourni+1.Alternatively,withprobability1−raninlinktothepageisremoved,andifthepagehasnoinlinksremainingitisremovedfromthegraph;thisisequivalenttotransferringtheballtourni−1wheni>1,anddiscardingtheballfromthesystemifi=1. Asaproofofconcept,weusetheinlinksdatafromalargecrawlofthewebperformedduringMay1999[BKM+00];thedatasetweusedinthisanalysiswasobtainedfromRaviKumaratIBM.Afterremovingwebpageswithzeroinlinksfromthedataset,thereremainapproximately177millionpageswithatotalof1.466billioninlinks.Usingtherankedap-proach,linearregressiononthelog-logtransformeddatagivesanexponentof2.1052,whichisconsistentwiththevalueof2.1reportedin[BKM+00]andconfirmedsubsequentlyin[AB02];seealso[DKM+02]. Asdiscussedintheintroduction,wehavesomereservationsabouttheuseoftherankedapproach,asitusesranksratherthanvalues,therebyignoringthegaps.InBroderetal.’sMay1999dataset,thefirstgapappearsat3121,i.e.therewerenopagesfoundwith3121inlinks.Twopossiblereasonsforsuchagapare:(i)therewerenopagesinthewebhaving3121inlinks,or(ii)therewereoneormoresuchpagesbutthecrawldidnotcoverthesepages.Inanycase,thereisnoreasontobelievethatthewebwillnotcontainsuchapageinthefuture.Moreover,suchgaps,whichareinherentinpreferentialattachmentmodels,changeovertimeduetothegrowthofthewebgraphandthestochasticnatureoftheevolutionaryprocess. Onewaytoavoidthisissueistousetheunrankedapproachandcarryouttheregressiononlyonthedatavaluesprecedingthefirstgap.Usingthefirst3120datavaluesoftheinlinksdataset,theregressionyieldsanexponentof2.1535.InFigure1weshowthetworegressionlines,whichhavenegativeslopesof−2.1052,whenusingtherankedapproach,and−2.1535,whenusingtheunrankedapproach.(Recallthattheexponentofthepowerlawis1+ρ.)Fromtheinlinksdatawethenestimated∆andφ,obtaining∆≈balls/pins=0.1205andφ≈|urn1|/pins=0.0553.Usingtheseestimatesfor∆andφ,andρ=1.1535,wecomputedpandrfrom(33)and(34),respectively,obtainingp=0.0915andr=0.8280.Fromthe 10 109108May 1999 web dataline with slope 2.1535line with slope 2.1052107number of pages106105104103102101100100101102103104105rank of number of inlinksFigure1:Broderetal.May1999inlinkdata analysisinSection4,itisevidentthatlinkdeletionistakingplaceinthewebgraph,sincer<1.Theextentoflinkdeletionindicatedasaproportionofinsertionsanddeletionsis(1−p)(1−r)=0.156.Itwouldbeinterestingtocheckthisvalueempiricallybylookingatsequentialsnapshotsoftheweb. Tovalidateourmean-fieldanalysisweconducted10simulationrunseachfork=107stepswiththeabovevaluesofpandr.Overthe10runs,themeanfor∆was0.1197withstandarddeviation1.1×10−4,andcomputingφas|urn1|/pinsgaveameanof0.0617withstandarddeviation9.2×10−5.However,ifwecomputeφfrom(29)themeanis0.0589withstandarddeviation4.9×10−4,whichisclosertotheempiricalvalueof0.0553.Wesuggestthatthedifferencebetweenthetwoestimatesforφismainlyduetotheslowconvergenceof|urn1|/pins. Lastly,wecomputedρfromthesimulationresultsas pins witha1485MhzIntelPentium4processorand500MBofRAM,onaWindows2000platform.These1billionstepsimulations,with7000urns,eachtookover300hours.(Weconductedseveralmorerunsof1billionsteps,varyingthenumbersofurns,tovalidatetherobustnessoftheapproximation;asinglerunof2billionstepswith15000urnsgavesimilarresultstothosewereportbelow.) Forthefirst(second)runwith7000urns,wefoundthatthefirstemptyurnwasurnnumber2403(2395)andthatoveralltherewere5859(5914)non-emptyurns.Fortheunrankedapproach,linearregressiononthelog-logtransformeddataforthefirst2402(2394)urnsgaveanexponentof2.1603(2.1558).Ontheotherhand,fortherankedapproach,regressiononallthenon-emptyurnsgaveanexponentof2.1199(2.1095).Insummary,thesimulationsofourstochasticmodelareconsistentwiththeinlinksdataset,highlightingasmallbutnoticeabledifferenceintheexponentdependingonwhethertherankedorunrankedapproachisused.Infact,withourevolutionarymodelofthewebgraph,incommonwithallothers,thereisnoeasywayofhandlingthegapsand,moreover,theconceptofgapshasnomeaninginthecontextofanasymptoticmeanfieldanalysis. 6ConcludingRemarks WehavepresentedanextensionofSimon’sclassicalstochasticprocessthatallowsforpins(whichmightrepresentweblinks)tobediscarded,andhaveshownthatasymptoticallyitstillfollowsapower-lawdistribution.Givenasnapshotofasystem,themean-fieldequationsthatwehavederivedgiveestimatesoftheparameterspandr,whichcanthenbeinputtoourstochasticprocesssimulatingtheevolutionofthesystem.WeappliedouranalysistotheMay1999webcrawldata[BKM+00]todetecttheextenttowhichlinkdeletionhadtakenplace.Thevaluesofpandrthatwehaveobtainedindicatethatapproximately15%ofalllinkoperationsaredeletions.Wealsorananumberofsimulationstovalidatethemean-fieldanalysis. Aninterestingfindingthatcametolightwhenanalysingthedatawasthatthereisgoodevidencethattheexponentofthepowerlawforinlinksisinfactcloseto2.15ratherthantothewidelypublishedvalueof2.1.Althoughthedifferencebetweentheseexponentsissmall,weconsiderittobesignificantbecauseitsuggestsamorejustifiablewaytouselinearregressiontoobtainexponentsfrompowerlawdata. Itwouldbeinterestingtostudylinkdeletionthroughhistoricaldata,suchasthatprovidedbythewaybackmachine[Not02](http://www.waybackmachine.org),inordertogainmoreinsightintothedynamicaspectsoftheevolutionofthewebgraph.Inparticular,itwouldbedesirabletodeterminewhetherandhowtheexponent,andtheparameterspandr,arechangingovertime. Acknowledgements.TheauthorswouldliketothankRaviKumaratIBMforproviding uswiththeinlinksdataset.Wewouldalsoliketothanktherefereesfortheirconstructivecomments,whichhavehelpedustoimprovethepresentationoftheresults. References [AB02] R.AlbertandA.-L.Barab´asi.Statisticalmechanicsofcomplexnetworks.ReviewsofModernPhysics,74:47–97,2002. 12 [AH01][BA99][BE01] L.A.AdamicandB.A.Huberman.TheWeb’shiddenorder.CommunicationsoftheACM,44(9):55–59,2001. A.-L.Barab´asiandR.Albert.Emergenceofscalinginrandomnetworks.Science,286:509–512,1999. S.BornholdtandH.Ebel.WorldWideWebscalingexponentfromSimon’s1955model.PhysicalReviewE,64:035103–1–035104–4,2001. [BKM+00]A.Broder,R.Kumar,F.Maghoul,P.Raghavan,A.Rajagopalan,R.Stata, A.Tomkins,andJ.Wiener.GraphstructureintheWeb.ComputerNetworks,33:309–320,2000.[CF03][DdR00] C.CooperandA.M.Frieze.Ageneralmodelofwebgraphs.RandomStructuresandAlgorithms,22:311–335,2003. H.Drees,L.deHaan,andS.Resnick.HowtomakeaHillplot.AnnalsofStatistics,28:254–274,2000. [DKM+02]S.Dill,R.Kumar,K.McCurley,S.Rajagopalan,D.Sivakumar,andA.Tomkins. Self-similarityintheweb.ACMTransactionsonInternetTechnology,2:205–223,2002.[DM01][DM02][FLL02] S.N.DorogovtsevandJ.F.F.Mendes.Scalingpropertiesofscale-freeevolvingnetworks:Continuousapproach.PhysicalReviewE,35:056125,2001. S.N.DorogovtsevandJ.F.F.Mendes.Evolutionofnetworks.AdvancesinPhysics,51:1079–1187,2002. T.I.Fenner,M.Levene,andG.Loizou.Astochasticevolutionarymodelexhibitingpower-lawbehaviourwithanexponentialcutoff.CondensedMatterArchive,cond-mat/0209463,2002. P.L.KrapivskyandS.Redner.Astatisticalphysicsperspectiveonwebgrowth.ComputerNetworks,39:261–276,2002. [KR02] [LFLW02]M.Levene,T.I.Fenner,G.Loizou,andR.Wheeldon.Astochasticmodelforthe evolutionoftheWeb.ComputerNetworks,39:277–287,2002.[Man65] B.Mandelbrot.Informationtheoryandpsycholinguistics.InB.A.WolmanandE.N.Nagel,editors,ScientificPsychology:PrinciplesandApproaches,pages550–562.BasicBooks,NewYork,NY,1965. M.E.J.Newman.Thestructureandfunctionofcomplexnetworks.SIAMReview,45:167–256,2003. P.T.Nicholls.BibliometricmodelingprocessesandtheempiricalvalidityofLotka’slaw.JournaloftheAmericanSocietyofInformationScience,40:379–385,1989. G.R.Notess.Thewaybackmachine:Theweb’sarchive.March/April2002. 13 Online,26:59–61, [New03][Nic89] [Not02] [PFL+02] D.M.Pennock,G.W.Flake,S.Lawrence,E.J.Glover,andC.L.Giles.Winnersdon’ttakeall:Characterizingthecompetitionforlinksontheweb.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,99:5207–5211,2002. S.M.Ross.IntroductiontoStochasticDynamicProgramming.AcademicPress,NewYork,NY,1983. M.Schroeder.Fractals,Chaos,PowerLaws:MinutesfromanInfiniteParadise.W.H.Freeman,NewYork,NY,1991. H.A.Simon.Onaclassofskewdistributionfunctions.Biometrika,42:425–440,1955. [Ros83][Sch91][Sim55] 14 因篇幅问题不能全部显示,请点此查看更多更全内容