您好,欢迎来到意榕旅游网。
搜索
您的当前位置:首页A Stochastic Model for the Evolution of the Web Allowing Link Deletion

A Stochastic Model for the Evolution of the Web Allowing Link Deletion

来源:意榕旅游网


Birkbeck ePrints: an open access repository of the

research output of Birkbeck College

http://eprints.bbk.ac.uk Fenner, Trevor; Levene, Mark; and Loizou, George (2005) A stochastic model for the evolution of the web allowing link deletion. ACM Transactions in Internet Technology [in press]. This is an author-produced version of a paper to be published in ACM Transactions in Internet Technology (ISSN 1533-5399). This version has been peer-reviewed but does not include the final publisher proof corrections, published layout or pagination. All articles available through Birkbeck ePrints are protected by intellectual property law, including copyright law. Any use made of the contents should comply with the relevant law. Citation for this version: Fenner, Trevor; Levene, Mark; and Loizou, George (2005) A stochastic model for the evolution of the web allowing link deletion. London: Birkbeck ePrints. Available at: http://eprints.bbk.ac.uk/archive/00000268 Citation for the publisher’s version: Fenner, Trevor; Levene, Mark; and Loizou, George (2005) A stochastic model for the evolution of the web allowing link deletion. ACM Transactions in Internet Technology [in press]. http://eprints.bbk.ac.ukContact Birkbeck ePrints at lib-eprints@bbk.ac.uk

AStochasticModelfortheEvolutionoftheWeb

AllowingLinkDeletion

arXiv:cond-mat/0304316 v2 25 Mar 2004TrevorFenner,MarkLevene,andGeorgeLoizouSchoolofComputerScienceandInformationSystems

BirkbeckCollege,UniversityofLondon

LondonWC1E7HX,U.K.

{trevor,mark,george}@dcs.bbk.ac.uk

Abstract

Recentlyseveralauthorshaveproposedstochasticevolutionarymodelsforthegrowthofthewebgraphandothernetworksthatgiverisetopower-lawdistributions.Thesemodelsarebasedonthenotionofpreferentialattachmentleadingtothe“richgetricher”phenomenon.Wepresentageneralisationofthebasicmodelbyallowingdeletionofindividuallinksandshowthatitalsogivesrisetoapower-lawdistribution.Wederivethemean-fieldequationsforthisstochasticmodelandshowthat,byexaminingasnapshotofthedistributionatthesteadystateofthemodel,weareabletotellwhetheranylinkdeletionhastakenplaceandestimatetheprobabilityofdeletingalink.Applyingourmodeltoactualwebgraphdatagivessomeinsightintothedistributionofinlinksinthewebgraphandprovidesevidenceoflinkdeletionandtheextenttowhichthishasoccurred.Ouranalysisofthedataalsosuggestsapower-lawexponentofapproximately2.15ratherthanthewidelypublishedvalueof2.1.

1Introduction

f(i)=Ci−τ,

(1)

Power-lawdistributionstakingtheform

whereCandτarepositiveconstants,areabundantinnature[Sch91].Theconstantτiscalledtheexponentofthedistribution.Examplesofsuchdistributionsare:Zipf’slaw,whichstatesthattherelativefrequencyofwordsinatextisinverselyproportionaltotheirrank,Pareto’slaw,whichstatesthatthenumberofpeoplewhosepersonalincomeisaboveacertainlevelfollowsapower-lawdistributionwithanexponentbetween1.5and2,Lotka’slaw,whichstatesthatthenumberofauthorspublishingaprescribednumberofpapersisinverselyproportionaltothesquareofthenumberofpublications,andGutenberg-Richter’slaw,whichstatesthatthenumberofearthquakesoveraperiodoftimehavingacertainmagnitudeisroughlyinverselyproportionaltothemagnitude.

Recentlyseveralresearchershavedetectedasymptoticpower-lawdistributionsinthetopologyoftheWorld-Wide-Web[BKM+00,DKM+02]and,inparallel,researchersfromavarietyoffieldsaretryingtheexplaintheemergenceofthesepowerlawsintermsoftheevolutionofcomplexnetworks.See[AB02,DM02,New03]forcomprehensivesurveysofre-centworkinthisarea,detailingvariousstochasticmodelsthatcanexplaintheevolutionof

1

thewebandothernetworkssuchastheinternet,citationnetworks,collaborationnetworksandbiologicalnetworks.Acommonthemeinthesemodelsisthatofpreferentialattachment,whichresultsinthe“richgetricher”phenomenon,forexample,whennewlinkstowebpagesareaddedinproportiontothenumberofcurrentlyexistinglinkstothesepages.Relatedap-proachesarethegeneraltheoreticalmodelcoveringbothdirectedandundirectedwebgraphs[CF03],thestochasticmultiplicativeprocessinwhichnodesappearatdifferenttimesandtherateofadditionoflinksvariesbetweennodes[AH01],andthestatisticalphysicsapproachthatusestherateequationtechnique[KR02].

Toexplainsituationswherepurepreferentialattachmentmodelsfail,we[LFLW02]andothers[PFL+02]havepreviouslyproposedextensionsofthestochasticmodelfortheweb’sevolutioninwhichtheadditionoflinksisprescribedbyamixtureofpreferentialandnon-preferentialmechanisms.In[LFLW02],wedevisedageneralstochasticmodelinvolvingthetransferofballsbetweenurns;thisalsonaturallymodelsquantitiessuchasthenumbersofwebpagesinandvisitorstoawebsite,whicharenotnaturallydescribedingraph-theoreticterms.WenotethatoururnmodelisanextensionofthestochasticmodelproposedbySimoninhisvisionarypaperpublishedin1955[Sim55],whichwascouchedintermsofwordfrequenciesinatext.WealsoconsideredanalternativeextensionofSimon’smodelin[FLL02]byaddingapreferentialmechanismfordiscardingballsfromurns(correspondingtodeletingwebpages);thisresultsinanexponentialcutoffinthepower-lawdistribution.

Oururntransfermodelisastochasticprocess,inwhichateachstepwithprobabilitypanewball(whichmightrepresentawebpage)isaddedtothefirsturnandwithprobability1−paballinsomeurnismovedalongtothenexturn.Weassumethataballintheithurnhasipinsattachedtoit(whichmightrepresentweblinks).Itisknownthatthesteady-statedistributionofthismodelisapowerlaw,withexponentτ=1+1/(1−p)[Sim55,LFLW02].Asmentionedabove,thepower-lawdistributionbreaksdownwhenballsmaybediscarded,resultinginapower-lawdistributionwithanexponentialcutoff.Sothequestionarisesastowhetheritalsobreaksdownwhenremovalofpinsisallowed.Weanswerthisquestionherebyshowingthatthepower-lawdistributiondoesnotbreakdown,undertheconstraintthatmoreballsareaddedtothesystemthanareremoved.(Whentheonlyremainingpinisremovedfromaballinthefirsturnthatballisremovedfromthesystem.)Thismodelgivesamorerealisticexplanationfortheemergenceofpowerlawsincomplexnetworksthanthebasicmodelwithoutlinkdeletion(i.e.pinremoval).Considering,forexample,thewebgraph,themodificationwemaketothebasicmodelisthatafterawebpageischosenpreferentially,sayaccordingtothenumberofitsinlinks,thereisasmallprobabilitythatsomelinktothispagewillbedeleted.(Thisisequivalenttodeletingalinkchosenuniformlyatrandom.)Apossiblereasonforthismaybethatpopularwebpagescompeteforinlinks,sothatthenumberofinlinksapagehasacquiredwillfluctuatewithitspopularity.Thisisevident,forexample,intheriseandfallinthepopularityofseveralsearchengines.LinkdeletionhasalsobeenconsideredbyDorogovtsevandMendes[DM01]inthecontextofadifferentmodelinwhich,ateachtimestep,anewnodeisaddedwithjustonelink,whichpreferentiallyattachesitselftoanexistingnode.Inaddition,ateachtimestepmlinksbetweenexistingnodesareaddedtopreferentiallychosennodesandclinksbetweenexistingnodesaredeleted,againpreferentially.Theirconclusionthatlinkdeletionincreasesthepower-lawexponentisalsoobtainedhereforourstochasticmodel.

ConsiderapowerlawsuchasLotka’slaw[Nic89].Ifthisholds,thenaplotonalog-logscaleofthenumberofauthors(thefrequency)againstthenumberofpublications(thevalue)

2

shouldrevealastraightlinewithanegativeslopeofaround−2.Thereisanobviousproblemwithalog-logtransformationifanyofthefrequenciesarezero;thatistosay,whentherearevaluesv1andv2,withv1Onewayofdealingwiththeproblemofgapsissimplytoignoreallvalue-frequencypairswherethefrequencyiszero.However,ignoringgapsinthiswayseemsquestionable,sincethezerofrequencyvaluesdogiverelevantinformationaboutthedatasetandshouldnotbetreatedasmissingvalues.Ouruneasewiththisapproachisreinforcedbythefactthatcomputingtheexponentinthiswayresultsinamuchlowervalue;forexample,forthewebinlinkdata[BKM+00],itgivesavaluearound1.5,ratherthanthegenerallyacceptedfigureof2.1.

Analternativeapproachistosquashthenon-zerofrequenciesuptowardsthey-axis(i.e.thefrequencyaxis),afterignoringthezerofrequencyvalues.Thisisequivalenttoomittingfromthedatasetthosevaluesforwhichtherearenoauthorshavingthatnumberofpub-licationsandthenrankingtheremainingvaluesinincreasingorder,i.e.renumberingthevaluesstartingfromone.Thismethodalsoseemssomewhatdubious,sincethepower-lawrelationshipshouldreallyinvolvetheactualvaluesratherthantheirranks.

Thestandardtechniqueforfittingapower-lawdistributionuseslinearregressiononlog-logtransformeddata,whichisnotpossibleifgapsarepresent.Anotherapproachforhandlinggapsistoconsideronlythevaluesupuntilthefirstgap;thisapproachisonlyreasonableifthefirstgapdoesnotoccurattoolowavalue.Wecallthistheunrankedapproach,andnotethatthisapproachignoresthelargevaluesinthetailofthedistribution.Theunrankedapproachsuggestsonepossiblesolution,butotherapproaches,suchassmoothingorusingtheHillplot[DdR00],arepossible.Ad-hocapproacheshavethedisadvantagethattheyarehardtojustify,andtheHillplot,asoriginallydefined,isonlyapplicableafterthedatahasbeensorted,andthisalsoseemsdifficulttojustify.

Anothersolutionforhandlinggapsisthesecondmethoddescribedaboveofrankingthevalueswithnon-zerofrequenciesandsquashingup;wecallthistherankedapproach.Ingeneral,weshouldnotexpecttherankedandunrankedapproachestoyieldthesamepower-lawexponent.Wewouldexpecttheexponentcomputedbylinearregressiononthelog-logtransformeddatafromtheunrankedapproachtobesomewhathigher.Oneofthefindingsofthispaperisthatforinlinksinthewebgraphtherankedandunrankedapproachesleadtoasmallbutnoticeabledifferencebetweentheexponents.(Ourpreviouscommentsindicatewhywehavemisgivingswiththerankedapproach.)

Therestofthepaperisorganisedasfollows.InSection2wepresentanurntransfermodelthatextendsSimon’smodelbyallowingapin,chosenuniformlyatrandom,tobediscardedwithsomespecifiedprobability.InSection3wederivethesteadystatedistributionofthemodel,which,asstated,followsanasymptoticpowerlawlike(1).InSection4weshowthat,byexaminingasnapshotofthedistributionofballsinurnsatthesteadystate,weareabletotellwhetherremovalofpinshastakenplaceandestimatethepinremoval

3

probability.InSection5weutiliseoururnmodeltodescribeadiscretestochasticprocessthatsimulatestheevolutionofthedegreedistributionofinlinksinthewebgraph.Asaproofofconcept,weanalysetheMay1999datasetforinlinkspresentedin[BKM+00],whichweobtainedfromRaviKumaratIBM.Weareabletoshowthatourmodelisconsistentwiththedataanddeterminetheextenttowhichlinkdeletionhasoccurred.Wealsoinvestigatethediscrepancybetweentherankedandunrankedapproaches.Withtherankedapproach,usinglinearregressiononthelog-logtransformeddata,anexponentof2.1isobtained,asin[BKM+00].However,theunrankedapproachresultsinanexponentof2.15,whichisshowntobeconsistentwithourstochasticmodel.Althoughthedifferencebetween2.1and2.15maynotseemsignificant,ithasbeenremarkedin[BKM+00]that“2.1isinremarkableagreementwithearlierstudies”andin[DKM+02]thattheexponentis“reliablyaround2.1(withlittlevariation)”,whichjustifiesusinmakingadistinctionbetweentheexponentsobtainedusingtherankedandunrankedapproaches.Finally,inSection6wegiveourconcludingremarks.

2AnUrnTransferModel

Wenowpresentanurntransfermodelforastochasticprocesswithurnscontainingballs(whichmightrepresentwebpages)thathavepins(whichmightrepresenteitherinlinksoroutlinks)attachedtothem.Ourmodelallowsforpinstobediscardedwithasmallprobability.ThismodelcanbeviewedasanextensionofSimon’smodel[Sim55].WenotethatthereisacorrespondencebetweentheBarab´asiandAlbertmodel[BA99],definedintermsofnodesandlinks,andSimon’smodel,definedintermsofballsandpins,aswasestablishedin[BE01].Essentially,thecorrespondenceisobtainedbynotingthattheballsinanurncanbeviewedasanequivalenceclassofnodesallhavingthesameindegree(oroutdegree).

Weassumeacountablenumberofurns,urn1,urn2,urn3,...,whereeachballinurniisassumedtohaveipinsattachedtoit.Initiallyalltheurnsareemptyexcepturn1,whichhasoneballinit.LetFi(k)bethenumberofballsinurniatstagekofthestochasticprocess,soF1(1)=1.Then,fork≥1,atstagek+1ofthestochasticprocessoneoftwothingsmayoccur:

(i)withprobabilityp,0(ii)withprobability1−panurnisselected,withurnibeingselectedwithprobability

proportionaltoiFi(k),andaballischosenfromurni;then,

(a)withprobabilityr,0equivalenttoattachinganadditionalpintotheballchosenfromurni),or(b)withprobability1−r,theballistransferredtourni−1ifi>1,otherwise,ifi=1,

theballisdiscarded(thisisequivalenttoremovinganddiscardingapinfromtheballchosenfromurni).Inthespecialcasewhenr=1,theprocessreducestoSimon’soriginalmodel.

Wenotethatchoosingaballpreferentially(inproportiontothenumberofpins)isequiv-alenttoselectingapinuniformlyatrandomandchoosingtheballitisattachedto.Thus,withprobability(1−p)(1−r),apinchosenuniformlyatrandomisdiscarded.

4

SinceiFi(k)isthetotalnumberofpinsattachedtoballsinurni,theexpectedtotalnumberofpinsintheurnsatstagekisgivenby

󰀇󰀆kE

iFi(k)

+(k−1)(p+(1−p)r−(1−p)(1−r))i=1

󰀉

=1=1+(k−1)(1−2(1−p)(1−r)).

(2)

Correspondingly,theexpectedtotalnumberofballsintheurnsisgivenby

󰀇󰀆kkE

Fi(k)

−(1−p)(1−r)

i=1

󰀉

=1+(k−1)p󰀆−1φj,(3)

j=1

whereφj,1≤j≤k−1,istheexpectedprobabilityofchoosingurn1atstep(ii)ofstage

j+1,i.e.φj=E

󰀄

F1(j)

kk

󰀆

φj.(5)

j=1

Inordertoensurethatthereareatleastasmanypinsinthesystemasthereareballs

andthat,onaverage,moreballsareaddedtothesystemthanareremoved,werequirethefollowingconstraint,derivedfrom(2)and(3),

1−2r

(1−p)(1−r)

.(6)

Thisimplies

2(1−p)(1−r)≤1,

(7)

whichobviouslyholdsforr≥1/2.Inequality(7)expressesthefactthattheexpectednumberofpinsshouldnotbenegative,andfollowsfrom(2).

Wenotethatwecouldmodifytheinitialconditionssothat,forexample,urn1initiallycontainedδballs,δ>1,insteadofjustoneball.Itcanbeshown,fromthedevelopmentofthemodelbelow,thatanychangeintheinitialconditionswillhavenoeffectontheasymptoticdistributionoftheballsintheurnsasktendstoinfinity,providedtheprocessdoesnotterminatewithalloftheurnsempty.Morespecifically,theprobabilitythattheprocesswillnotterminatewithalltheurnsemptyisgivenby

1−

󰀊

(1−p)(1−r)

3DerivationoftheSteadyStateDistribution

FollowingSimon[Sim55],wenowstatethemean-fieldequationsfortheurntransfermodel.Fori>1theexpectednumberofballsinurniisgivenby

Ek(Fi(k+1))=Fi(k)+βkr(i−1)Fi−1(k)+(1−r)(i+1)Fi+1(k)−iFi(k),

󰀇

󰀉

(9)

whereEk(Fi(k+1))istheexpectedvalueofFi(k+1),giventhestateofthemodelatstagek,and

1−p

βk=

k(1−2(1−p)(1−r))

.

Themotivationforthisapproximationisthatthedenominatorinthedefinitionofβk,thetotalnumberofpins,hasbeenreplacedbyitsexpectationgivenin(2).Thisisareasonableassumptionsincethenumberofpinsisthedifferencebetweentwobinomialrandomvariables,

ˆkandwithhighprobabilitythiswillbeclosetoitsexpectedvalue.Weobservethatusingβ

asthenormalisingfactorinsteadofβkresultsinanapproximationsimilartothatofthe“pkmodel”in[LFLW02].

ˆkandtakingtheexpectationsof(9)and(11),weobtainReplacingβkbyβ

ˆkr(i−1)E(Fi−1(k))+(1−r)(i+1)E(Fi+1(k))−iE(Fi(k))E(Fi(k+1))=E(Fi(k))+βandrespectively.

ˆk(1−r)2E(F2(k))−E(F1(k)),E(F1(k+1))=E(F1(k))+p+β

󰀇

󰀉

󰀇

󰀉

(12)(13)

Inordertosolve(12)and(13),wewouldliketoshowthatE(Fi(k))/ktendstoalimit

fiasktendstoinfinity.Supposeforthemomentthatthisisthecase,then,providedtheconvergenceisfastenough,E(Fi(k+1))−E(Fi(k))tendstofi.By“fastenough”wemeanthatǫi,k+1−ǫi,kiso(1/k)forlargek,where

E(Fi(k))=k(fi+ǫi,k).

6

Now,letting

β=kβˆk=

1−p

)gi+1

gi

+

(1−riβ

and

p

1

=1+

1

gΓ(i+α+1+ρ)

󰀊

1+

η

i3

󰀃󰀃

,whereα,η,ρandCareconstants.

UsingthebinomialtheoremandthefactthatΓ(x+1)=xΓ(x),weget

gi−1

i+α

󰀊

1+

η

󰀃󰀃󰀊

1−

η

i3

󰀃󰀃

=

i+α+ρ

3=1+ρ

󰀃󰀃

i3

i2

+O

󰀊

i1

i+α+1

gi

i3

ρ

󰀃󰀃=1−i2

+O

󰀊

1

ρα

i

i3

󰀃󰀃

+(1−r)󰀊

1−

ρ

i2

+O

󰀊

1

7

(17)

(19)

,

whichsimplifiesto

1+

ρ

i2

󰀇

(1−r)(ρ+1)−(2r−1)α󰀉

+O

󰀊

1

.(22)

Forthistoholdforalllargeenoughvaluesofi,werequire

ρ=

1

2r−1

.(24)

Itisstraightforwardtoobtainmoretermsintheexpansionof(19)byamoredetailedanalysis,forexampleρ󰀁(1−r)(α+ρ+1)2η=

−rα2󰀂

1−p

󰀃󰀊

1

1−p

.(27)

4RecognisingPinRemoval

Inthissectionweshowthat,usingthemean-fieldequationsderivedinSection3,weareabletodetectwhetherpinremovalhastakenplacebyinspectingastaticsnapshotofthesystematthesteadystate.

Basedon(2)and(3)wehave

pins

balls

1−2(1−p)(1−r)

.(30)

Fromthisandthefactthat∆≥φ,itfollowsthat

p

.φ≤

1−2(1−p)(1−r)Using(7)and(26),thisimplies

φ

ρ.

(31)

(32)

Nowsupposewearegivenρ,∆andφ.Thenfrom(26)and(30)wecanderivethefollowingequations:

φ(ρ−1)

p=

2

󰀊

1+

φ

ρ

=

p

1−2(1−p)(1−r)

.(36)

Itisevidentthat(35)and(36)canonlybeconsistentifr≈1.Thus,simulatingtheurnprocesswithp′=(ρ−1)/ρandr=1wouldresultinthesamevalueforρasthatobtainedfrompandtheoriginalvalueofr.However,inthepresenceofpinremoval,thevalueof∆(i.e.theasymptoticvalueofballs/pins)wouldbelessthanp′,theprobabilityofinsertinganewballintothefirsturn.Thisprovidesadiscriminatorbetweenthetwoprocesses.Asaresult,byexaminingthesaidsnapshotweareabletoascertainwhethertheprocessisconsistentwiththeurnmodelinwhichpinsmaybediscarded,i.e.withr<1.Weapplythisanalysisinthenextsectiontothedegreedistributionofinlinksinthewebgraph.

9

5AModelfortheEvolutionoftheWebGraph

Wenowdescribeadiscretestochasticprocessforsimulatingtheevolutionofthedegreedistributionofinlinksinthewebgraph.Inthismodel,ballscorrespondtowebpagesandpinscorrespondtoinlinks.AteachtimestepthestateofthewebgraphisadirectedgraphG=(N,E),whereNisitsnodesetandEisitsarcset.InthisscenarioFi(k),i≥1,isthenumberofpages(nodes)inthewebgraphhavingiinlinks.Wenotethat,althoughwehavechosenitodenotethenumberofinlinks,icouldalternativelydenotethenumberofoutlinks,thenumberofpagesinawebsite,oranyotherreasonableparameterwewouldliketoinvestigate;see[LFLW02]forfurtherdetails.

Considertheevolutionofthewebgraphwithrespecttothenumberofpageshavingiinlinksatthekthstepoftheprocess.InitiallyGcontainsjustasinglepage.Ateachsteponeofthreethingscanhappen.First,withprobabilitypanewpagehavingoneincominglinkisaddedtoG;thisisequivalenttoplacinganewballinurn1.Alternatively,withprobability1−papageischosen,theprobabilityofchoosingagivenpagebeingproportionaltoi,thenumberofinlinksthepagecurrentlyhas;thisisequivalenttopreferentiallychoosingaballfromurni.Then,withprobabilityrthechosenpagereceivesanewinlink;thisisequivalenttotransferringthechosenballfromurnitourni+1.Alternatively,withprobability1−raninlinktothepageisremoved,andifthepagehasnoinlinksremainingitisremovedfromthegraph;thisisequivalenttotransferringtheballtourni−1wheni>1,anddiscardingtheballfromthesystemifi=1.

Asaproofofconcept,weusetheinlinksdatafromalargecrawlofthewebperformedduringMay1999[BKM+00];thedatasetweusedinthisanalysiswasobtainedfromRaviKumaratIBM.Afterremovingwebpageswithzeroinlinksfromthedataset,thereremainapproximately177millionpageswithatotalof1.466billioninlinks.Usingtherankedap-proach,linearregressiononthelog-logtransformeddatagivesanexponentof2.1052,whichisconsistentwiththevalueof2.1reportedin[BKM+00]andconfirmedsubsequentlyin[AB02];seealso[DKM+02].

Asdiscussedintheintroduction,wehavesomereservationsabouttheuseoftherankedapproach,asitusesranksratherthanvalues,therebyignoringthegaps.InBroderetal.’sMay1999dataset,thefirstgapappearsat3121,i.e.therewerenopagesfoundwith3121inlinks.Twopossiblereasonsforsuchagapare:(i)therewerenopagesinthewebhaving3121inlinks,or(ii)therewereoneormoresuchpagesbutthecrawldidnotcoverthesepages.Inanycase,thereisnoreasontobelievethatthewebwillnotcontainsuchapageinthefuture.Moreover,suchgaps,whichareinherentinpreferentialattachmentmodels,changeovertimeduetothegrowthofthewebgraphandthestochasticnatureoftheevolutionaryprocess.

Onewaytoavoidthisissueistousetheunrankedapproachandcarryouttheregressiononlyonthedatavaluesprecedingthefirstgap.Usingthefirst3120datavaluesoftheinlinksdataset,theregressionyieldsanexponentof2.1535.InFigure1weshowthetworegressionlines,whichhavenegativeslopesof−2.1052,whenusingtherankedapproach,and−2.1535,whenusingtheunrankedapproach.(Recallthattheexponentofthepowerlawis1+ρ.)Fromtheinlinksdatawethenestimated∆andφ,obtaining∆≈balls/pins=0.1205andφ≈|urn1|/pins=0.0553.Usingtheseestimatesfor∆andφ,andρ=1.1535,wecomputedpandrfrom(33)and(34),respectively,obtainingp=0.0915andr=0.8280.Fromthe

10

109108May 1999 web dataline with slope 2.1535line with slope 2.1052107number of pages106105104103102101100100101102103104105rank of number of inlinksFigure1:Broderetal.May1999inlinkdata

analysisinSection4,itisevidentthatlinkdeletionistakingplaceinthewebgraph,sincer<1.Theextentoflinkdeletionindicatedasaproportionofinsertionsanddeletionsis(1−p)(1−r)=0.156.Itwouldbeinterestingtocheckthisvalueempiricallybylookingatsequentialsnapshotsoftheweb.

Tovalidateourmean-fieldanalysisweconducted10simulationrunseachfork=107stepswiththeabovevaluesofpandr.Overthe10runs,themeanfor∆was0.1197withstandarddeviation1.1×10−4,andcomputingφas|urn1|/pinsgaveameanof0.0617withstandarddeviation9.2×10−5.However,ifwecomputeφfrom(29)themeanis0.0589withstandarddeviation4.9×10−4,whichisclosertotheempiricalvalueof0.0553.Wesuggestthatthedifferencebetweenthetwoestimatesforφismainlyduetotheslowconvergenceof|urn1|/pins.

Lastly,wecomputedρfromthesimulationresultsas

pins

witha1485MhzIntelPentium4processorand500MBofRAM,onaWindows2000platform.These1billionstepsimulations,with7000urns,eachtookover300hours.(Weconductedseveralmorerunsof1billionsteps,varyingthenumbersofurns,tovalidatetherobustnessoftheapproximation;asinglerunof2billionstepswith15000urnsgavesimilarresultstothosewereportbelow.)

Forthefirst(second)runwith7000urns,wefoundthatthefirstemptyurnwasurnnumber2403(2395)andthatoveralltherewere5859(5914)non-emptyurns.Fortheunrankedapproach,linearregressiononthelog-logtransformeddataforthefirst2402(2394)urnsgaveanexponentof2.1603(2.1558).Ontheotherhand,fortherankedapproach,regressiononallthenon-emptyurnsgaveanexponentof2.1199(2.1095).Insummary,thesimulationsofourstochasticmodelareconsistentwiththeinlinksdataset,highlightingasmallbutnoticeabledifferenceintheexponentdependingonwhethertherankedorunrankedapproachisused.Infact,withourevolutionarymodelofthewebgraph,incommonwithallothers,thereisnoeasywayofhandlingthegapsand,moreover,theconceptofgapshasnomeaninginthecontextofanasymptoticmeanfieldanalysis.

6ConcludingRemarks

WehavepresentedanextensionofSimon’sclassicalstochasticprocessthatallowsforpins(whichmightrepresentweblinks)tobediscarded,andhaveshownthatasymptoticallyitstillfollowsapower-lawdistribution.Givenasnapshotofasystem,themean-fieldequationsthatwehavederivedgiveestimatesoftheparameterspandr,whichcanthenbeinputtoourstochasticprocesssimulatingtheevolutionofthesystem.WeappliedouranalysistotheMay1999webcrawldata[BKM+00]todetecttheextenttowhichlinkdeletionhadtakenplace.Thevaluesofpandrthatwehaveobtainedindicatethatapproximately15%ofalllinkoperationsaredeletions.Wealsorananumberofsimulationstovalidatethemean-fieldanalysis.

Aninterestingfindingthatcametolightwhenanalysingthedatawasthatthereisgoodevidencethattheexponentofthepowerlawforinlinksisinfactcloseto2.15ratherthantothewidelypublishedvalueof2.1.Althoughthedifferencebetweentheseexponentsissmall,weconsiderittobesignificantbecauseitsuggestsamorejustifiablewaytouselinearregressiontoobtainexponentsfrompowerlawdata.

Itwouldbeinterestingtostudylinkdeletionthroughhistoricaldata,suchasthatprovidedbythewaybackmachine[Not02](http://www.waybackmachine.org),inordertogainmoreinsightintothedynamicaspectsoftheevolutionofthewebgraph.Inparticular,itwouldbedesirabletodeterminewhetherandhowtheexponent,andtheparameterspandr,arechangingovertime.

Acknowledgements.TheauthorswouldliketothankRaviKumaratIBMforproviding

uswiththeinlinksdataset.Wewouldalsoliketothanktherefereesfortheirconstructivecomments,whichhavehelpedustoimprovethepresentationoftheresults.

References

[AB02]

R.AlbertandA.-L.Barab´asi.Statisticalmechanicsofcomplexnetworks.ReviewsofModernPhysics,74:47–97,2002.

12

[AH01][BA99][BE01]

L.A.AdamicandB.A.Huberman.TheWeb’shiddenorder.CommunicationsoftheACM,44(9):55–59,2001.

A.-L.Barab´asiandR.Albert.Emergenceofscalinginrandomnetworks.Science,286:509–512,1999.

S.BornholdtandH.Ebel.WorldWideWebscalingexponentfromSimon’s1955model.PhysicalReviewE,64:035103–1–035104–4,2001.

[BKM+00]A.Broder,R.Kumar,F.Maghoul,P.Raghavan,A.Rajagopalan,R.Stata,

A.Tomkins,andJ.Wiener.GraphstructureintheWeb.ComputerNetworks,33:309–320,2000.[CF03][DdR00]

C.CooperandA.M.Frieze.Ageneralmodelofwebgraphs.RandomStructuresandAlgorithms,22:311–335,2003.

H.Drees,L.deHaan,andS.Resnick.HowtomakeaHillplot.AnnalsofStatistics,28:254–274,2000.

[DKM+02]S.Dill,R.Kumar,K.McCurley,S.Rajagopalan,D.Sivakumar,andA.Tomkins.

Self-similarityintheweb.ACMTransactionsonInternetTechnology,2:205–223,2002.[DM01][DM02][FLL02]

S.N.DorogovtsevandJ.F.F.Mendes.Scalingpropertiesofscale-freeevolvingnetworks:Continuousapproach.PhysicalReviewE,35:056125,2001.

S.N.DorogovtsevandJ.F.F.Mendes.Evolutionofnetworks.AdvancesinPhysics,51:1079–1187,2002.

T.I.Fenner,M.Levene,andG.Loizou.Astochasticevolutionarymodelexhibitingpower-lawbehaviourwithanexponentialcutoff.CondensedMatterArchive,cond-mat/0209463,2002.

P.L.KrapivskyandS.Redner.Astatisticalphysicsperspectiveonwebgrowth.ComputerNetworks,39:261–276,2002.

[KR02]

[LFLW02]M.Levene,T.I.Fenner,G.Loizou,andR.Wheeldon.Astochasticmodelforthe

evolutionoftheWeb.ComputerNetworks,39:277–287,2002.[Man65]

B.Mandelbrot.Informationtheoryandpsycholinguistics.InB.A.WolmanandE.N.Nagel,editors,ScientificPsychology:PrinciplesandApproaches,pages550–562.BasicBooks,NewYork,NY,1965.

M.E.J.Newman.Thestructureandfunctionofcomplexnetworks.SIAMReview,45:167–256,2003.

P.T.Nicholls.BibliometricmodelingprocessesandtheempiricalvalidityofLotka’slaw.JournaloftheAmericanSocietyofInformationScience,40:379–385,1989.

G.R.Notess.Thewaybackmachine:Theweb’sarchive.March/April2002.

13

Online,26:59–61,

[New03][Nic89]

[Not02]

[PFL+02]

D.M.Pennock,G.W.Flake,S.Lawrence,E.J.Glover,andC.L.Giles.Winnersdon’ttakeall:Characterizingthecompetitionforlinksontheweb.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,99:5207–5211,2002.

S.M.Ross.IntroductiontoStochasticDynamicProgramming.AcademicPress,NewYork,NY,1983.

M.Schroeder.Fractals,Chaos,PowerLaws:MinutesfromanInfiniteParadise.W.H.Freeman,NewYork,NY,1991.

H.A.Simon.Onaclassofskewdistributionfunctions.Biometrika,42:425–440,1955.

[Ros83][Sch91][Sim55]

14

因篇幅问题不能全部显示,请点此查看更多更全内容

Copyright © 2019- yrrf.cn 版权所有

违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com

本站由北京市万商天勤律师事务所王兴未律师提供法律服务