《數(shù)據(jù)庫(kù)系統(tǒng)》英文教學(xué)課件
數(shù)據(jù)庫(kù)系統(tǒng)英文教學(xué)課件,數(shù)據(jù)庫(kù)系統(tǒng),數(shù)據(jù)庫(kù),系統(tǒng),英文,教學(xué),課件
Concurrency ControlR&G-Chapter 17Smile,it is the key that fits the lock of everybodys heart.Anthony J.DAngelo,The College Blue BookReviewACID transaction semantics.Today:focus on Isolation propertySerial schedules safe but slowTry to find schedules equivalent to serial Conflicting OperationsNeed a tool to decide if 2 schedules are equivalentUse notion of“conflicting”operationsDefinition:Two operations conflict if:They are by different transactions,they are on the same object,and at least one of them is a write.Conflict Serializable SchedulesDefinition:Two schedules are conflict equivalent iff:They involve the same actions of the same transactions,andevery pair of conflicting actions is ordered the same wayDefinition:Schedule S is conflict serializable if:S is conflict equivalent to some serial schedule.Note,some“serializable”schedules are NOT conflict serializableA price we pay to achieve efficient enforcement.Conflict Serializability IntuitionA schedule S is conflict serializable if:You are able to transform S into a serial schedule by swapping consecutive non-conflicting operations of different transactions.Example:R(A)R(B)W(A)W(B)R(A)W(A)R(B)W(B)W(A)R(B)R(B)R(A)W(B)W(A)W(B)R(A)R(A)R(B)W(A)W(B)R(A)W(A)R(B)W(B)Conflict Serializability(Continued)Heres another example:Serializable or not?R(A)W(A)R(A)W(A)NOT!Dependency GraphDependency graph:One node per XactEdge from Ti to Tj if:An operation Oi of Ti conflicts with an operation Oj of Tj andOi appears earlier in the schedule than Oj.Theorem:Schedule is conflict serializable if and only if its dependency graph is acyclic.TiTjExampleA schedule that is not conflict serializable:The cycle in the graph reveals the problem.The output of T1 depends on T2,and vice-versa.T1T2ABDependency graphT1:R(A),W(A),R(B),W(B)T2:T1:R(A),W(A),R(B),W(B)T2:R(A),W(A),R(B),W(B)T1:R(A),W(A),R(B),W(B)T2:R(A),W(A),R(B),W(B)T1:R(A),W(A),R(B),W(B)T2:R(A),W(A),R(B),W(B)T1:R(A),W(A),R(B),W(B)T2:R(A),W(A),R(B),W(B)Two-Phase Locking(2PL)rules:Xact must obtain a S(shared)lock before reading,and an X(exclusive)lock before writing.Xact cannot get new locks after releasing any locks.SXSXLockCompatibilityMatrixTwo-Phase Locking(2PL),cont.2PL guarantees conflict serializabilitytime#locks heldrelease phaseacquisition phaseBut,does not prevent Cascading Aborts.Strict 2PLProblem:Cascading AbortsExample:rollback of T1 requires rollback of T2!Strict Two-phase Locking(Strict 2PL)protocol:Same as 2PL,except:Locks released only when transaction completes i.e.,either:(a)transaction has committed(commit record on disk),or (b)transaction has aborted and rollback is complete.T1:R(A),W(A),R(B),W(B),AbortT2:R(A),W(A)Strict 2PL(continued)#locks heldacquisition phasetimerelease all locks at end of xactNext.A few examplesLock_X(A)Read(A)Lock_S(A)A:=A-50Write(A)Unlock(A)Read(A)Unlock(A)Lock_S(B)Lock_X(B)Read(B)Unlock(B)PRINT(A+B)Read(B)B:=B+50Write(B)Unlock(B)Non-2PL,A=1000,B=2000,Output=?Lock_X(A)Read(A)Lock_S(A)A:=A-50Write(A)Lock_X(B)Unlock(A)Read(A)Lock_S(B)Read(B)B:=B+50Write(B)Unlock(B)Unlock(A)Read(B)Unlock(B)PRINT(A+B)2PL,A=1000,B=2000,Output=?Lock_X(A)Read(A)Lock_S(A)A:=A-50Write(A)Lock_X(B)Read(B)B:=B+50Write(B)Unlock(A)Unlock(B)Read(A)Lock_S(B)Read(B)PRINT(A+B)Unlock(A)Unlock(B)Strict 2PL,A=1000,B=2000,Output=?Venn Diagram for SchedulesAll SchedulesAvoid Cascading AbortSerialView SerializableConflict Serializable Which schedules does Strict 2PL allow?All SchedulesAvoid Cascading AbortSerialView SerializableConflict SerializableLock ManagementLock and unlock requests handled by Lock ManagerLM keeps an entry for each currently held lock.Entry contains:List of xacts currently holding lockType of lock held(shared or exclusive)Queue of lock requestsLock Management,cont.When lock request arrives:Does any other xact hold a conflicting lock?If no,grant the lock.If yes,put requestor into wait queue.Lock upgrade:xact with shared lock can request to upgrade to exclusiveLock_X(A)Lock_S(B)Read(B)Lock_S(A)Read(A)A:=A-50Write(A)Lock_X(B)ExampleDeadlocksDeadlock:Cycle of transactions waiting for locks to be released by each other.Two ways of dealing with deadlocks:preventiondetectionMany systems just punt and use TimeoutsWhat are the dangers with this approach?Deadlock DetectionCreate and maintain a“waits-for”graphPeriodically check for cycles in graphDeadlock Detection(Continued)Example:T1:S(A),S(D),S(B)T2:X(B)X(C)T3:S(D),S(C),X(A)T4:X(B)T1T2T4T3Deadlock PreventionAssign priorities based on timestamps.Say Ti wants a lock that Tj holds Two policies are possible:Wait-Die:If Ti has higher priority,Ti waits for Tj;otherwise Ti abortsWound-wait:If Ti has higher priority,Tj aborts;otherwise Ti waitsWhy do these schemes guarantee no deadlocks?Important detail:If a transaction re-starts,make sure it gets its original timestamp.-Why?SummaryCorrectness criterion for isolation is“serializability”.In practice,we use“conflict serializability,”which is somewhat more restrictive but easy to enforce.Two Phase Locking and Strict 2PL:Locks implement the notions of conflict directly.The lock manager keeps track of the locks issued.Deadlocks may arise;can either be prevented or detected.Concurrency Control Part 2R&G-Chapter 17The sequel was far better than the original!-NobodyOutlineLast time:Theory:conflict serializability,view serializabilityTwo-phase locking(2PL)Strict 2PLDealing with deadlocks(prevention,detection)Today:“advanced”locking issuesLocking granularityOptimistic Concurrency ControlLocking GranularityHard to decide what granularity to lock(tuples vs.pages vs.tables).why?Multiple-Granularity LocksShouldnt have to make same decision for all transactions!Data“containers”are nested:TuplesTablesPagesDatabasecontainsSolution:New Lock Modes,ProtocolAllow Xacts to lock at each level,but with a special protocol using new“intention”locks:Still need S and X locks,but before locking an item,Xact must have proper intension locks on all its ancestors in the granularity hierarchy.vIS Intent to get S lock(s)at finer granularity.vIX Intent to get X lock(s)at finer granularity.vSIX mode:Like S&IX at the same time.Why useful?TuplesTablesPagesDatabaseMultiple Granularity Lock ProtocolEach Xact starts from the root of the hierarchy.To get S or IS lock on a node,must hold IS or IX on parent node.What if Xact holds S on parent?SIX on parent?To get X or IX or SIX on a node,must hold IX or SIX on parent node.Must release locks in bottom-up order.Protocol is correct in that it is equivalent to directly settinglocks at the leaf levels of the hierarchy.TuplesTablesPagesDatabaseLock Compatibility MatrixvIS Intent to get S lock(s)at finer granularity.vIX Intent to get X lock(s)at finer granularity.vSIX mode:Like S&IX at the same time.ISIXSIXISIXSIXSXSX-TuplesTablesPagesDatabaseExamples 2 level hierarchyT1 scans R,and updates a few tuples:T1 gets an SIX lock on R,then get X lock on tuples that are updated.T2 uses an index to read only part of R:T2 gets an IS lock on R,and repeatedly gets an S lock on tuples of R.T3 reads all of R:T3 gets an S lock on R.OR,T3 could behave like T2;can use lock escalation to decide which.Lock escalation dynamically asks for coarser-grained locks when too manylow level locks acquiredIS IXSIXISIXSIX SXSXTuplesTablesJust So Youre Aware:Indexes2PL on B+-tree pages is a rotten idea.Why?Instead,do short locks(latches)in a clever wayIdea:Upper levels of B+-tree just need to direct traffic correctly.Dont need to be serializably handled!Different tricks to exploit thisIncluding the“rightlink”trick you peeked at in GiSTNote:this is pretty complicated!Just So Youre Aware:PhantomsSuppose you query for sailors with rating between 10 and 20,using a B+-treeTuple-level locks in the Heap FileI insert a Sailor with rating 12You do your query againYikes!A phantom!Problem:Serializability assumed a static DB!What we want:lock the logical range 10-20Imagine that lock table!What is done:set locks in indexes cleverlyRoadmapSo far:Correctness criterion:serializabilityLock-based CC to enforce serializabilityStrict 2PLDeadlocksHierarchical LockingTree latchingPhantomsNext:Alternative CC mechanism:OptimisticOptimistic CC(Kung-Robinson)Locking is a conservative approach in which conflicts are prevented.Disadvantages:Lock management overhead.Deadlock detection/resolution.Lock contention for heavily used objects.Locking is“pessimistic”because it assumes that conflicts will happen.What if conflicts are rare?We might get better performance by not locking,and instead checking for conflicts at commit time.Kung-Robinson ModelXacts have three phases:READ:Xacts read from the database,but make changes to private copies of objects.VALIDATE:Check for conflicts.WRITE:Make local copies of changes public.RVWValidationIdea:test conditions that are sufficient to ensure that no conflict occurred.Each Xact assigned a numeric id.Just use a timestamp.Assigned at end of READ phase.ReadSet(Ti):Set of objects read by Xact Ti.WriteSet(Ti):Set of objects modified by Ti.Test 1For all i and j such that Ti Tj,check that Ti completes before Tj begins.TiTjRVWRVWTest 2For all i and j such that Ti Tj,check that:Ti completes before Tj begins its Write phase ANDWriteSet(Ti)ReadSet(Tj)is empty.TiTjRVWRVWDoes Tj read dirty data?Does Ti overwrite Tjs writes?Test 3For all i and j such that Ti Tj,check that:Ti completes Read phase before Tj does ANDWriteSet(Ti)ReadSet(Tj)is empty ANDWriteSet(Ti)WriteSet(Tj)is empty.TiTjRVWRVWDoes Tj read dirty data?Does Ti overwrite Tjs writes?Applying Tests 1&2:Serial ValidationTo validate Xact T:valid=true;/S=set of Xacts that committed after Begin(T)/(above defn implements Test 1)/The following is done in critical section else Restart Tstartof critical sectionend of critical sectionComments on Serial ValidationApplies Test 2,with T playing the role of Tj and each Xact in Ts(in turn)being Ti.Assignment of Xact id,validation,and the Write phase are inside a critical section!Nothing else goes on concurrently.So,no need to check for Test 3-cant happen.If Write phase is long,major drawback.Optimization for Read-only Xacts:Dont need critical section(because there is no Write phase).Overheads in Optimistic CCRecord xact activity in ReadSet and WriteSetBookkeeping overhead.Check for conflicts during validationCritical section can reduce concurrency.Private writes have to go somewhere arbitraryCan impact sequential I/Os on read&write.Restart xacts that fail validation.Work done so far is wasted;requires clean-up.Optimistic CC vs.LockingDespite its own overheads,Optimistic CC can be better if conflicts are rareSpecial case:mostly read-only xactsWhat about the case in which conflicts are not rare?The choice is less obvious Optimistic CC vs.Locking (for xacts that tend to conflict)Locking:Delay xacts involved in conflictsRestart xacts involved in deadlocksOptimistic CC:Delay other xacts during critical section(validation+write)Restart xacts involved in conflictsObservations:Locking tends to delay xacts longer(duration of X locks usually longer than critical section for validation+write)could decrease throughputOptimistic CC tends to restart xacts more often more“wasted”resources decreased throughput if resources are scarceRule of thumb:locking wins unless you have lots ofspare resources.E.g.distributed system.Just So Youve Heard of ThemTwo more CC techniquesTimestamp CCEach xact has a timestamp.It marks it on data it touches.Restart a xact if it tries to mess with a data item from“the future”.Multiversion CCAllow objects from many timestamps to coexist.Restart a transaction if it tries to“slip in a version”that should have been seen by somebody that ran previously.SummaryLocking,contHierarchical Locking a critical extension to 2PLTree latches a critical issue in practicePhantom handling important in practiceOptimistic CC using end-of-xact“validation”Good if:Read-dominated workloadSystem has lots of extra resourcesMost DBMSs use lockingOCC used in some distributed systems,since restart resources are cheap,latency of locks expensive.GroupProjectsGuifengZhengSlide2SummaryGroupingo4ATMOSTpergroup.Tasks:oOnefreetopicproject(DBrelated)oCheckpoint1:ProposaloCheckpoint2:DBdesignoCheckpoint3:FinaloOne10min.presentationoprojectproposalofinalprojectpresentationoorotherDBrelatedtopicsSlide4CandidatetopicsImplementingaDBapplicationsystemwithfunctionsofCRUD.ImplementingasmallDBsystem,focusononeormorealgorithms,canbaseonopensourceproject,suchasC-Store(http:/db.csail.mit.edu/projects/cstore/)Participate in a national competition.Such as Imagine Cup by Microsoft with Windows Phone Projects.Can be none-DB related.Other DB-related topics.PresentationtopicsDBcoursetopics.Paperreport,fromOurtextbooks804referencesorotherpapers,picktheinterestingones.Yourprojectrelated,proposalorfinalpresent.Othertopic,myconfirmationrequired.Requirements:DueonWeek15Submit(1)PPTs+demovideo(2)Sourcecode(andthecompiledexecutablefiles)(3)Theprojectreportdocuments(includingintroduction,design,setupanddeploy,andresult,projectmanagementrecords)(4)Theindividualreportofeachteammembers(yourcontributions,andanythingelseyouwanttotalkabout)(5)votesofthetop10teams(basedontheirpresentationsandyourobservations,givecommentsof2-3sentences)(6)in-groupassessment(gradeeachotheringroup)Slide5GradepolicyBasicpointsoFinishallprojectscorrectly(w/oerror).(60points)oOntime(WEEK15).(10points)oDocuments,codes,presentation.(30points)ProposalDBdesignVotesIn-groupassessmentSlide6Gradepolicy(cont.)Bonuspoints:10oCreativeordifficulty(+2)oExtrafunctionsimplementation(+2)oFriendlyGUI(+2)oApplyadvanceddesignpatternorelse(+2)oExcellentPresentation(+2)oImagineCup(+2)oOthersSlide6Anyquestion?Welcometopostinourforum(coursehomepage)Slide7Principles of Database SystemsWelcome!Slide 2Who am ITeacher:Dr.Zheng Guifeng School of SoftwareSun Yat-Sen Uhttp:/ 3Course Homepagehttp:/ 4Who are you?Database knowledge?Computer experience?Expectation?Are you ready?Slide 5TitlePrinciples of Database SystemsSlide 6TextbookDatabase Management Systems(3rd Edition)Raghu Ramakrishnan and Johannes Gehrke.McGraw-Hill Publishing Co.2002Many other Chinese reference booksSlide 7Teaching hours51x2Slide 8Grading PolicyPrerequisites:Light programming experienceA bit of“mathematical maturity”Interest in IT/CSClass Participation+Random Quiz 20%Written Assignments 20%Midterm Exam 10%Projects 20%Final Exam 30%(may change if necessary)Must finish the term project to pass!Slide 9Intellectual Honesty and PlagiarismUniversity PolicyIllnessCollaborationPlagiarismSlide 10Any Question?Course homepagehttp:/ TAsDu Xin(杜鑫 師兄)Chen Xue(薛晨 師姐)Database SystemsLecture#1Guifeng ZHENGSchool of Software,SYSUFall,2010Slide 2Big Names in Database SystemsCompanyProductRemarksOracle Oracle 8i,9i,etc.Worlds 2nd largest software companyCEO,Larry Ellison,worlds 2nd richestIBMDB2Worlds 2nd largest after Informix acquisitionMicrosoftAccess,SQL ServerAccess comes with MS OfficeSybaseAdaptive ServerCEO John Chen,grown up in HKInformixDynamic ServerAcquired by IBM in 2001Samuel J.Palmisano Steve Ballmer John Chen Larry Ellison Slide 3What Is a Database?A large,integrated collection of datawhich models a real-world enterprise:Entitiesstudents,courses,instructors,TAs.RelationshipsHillary is currently taking DB course.Barack is currently teaching DB course.John is currently TA-ing DB course but took it last year.A Database Management System(DBMS)is a software package that stores and manages DBsSlide 4Databases are everywhere:non-webPolice stationTracking crime stats.Airline bookingsRetailers:Wal-Mart,etc.when to re-order,purchase patterns,data-miningSlide 5Databases are everywhere:webRetail:Amazon,etc.Search enginesSearchable DBs:IMDB,etc.Web2.0 sites:flickr=images+tagsCMS systems(Wikis,blog&forum software,etc.)Slide 6Databases involved in ordering a pizza?1.Pizza Huts DB2.Credit card records3.CC approval by credit agencies4.Phone companys records5.Caller IDnError-checking,anticrimeSlide 7Your wallet is full of DB recordsDrivers licenseCredit cardsSYSU CardMedical insurance cardMoney(serial numbers)Etc“You may not be interested in databases,but databases are interested in you.”-TrotskySlide 8Example of a Traditional DB AppSuppose we build a system We store:checking accountssavings accountsaccount holdersstate of each of each persons accountsSlide 9Can we do without a DBMS?Sure!Start by storing the data in files:checking.txt savings.txt customers.txtNow write C or Java programs to implement specific tasksSlide 10Doing it without a DBMS.Transfer$100 from Georges savings to checking:Read savings.txtFind&update the line w/“George”balance-=100Write savings.txtRead checking.txtFind&update the line w/“George”balance+=100Write checking.txtWrite a C program to do the following:Slide 11Problems without an DBMS.1.System crashes:Same problem even if reorderedHigh-volume (Rare frequent)2.Simultaneous access by many usersGeorge and Dick visit ATMs at same timeLock checking.txt before each usewhat is the problem?Read savings.txtFind&update the line w/“George.”Write savings.txtRead checking.txtFind&update the line w/“George”Write checking.txtCRASH!Slide 12Problems without a DBMS.3.Large data sets(100s of GBs,or TBs,)No indicesFinding“George”in huge flatfile is expensiveModifications intractable without better data structures“George”“Georgie”is very expensiveDeletions are very expensiveSlide 13Problems without an DBMS.5.Security?File system may lack security featuresFile system security may be coarse6.Application programming interface(API)?nInterfaces,interoperability7.How to query the data?Slide 14In homebrew system,must supportfailover/recoveryconcurrent usedeal with large datasets?securityinterop(互動(dòng)互動(dòng))?querying in what?DBMS as applicationQ:How does a DBMS solve these problems?A:See third part of course,but for nowSlide 15One big issue:Transaction processingGrouping of several queries(or other DB operations)into one transactionACID test properties Atomicityall or nothingConsistencyconstraints on relationshipsIsolationconcurrency controlsimulated solipsism(自閉)DurabilityCrash recoverySlide 16Atomicity&DurabilityAvoiding inconsistent stateA DBMS prevents this outcomexacts are all or nothingOne simple idea:log progress of and plans for each xactDurability:changes stay made(with log)Atomicity:entire xact is committed at onceSlide 17IsolationMany users concurrent executionDisk access is slow(compared to CPU)dont waste CPU keep runningInterweaving actions of different user programs but can lead to inconsistency:e.g.,two programs simultaneously withdraw from the same accountFor each user,should look like a single-user systemSimulated solipsismSlide 18IsolationContrast with a file in two NotepadsStrategy:ignore multiple userswhichever saves last winsfirst save is overwrittenContrast with a file in two WordsStrategy:blunt(生硬的)isolationOne can editTo the other its read-onlySlide 19ConsistencyEach xact(on a consistent DB)must leave it in a consistent statecan define integrity constraintschecks that the defined claims about the dataOnly xacts obeying them are allowedSlide 20A level up:data models(數(shù)據(jù)模型數(shù)據(jù)模型)Any DBMS uses a data model:collection of concepts for describing dataRelational data model:basically universalOracle,DB2,SQL Server,other SQL DBMSsRelations:table of rows&columnsa rels schema(關(guān)系模式關(guān)系模式)defines its fieldsThough some have OO extensionsSlide 21Data Schemas(模式模式)Schema:description of particular set of data,using some data model“Physical schema”Physical files on diskSchemaSet of relations/tables,with structureViews(“external schema”)Virtual tables(虛擬表)generated for user typesviewlogicalphysicalView definitionsLogical schema Physical schemaApplicationSlide 23Schema e.g.:college registrarSchema:Students(ssn:string,name:string,login:string,age:int,gpa:real)Courses(cid:string,cname:string,credits:int)Enrolled(sid:string,cid:string,grade:string)Physical schema:Relations stored as unordered text files.Indices on first column of each relViews:My_courses(cname:string,grade:string,credits:int)Course_info(ssn:string,name:string,status:string)Slide 24How the programmer sees the DBMSStart with SQL DDL to create tables:Continue with SQL to populate tables:CREATE TABLE Students(Name CHAR(30)SSN CHAR(9)PRIMARY KEY NOT NULL,Category CHAR(20);INSERT INTO StudentsVALUES(Hillary,123456789,undergraduate);Slide 25How the programmer sees the DBMSUltimately files,but complexStudents:Courses:Takes:Slide 26Querying:Structured Query LanguageFind all the students who have taken C20.0046:Find all the students who C20.0046 previously:Find the students names:SELECT SSN FROM TakesWHERE CID=C20.0046;SELECT SSN FROM TakesWHERE CID=C20.0046 AND Semester=Fall,2005;SELECT Name FROM Students,TakesWHERE Students.SSN=Takes.SSN AND CID=C20.0046 AND Semester=Fall,2005;Slide 27Database Industry(工業(yè),行業(yè),產(chǎn)業(yè)工業(yè),行業(yè),產(chǎn)業(yè))Relational databases are based on set theoryCommercial DBMSs:Oracle,IBMs DB2,Microsofts SQL Server,etc.Opensource:MySQL,PostgreSQL,etc.DBAs manage theseProgrammers write apps(CRUD,etc.)XML(“semi-structured data”)also importantSlide 28The Study of DBMSPrimary aspects:Data modelingSQLDB programmingDBMS implementationThis course covers all four(tho less of#4)Also will look at some more advanced areasXML,websearch,column-oriented DBs,RAID,RegExs,MapReduceSlide 29Course outlineDatabase design:Entity/Relationship modelsModeling constraintsThe relational model:Relational algebraTransforming E/R models to relational schemasSQLDDL&query languageSlide 30Course outlineProgramming for databasesSome DB implementationIndexes,sorting,xactsAdvanced topicsMay change as course progressespartly in response to audienceAlso“current events”Slashdot/whatever,Database Blog,etc.Slide 31SQL ReadingsMany SQL references available onlineGood online(free)SQL tutorials include:A Gentle Introduction to SQL(http:/ for Web Nerds(http:/ 32The Project:design end-to-end DB web appdata modelIdentify entities&their relationships relationscreation of DB in Oracle/MySQLPopulation with real(alistic)dataweb app for accessing/modifying dataIdentification of“interesting”questions&actionsProduce DBMS interfaceWork in pairs(/threes)Choose topic on your ownStart forming your group today!Slide 33Collaboration modelHomework and exams done individuallyProject done with your team members only,though can in general use any toolsNon-cited use of others problem solutions,code,etc.=plagiarismSee academic honesty policyContact me if youre at all unclear before a particular caseCite any materials used if youre at all unclear afterSlide 34On-going FeedbackDont be afraid to ask questionsSome parts will be abstract/mathematicalTopic selection will be partly based on student interestSlide 35So what is this course about,really?Languages:SQL(some XML)Data modelingSome theory!(rereading)Functional dependencies,normal formse.g.,how to find most efficient schema for dataSome DBMS implementation(algs&data structs)Algorithms and data structures(in the latter part)e.g.,indices make data much faster to find how?Lots of DB implementation and hacking for the projectSlide 36For next timeGet the bookSkim chapter 1Start reading chapter 2
收藏