搜档网
当前位置:搜档网 › Lecture 25 Advanced Topics (cont.) - Discussion of Follow-on Classes

Lecture 25 Advanced Topics (cont.) - Discussion of Follow-on Classes

Lecture 25 Advanced Topics (cont.) - Discussion of Follow-on Classes
Lecture 25 Advanced Topics (cont.) - Discussion of Follow-on Classes

https://www.sodocs.net/doc/034433893.html,

6.046J Introduction to Algorithms, Fall 2005

Please use the following citation format:

Erik Demaine and Charles Leiserson, 6.046J Introduction to

Algorithms, Fall 2005. (Massachusetts Institute of Technology: MIT

OpenCourseWare). https://www.sodocs.net/doc/034433893.html, (accessed MM DD, YYYY).

License: Creative Commons Attribution-Noncommercial-Share Alike. Note: Please use the actual date you accessed this material in your citation.

For more information about citing these materials or our Terms of Use, visit: https://www.sodocs.net/doc/034433893.html,/terms

https://www.sodocs.net/doc/034433893.html,

6.046J Introduction to Algorithms, Fall 2005

Transcript – Lecture 25

The last lecture of 6.046. We are here today to talk more about cache oblivious algorithms. Last class, we saw several cache oblivious algorithms, although none of them quite too difficult. Today we will see two difficult cache oblivious algorithms, a little bit more advanced. I figure we should do something advanced for the last class just to get to some exciting climax. So without further ado, let's get started. Last time, we looked at the binary search problem.

Or, we looked at binary search, rather. And so, the binary search did not do so well in the cache oblivious context. And, some people asked me after class, is it possible to do binary search while cache obliviously? And, indeed it is with something called static search trees. So, this is really binary search. So, I mean, the abstract problem is I give you N items, say presorted, build some static data structure so that you can search among those N items quickly.

And quickly, I claim, means log base B of N. We know that with B trees, our goal is to get log base B of N. We know that we can achieve that with B trees when we know B. We'd like to do that when we don't know B. And that's what cache oblivious static search trees achieve. So here's what we're going to do. As you might suspect, we're going to use a tree. So, we're going to store our N elements in a complete binary tree. We can't use B trees because we don't know what B is. So, we'll use a binary tree. And the key is how we lay out a binary tree. The binary tree will have N nodes. Or, you can put the data in the leaves. It doesn't really matter.

So, here's our tree. There are the N nodes. And we're storing them, I didn't say, in order, you know, in the usual way, in order in a binary tree, which makes it a binary search tree. So we now had a search in this thing. So, the search will just start at the root and a walk down some root-to-leaf path. OK, and each point you know whether to go left or to go right because things are in order. So we're assuming here that we have an ordered universe of keys.

So that's easy. We know that that will take log N time. The question is how many memory transfers? We'd like a lot of the nodes near the root to be somehow closer and one block. But we don't know what the block size is. So are going to do is carve the tree in the middle level. We're going to use divide and conquer for our layout of the tree, how we order the nodes in memory. And the divide and conquer is based on cutting in the middle, which is a bit weird.

It's not our usual divide and conquer. And we'll see this more than once today. So, when you cut on the middle level, if the height of your original tree is log N, maybe log N plus one or something, it's roughly log N, then the top half will be log N over two. And at the height of the bottom pieces will be log N over two. How many nodes will there be in the top tree? N over two? Not quite. Two to the log N over two, square root of N. OK, so it would be about root N nodes over here. And therefore, there will be about root N subtrees down here, one for each, or a couple for each leaf.

OK, so we have these subtrees of root N, and there are about root N of them. OK, this is how we are carving our tree. Now, we're going to recurse on each of the pieces. I'd like to redraw this slightly, sorry, just to make it a little bit clearer. These triangles are really trees, and they are connected by edges to this tree up here. So what we are really doing is carving in the middle level of edges in the tree. And if N is not exactly a power of two, you have to round your level by taking floors or ceilings. But you cut roughly in the middle level of edges. There is a lot of edges here. You conceptually slice there. That gives you a top tree and the bottom tree, several bottom trees, each of size roughly root N.

OK, and then we are going to recursively layout these root N plus one subtrees, and then concatenate. So, this is the idea of the recursive layout. We sought recursive layouts with matrices last time. This is doing the same thing for a tree. So, I want to recursively layout the top tree. So here's the top tree. And I imagine it being somehow squashed down into a linear array recursively. And then I do the same thing for each of the bottom trees. So here are all the bottom trees. And I squashed each of them down into some linear order. And then I concatenate those linear orders. That's the linear order of the street.

And you need a base case. And the base case, just a single node, is stored in the only order of a single node there is. OK, so that's a recursive layout of a binary search tree. It turns out this works really well. And let's quickly do a little example just so it's completely clear what this layout is because it's a bit bizarre maybe the first time you see it. So let me draw my favorite picture.

So here's a tree of height four or three depending on how you count. We divide in the middle level, and we say, OK, that's the top tree. And then these are the bottom trees. So there's four bottom trees. So there are four children hanging off the root tree. They each have the same size in this case. They should all roughly be the same size. And the first we layout the top thing where we divide on the middle level. We say, OK, this comes first. And then, the bottom subtrees come next, two and three. So, I'm writing down the order in which these nodes are stored in an array. And then, we visit this tree so we get four, five, six. And then we visit this one so we get seven, eight, nine.

And then the subtree, 10, 11, 12, and then the last subtree. So that's the order in which you store these 15 nodes. And you can build that up recursively. OK, so the structure is fairly simple, just a binary structure which we know and love, but store it in this funny order. This is not depth research order or level order, lots of natural things you might try, none of which work in cache oblivious context.

This is pretty much the only thing that works. And the intuition as, well, we are trying to mimic all kinds of B trees. So, if you want a binary tree, well, that's the original tree. It doesn't matter how you store things. If you want a tree where the branching factor is four, well, then here it is. These blocks give you a branching factor of four. If we had more leaves down here, there would be four children hanging off of that node. And these are all clustered together consecutively in memory. So, if your block size happens to be three, then this is a perfect way to store things for a block size of three.

If you're block size happens to be probably 15, right, if we count the number of, right, the number of nodes in here is 15, if you're block size happens to be 15, then

this recursion will give you a perfect blocking in terms of 15. And in general, it's actually mimicking block sizes of 2^K-1. Think powers of two. OK, that's the intuition. Let me give you the formal analysis to make it clearer. So, we claim that there are order, log base B of N memory transfers. That's what we want to prove no matter what B is. So here's what we're going to do. You may recall last time when we analyzed divide and conquer algorithms, we wrote our recurrence, and that the base case was the key.

Here, in fact, we are only going to think about the base case in a certain sense. We don't have, really, recursion in the algorithm. The algorithm is just walking down some root-to-leaf path. We only have a recursion in a definition of the layout. So, we can be a little bit more flexible. We don't have to look at our recurrence. We are just going to think about the base case. I want to imagine, you start with the big triangle. That you cut it in the middle; you get smaller triangles. Imagine the point at which you keep recursively cutting.

So imagine this process. Big triangles halve in height each time. They're getting smaller and smaller, stop cutting at the point where a triangle fits in a block. OK, and look at that time. OK, the recursion actually goes all the way, but in the analysis let's think about the point where the chunk fits in a block in one of these triangles, one of these boxes fits in a block. So, I'm going to call this a recursive level.

So, I'm imagining expanding all of the recursions in parallel. This is some level of detail, some level of refinement of the trees at which the tree you're looking at, the triangle, has size. In other words, there is a number of nodes in that triangle is less than or equal to B. OK, so let me draw a picture. So, I want to draw sort of this picture but where instead of nodes, I have little triangles of size, at most, B. So, the picture looks something like this. We have a little triangle of size, at most, B. It has a bunch of children which are subtrees of size, at most, B, the same size. And then, these are in a chunk, and then we have other chunks that look like that in recursion potentially.

OK, so I haven't drawn everything. There would be a whole bunch of, between B and B^2, in fact, subtrees, other squares of this size. So here, I had to refine the entire tree here. And then I refined each of the subtrees here and here at these levels. And then it turned out after these two recursive levels, everything fits in a block. Everything has the same size, so at some point they will all fit within a block. And they might actually be quite a bit smaller than the block. How small?

So, what I'm doing is cutting the number of levels and half at each point. And I stop when the height of one of these trees is essentially at most log B because that's when the number of nodes at there will be B roughly. So, how small can the height be? I keep dividing at half and stopping when it's, at most, log B. Log B over two. So it's, at most, log B, it's at least half log B. Therefore, the number of nodes it here could be between the square root of B and B. So, this could be a lot smaller and less than a constant factor of a block, a claim that doesn't matter. It's OK. This could be a small square root of B. I'm not even going to write that it could be a small square root of B because that doesn't play a role in the analysis. It's a worry, but it's OK essentially because our bound only involves log B. It doesn't involve B.

So, here's what we do. We know that each of the height of one of these triangles of size, at most, B is at least a half log B. And therefore, if you look at a search path, so, when we do a search in this tree, we're going to start up here. And I'm going to

mess up the diagram now. We're going to follow some path, maybe I should have drawn it going down here. We visit through some of these triangles, but it's a root-

to-node path in the tree. So, how many of the triangles could it visit? Well, the height of the tree divided by the height of one of the triangles. So, this visits, at most, log N over half log B triangles, which looks good.

This is log base B of N, mind you off factor of two. Now, what we worry about is how many blocks does a triangle occupy? One of these triangles should fit in a block. We know by the recursive layout, it is stored in a consecutive region in memory. So, how many blocks could occupy? Two, because of alignment, it might fall across the boundary of a block, but at most, one boundary. So, it fits in two blocks. So, each triangle fits in one block, but is in, at most, two blocks, memory blocks, size B depending on alignment. So, the number of memory transfers, in other words, a number of blocks we read, because all we are doing here is reading in a search, is at most two blocks per triangle.

There are this many triangles, so it's at most, 4 log base B of N, OK, which is order log base B of N. And there are papers about decreasing this constant 4 with more sophisticated data structures. You can get it down to a little bit less than two I think. So, there you go. So not quite as good as B trees in terms of the constant, but pretty good. And what's good is that this data structure works for all B at the same time. This analysis works for all B. So, we have a multilevel memory hierarchy, no problem. Any questions about this data structure? This is already pretty sophisticated, but we are going to get even more sophisticated. Next, OK, good, no questions. This is either perfectly clear, or a little bit difficult, or both. So, now, I debated with myself what exactly I would cover next. There are two natural things I could cover, both of which are complicated. My first result in the cache oblivious world is making this data structure dynamic. So, there is a dynamic B tree that's cache oblivious that works for all values of B. And it gets log base B of N, insert, delete, and search. So, this just gets search in log base B of N. That data structure, our first paper was damn complicated, and then it got simplified.

It's now not too hard, but it takes a couple of lectures in an advanced algorithms class to teach it. So, I'm not going to do that. But there you go. It exists. Instead, we're going to cover our favorite problem sorting in the cache oblivious context. And this is quite complicated, more than you'd expect, OK, much more complicated than it is in a multithreaded setting to get the right answer, anyway.

Maybe to get the best answer in a multithreaded setting is also complicated. The version we got last week was pretty easy. But before we go to cache oblivious sorting, let me talk about cache aware sorting because we need to know what bound we are aiming for. And just to warn you, I may not get to the full analysis of the full cache oblivious sorting. But I want to give you an idea of what goes into it because it's pretty cool, I think, a lot of ideas.

So, how might you sort? So, cache aware, we assume we can do everything. Basically, this means we have B trees. That's the only other structure we know. How would you sort N numbers, given that that's the only data structure you have? Right, just add them into the B tree, and then do an in-order traversal. That's one way to sort, perfectly reasonable. We'll call it repeated insertion into a B tree. OK, we know in the usual setting, and the BST sort, where you use a balanced binary search tree, like red-black trees, that takes N log N time, log N per operation, and that's an

optimal sorting algorithm in the comparison model, only thinking about comparison model here.

So, how many memory transfers does this data structure takes? Sorry, this algorithm for sorting? The number of memory transfers is a function of N, and B_M of N is? This is easy. N insertions, OK, you have to think about N order traversal. You have to remember back your analysis of B trees, but this is not too hard. How long does the insertion take, the N insertions? N log base B of N. How long does the traversal take? Less time. If we think about it, you can get away with N over B memory transfers, so quite a bit less than this. This is bigger than N, which is actually pretty bad. N memory transfers means essentially you're doing random access, visiting every element in some random order. It's even worse. There's even a log factor. Now, the log factor goes down by this log B factor. But, this is actually a really bad sorting bound.

So, unlike normal algorithms, where using a search tree is a good way to sort, in cache oblivious or cache aware sorting it's really, really bad. So, what's another natural algorithm you might try, given what we know for sorting? And, even cache oblivious, all the algorithms we've seen are cache oblivious. So, what's a good one to try? Merge sort. OK, we did merge sort in multithreaded algorithms. Let's try a merge sort, a good divide and conquer thing. So, I'm going to call it binary merge sort because it splits the array into two pieces, and it recurses on the two pieces. So, you get a binary recursion tree. So, let's analyze it. So the number of memory transfers on N elements, so I mean it has a pretty good recursive layout, right? The two subarrays that we get what we partition our array are consecutive. So, we're recursing on this, recursing on this.

So, it's a nice cache oblivious layout. And this is even for cache aware. This is a pretty good algorithm, a lot better than this one, as we'll see. But, what is the recurrence we get? So, here we have to go back to last lecture when we were thinking about recurrences for recursive cache oblivious algorithms. I mean, the first part should be pretty easy. There's an O. Well, OK, let's put the O at the end, the divide and the conquer part at the end. The recursion is 2MT of N over two, good.

All right, that's just like the merge sort recurrence, and that's the additive term that you're thinking about. OK, so normally, we would pay a linear additive term here, order N because merging takes order N time. Now, we are merging, which is three parallel scans, the two inputs and the output. OK, they're not quite parallel interleaved. They're a bit funnily interleaved, but as long as your cache stores at least three blocks, this is also linear time in this setting, which means you visit each block a constant number of times.

OK, that's the recurrence. Now, we also need a base case, of course. We've seen two base cases, one MT of B, and the other, MT of whatever fits in cache. So, let's look at that one because it's better. So, for some constant, C, if I have an array of size M, this fits in cache, actually, probably C is one here, but I'll just be careful. For some constant, this fits in cache. A problem of this size fits in cache, and in that case, the number of memory transfers is, anyone remember? We've used this base case more than once before. Do you remember?

Sorry? CM over B. I've got a big O, so M over B. Order M over B because this is the size of the data. So, I mean, just to read it all in takes M over B. Once it's in cache, it doesn't really matter what I do as long as I use linear space for the right constant

here. As long as I use linear space in that algorithm, I'll stay in cache, and therefore, not have to write anything out until the very end and I spend M over B to write it out.

OK, so I can't really spend more than M over B almost no matter what algorithm I have, so long as it uses linear space. So, this is a base case that's useful pretty much in any algorithm. OK, that's a recurrence. Now we just have to solve it. OK, let's see how good binary merge sort is. OK, and again, I'm going to just give the intuition behind the solution to this recurrence. And I won't use the substitution method to prove it formally. But this one's actually pretty simple. So, we have, at the top, actually I'm going to write it over here. Otherwise I won't be able to see. So, at the top of the recursion, we have N over B costs. I'll ignore the constants. There is probably also on additive one, which I'm ignoring here. Then we split into two problems of half the size. So, we get a half N over B, and a half N over B.

OK, usually this was N, half N, half N. You should regard it as from lecture one. So, the total on this level is N over B. The total on this level is N over B. And, you can prove by induction, that every level is N over B. The question is how many levels are there? Well, at the bottom, so, dot, dot, dot, at the bottom of this recursion tree we should get something of size M, and then we're paying M over B. Actually here we're paying M over B.

So, it's a good thing those match. They should. So here, we have a bunch of leaves, all the size M over B. You can also compute the number of leaves here is N over M. If you want to be extra sure, you should always check the leaf level. It's a good idea. So we have N over M leaves, each costing M over B. This is an M. So, this is N over B also. So, every level here is N over B memory transfers. And the number of levels is one N over B? Log N over B. Yep, that's right. I just didn't hear it right. OK, we are starting at N. We're getting down to M. So, you can think of it as log N, the whole binary tree minus the subtrees log M, and that's the same as log N over M, OK, or however you want to think about it.

The point is that this is a log base two. That's where we are not doing so great. So this is actually a pretty good algorithm. So let me write the solution over here. So, the number of memory transfers on N items is going to be the number of levels times the cost of each level. So, this is N over B times log base two of N over M, which is a lot better than repeated insertion into a B tree. Here, we were getting N times log N over log B, OK, so N log N over log B. We're getting a log B savings over not proving anything, and here we are getting a factor of B savings, N log N over B. In fact, we even made it a little bit smaller by dividing this N by M. That doesn't matter too much. This dividing by B is a big one.

OK, so we're almost there. This is almost an optimal algorithm. It's even cache oblivious, which is pretty cool. And that extra little step, which is that you should be able to get on other log B factor improvement, I want to combine these two ideas. I want to keep this factor B improvement over N log N, and I want to keep this factor log B improvement over N log N, and get them together. So, first, before we do that cache obliviously, let's do it cache aware.

So, this is the third cache aware algorithm. This one was also cache oblivious. So, how should I modify a merge sort in order to do better? I mean, I have this log base two, and I want a log base B, more or less. So, how would I do that with merge sort? Yeah? Split into B subarrays, yeah. Instead of doing binary merge sort, this is what I

was hinting at here, instead of splitting it into two pieces, and recursing on the two pieces, and then merging them, I could split potentially into more pieces. OK, and to do that, I'm going to use my cache. So the idea is B pieces. This is actually not the best thing to do, although B pieces does work. And, it's what I was hinting at because I was saying I want a log B. It's actually not quite log B. It's log M over B. OK, but let's see. So, what is the most pieces I could split into?

Right, well, I could split into N pieces. That would be good, wouldn't it, at only one recursive level? I can't split into N pieces. Why? What happens wrong when I split into N pieces? That would be the ultimate. You can't merge, exactly. So, if I have N pieces, you can't merge in cache. I mean, so in order to merge in cache, what I need is to be able to store an entire block from each of the lists that I'm merging. If I can store an entire block in cache for each of the lists, then it's a bunch of parallel scans. So this is like testing the limit of parallel scanning technology. If you have K parallel scans, and you can fit K blocks in cache, then all is well because you can scan through each of those K arrays, and have one block from each of the K arrays in cache at the same time.

So, that's the idea. Now, how many blocks can I fit in cache? M over B. That's the biggest I could do. So this will give the best running time among these kinds of merge sort algorithms. This is an M over B way merge sort. OK, so now we get somewhat better recurrence. We split into M over B subproblems now, each of size, well, it's N divided by M over B without thinking. And, the claim is that the merge time is still linear because we have barely enough, OK, maybe I should describe this algorithm. So, we divide, because we've never really done non-binary merge sort. We divide into M over B equal size subarrays instead of two. Here, we are clearly doing a cache aware algorithm.

We are assuming we know what M over B is. So, then we recursively sort each subarray, and then we conquer. We merge. And, the reason merge works is because we can afford one block in cache. So, let's call it one cache block per subarray. OK, actually, if you're careful, you also need one block for the output of the merged array before you write it out. So, it should be M over B minus one. But, let's ignore some additive constants. OK, so this is the recurrence we get. The base case is the same. And, what improves here? I mean, the per level cost doesn't change, I claim, because at the top we get N over B. This does before. Then we split into M over B subproblems, each of which costs a one over M over B factor times N over B. OK, so you add all those up, you still get N over B because we are not decreasing the number of elements. We're just splitting them. There's now M over B subproblems, each of one over M over B the size.

So, just like before, each level will sum to N over B. What changes is the number of levels because now we have bigger branching factor. Instead of log base two, it's now log base the branching factor. So, the height of this tree is log base M over B of N over M, I believe. Let me make sure that agrees with me. Yeah. OK, and if you're careful, this counts not quite the number of levels, but the number of levels minus one. So, I'm going to one plus one here. And the reason why is this is not quite the bound that I want.

So, we have log base M over B. What I really want, actually, is N over B. I claim that these are the same because we have minus, yeah, that's good. OK, this should come as rather mysterious, but it's because I know what the sorting bound should be as

I'm doing this arithmetic. So, I'm taking log base M over B of N over M. I'm not changing the base of the log. I'm just saying, well, N over M, that is N over B divided by M over B because then the B's cancel, and the M goes on the bottom. So, if I do that in the logs, I get log of N over B minus log of M over B minus, because I'm dividing. OK, now, log base M over B of M over B is one.

So, these cancel, and I get log base M over B, N over B, which is what I was aiming for. Why? Because that's the right bound as it's normally written. OK, that's what we will be trying to get cache obliviously. So, that's the height of the search tree, and at each level we are paying N over B memory transfers. So, the overall number of memory transfers for this M over B way merge sort is the sorting bound.

This is, I'll put it in a box. This is the sorting bound, and it's very special because it is the optimal number of memory transfers for sorting N items cache aware. This has been known since, like, 1983. OK, this is the best thing to do. It's a really weird bound, but if you ignore all the divided by B's, it's sort of like N times log base M of N. So, that's little bit more reasonable. But, there's lots of divided by B's. So, the number of the blocks in the input times log base the number of blocks in the cache of the number of blocks in the input. That's a little bit more intuitive. That is the bound.

And that's what we are aiming for. So, this algorithm, crucially, assume that we knew what M over B was. Now, we are going to try and do it without knowing M over B, do it cache obliviously. And that is the result of only a few years ago. Are you ready? Everything clear so far? It's a pretty natural algorithm. We were going to try to mimic it essentially and do a merge sort, but not M over B way merge sort because we don't know how.

We're going to try and do it essentially a square root of N way merge sort. If you play around, that's the natural thing to do. The tricky part is that it's hard to merge square root of N lists at the same time, in a cache efficient way. We know that if the square root of N is bigger than M over B, you're hosed if you just do a straightforward merge. So, we need a fancy merge. We are going to do a divide and conquer merge. It's a lot like the multithreaded algorithms of last week, try and do a divide and conquer merge so that no matter how many lists are merging, as long as it's less than the square root of N, or actually cubed root of N, we can do it cache efficiently, OK? So, we'll do this, we need a bit of setup. But that's where we're going, cache oblivious sorting. So, we want to get the sorting bound, and, yeah.

It turns out, to do cache oblivious sorting, you need an assumption about the cache size. This is kind of annoying, because we said, well, cache oblivious algorithms should work for all values of B and all values of M. But, you can actually prove you need an additional assumption in order to get this bound cache obliviously. That's the result of, like, last year by Garrett Brodel. So, and the assumption is, well, the assumption is fairly weak. That's the good news. OK, we've actually made an assumption several times. We said, well, assuming the cache can store at least three blocks, or assuming the cache can store at least four blocks, yeah, it's reasonable to say the cache can store at least four blocks, or at least any constant number of blocks. This is that the number of blocks that your cache can store is at least B to the epsilon blocks.

This is saying your cache isn't, like, really narrow. It's about as tall as it is wide. This actually gives you a lot of sloth. And, we're going to use a simple version of this

assumption that M is at least B^2. OK, this is pretty natural. It's saying that your cache is at least as tall as it is wide where these are the blocks. OK, the number of blocks is it least the size of a block. That's a pretty reasonable assumption, and if you look at caches these days, they all satisfy this, at least for some epsilon.

Pretty much universally, M is at least B^2 or so. OK, and in fact, if you think from our speed of light arguments from last time, B^2 or B^3 is actually the right thing to do. As you go out, I guess in 3-D, B^2 would be the surface area of the sphere out there. OK, so this is actually the natural thing of how much space you should have at a particular distance. Assuming we live in a constant dimensional space, that assumption would be true. This even allows going up to 42 dimensions or whatever, OK, so a pretty reasonable assumption. Good. Now, we are going to achieve this bound. And what we are going to try to do is use an N to the epsilon way merge sort for some epsilon. And, if we assume that M is at least B^2, the epsilon will be one third, it turns out.

So, we are going to do the cubed root of N way merge sort. I'll start by giving you and analyzing the sorting algorithms, assuming that we know how to do merge in a particular bound. OK, then we'll do the merge. The merge is the hard part. OK, so the merge, I'm going to give you the black box first of all. First of all, what does merge do? The K way merger is called the K funnel just because it looks like a funnel, which you'll see. So, a K funnel is a data structure, or is an algorithm, let's say, that looks like a data structure. And it merges K sorted lists. So, supposing you already have K lists, and they're sorted, and assuming that the lists are relatively long, so we need some additional assumptions for this black box to work, and we'll be able to get them as we sort.

We want the total size of those lists. You add up all the elements, and all the lists should have size at least K^3 is the assumption. Then, it merges these lists using essentially the sorting bound. Actually, I should really say theta K^3. I also don't want to be too much bigger than K^3. Sorry about that. So, the number of memory transfers that this funnel merger uses is the sorting bound on K^3, so K^3 over B, log base M over B of K^3 over B, plus another K memory transfers.

Now, K memory transfers is pretty reasonable. You've got to at least start reading each list, so you got to pay one memory transfer per list. OK, but our challenge in some sense will be getting rid of this plus K. This is how fast we can merge. We'll do that after. Now, assuming we have this, let me tell you how to sort. This is, eventually enough, called funnel sort. But in a certain sense, it's really cubed root of N way merge sort. OK, but we'll analyze it using this. OK, so funnel sort, we are going to define K to be N to the one third, and apply this merger. So, what do we do? It's just like here. We're going to divide our array into N to the one third.

I mean, it they should be consecutive subarrays. I'll call them segments of the array. OK, for cache oblivious, it's really crucial how these things are laid out. We're going to cut and get consecutive chunks of the array, N to the one third of them. Then I'm going to recursively sort them, and then I'm going to merge. OK, and I'm going to merge using the K funnel, the N to the one third funnel because, now, why do I use one third? Well, because of this three. OK, in order to use the N to the one third funnel, I need to guarantee that the total number of elements that I'm merging is at least the cube of this number, K^3. The cube of this number is N. That's exactly how many elements I have in total.

OK, so this is exactly what I can apply the funnel. It's going to require that I have it least K^3 elements, so that I can only use an N to the one third funnel. I mean, if it didn't have this requirement, I could just say, well, I have N lists each of size one. OK, that's clearly not going to work very well for our merger, I mean, intuitively because this plus K will kill you. That will be a plus M which is way too big.

But we can use an N to the one third funnel, and this is how we would sort. So, let's analyze this algorithm. Hopefully, it will give the sorting bound if I did everything correctly. OK, this is pretty easy. The only thing that makes this messy as I have to write the sorting bound over and over. OK, this is the cost of the merge. So that's at the root. But K^3 in this case is N. So at the root of the recursion, let me write the recurrence first. Sorry. So, we have memory transfers on N elements is N to the one third. Let me get this right. Yeah, N to the one third recursions, each of size N to the two thirds, OK, plus this time, except K^3 is N.

So, this is plus N over B, log base M over B of N over B plus cubed root of M. This is additive plus K term. OK, so that's my recurrence. The base case will be the usual. MT is some constant times M is order M over B. So, we sort of know what we should get here. Well, not really. So, in all the previous recurrence is, we have the same costs at every level, and that's where we got our log factor. Now, we already have a log factor, so we better not get another one. Right, this is the bound we want to prove. So, let me cheat here for a second.

All right, indeed. You may already be wondering, this N to the one third seems rather large. If it's bigger than this, we are already in trouble at the very top level of the recursion. So, I claim that that's OK. Let's look at N to the one third. OK, there is a base case here which covers all values of N that are, at most, some constant times N. So, if I'm in this case, I know that N is at least as big as the cache up to some constant.

OK, now the cache is it least B^2, we've assumed. And you can do this with B to the one plus epsilon if you're more careful. So, N is at least B^2, OK? And then, I always have trouble with these. So this means that N divided by B is omega root N. OK, there's many things you could say here, and only one of them is right. So, why? So this says that the square root of N is at least B, and so N divided by B is at most N divided by square root of N. So that's at least the square root of N if you check that all out. I'm going to go through this arithmetic relatively quickly because it's tedious but necessary. OK, the square root of N is strictly bigger than cubed root of N. OK, so that means that N over B is strictly bigger than N to the one third.

Here we have N over B times something that's bigger than one. So this term definitely dominates this term in this case. As long as I'm not in the base case, I know N is at least order M. This term disappears from my recurrence. OK, so, good. That was a bit close. Now, what we want to get is this running time overall. So, the recursive cost better be small, better be less than the constant factor increase over this.

So, let's write the recurrence. So, we get N over B, log base M over B, N over B at the root. Then, we split into a lot of subproblems, N to the one third subproblems here, and each one costs essentially this but with N replaced by N to the two thirds. OK, so N to the two thirds log base M over B, oops I forgot to divide it by B out here, of N to the two thirds divided by B. That's the cost of one of these nodes, N to the one third of them. What should they add up to? Well, there is N to the one third, and

there's an N to the two thirds here that multiplies out to N. So, we get N over B. This looks bad. This looks the same. And we don't want to lose another log factor. But the good news is we have two thirds in here.

OK, this is what we get in total at this level. It looks like the sorting bound, but in the log there's still a two thirds. Now, a power of two thirds and a log comes out as a multiple of two thirds. So, this is in fact two thirds times N over B, log base M over B of N over B, the sorting bound. So, this is two thirds of the sorting bound. And this is the sorting bound, one times the sorting bound.

So, it's going down geometrically, yea! OK, I'm not going to prove it, but it's true. This went down by a factor of two thirds. The next one will also go down by a factor of two thirds by induction. OK, if you prove it at one level, it should be true at all of them. And I'm going to skip the details there. So, we could check the leaf level just to make sure. That's always a good sanity check. At the leaves, we know our cost is M over B.

OK, and how many leaves are there? Just like before, in some sense, we have N/M leaves. OK, so in fact the total cost at the bottom is N over B. And it turns out that that's what you get. So, you essentially, it looks funny, because you'd think that this would actually be smaller than this at some intuitive level. It's not. In fact, what's happening is you have this N over B times this log thing, whatever the log thing is. We don't care too much. Let's just call it log.

What you are taking at the next level is two thirds times that log. And at the next level, it's four ninths times that log and so on. So, it's geometrically decreasing until the log gets down to one. And then you stop the recursion. And that's what you get N over B here with no log. So, what you're doing is decreasing the log, not the N over B stuff. The two thirds should really be over here. In fact, the number of levels here is log, log N.

It's the number of times you have to divide a log by three halves before you get down to one, OK? So, we don't actually need that. We don't care how many levels are because it's geometrically decreasing. It could be infinitely many levels. It's geometrically decreasing, and we get this as our running time. MT of N is the sorting bound for funnel sort. So, this is great. As long as we can get a funnel that merges this quickly, we get a sorting algorithm that sorts as fast as it possibly can. I didn't write that on the board that this is asymptotically optimal. Even if you knew what B and M were, this is the best that you could hope to do. And here, we are doing it no matter what, B and M are.

Good. Get ready for the funnel. The funnel will be another recursion. So, this is a recursive algorithm in a recursive algorithm. It's another divide and conquer, kind of like the static search trees we saw at the beginning of this lecture. So, these all tie together. All right, the K funnel, so, I'm calling it K funnel because I want to think of it at some recursive level, not just N to the one third. OK, we're going to recursively use, in fact, the square root of K funnel. So, here's, and I need to achieve that bound. So, the recursion is like the static search tree, and a little bit hard to draw on one board, but here we go.

So, we have a square root of K funnel. Recursively, we have a buffer up here. This is called the output buffer, and it has size K^3, and just for kicks, let's suppose it that filled up a little bit. And, we have some more buffers. And, let's suppose they've

been filled up by different amounts. And each of these has size K to the three halves, of course. Halves, these are called buffers, let's say, with the intermediate buffers. And, then hanging off of them, we have more funnels, the square root of K funnel here, and a square root of K funnel here, one for each buffer, one for each child of this funnel.

OK, and then hanging off of these funnels are the input arrays. OK, I'm not going to draw all K of them, but there are K input arrays, input lists let's call them down at the bottom. OK, so the idea is we are going to merge bottom-up in this picture. We start with our K input arrays of total size at least K^3. That's what we're assuming we have up here. We are clustering them into groups of size square root of K, so, the square root of K groups, throw each of them into a square root of K funnel that recursively merges those square root of K lists. The output of those funnels we are putting into a buffer to sort of accumulate what the answer should be. These buffers have besides exactly K to the three halves, which might not be perfect because we know that on average, there should be K to the three halves elements in each of these because there's K^3 total, and the square root of K groups.

So, it should be K^3 divided by the square root of K, which is K to the three halves on average. But some of these will be bigger. Some of them will be smaller. I've drawn it here. Some of them had emptied a bit more depending on how you merge things. But on average, these will all fill at the same time. And then, we plug them into a square root of K funnel, and that we get the output of size K^3. So, that is roughly what we should have happen.

OK, but in fact, some of these might fill first, and we have to do some merging in order to empty a buffer, make room for more stuff coming up. That's the picture. Now, before I actually tell you what the algorithm is, or analyze the algorithm, let's first just think about space, a very simple warm-up analysis. So, let's look at the space excluding the inputs and outputs, those buffers. OK, why do I want to exclude input and output buffers? Well, because I want to only count each buffer once, and this buffer is actually the input to this one and the output to this one. So, in order to recursively count all the buffers exactly once, I'm only going to count these middle buffers. And then separately, I'm going to have to think of the overall output and input buffers. But those are sort of given. I mean, I need K^3 for the output. I need K^3 for the input. So ignore those overall. And that if I count the middle buffers recursively, I'll get all the buffers.

So, then we get a very simple recurrence for space. S of K is roughly square root of K plus one times S of square root of K plus order K^2, K^2 because we have the square root of K of these buffers, each of size K to the three halves. Work that out, does that sound right? That sounds an awful lot like K^3, but maybe, all right. Oh, no, that's right. It's K to the three halves times the square root of K, which is K to the three halves plus a half, which is K to the four halves, which is K^2. Phew, OK, good. I'm just bad with my arithmetic here. OK, so K^2 total buffering here. You add them up for each level, each recursion, and the plus one here is to take into account the top guy, the square root of K bottom guys, so the square root of K plus one.

If this were, well, let me just draw the recurrence tree. There's many ways you could solve this recurrence. A natural one is instead of looking at K, you look at log K, because here at log K is getting divided by two. I just going to draw the recursion trees, so you can see the intuition. But if you are going to solve it, you should

probably take the logs, substitute by log. So, we have the square root of K. plus one branching factor.

And then, the problem is size square root of K, so this is going to be K, I believe, for each of these. This is square root of K squared is the cost of these levels. And, you keep going. I don't particularly care what the bottom looks like because at the top we have K^2. That we have K times root K plus one cost at the next level. This is K to the three halves plus K. OK, so we go from K^2 to K to the three halves plus K. This is a super-geometric. It's like an exponential geometric decrease. This is decreasing really fast. So, it's order K^2. That's my hand-waving argument. OK, so the cost is basically the size of the buffers at the top level, the total space. We're going to need this. It's actually theta K^2 because I have a theta K^2 here.

We are going to be this in order to analyze the time. That's why it mentioned it. It's not just a good feeling that the space is not too big. In fact, the funnel is a lot smaller than a total input size. The input size is K^3. But that's not so crucial. What's crucial is that it's K^2, and we'll use that in the analysis. OK, naturally, this thing is laid out recursively. You recursively store the funnel, top funnel. Then, for example, you write out each buffer as a consecutive array, in this case.

There's no recursion there. So just write them all out one by one. Don't interleave them or anything. Store them in order. And that, you write out recursively these funnels, the bottom funnels. OK, any way you do it recursively, as long as each funnel remains a consecutive chunk of memory, each buffer remains a consecutive chuck of memory, the time analysis that we are about to do will work.

OK, let me actually give you the algorithm that we're analyzing. In order to make the funnel go, what we do is say, initially, all the buffers are empty. Everything is at the bottom. And what we are going to do is, say, fill the root buffer. Fill this one. And, that's a recursive algorithm, which I'll define in a second, how to fill a buffer. Once

it's filled, that means everything has been pulled up, and then it's merged. OK, so that's how we get started. So, merge means to merge algorithm is fill the topmost buffer, the topmost output buffer.

OK, and now, here's how you fill a buffer. So, in general, if you expand out this recursion all the way, in the base case, I didn't mention you sort of get a little node there. So, if you look at an arbitrary buffer in this picture that you want to fill, so this one's empty and you want to fill it, then immediately below it will be a vertex who has two children, two other buffers. OK, maybe they look like this. You have no idea how big they are, except they are the same size. It could be a lot smaller than this one, a lot bigger, we don't know. But in the end, you do get a binary structure out of this just like we did with the binary search tree at the beginning. So, how do we fill this buffer? Well, we just merge these two child buffers as long as we can.

So, we merge the two children buffers as long as they are both non-empty. So, in general, the invariant will be that this buffer, let me write down a sentence. As long as a buffer is non-empty, or whatever is in that buffer, and hasn't been used already, it's a prefix of the merged output of the entire subtree beneath it. OK, so this is a partially merged subsequence of everything down here. This is a partially merged subsequence of everything down here. I can just merge element by element off the top, and that will give me outputs to put there until one of them gets emptied. And, we have no idea which one will empty first just because it depends on

the order. OK, whenever one of them empties, we recursively fill it, and that's it. That's the algorithm.

Whenever one empties ----we recursively fill it. And at the base case at the leaves, there's sort of nothing to do. I believe you just sort of directly read from an input list. So, at the very bottom, if you have some note here that's trying to merge between these two, that's just a straightforward merge between two lists. We know how to do that with two parallel scans. So, in fact, we can merge the entire thing here and just spit it out to the buffer. Well, it depends how big the buffer is. We can only merge it until the buffer fills.

Whenever a buffer is full, we stop and we pop up the recursive layers. OK, so we keep doing this merge until the buffer we are trying to fill fills, and that we stop, pop up. OK, that's the algorithm for merging. Now, we just have to analyze the algorithm. It's actually not too hard, but it's a pretty clever analysis. And, to top it off, it's an amortization, your favorite. OK, so we get one last practice at amortized analysis in the context of cache oblivious algorithms. So, this is going to be a bit sophisticated. We are going to combine all the ideas we've seen. The main analysis idea we've seen is that we are doing this recursion in the construction, and if we imagine, we take our K funnel, we split it in the middle level, make a whole bunch of square root of K funnels, and so on, and then we cut those in the middle level, get fourth root of K funnels, and so on, and so on, at some point the funnel we look at fits in cache.

OK, before we said if it's in a block. Now, we're going to say that at some point, one of these funnels will fit in cache. Each of the funnels at that recursive level of detail will fit in cache. We are going to analyze that level. We'll call that level J. So, consider the first recursive level of detail, and I'll call it J, at which every J funnel we have fits, let's say, not only does it fit in cache, but four of them fit in cache.

It fits in one quarter of the cache. OK, but we need to leave some cache extra for doing other things. But I want to make sure that the J funnel fits. OK, now what does that mean? Well, we've analyzed space. We know that the space of a J funnel is about J^2, some constant times J^2. We'll call it C times J^2. OK, so this is saying that C times J^2 is at most, M over 4, one quarter of the cache.

OK, that means a J funnel that happens at the size sits in the quarter of the cache. OK, at some point in the recursion, we'll have this big tree of J funnels, with all sorts of buffers in between them, and each of the J funnels will fit. So, let's think about one of those J funnels. Suppose J is like the square root of K. So, this is the picture because otherwise I have to draw a bigger one. So, suppose this is a J funnel. It has a bunch of input buffers, has one output buffer.

So, we just want to think about how the J funnel executes. And, for a long time, as long as these buffers are all full, this is just a merger. It's doing something recursively, but we don't really care. As soon as this whole thing swaps in, and actually, I should be drawing this, as soon as the funnel, the output buffer, and the input buffer swap in, in other words, you bring all those blocks in, you can just merge, and you can go on your merry way merging until something empties or you fill the output.

So, let's analyze that. Suppose everything is in memory, because we know it fits. OK, well I have to be a little bit careful. The input buffers are actually pretty big in

total size because the total size is K to the three halves here versus K to the one half. Actually, this is of size K. Let me draw a general picture. We have a J funnel, because otherwise the arithmetic is going to get messy.

We have a J funnel. Its size is C times J^2, we're supposing. The number of inputs is J, and the size of them is pretty big. Where did we define that? We have a K funnel. The total input size is K^3. So, the total input size here would be J^3. We can't afford to put all that in cache. That's an extra factor of J. But, we can afford to one block per input. And for merging, that's all we need. I claim that I can fit the first block of each of these input arrays in cash at the same time along with the J funnel. And so, for that duration, as long as all of that is in cache, this thing can merge at full speed just like we were doing parallel scans. You use up all the blocks down here, and one of them empties. You go to the next block in the input buffer and so on, just like the normal merge analysis of parallel arrays, at this point we assume that everything here is fitting in cache.

So, it's just like before. Of course, in fact, it's recursive but we are analyzing it at this level. OK, I need to prove that you can fit one block per input. It's not hard. It's just computation. And, it's basically the way that these funnels were designed was so that you could fit one block per input buffer. And, here's the argument. So, the claim is you can also fit one memory block in the cache per input buffer. So, this is in addition to one J funnel. You could also fit one block for each of its input buffers. OK, this is of the J funnel.

It's not any funnel because bigger funnels are way too big. OK, so here's how we prove that. J^2 is at most a quarter M. That's what we assumed here, actually CJ2. I'm not going to bother with the C because that's going to make my life even harder. OK, I think this is even a weaker constraint. So, the size of our funnel proves about J^2. That's at most a quarter of the cache. That implies that J, if we take square roots of both sides, is at most a half square root of M. OK, also, we know that B is at most square root of M because M is at least B squared. So, we put these together, and we get J times B is at most a half M.

OK, now I claim that what we are asking for here is J times B because in a J funnel, there are J input arrays. And so, if you want one block each, that costs a space of B each. So, for each input buffer we have one block of size B, and the claim is that that whole thing fits in half the cache. And, we've only used a quarter of the cache. So in total, we use three quarters of the cache and that's all we'll use. OK, so that's good news. We can also fit one more block to the output. Not too big a deal.

So now, as long as this J funnel is running, if it's all in cache, all is well. What does that mean? Let me first analyze how long it takes for us to swap in this funnel. OK, so how long does it take for us to read all the stuff in a J funnel and one block per input buffer? That's what it would take to get started. So, this is swapping in a J funnel, which means reading the J funnel in its entirety, and reading one block per input buffer. OK, the cost of the swap in is pretty natural. The size of the buffer divided by B, because that's just sort of a linear scan to read it in, and we need to read one block per buffer.

These buffers could be all over the place because they're pretty big. So, let's say we pay one memory transfer for each input buffer just to get started to read the first block. OK, the claim is, and here we need to do some more arithmetic. This is, at most, J^3 over B. OK, why is it, at most, J^3 over B? Well, this was the first level at

which things fit in cache. That means the next level bigger, which is J^2, which has size J^4, should be bigger than cache. Otherwise we would have stopped then. OK, so this is just more arithmetic. You can either believe me or follow the arithmetic. We know that J^4 is at least M. So, this means that, and we know that M is at least B^2. Therefore, J^2, instead of J^4, we take the square root of both sides, J^2 is at least B.

OK, so certainly J^2 over B is at most J^3 over B. But also J is at most J^3 over B because J^2 is at least B. Hopefully that should be clear. That's just algebra. OK, so we're not going to use this bound because that's kind of complicated. We're just going to say, well, it causes J^3 over B to get swapped in. Now, why is J^3 over B a good thing? Because we know the total size of inputs to the J funnel is J^3. So, to read all of the inputs to the J funnel takes J^3 over B. So, this is really just a linear extra cost to get the whole thing swapped in. It sounds good. To do the merging would also cost J^3 over B. So, the swap-in causes J^3 over B to merge all these

J^3 elements. If they were all there in the inputs, it would take J^3 over B because once everything is there, you're merging at full speed, one per B items per memory transfer on average.

OK, the problem is you're going to swap out, which you may have imagined. As soon as one of your input buffers empties, let's say this one's almost gone, as soon as it empties, you're going to totally obliterate that funnel and swap in this one in order to merge all the stuff there, and fill this buffer back up. This is where the amortization comes in. And this is where the log factor comes in because so far it we've basically paid a linear cost.

We are almost done. So, we charge, sorry, I'm jumping ahead of myself. So, when an input buffer empties, we swap out. And we recursively fill that buffer. OK, I'm going to assume that there is absolutely no reuse, that is recursive filling completely swapped everything out and I have to start from scratch for this funnel. So, when that happens, I feel this buffer, and then I come back and I say, well, I go swap it back in. So when the recursive call finishes, I swap back in. OK, so I recursively fill, and then I swap back in.

And, at the swapping back in costs J^3 over B. I'm going to charge that cost to the elements that just got filled. So this is an amortized charging argument. How many are there? It's the only question. It turns out, things are really good, like here, for the square root of K funnel, we have each buffer has size K to the three halves. OK, so this is a bit complicated. But I claim that the number of elements here that fill the buffer is J^3. So, if you have a J funnel, each of the input buffers has size J^3. It should be correct if you work it out. So, we're charging this J^3 over B cost to J^3 elements, which sounds like you're charging, essentially, one over B to each element.

Sounds great. That means that, so you're thinking overall, I mean, there are N elements, and to each one you charge a one over B cost. That sounds like the total running time is N over B. It's a bit too fast for sorting. We lost the log factor. So, what's going on is that we're actually charging to one element more than once. And, this is something that we don't normally do, never done it in this class, but you can do it as long as you bound that the number of times you charge. OK, and wherever you do a charging argument, you say, well, this doesn't happen too many times because whenever this happens, this happens. You should say, you should prove that the thing that you're charging to, Ito charged to that think very many times. So

here, I have a quantifiable thing that I'm charging to: elements. So, I'm saying that for each element that happened to come into this buffer, I'm going to charge it a one over B cost.

How many times does one element get charged? Well, each time it gets charged to, it's moved into a new buffer. How many buffers could it move through? Well, it's just going up all the time. Merging always goes up. So, we start here and you go to the next buffer, and you go to the next buffer. The number of buffers you visit is the right log, it turns out. I don't know which log that is. So, the number of charges of a one over B cost to each element is the number of buffers it visits, and that's a log factor. That's where we get an extra log factor on the running time. It is, this is the number of levels of J funnels that you can visit. So, it's log K divided by log J, if I got it right.

OK, and we're almost done. Let's wrap up a bit. Just a little bit more arithmetic, unfortunately. So, log K over log J. Now, J^2 is like M, roughly. It might be square root of M. But, log J is basically log M. There's some constants there. So, the number of charges here is theta, log K over log M. So, now this is a bit, we haven't seen this in amortization necessarily, but we just need to count up total amount of charging. All work gets charged to somebody, except we didn't charge the very initial swapping in to everybody. But, every time we do some swapping in, we charge it to someone. So, how many times does everything it charged? Well, there are N elements. Each gets charged to a one over B cost, and the number of times it gets charged is its log K over log M.

So therefore, the total cost is number of elements times a one over B times this log thing. OK, it's actually plus K. We forgot about a plus K, but that's just to get started in the very beginning, and start on all of the input lists. OK, this is an amortization analysis to prove this bound. Sorry, what was N here? I assumed that I started out with K cubed elements at the bottom. The total number of elements in the bottom was K^3 theta. OK, so I should have written K^3 not M. This should be almost the same as this, OK, but not quite. This is log based M of K, and if you do a little bit of arithmetic, this should be K^3 over B times log base M over B of K over B plus K. That's what I want to prove. Actually there's a K^3 here instead of a K, but that's just a factor of three. And this follows because we assume we are not in the base case. So, K is at least M, which is at least B^2, and therefore K over B is omega square root of K. OK, so K over B is basically the same as K when you put it in a log. So here we have log base M. I turned it into log base M over B. That's even worse. It doesn't matter. And, I have log of K. I replaced it with K over B, but K over B is basically square root of K. So in a log, that's just a factor of a half.

So that concludes the analysis of the funnel. We get this crazy running time, which is basically sorting bound plus a little bit. We plug that into our funnel sort, and we get, magically, optimal cache oblivious sorting just in time. Tuesday is the final. The final is more in the style of quiz one, so not too much creativity, mostly mastery of material. It covers everything. You don't have to worry about the details of funnel sort, but everything else. So it's like quiz one but for the entire class.

It's three hours long, and good luck. It's been a pleasure having you, all the students. I'm sure Charles agrees, so thanks everyone. It was a lot of fun.

数据库系统原理课后答案 第九章

9.1 名词解释 (1)OODBS:是指面向对象数据库系统,它既具数据库管理的基本功能,又能支持面向对象的数据模型。 (2)ORDBS:基于对象关系数据模型的DBS称为对象关系数据库系统(ORDBS)。 (3)平面关系模型:传统的关系模型称为“平面关系模型”,它要求关系模式具有第一范式(1NF)性质,关系具有规范化的结构。也就是规定属性值是不可分解的,即不允许属性值具有复合结构(元组或关系)。 (4)嵌套关系模型:是从平面关系模型发展而成的。它允许关系的属性值又可以是一个关系,而且可以出现多次嵌套。嵌套关系突破了1NF的定义框架,是“非1NF关系”。 (5)复合对象模型:在嵌套关系模型上进一步放宽要求。在关系定义上,集合与元组不再有交替出现的严格限制,此时的关系中,属性类型可以是基本数据类型、结构类型(元组类型)或集体类型(即关系类型)。 (6)数据的泛化/细化:是对概念之间联系进行抽象的一种方法。当在较低层上的抽象表达了 与之联系的较高层上抽象的特殊情况时,就称较高层上抽象是较低层上抽象的"泛化",而较低层上抽象是较高层上抽象的"细化"。 (7)对象关系模型:在传统关系数据基础上,提供元组、数组、集合等更为丰富的数据类型及处理新数据类型操作的能力而形成的数据模型。(注:传统关系模型只支持字符、数值、字串,布尔值等等基本数据类型及其处理功能) (8)类型级继承性:当继承性发生在类型级时,子类型继承了超类型的属性。也就是说,超类型所具有的属性,在子类上也具有。 (9)表级继承性:继承性也可发生在表级,(就是元组集合上发生继承),子表继承超表全部属性,超表中每个元组最多可以与子表中一个元组对应,而子表中的每个元组在超表中恰有一个元组对应,并在继承的属性值上具有相同的值。 (10)引用类型:数据类型可以嵌套定义,在嵌套引用时,不是引用对象本身,而是个用对象标识符(即指针),这种指针被称为引用类型。 (11)对象:客观世界中的实体经过抽象称为问题空间中的对象,它是对一组信息及其操作的描述。 (12)类:是具有相同的变量名和类型、相同的消息和使用方法的对象的集合。 (13)单重继承性:一个子类继承某一个超类的结构和特性,称为单重继承性。 (14)多重继承性:一个子类继承多个超类的结构和特性,称为多重继承性。 (15)对象标识:在面向对象语言中,对象标识是一个指针一级的概念,在对象创建的瞬间,由系统赋给每个对象一个“标识”,即系统内的一个唯一的指针,在对象生存期内,这个标识不可改变。 (16)对象包含:不同类的对象之间存在的包含关系称为对象包含。包含是一种“一部分”(is part of)的联系。 (17)类继承层次图:表示类继承关系的图,由超类名、子类名和一组线条自上而下有序的表示。(18)类包含层次图:表示对象包含关系的图,由一些具有包含关系的对象和线条自上而下表示(下方的对象为其连线所指上方对象的一部分)。 (19)持久数据:是指创建这些数据的程序运行终止后数据依然存在于系统之中。数据库中的关系就是持久数据。 (20)持久对象:程序运行结束后,被保留下来的对象称为持久对象。 (21)持久指针:持久指针可看作是数据库中指向对象的指针。持久化指针不像内存中的指针,它在程序执行后及数据重组后仍保持有效。 (22)持久化C++系统: 基于C++的持久化扩充的OODBS。

2019年托福听力背景知识:地质火山(图)

2019年托福听力背景知识:地质火山(图) 地质学是托福听力中考察的重点学科之一,涉及到的具体范畴有板块构造,洞穴形成的分析,冰川移动的形式等等,在本篇文章中,我将针对地质学的另一个重点内容地震以及火山的相关信息以图解的方式为广大考生表现,让考生能够更佳生动深刻的学习并且掌握听力中关于火山和地震相关的核心背景信息。 首先,关于地震和火山现象,要放到一起来说,因为两者往往是相继发生的地质现象。 每天世界各地都会发生不起其数的有感地震,那为什么地面会震动呢? 如上图所示,我们的地球由地壳(crust),地幔(mantle)以及地核(core)组成,其中的地幔部分由超高温的金属和岩石(super-heated metal and rock)组成,温度超过1900摄氏度,而我们则生活于厚度仅有6.5千米的地壳之上。如果把地球缩小成一颗苹果,那地壳的厚度就跟苹果皮一样薄。 因为炽热的地核在持续运动,温度较高的岩石从地心升至地表,温度较低的则沉向地心。这样的温度差就会产生对流,如上图,红色代表温度高的岩石往上流动,蓝色代表温度较低的岩石往下流动,如此就会对地壳实行推压,进而将地壳分割成很多岩石块,称为地壳构造板块(tectonic plates)。这些板块依靠地幔中熔岩的流动而运动,就像溪水中的树叶一般,相遇的时候会产生摩擦(friction)。 有时一个板块会滑倒另一个下面或者压在另一个上面。当这种情况发生时,我们就会感到地震的发生。 摩擦力避免了板块频繁的运动,但这意味着当数十亿吨岩石相互撞击时,就积累了巨大的压力,迟早这个压力会变大,当大到断层发生剧烈运动时,就会印发地震(earthquake)。

GIS矢量数据和栅格数据知识点

栅格数据和矢量数据 矢量数据 定义: ?矢量数据结构通过记录空间对象的坐标及空间关系来表达空间对象的位置。 ?点:空间的一个坐标点; ?线:多个点组成的弧段; ?面:多个弧段组成的封闭多边形; 获取方法 ?定位设备(全站仪、GPS、常规测量等) ?地图数字化 ?间接获取 ●栅格数据转换 ●空间分析(叠置、缓冲等操作产生的新的矢量数据) 矢量数据表达考虑内容 ?矢量数据自身的存储和管理 ?几何数据和属性数据的联系 ?空间对象的空间关系(拓扑关系) 矢量数据表达 ?简单数据结构 ?拓扑数据结构 ?属性数据组织 矢量数据结构编码方式 实体式 索引式 双重独立式 链状独立 栅格数据 定义 以规则像元阵列表示空间对象的数据结构,阵列中每个数据表示空间对象的属性特征。或者说,栅格数据结构就是像元阵列,每个像元的行列号确定位置,用像元值表示空间对象的类型、等级等特征。 每个栅格单元只能存在一个值。 对于栅格数据结构 ●点:为一个像元 ●线:在一定方向上连接成串的相邻像元集合。 ●面:聚集在一起的相邻像元集合。 获取方式: ●遥感数据 ●图片扫描数据 ●矢量数据转换 ●手工方式 栅格数据坐标系 栅格数据压缩编码方案 栅格数据的分层

栅格数据的组织方法 栅格数据特点 编码方式: 直接编码—无压缩编码 链式编码—便界编码 游程长度编码 块式编码 四叉树编码 矢量数据优点: ?表示地理数据的精度较高 ?严密的数据结构,数据量小 ?完整的描述空间关系 ?图形输出精确美观 ?图形数据和属性数据的恢复、更新、综合都能实现 ?面向目标,不仅能表达属性,而且能方便的记录每个目标的具体属性信息缺点: ?数据结构复杂 ?矢量叠置较为复杂 ?数学模拟比较困难 ?技术复杂,特别是软硬件 栅格数据优点: ?数据结构简单 ?空间数据的叠置和组合方便 ?各类空间分析很易于进行 ?数学模拟方便 缺点: ?图形数据量大 ?用大像元减少数据量时,精度和信息量受损 ?地图输出不美观 ?难以建立网络连接关系 ?投影变换比较费时 ?矢量数据结构是一种常见的图形数据结构,它用一系列有序的x、y坐标对表示地理实体的空间位置。 ?矢量结构的特点:属性隐含,定位明显 ?矢量型数据结构按其是否明确表示各地理实体的空间相互关系可分为实体型和拓扑型两大类。 实体型与拓扑型数据结构比较 ?两者都是目前最常用的数据结构模型 实体型代表软件为MapInfo 拓扑型代表软件为ARC/INFO ?它们各具特色 实体型虽然会产生数据冗余和歧异,但易于编辑。 拓扑型消除了数据的冗余和歧异,但操作复杂,甚至会产生新的数据冗余。

数据库题作业

数据库原理及应用作业 班级:学号:姓名: 第一章 一、单项选择题 1. 在数据管理技术的发展过程中,经历了人工管理阶段、文件系统阶段和数据库系统阶段。在这几个阶段中,数据独立性最高的是()阶段。 A. 数据库系统 B. 文件系统 C. 人工管理 D.数据项管理 2. 数据库系统与文件系统的主要区别是()。 A. 数据库系统复杂,而文件系统简单 B. 文件系统不能解决数据冗余和数据独立性问题,而数据库系统可以解决 C. 文件系统只能管理程序文件,而数据库系统能够管理各种类型的文件 D. 文件系统管理的数据量较少,而数据库系统可以管理庞大的数据量 3. 在数据库中存储的是()。 A. 数据 B. 数据模型 C. 数据及数据之间的联系 D. 信息 4. 数据库的特点之一是数据的共享,严格地讲,这里的数据共享是指()。 A. 同一个应用中的多个程序共享一个数据集合 B. 多个用户、同一种语言共享数据 C. 多个用户共享一个数据文件 D. 多种应用、多种语言、多个用户相互覆盖地使用数据集合 5. 数据库(DB)、数据库系统(DBS)和数据库管理系统(DBMS)三者之间的关系是()。 A. DBS包括DB和DBMS B. DBMS包括DB和DBS C. DB包括DBS和DBMS D. DBS就是DB,也就是DBMS 6. 数据库管理系统(DBMS)是()。 A. 一个完整的数据库应用系统 B. 一组硬件 C. 一组系统软件 D. 既有硬件,也有软件 7. 数据库是在计算机系统中按照一定的数据模型组织、存储和应用的()。 A. 文件的集合 B. 数据的集合 C. 命令的集合 D. 程序的集合 8. 支持数据库各种操作的软件系统是()。 A. 命令系统 B. 数据库管理系统 C. 数据库系统 D. 操作系统 9. 由计算机硬件、DBMS、数据库、应用程序及用户等组成的一个整体叫()。 A. 文件系统 B. 数据库系统 C. 软件系统 D. 数据库管理系统 10. 数据库系统中应用程序与数据库的接口是()。 A. 数据库集合 B. 数据库管理系统DBMS C. 操作系统OS D. 计算机中的存储介质 11. 在DBS中,DBMS和OS之间关系是()。 A. 并发运行 B. 相互调用 C. OS调用DBMS D. DBMS调用OS 12. 在数据库方式下,信息处理中占据中心位置的是()。

托福听力背景知识—考古篇

托福听力背景知识—考古篇 上海环球雅思 今天,环球托福的黄维老师为大家整理了一篇关于考古学的托福听力背景知识文章分享,希望大家在做托福听力之前,能够充分对于考古学这一部分的知识有更深层的了解,以便在托福听力考试时更加得心应手。详情请看下面由环球托福老师黄维的托福听力背景知识—考古篇。 考古学: 年代测定 托福听力背景知识—考古篇之一.考古地层学 为了建立地层之间的时间关系﹐19世纪初期就形成了一些地层的基本概念。地层层序律说明地层沉积的原始位置近于水平﹐老者在下﹐新者在上。化石顺序律认为不同的地层含有不同的化石﹐可利用不同化石特征鉴别地层。19世纪地层学的主要工作是利用化石逐步建立了统一的地层系统﹐就是现代所称年代地层学。到19世纪末﹐人们发现同时期形成的地层具有不同的岩性﹐这种横向变化导出了岩相横变的概念。德国学者瓦尔特﹐J.把岩相横变同海侵作用联系起来﹐解释了时间界面同岩相界面的关系﹐称为瓦尔特定律。岩相的研究说明岩性界限在多数情况下﹐并非时间界限﹐所以除年代地层学以外﹐还须建立岩性或岩石地层学。20世纪30年代以来﹐详细的地层和生物群的对比研究建立了生物地层学。年代地层学﹑岩石地层学和生物地层学一直是地层学中的主要分支学科。50年代以後﹐由于研究范围的扩大和研究手段的发展﹐出现了不少新的地层分支学科﹐如磁性地层学﹑地震地层学﹑事件地层学﹑层序地层学等等。 托福听力背景知识—考古篇之二.利用碳-14测定年代 通过测定所发掘物质中的碳-14来鉴别时间,原理是利用碳-14的半衰期。 碳-14是碳的一种具放射性的同位素,于1940年首次被发现。它是透过宇宙射线撞击空气中的氮原子所产生,其半衰期约为5730年,衰变方式是B衰变,碳-14原子转变为氮原子。由于其半衰期达5730年,且氮是有机物的元素之一,我们可以根据死亡生物体的体内残余碳-14成分来推断它的存在年龄。生物在生存的时候,由于需要呼吸,其体内的碳-14含量大致不变,生物死去后会停止呼吸,此时体内的碳-14开始减少。由于碳元素在自然界的各个同位素的比例一直都是很稳定,人们可以透过倾测一件古物的碳-14含量来估计它的大概年龄。这种方法称之为碳定年法。 以上即使本次环球托福的黄维老师为大家整理的托福听力背景知识—考古篇文章,希望能够对大家的托福听力考试有所帮助!

托福听力中lecture记笔记的方法及要点

托福听力中lecture记笔记的方法及要点 自从走上托福听力的讲坛,我经常遇到各种因听力拖后腿而愁眉不展的考生,在听完学子们集体声讨听力段子难度无底限的同时,耳边总是回荡着考生们关于自己无法记下笔记的无奈。这年头,作为一位托福听力老师,手里没几个货真价实的笔记方法,你都不好意思跟学生打招呼;作为托福备考的学子们,要是没几个拿得出手的惯用笔记符号,你都没脸跟人家说自己曾经考过托福。 笔记到底应该怎么记,才会更有效率,笔者认为应该从以下几个方面来练习。众所周知,托福听力讲座呈现的美国大学课堂上的真实场景,教授的演讲总是遵循一定的逻辑和脉络的,常见的结构如总分式,先提出本课的重点,再从多个侧面展开论述,最后总结强调;或者常见于历史类讲座中的线型结构,按照时间的先后顺序进行讲解,这就要求我们在练习的时候,注意从整体上把握文章的结构,边听边划分文章的层次。 把握了文章结构之后,笔记的重点就应该瞄准文章的考点,比如举例论证是听力中出现的最为频繁的考点,出题的角度也是多种多样,在听到举例的时候应该在笔记上标出“eg.”的符号,并用箭头标注此事例的支撑点是什么;抑或是在师生互动的文章中,师生间的问答也是考点,一方面给出相应的背景知识,另一方面老师会对学生的观点进行评价。在平日的练习中,大家就要有的放矢的捕捉考点,逐一击破。 当然,在记笔记的过程中,也要讲究方式方法,平日练习的时候,应该多使用自己习惯的符号,如用星号或三角来表示强调重点;用Q & A 表示问答;用上下箭头表示增减;用单词的首尾字母代替完整的单词拼写,或者几个单词的首字母代替常用的短语。 总之:要想记好笔记,实力一定是第一位的。好的实力才能保证你在听的时候分出精力去辨别此处是否值

类与类之间的关系

类与类之间存在以下关系: (1)泛化(Generalization) (2)关联(Association) (3)依赖(Dependency) (4)聚合(Aggregation) 1.泛化(Generalization) [泛化] 表示类与类之间的继承关系,接口与接口之间的继承关系,或类对接口的实现关系。一般化的关系是从子类指向父类的,与继承或实现的方法相反。 父类父类实例=new 子类() [UML图](图1.1) 2.依赖(Dependency) [依赖] 对于两个相对独立的对象,当一个对象负责构造另一个对象的实例,或者依赖另一个对象的服务时,这两个对象之间主要体现为依赖关系。 依赖关系表现在局部变量,方法的参数,以及对静态方法的调用 [现实例子] 比如说你要去拧螺丝,你是不是要借助(也就是依赖)螺丝刀(Screwdriver)来帮助你完成拧螺

丝(screw)的工作 [UML表现](图1.2) 3.关联(Association) [关联] 对于两个相对独立的对象,当一个对象的实例与另一个对象的一些特定实例存在固定的对应关系时,这两个对象之间为关联关系。[具体表现] 关联关系是使用实例变量来实现[现实例子] 比如客 3.关联(Association) [关联] 对于两个相对独立的对象,当一个对象的实例与另一个对象的一些特定实例存在固定的对应关系时,这两个对象之间为关联关系。 [具体表现] 关联关系是使用实例变量来实现 [现实例子] 比如客户和订单,每个订单对应特定的客户,每个客户对应一些特定的订单;再例如公司和员工,每个公司对应一些特定的员工,每个员工对应一特定的公司 [UML图] (图1.3) (4)聚合(Aggregation) [聚合] 当对象A被加入到对象B中,成为对象B的组成部分时,对象B和对象A之间为聚集关系。聚合是关联关系的一种,是较强的关联关系,强调的是整体与部分之间的关系。 [具体表现] 与关联关系一样,聚合关系也是通过实例变量来实现这样关系的。关联关系和聚合关系来语

浅谈对的理解对象—关系型数据库

浅谈对对象—关系型数据库的理解 姓名:杨小敏 学号:2010206190026

针对对象—关系型数据库的理解我想结合自己的专业(地图学与地理信息系统)从下面三个方面来理解:(1)关系型数据的发展以及在空间数据管理方面的优缺点(2)面向对象数据库的发展及在空间数据管理方面的优缺点(3)关系型数据库和面向对象技术的融合在数据库发展中所起到的独特作用在我们GIS专业领域内,随着信息技术的发展,各种应用系统建设的不断深入,像现在面向21世纪的应急应用系统的建设、城市基础地理空间信息数据库系统与共享平台的建设、地理信息公共服务平台的建设,小到“数字城市”的建设,大到“数字地球”乃至“智慧地球”的建设,我们已经开始不满足数值和文字的信息处理,为了达到系统建设平台尤其是公共服务平台的建设起到良好的客户友好体验,大量的图形信息,音频信息已经深入到数据库的设计中,其中尤其是空间数据库管理备受瞩目。所以,面对信息爆炸的21世纪,海量数据的存储和管路已经不是传统的数据库能解决的,空间数据管理需要更强的数据库——对象关系型数据库。 为什么空间数据需要对象关系型数据库的管理才更有效?我想简单的说一下GIS空间数据的基本特征:(1)空间特征:每个空间对象都具有空间坐标,所以在存储空间数据的同时我们要考虑数据的空间分布特征;(2)非结构化特征:通用数据库或者是传统数据库数据记录一般是结构化的,在面对空间图形信息的时候难以直接采用关系数据管理系统;(3)空间关系特征:空间数据的空间关系最重要的就是空间拓扑关系,这种拓扑结构方便了空间数据的查询和空间分析,但是给空间数据的一致性和完整性的维护增加了复杂性;(4)海量数据的特征:数据库在面对海量数据的存储和组织时,一般在二维空间上划分块或图幅,在垂直的方向上划分层在组织海量空间数据。 在空间数据的管理技术的发展中,从手工管理管理阶段到文件管理阶段再到数据库管理阶段,在三个数据管理阶段,对数据管理方式也不尽相同,在这里,我想说的是空间数据库的发展历史对空间数据管理的影响,第一是层次关系型数据库:只是数据库发展的初级阶段,这是空间数据的管理大多用文件方式管理,很显然不适合管理海量的空间数据,所以淘汰;第二是网络关系型数据库:在一定程度上解决了空间数据复杂管理的难题,但还是被日益崛起的关系型数据库所淘汰;第三是关系型数据库的发展:为了解决难于保证数据的完整性,开始将空

托福听力背景材料

托福听力背景材料——天文学 (1宇宙与星系 随着更强大望远镜的发明和科学技术的进步, 人类开始逐步深入探索宇宙的奥秘。宇宙有多老 ? 宇宙中是否还有其他生命体 ? 宇宙有多大 ? 根据哈勃望远镜测算到宇宙的年龄是:130亿年到 170亿年之间。所来一个偶然的发明, 使人们接收到宇宙微波辐射背景, 这就推算出宇宙的年龄是 137亿年, 这项技术因此还得了诺贝尔奖。而在学科上, 也出现了一个新的学科——天体生物学。天体生物学(astrobiology是天文学和生物学的交叉学科。这个学科主要研究陨石中的微生物。而这些微生物是可以随着陨石在不同的行星 (Planets之间转移的。宇宙过于浩瀚,故而,天文学家需要划分出一些区域进行研究。星系、星云、星群、星族、星座都是被划分出来的研究区域。其中, 星系是最大的区域, 比如我们地球所处的银河系就是众多星系中的一个。然而, 早期望远镜没有现在这么发达, 科学家还常常把星系误判断为星云, 比如现在我们银河系的邻居——大、小麦哲伦星云其实是星系。星系与星系之间存在互相作用并进行吞噬, 银河系吸引临近的星系就像地球和月亮间的潮汐力吸引一样。银河系会以它强烈的引力进行吞噬。银河系对这个星系是有影响的:一方面吸收了它的星球 ; 另一方面改变了它的形状, 拉长它的形状最终破坏它。天文学家还发现银河系中的某些缺金属元素的“高速星云” , 这些高速星云中和了新星上形成的金属元素。这些星云起着维持银河系中星云平衡以便生成新星的作用 月球的南极艾特肯盆地 (TPO 1. 南极 -艾特肯盆地 (South Pole-Aitken basin.简称为 SPAB 月球上最大的环形山 , 同时也是太阳系内已知最大的 , 形成了 South Pole-Aitken basin(SPAB。这个环形山位于月球的背面, 接近南极的 Aitken 盆地, 直径约 2, 500 千米, 深 12千米。该盆地层略有升高丰度的铁,钛,和钍等化学元素。

(完整word)新托福听力场景汇总之LECTURE篇(修正),推荐文档

新托福听力场景汇总之LECTURE篇 1.生物学 antibody抗体toxin毒素immunity免疫immunology免疫学vaccine疫苗fungus真菌bacteria细菌fermentation发酵inflection传染 / 感染microorganism / microbe微生物virus病毒disfection消毒sterilization灭菌biology生物学marine biology海洋生物学entomology昆虫学ornithology鸟类学microbiology微生物学genetics遗传学speciology物种学parasitology寄生虫学paleontology古生物学paleontologist古生物学家dinosaur恐龙die out / extinction灭绝mammal 哺乳动物carnivore食肉动物rodent啮齿类动物underwater水下的marine海洋的scuba 水下呼吸器diving潜水 / 跳水one-celled organism单细胞有机体tissue(动植物细胞的)组织protective camouflage保护色predator捕猎者oceanic海洋的snail蜗牛animal adaptation动物适应性survival of the fittest适者生存origin of species物种起源wild environment野生环境insecticide杀虫剂prenatal care产前护理habitat栖息地tentacle触须prey捕食navigate导航tiny receptor接收器nerve 神经/ specimen样品amphibian两栖类动物decline in the number数量减少gene基因genetic基因的,遗传的endangered species濒危动物survival活着的transition转变/过渡microbe 微生物yeast酵母(菌)bacteria 细菌single-cell单细胞reptile爬行类动物hatch孵化incubation 孵化nest巢offspring子孙chew up咀嚼unfertilized eggs未受精卵nutrient营养品nourishment营养品 / 食物feed喂养cannibalism同类相食respiration呼吸ingestion摄食digestion消化digestive enzyme消化酶cell细胞nucleus细胞核cytoplasm细胞质plasma lemma / cell membrane细胞膜cell wall细胞壁protein蛋白质amino acid核酸plankton浮游生物heredity遗传mutation of species物种变异chromosome染色体genetic engineering遗传工程solitary独居social群居bio-diversity生物多样性metamorphosis变态/变形mutation变种variation变异 2.动物学 zoology动物学Darwinism达尔文学说natural selection自然选择phylum门class纲order目suborder亚目family科genus属species 种invertebrate无脊椎动物vertebrate脊椎动物aquatic life水生动物reptile爬行动物amphibian/amphibious animal两栖动物protozoa原生动物rodent啮齿动物ruminant反刍动物parasitic animal寄生动物primate灵长动物plankton浮游生物mollusk 软体动物coelenterate腔肠动物(如水母、海蜇、珊瑚等)herbivore食草动物mammal哺乳动物homotherm恒温动物cold-blooded animal冷血动物poikilotherm变温动物scavenger食腐动物carnivorous食肉的herbivorous食草的omnivorous杂食的bird鸟类camouflage伪装hibernate冬眠;蛰伏regeneration再生predatory / carnivore食肉的predator捕食者prey捕食hordes/swarms (昆虫等)群flock(牛、羊等)群community动物的群落或人的部落population种群herd兽群hygiene卫生sanitation公共卫生;卫生设施monogamous一夫一妻的/一雌一雄的polygamous一夫多妻的/一雄多雌的polyandrous一妻多夫的/一雌多雄的nomadic游牧的;流浪的trapper诱捕动物者niche小生态环境vestige 退化的器官fertilize使受精metabolism新陈代谢breed(名词)品种;(动词)繁殖multiply / reproduce繁殖spawn(鱼、虾、蛙等)孵anatomy解剖学appetite食欲creature生物scales鳞feathers羽毛armor甲spinal cord脊椎digestive system消化系统excretory system排泄系统reproductive system生殖系统circulatory system循环系统respiratory system呼吸系统hormonal system内分泌系统digestive duct消化管esophagus食管stomach胃small intestine小肠large intestine大肠anus肛门digestive gland消化腺salivary gland 唾液腺liver肝gallbladder胆pancreas胰squirrel松鼠marten貂bat蝙蝠squeak(老鼠等)吱吱otter水獭antelope羚羊gorilla大猩猩chimpanzee黑猩猩baboon狒狒

对象关系模型数据库解析

面向对象数据库系统(Object Oriented Data Base System,简称OODBS)是数据库技术与面向对象程序设计方法相结合的产物。 对于OO数据模型和面向对象数据库系统的研究主要体现在:研究以关系数据库和SQL为基础的扩展关系模型;以面向对象的程序设计语言为基础,研究持久的程序设计语言,支持OO模型;建立新的面向对象数据库系统,支持OO数据模型。 面向对象程序设计方法是一种支持模块化设计和软件重用的实际可行的编程方法。它把程序设计的主要活动集中在建立对象和对象之间的联系(或通信)上,从而完成所需要的计算。一个面向对象的程序就是相互联系(或通信)的对象集合。面向对象程序设计的基本思想是封装和可扩展性。 面向对象数据库系统支持面向对象数据模型(以下简称OO模型)。即面向对象数据库系统是一个持久的、可共享的对象库的存储和管理者;而一个对象库是由一个OO模型所定义的对象的集合体。 一个OO模型是用面向对象观点来描述现实世界实体(对象)的逻辑组织、对象间限制、联系等的模型。一系列面向对象核心概念构成了OO模型的基础。概括起来,OO模型的核心概念有如下一些: (1)对象(Object)与对象标识OID(Object IDentifier) 现实世界的任一实体都被统一地模型化为一个对象,每个对象有一个唯一的标识,称为对象标识(OID)。 (2)封装(Encapsulation) 每一个对象是其状态与行为的封装,其中状态是该对象一系列属性(Attribute)值的集合,而行为是在对象状态上操作的集合,操作也称为方法(Method)。 (3)类(C1ass) 共享同样属性和方法集的所有对象构成了一个对象类(简称类),一个对象是某一类的一个实例(instance)。 (4)类层次(结构) 在一个面向对象数据库模式中,可以定义一个类(如C1)的子类(如C2),类Cl 称为类C2的超类(或父类)。子类(如C2)还可以再定义子类(如C3)。这样,面向对象数据库模式的一组类形成一个有限的层次结构,称为类层次。 (5)消息(Message) 由于对象是封装的,对象与外部的通信一般只能通过显式的消息传递,即消息从外部传送给对象,存取和调用对象中的属性和方法,在内部执行所要求的操作,操作的结果仍以消息的形式返回。 OODB语言用于描述面向对象数据库模式,说明并操纵类定义与对象实例。OODB语言主要包括对象定义语言(ODL)和对象操纵语言(OML),对象操纵语言中一个重要子集是对象查询语言(OQL)。OODB语言一般应具备下述功能: (1)类的定义与操纵 面向对象数据库语言可以操纵类,包括定义、生成、存取、修改与撤销类。其中类的定义包括定义类的属性、操作特征、继承性与约束等。 (2)操作/方法的定义 面向对象数据库语言可用于对象操作/方法的定义与实现。在操作实现中,语言的命令

托福听力背景知识学习之植物学

托福听力背景知识学习之植物学 托福听力备考中如果大家能了解一些各类学科的背景知识,对于托福听力提升的帮助也很大。下面就和大家分享托福听力背景知识学习,来欣赏一下吧。 托福听力背景知识学习关于植物学你了解多少? 托福听力背景知识学习之植物结构 和其他话题一样,考生对于植物学的了解并不需要多么专业,但专家提醒各位考生,至少要了解一些和植物相关的基本词汇才能理解*所要表达的内容。以一棵树为例,其组成部分主要包括了树干(trunk)、树皮(bark)、根(root)、树枝(branch/bough)、细枝(twig)、嫩芽,幼苗(shoot)、树叶(leaves)、叶柄(stalk)等。植物的花(flower)包括了花苞(bud)、花瓣(petal)、花蜜(nectar)、花粉(pollen)、雌蕊(pistil)、雄蕊(stamen)等。植物的果实(fruit)包括了果皮(peel)、果肉(flesh)、种子(seed)等,有的种子外部会包裹坚硬的壳(shell),以保护种子不受伤害。 托福听力背景知识学习之植物的光和作用(photosynthesis) 不论对于植物还是人类乃至整个地球来说,光和作用都是非常重要的过程。简单来说,光和作用就是绿色植物利用光能将其所吸收的二氧化碳(carbon dioxide)和水转化为有机物(organics),

并释放出氧气(oxygen)的过程。存在于叶子中的叶绿素(chlorophyll)和一些酶(enzyme)在光合作用中起到了举足轻重的作用。在这个过程中,植物为人类消耗了我们生产、生活中所排放的二氧化碳气体,也就是通常意义上的温室气体(greenhouse gas)。植物释放的氧气是人类生存必不可少的。同时,植物生产储存的有机物又为动物提供能量****,因此植物被称为食物链(food chain)的生产者。 托福听力背景知识学习之植物的用途 除了作为果腹的食物,植物有许多其他的重要用途。许多药物成分的提取主要是来自于植物,比如阿司匹林(asprin)的成分就源自于柳树皮(bark of willow trees);像吗啡(morphine)这类麻醉止痛剂(narcotic analgesics)则是从罂粟(opium poppy)中提取的。我们常常用来提神的咖啡(coffee)、巧克力(chocolate)、烟草(tobacco)和茶(tea)也是从植物中得来的。通过发酵(ferment)如大麦(barley)、米(rice)和葡萄(grape)这些植物,我们还能得到不同口味的美酒。 生活中,植物能为我们提供棉(cotton)、麻(hemp)等织物(fabrics),丝绸(silk)的生产也离不开桑树(mulberry tree)的存在。有的植物还能做成绳锁(rope),制作橡胶(rubber),为人类生活带来便利。

2019年托福听力讲座部分篇章结构之总分结构解题技巧

2019年托福听力讲座部分篇章结构之总分结构解题技 巧 托福听力,作为整场托福考试中耗时最长,强度的部分,一直是 所有考生出国留学征程上的拦路虎之一。面对无论是相对接地气的对 话部分(Conversation),还是绝对高大上的讲座部分(Lecture),一个 个学霸们都会肃然起敬,一片片学渣们更是闻风丧胆。以前,大家往 往认为,导致听力部分困难的原因是陌生词汇多,材料不熟悉。不管 是讲座部分涵盖的艺术人文(绘画/音乐/舞蹈/电影/雕塑/建筑),自然 科学(天文/地质/环境/生态),生命科学(动物/植物/微生物/基因), 社会科学(历史/政治/商业/心理/语言),还是学术场景类对话(论文/ 考试/办公室答疑),甚至是在最简单的生活场景类对话(图书馆/食堂/ 选课/就业)中,都有可能出现考生之前没有接触过的知识和情景。不过,这其实并不是最重要的原因,因为大家在完成托福阅读部分的时候,同样会遇到完全陌生的信息,却依然能够回去查看原文,通过考 点在文章中的定位来协助解题。而在听力部分中,考生却没有机会再 回去听力材料中寻找,因而必须在首次接触的时候,就搞清篇章结构,以便解题。 而且大家也都知道,托福中听力水平的测试,不但是通过听力部 分来实现的,也通过四道综合口语题,以及一道综合写作题来间接考查。不过万幸的是,托福口语后四道题中各自对应的那四篇听力材料,每一篇都有自己固定的结构,而托福综合写作里那段听力材料的结构 则更是千篇一律的三段式,所以考生相对来说还是比较容易把握。而 真正的听力部分中,结构还是比较多样化,不过即便这样,我们还是 能够将所有的文章划分成几种类型。下面,我们就具体来聊聊如何通 过一些关键的字词,来识别听力文章的结构,从而协助我们解题。 通常情况下,一次考试会有两个听力正题部分(Section),每个部 分包含一篇对话和两篇讲座。也就是说,每次考试中我们会有遇到两 篇计分的对话,和四篇计分的讲座。其中,所有的讲座结构又能够被

类与类之间的关系及代码表现

类与类之间的关系对于理解面向对象具有很重要的作用,以前在面试的时候也经常被问到这个问题,在这里我就介绍一下。 类与类之间存在以下关系: (1)泛化(Generalization) (2)关联(Association) (3)依赖(Dependency) (4)聚合(Aggregation) UML图与应用代码例子: 1.泛化(Generalization) [泛化] 表示类与类之间的继承关系,接口与接口之间的继承关系,或类对接口的实现关系。一般化的关系是从子类指向父类的,与继承或实现的方法相反。 [具体表现] 父类父类实例=new 子类() [UML图](图1.1) 图1.1Animal类与Tiger类,Dog类的泛化关系 [代码表现] 1.class Animal{} 2.class Tiger extends Animal{} 3.public class Test 4.{ 5. public void test() 6. { 7. Animal a=new Tiger(); 8. } 9.} 2.依赖(Dependency) [依赖] 对于两个相对独立的对象,当一个对象负责构造另一个对象的实例,或者依赖另一个对象的服务时,这两个对象之间主要体现为依赖关系。 [具体表现]

依赖关系表现在局部变量,方法的参数,以及对静态方法的调用 [现实例子] 比如说你要去拧螺丝,你是不是要借助(也就是依赖)螺丝刀(Screwdriver)来帮助你完成拧螺丝(screw)的工作 [UML表现](图1.2) 图1.2 Person类与Screwdriver类的依赖关系 [代码表现] 1.public class Person{ 2. /** 拧螺丝 */ 3. public void screw(Screwdriver screwdriver){ 4. screwdriver.screw(); 5. } 6.} 3.关联(Association) [关联] 对于两个相对独立的对象,当一个对象的实例与另一个对象的一些特定实例存在固定的对应关系时,这两个对象之间为关联关系。 [具体表现] 关联关系是使用实例变量来实现 [现实例子] 比如客户和订单,每个订单对应特定的客户,每个客户对应一些特定的订单;再例如公司和员工,每个公司对应一些特定的员工,每个员工对应一特定的公司 [UML图] (图1.3) 图1.3 公司和员工的关联关系 [代码表现] 1.public class Company{ 2. private Employee employee;

矢量、栅格数据结构的优缺点

§矢量栅格一体化数据结构 一、矢量、栅格数据结构的优缺点 矢量数据结构可具体分为点、线、面,可以构成现实世界中各种复杂的实体,当问题可描述成线或边界时,特别有效。矢量数据的结构紧凑,冗余度低,并具有空间实体的拓扑信息,容易定义和操作单个空间实体,便于网络分析。矢量数据的输出质量好、精度高。 矢量数据结构的复杂性,导致了操作和算法的复杂化,作为一种基于线和边界的编码方法,不能有效地支持影像代数运算,如不能有效地进行点集的集合运算(如叠加),运算效率低而复杂。由于矢量数据结构的存贮比较复杂,导致空间实体的查询十分费时,需要逐点、逐线、逐面地查询。矢量数据和栅格表示的影像数据不能直接运算(如联合查询和空间分析),交互时必须进行矢量和栅格转换。矢量数据与DEM(数字高程模型)的交互是通过等高线来实现的,不能与DEM直接进行联合空间分析。 栅格数据结构是通过空间点的密集而规则的排列表示整体的空间现象的。其数据结构简单,定位存取性能好,可以与影像和DEM数据进行联合空间分析,数据共享容易实现,对栅格数据的操作比较容易。 栅格数据的数据量与格网间距的平方成反比,较高的几何精度的代价是数据量的极大增加。因为只使用行和列来作为空间实体的位置标识,故难以获取空间实体的拓扑信息,难以进行网络分析等操作。栅格数据结构不是面向实体的,各种实体往往是叠加在一起反映出来的,因而难以识别和分离。对点实体的识别需要采用匹配技术,对线实体的识别需采用边缘检测技术,对面实体的识别则需采用影像分类技术,这些技术不仅费时,而且不能保证完全正确。

通过以上的分析可以看出,矢量数据结构和栅格数据结构的优缺点是互补的(图2-4-1),为了有效地实现GIS中的各项功能(如与遥感数据的结合,有效的空间分析等)需要同时使用两种数据结构,并在GIS中实现两种数据结构的高效转换。 在GIS建立过程中,应根据应用目的和应用特点、可能获得的数据精度以及地理信息系统软件和硬件配置情况,选择合适的数据结构。一般来讲,栅格结构可用于大范围小比例尺的自然资源、环境、农林业等区域问题的研究。矢量结构用于城市分区或详细规划、土地管理、公用事业管理等方面的应用。 精

详解托福听力心理学相关背景知识

详解托福听力心理学相关背景知识托福听力相对而言是比较有难度的项目,考生是要在平时积累足够的背景知识,再更多的做练习与了解托福听力评分标准,才可以在考试中取得理想成绩。下面老师为大家带来了托福听力心理学相关背景知识一文。 心理学——Psychology: 心理学:主要表示研究心理现象客观规律的科学。心理现象主要是指认识、情感、意志等心理过程和能力、性格等心理特征。 心理学研究牵涉了知觉、认知、情绪、人格、行为、人际关系、社会关系等相当多的领域,同样是和日常生活的许多领域——家庭、教育、健康等发生关联。心理学一方面是试着使用大脑运作来解释个人基本的行为与心理机能,另外心理学同样是要尝试解释个人心理机能在社会行为以及社会动力中的角色;同时它也与神经科学、医学、生物学等科学有关,因为这些科学所探讨的生理作用会影响个人的心智。 针对ETS出题人而言,弗洛伊德(心理学家)最关键的思想理论成就也就是他的各种意识论。当然现代心理学对于意识以及认知的观点与弗洛伊德都是有大不同的,因为弗洛伊德的出发点是“治病救人”,因此他的理论不仅不是一般意义上的心理学,同样不是所谓的哲学,而是基于调整病态心理问题而生的理论。但老爷爷确实做出了很多前无古人的创举。 伪心理学: 心理学原本就属于一项很严谨的科学研究,要求不断的进行实验来验证所有的理论。另外心理学不但是在临床医学上被广泛应用,同时还在有的需要心理学研究辅助的行业也有非常重要的作用,例如在美剧里时常出现的“侧写(profiling)”,此为一种通过罪犯的行为方式推断其心理状态,最后分析罪犯的性格、职业、生活环境等的特殊刑侦手段。很多时候我

们都会将它与一些所谓“封建迷信”联系在一起,而所谓的“伪心理学”与真正的心理学最大的不同即为:伪心理学并不为他们的理论进行实验与验证。所以啦,看手相、面相、测字、通灵等等在科学界都被认为是“伪科学”。 以上就是老师介绍的托福听力心理学相关背景知识,更多精彩敬请关注本在线托福网。

最新托福听力lecture的结构

托福听力lecture的结构 1. Lecture有结构 托福听力的lecture基本上都是总分总的结构,即「引入话题——展开讨论——教授总结」,下面分点论述: 1)引入话题 其实托福听力lecture里面讲课的教授就跟大家碰到过的老师一样,有的喜欢直入主题,有的喜欢拉七杂八。总的说来,lecture话题的引入方式有三种: ?开门见山:啥也不扯,一言不合就开车,比如TPO1 Lecture2:Ok, let’s get started. Great. Today I want to talk about a way in which we are able to determine how old a piece of land, or some other geologic feature is –dating techniques. ?课程回顾:回顾上堂课的内容,然后再切入本堂课的内容;一般来说本次是上次的细化或者与上次的对比。比如TPO5 Lecture2:Last week, we covered some arguments against going back to the Moon. But there are compelling reasons in favor of another Moon landing too, um…not the least of which is trying to pinpoint the moon’s age. ? ?啰哩吧嗦:有的教授实在是特别啰嗦,比如TPO2 Lecture2:Hi, everyone. Good to see you all today.【你也好啊】Actually, I expected the population to be a lot lower today. It typically runs

相关主题