-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database Discussion #12
Comments
Yes, you should use the schema builder for this task: http://four.laravel.com/docs/schema |
I'm looking forward to something like that, just jotting it down here :o) |
That sounds like a much better way of approaching field handling in the database. The current Symphony approach of creating a new table for each field instance seemed a little....strange.... Also don't forget about migrations for any DB changes. And possibly even generators for some really quick setups. :) |
noSQL? |
Good call. I was just thinking the same in terms of possible integration with other database types such as MongoDB. I'm not sure what Laravel provides in terms of support though so I guess that would be the benchmark limitation. |
I would hazard a guess the we can expect this to change. One of the benefits of using Eloquent, Schema and Migrations is the separation from SQL operations. |
I've added some Laravel |
forget extensions and delegates, they'll be obsolete since we have a real pubsub system. Extensions shouldn't be registered on the database either. |
That's what I thought. I have lots more to push today. |
Thanks for doing this @designermonkey. Also, I would love to see the Pages and Sections removed from the database, like the Data Sources. These are the two main problems I have to deal with when working with a team on a Symphony project: they make the sync. of the DB export mandatory, and conflicts are almost impossible to merge manually. |
I imagine they will be, but for the beginning we could stick to what we know and scope it out to be in the database. I am a big believer that it should be XML based, and will be pushing for that myself. |
Agree, wise approach. |
Gist updated |
On entries (or everything), is it possible to work with SHA-1 timestamp on primary keys instead of auto-increments integer? This keys possibly open a lot doors (versioning, sync, pluggable stuffs)... |
I did a little research about this a while back, when we were discussing IDs and UIDs for XML based schemas, and how to match them throughout the schemas and database, and the one point stuck out: Using UIDs in a database goes against it's natural method of relational data, and is therefore slower at indexing and returning data, although this speed decrease isn't noticeable until you get to using 1000s of rows in a table. This being said, it would be a great thing to do to relate data inside and outside of the database. Any XML schemas will need a database record to ensure they are the same file if any of it's values are changed. I'll try and find the resources again that I was reading. |
I've renamed this thread to allow discussion on more DB topics |
I've been researching again, and from what I see my comment above stands. There are advantages to using GUIDs in place of IDs, one of which is database replication, which becomes really easy to maintain when doing away with RDBMS generated IDs. It all comes down to speed and performance, against portability of data across systems. As we want to make a system that allows data to be packaged and used across different installs (passing sections around for example) then GUIDs are the way to go. One of the arguments against it, raised by @brendo a while back (although I forget where) was using IDs in URLs like Another issue we would encounter would be that Eloquent and Laravel are designed to use RDBMS generated IDs when inserting data, joining tables, ORM etc, so we would have a headache there. I'm all for GUIDs personally, and have wanted to do it for a while, but we have to weigh up the pros and cons, and whether we could coerce Laravel framework to use them fluently instead of IDs. |
Actually, thinking about my comment about Eloquent and Laravel, I don't think it would be an issue come to think of it, they would just rely on an auto increment value, where we would be passing the value into a field without an auto increment. It should be fine then. We just have to weigh up the pros and cons, and what we actually want to achieve here. +1 from me though :o) |
Some useful opinion and some pros and cons:
Seems like it has more benefits than disadvantages. It looks like Eloquent can also be extended when getting and setting ID: And PHP has a ready made function for this in uniqid. |
Funny, those are two of the articles I just read :o) Something simple like
takes no time at all to produce a completely unique hash. And then, even if we only use the first 6-10 characters, we won't get any collisions. |
Ha, being top of Google gets results. ;) I've never really thought about GUID before (never had the need) but now it's been mentioned I can see it's benefits. As you say, I can't really think of many use cases where the ID is called in the URL and the benefits of having truly unique entry ID's does seem to outweigh a few extra bytes to store them. Laravel uses an Application key which could be used for the seed. Although that may just be in the full-stack version... Mcrypt would probably do the job anyway. |
The only reason this has ever come up with Symphony in the past is when we were preparing to migrate the structure; Sections, Fields, Pages, to XML. To do that and allow portability between systems would need truly unique identifiers, and a system that could manage what to do when they were different, so a database table of records of the unique IDs that were loaded, and a check of the filesystem folders in which they resided to check for changes/additions. In my opinion, if we are moving to this structure (which we are) it makes sense to change all of the things to GUIDs for ease of implementation. Files should have a GUID, and a 'change' GUID too to check revisions, but we have a thread all about that on the Symphony 2 repo. Edit: here's the link for reference: symphonycms/symphonycms#1002 |
Replication isn't that hard with ID's either. In mysql, for example, you can use the auto_increment_increment config, which will make sure there are no more conflicts with ID's. Using GUIDs will be a solution to a few problems, but merging database records will always be a can of worms. |
The auto increment offsets, can indeed work magic. I've been set-up like On 29 May 2013 14:04, Huib Keemink [email protected] wrote:
|
This discussion has been on my mind for a while now, and the more I think about it, the more I think we are headed in the wrong direction. The biggest problem with the database design as it stands now is that it is very hard to understand and query (multiple joins, having to know exactly which id is what field, etc). We could do away with one problem by naming the tables better as @designermonkey pointed out, but why wouldn't we merge all these tables into one? Imagine selecting an entry from the
Doing it like this would also mean that we can use Eloquent as it is meant. Thoughts? |
The concept of abstract sections and fields is one that bothered me for a long time when first looking at Symphony, but over the time I've used it, I understand why it is a necessity in a way. Normalising it as much as possible is required to move forward, but to do what we know and love in Symphony is to use a method like I outline above. We wouldn't need too many joins, as each table is a |
I don't. I am working a lot in the database directly, so I can find my way. But I don't really understand why this level of normalization is better than having all fields of a section in a single table (like proposed by @creativedutchmen). However, I remember that (when Symphony 2 started) there were discussions about reducing the normalization level, but in the end it was considered the better solution. Maybe @allen can tell us the reasons. Or maybe you can, @designermonkey? |
We have two concepts at play here (for those reading this that don't get it), abstraction and normalisation. Abstraction generally makes a database or system more manageable by software like PHP, whereas normalisation makes it more manageable by users. In the case of Symphony, the database is entirely managed by the software, and in itself has some abstract concepts at play. Symphony build Section and Fields as though they were tables in a database, albeit with a lovely GUI for users to interact with them. This lets users build systems as abstract or normal as the end user needs them to be. To have this work efficiently from a code perspective, abstraction at a database level is a must, otherwise it gets complicated with more points of possible failure from the code interacting with the database. Normalised databaseManaging a database where the underlying schema is normalised to a user's understanding level is very hard to manage from a code perspective, as the schema can be changed at a whim to whatever the For a system that will have a fixed schema, like Wordpress et al, this isn't a problem, but for a schema creation system like Symphony? Bad idea. Abstracted databaseIt's much easier to manage a database with a set schema that employs using rows over columns to configure a system's abstract concept of Symphony is the database as far as the user is concerned, the database in reality becomes just an abstract storage engine. We have to remind ourselves that our level of abstraction and customisation comes at a price, and that is normalisation at RDBMS level. It's just not possible to efficiently manage this. We can of course normalise it to some degree, making the understanding of the abstract concept easier (as I propose above) but complete normalisation isn't possible for a concept like Symphony. God knows I've argued with a Symphony database a lot and sometimes hated the concept, but using it over and over again has taught me why it is needed. It's a toss up between system level abstraction and customisation vs database normalisation and understanding. We can't have both, and it is one of the reasons a system like Symphony is so powerful. What I propose is to remove that initial WTF??? moment when looking at a Symphony database, and merging all the entry data tables into one per field, we can't however, remove all of them to a single schema for each section, as we will lose that abstract ability we have in the admin interface. |
No it isn't. Normalisation is about having every bit of data in a single location, and linking to the data rather than copying it (gross simplification, I know!). I would say that the current database design is highly normalised, but it is far from easy for a user to understand. In fact, normalised databases are much easier to work with from within code, as the database itself can handle much of the additional tasks (deleting children when parent is deleted, for example), and the risk of corrupt/incomplete/missing data is reduced. To be honest I do not understand your point about the I do appreciate your concern with index recreation whenever a section is modified, but keep in mind these are only very time consuming when the tables you're dealing with are huge. Mine certainly aren't, especially not in the period where schema changes often occur: in early development. To be honest I would prefer a much simpler layout to work with, even if that means I have to do a huge migration in the weekends or at night. |
As a sidenote: we do not have to write |
I try to better understand this.
Do I understand this right? If every simple field was allowed to insert columns in a "section table", that would be potentially complicated or even dangerous? (Unless "migrations" would handle this, as @creativedutchmen says.) Correct?
Same point as above, right? Generally, it might be a good idea to do some research on how experienced Laravel users organize data in the database. |
I'd take a look at the DB structure of craftcms. |
I would disagree, as all of the data isn't in one single place, it's in many multiple tables, to join together and create an entry. Normalised would mean that a single entry is in a single table, as you're proposing. Normalisation, adding data in one location, makes it manageable by users. We're arguing the same point really.
Yep. We currently Anyway, I have had this explained to me by @allen and @brendo before as to why it was done this way. I can't find them in my emails though. Also, If I remember rightly, they explained it was done the way @creativedutchmen is proposing while trialling, but it didn't work right, or something like that. I won't speak for others, as I can't fully remember the conversations. |
@designermonkey are you sure? For me normalization means reduction of data replication. That is as long as the data is available in a single place (one table) it doesn't matter with how many others it is joined. So having the First Name in one table; and the last name in another is completely acceptable as long as there is not another table containing the same user's first/last name. I'd say symphony as a 'core' is very normalized. The way symphony is built this step falls far more on the developer building systems with Symphony rather then symphony itself.
Symphony already allows me to do this; weather I got 1 table or a 100; what users will ever interact with is the Symphony backend; no matter how many tables/links we generate in the background for the user it's always a single entry. If on the other hand you're looking at developers that's a totally different matter. I've worked on extensions and datasources which pulled up entry data from the backend and in all honesty it's not as complicated. Maybe what would make things simpler is a better API wrapper to make the interaction with database easier and more manageable for developers, who do not want to delve into the Symphony Database structure. |
Well I'm obviously getting my terminology wrong, but that's not the point I'm trying to make. It's just semantics. |
Swap normalised for simplified and it makes more sense then. |
Edit: oops, I am too late to the party with this...
Not really. Every change in the schema would be catered for by a migration, which is completely separate from the way the entries are fetched. Because the schema is very simple and clean between migrations it means filtering is quite easy. You could do something like Because the schema is reflected in the Models we will be using, the ORM will know how to build the queries, even if they get more complex and involve relations. The biggest reason I can think of why Symphony hasn't used this kind of schema yet is that the migration tools and the ORM are very time-consuming projects. But since they have been developed for us, I think it would be very wise to just build upon them and not try to reinvent the wheel! |
Wise words. If we trust in the framework (and we must do this), we should use as many parts of it as possible. This means "delegating work to someone/something we trust", as opposed to "use a bit of it, but try and do the rest better anyway". In my experience the latter generally means a lot of work with little outcome. |
I'm in complete agreement that we should use ORM, and therefore Eloquent, and therefore not re-invent the wheel, but you're both missing the point, which is: We are not building a simplified data structure, we are allowing developers to build simple and complex data structures, and these need a more complex data structure at the RDBMS level, which is why Symphony is currently built the way it is. The problem we currently have with ORM is that our PHP code has no ORM built in, it has nothing to do with the database structure. The database structure is the way it is because our developers use the Symphony admin to build what is best described as a pseudo RDBMS on top of a real RDBMS. Also, @creativedutchmen I don't understand where, in your example, you are getting |
An ORM in this context takes objects (rows) from a single database table, and relates them in object terms to another single object (row) from another single database table, using This has nothing to do with table layout in this context of what we're trying to achieve, and what we currently achieve with Symphony. We have to continue to think about the bigger picture here of what Symphony allows developers to do. |
(Sorry for going on and on about this) A Field would be a model ( The main The Now, granted, I havent figured out the details of how this would work, but you can start to see how a |
I disagree here. Even though we are building complex structures, these structures can easily be built using the schema I proposed. There is absolutely no need to abstract the data as far as you are proposing, because the tools that make a simple schema work have already been written. Just imagine having a single table for each section, and that each entry is a row in that table, with each field being a column. Doesn't this scream Symphony to you? To me it does, because it's exactly the way the backend itself is laid out. Now, to say we are building structures that are too complex for this sounds strange to me, since that would mean they no longer fit within Symphony.
It is an example. Most fields will have a few bits of data they need to store, for a textfield this is value and handle. It made sense to me to prefix the field name with the attribute. But, of course, this is just an example. title_value could work just as well, or titleValue, or whatever. The title.handle would simply be the name of the column in the database. |
I see what you mean with the dot now. I think we need @brendo's input here. |
Seems to make sense to me; with some slight concerns.
|
How ever I look at this, what @creativedutchmen is proposing just won't work. My Address Location Field for instance; The field itself has 14 columns alone, not to mention the meta data in the fields table. |
I think it's an important discussion, and it's much better to have it here than in a chatroom or email conversation where we will lose it in a while :) Let me add though that every schema we pick now will have its benefits and downsides. I am a big fan of putting the priority on read performance and simplicity, but I might be taking it too far when it comes to writes and schema changes, I don't know.
Having that many fields is a sign that the section itself can be normalised quite a bit. Simply because it is possible shouldn't mean you should also do it :)
Yes. This could definitely be a performance killer when you're adding a field with a default value to a complex section with hundreds of thousands of records, as the default value will have to be inserted for each of these entries. Now, I know this is not pleasant, but keep in mind the migration that deals with this can be executed in the weekends or in the evening The thing I really like about this approach is that it closely resembles a database itself, so everybody with experience in database design will know how data can be structured better when performance problems occur, simply because the same rules apply (the database maps almost 1:1 to what you see in Symphony). |
Sure, other things won't fit either: many to many relations, for example. But the good thing is: it doesn't matter. Simply put, every field should make sure its own data is properly normalised. When your extension needs 14 columns, this is a sign that you could group things, and put them in a separate table. There is absolutely nothing wrong with this. The many-to-many relation need a pivot table for the entries table to remain sane, so it should create one. But much simpler fields like a checkbox, file upload, textbox do not need an extra table, and I think nobody who would create a website using Laravel from scratch would put them in one. |
What John has said is accurate and reflects the design decisions we've made with Symphony to this point. So to put things succinctly:
A way to think about normalised DB is to think of each table as lego blocks. At any point, Symphony can join pieces of lego blocks to form a cohesive data chunk. It can also drop pieces of Lego blocks if it's no longer needed. A less normalised structure is akin to chopping up a block that is meant to be whole. The analogy can be extended to thinking about performance too. It's easier to fetch a single block than it is to fetch pieces of associated blocks and then having to combine them on the go. So the main reason why we chose the higher normal form for Symphony is to do with our thinking about large numbers. First, we tested between large number of columns versus large number of tables. Our tests showed that there was no significant difference in performance when going with either. However, there were differences when it comes to fetch and alter requests. Not surprisingly, it's faster for DB to select from a single, albeit wide table but when it comes to altering the table, performance took a significant hit. At one point I think we even managed to corrupt the table when altering it. With the highly normalised table structure, select queries performed several notches below the alternative. However, as we reduced the number of joins required (simulating fewer included elements in XML), performance started to equalise. Since each table in higher normalised form are all much leaner, there were situations where it was faster than a wide table. That said, this kind of performance gain is not something you can really bank on since we just don't know how the system is going to be used. When it came to altering tables, it was quick and easy to do, given the highly normalised nature of the structure. At one point, we did consider a hybrid approach; core fields were combined, while extension fields were normalised and joined. This idea went on the way side when we decided all fields in Symphony were equal citizens. The nail in the coffin for us really came down to extensions. Knowing that the system did not have to worry about extensions messing up a single shared table is something we found very comforting. It's again important to point out that our decisions were made when ORM was not considered. |
I've missed an important paragraph above. When it comes down to thinking about large numbers, the complexity of the "monolithic table" started to approach on the "small but many tables" structure. Select statements for the monolithic table became more and more sluggish, with fewer optimisation routes possible. With the, "small but many" table structure, we were able to isolate problematic areas and selectively optimise it. Lastly, to solve the performance problem with highly normalised table structure, I proposed the index table solution back during Symposium 2010. A data source can optionally generate an index table. It combines all the required fields into a single table, thus eliminating all joins. As these are just index tables, they can be safely altered or purged without affecting the original table structure. These tables would be super fast and maintain the flexibility offered by a database, as opposed to static cache file solutions. These index tables are essentially a less normalised form of the originating table, that offers the performance benefit, yet avoid the issues of altering large, monolithic tables. |
Allen said a lot of what was in my head. The thing that is yet to be determined is how extensions are going to play a part in Next. Having one table that extensions modify is a risk (I know this is not done directly, migrations/ORM would handle it most of the time) but still the possibility that installing an extension may corrupt the entire table is one that I imagine most users would like to avoid. At the moment, because extensions are responsible for managing their own tables/structures, the risk is mitigated. Sure your site might come down with an exception because of a failed MySQL query, but generally speaking, your data is safe. ORM isn't a magic bullet. It shifts the responsibly from individuals to those that manage the ORM code, so while it's fine to say "the ORM handles it", at the end of the day this is still going to invoke an SQL query that cannot be optimised. I'm actually anticipating a performance drop when switching to an ORM, but without numbers it's just a gut feeling at the moment :) |
@allen I do appreciate the point about data corruption, and I think it should be a very important factor indeed. I am not so sure I agree with your points about performance. Right now the queries that every DS produces are so complex I can hardly understand why they are so slow sometimes. Rewriting the query from scratch normally fixes these problems, but this is a huge pain as I have to go back and forth between tables to see which tables I have to join. My point being: because the structure is complex it is hard to write a DS generator that produces efficient queries. For example: the SBL suffers from the dreaded |
Can you elaborate the n+1 syndrome? You lost me there. |
@designermonkey sure, it's where you create a query to fetch the children of a parent, then you query each child for its data (1 query to get the children, then N queries to get the data of each child). This is the reason I rewrite quite a few of my datasources: getting the number of queries down from 150+ for each datasource to one (or a handful, when the query gets too complex). At least, I suspect the SBL to be the main cause of this, as I noticed that datasources that relied on an sbl field were the worst in this case. It could be that the SBL itself is not the culprit but merely shows the effects better, but still. Does that clarify what I mean? By the way: before you object saying these queries are small, fast and cached; they might be, but when you introduce a bit of latency between your webserver and your database you'll feel the pain. |
It certainly does now, thanks. |
Me? Object?? ;o) I think we may find that Eloquent and Fluent will be worse in this respect whatever we do. We may end up having to make commits to those classes to help make them more efficient IMO, but I havent tried yet, so will shut up there. |
some years old... how its comming? When designing a database with ORM or without - imaging the querys you run against it. Example: Building the Add / Edit Entrie View. In designermonkey's schema you would need to query: You can join those calls entryID > n:1 > sectionID > 1:n > field row Than you query: Basicly - you hit the database all over the place, to get very, very few data out of it. Because you use many dynamic table names - the server cant prepare the querys very well. Worse: you hit basicly the same tables every time a this section is used in the frontend. ORM let you build schemas on the fly. I would suggest to use field meta information to build one entry tables for every section. O means Object. Symphonys Object should be The Blog Post, The Comment, The Page, The User, The Product, etc.. - and not the abstract thing the system use to managed all that stuff. As alternative you could use views (if that ORM provide sutch thing) - in that case you would have-. A view_blogposts joining the tables field_author, field_title, field_text by sectionID & entryID This way you could also provide different views on same Objects. Basicly you could model many datasources as sql views. |
With the new project, and the fact we're using a new way of modelling the database (Eloquent), the question has come up about how we structure our tables.
Now, currently, we have a super efficient, super scary abstract structure which without a lot of experience in staring at it trying to denote the sorcery that gave birth to it's arcane structure is very very hard to manage outside of Symphony. With Eloquent, we will be relinquishing some of that abstraction to the ORM, which from a programming and storing point of view is a very good idea.
This doesn't mean though that we need to fully leave our abstract structure behind us, and it's with that in mind that I've been doing some playing and thinking about what would be the easiest, safest and efficient way of utilising our method of working (Sections, Fields, Entries etc) with a less abstract data model.
With Eloquent (if you've not read about it, I suggest you do, it's really good) we would be building a Model for every table in the DB, and would also have our fields do the same.
Currently, we make an entry in the
fields
table for each Field, and then have thefield_fieldtype
table for meta data for that Field. Then, we add a data table for every field instance, using an ID for reference in the name. While very abstract, this is going to be a nightmare in the Next project.Now. I'm thinking that with the new Models, we should continue to have a
fields
table, but instead of a single Field instance table, and many tables for data, we have a Field schema table and a Field data table. Like so:In the schema and data tables, there would be an
id
column, for uniqueness, and afield_id
column to reference the relevant Field, and in, which would allow multiple value fields to store their multiple values in the same table. Also in the data table would obviously be anentry_id
. I've added some SQL statements below to illustrate this in practice (although I hear we may not need SQL in Laravel?)...The text was updated successfully, but these errors were encountered: