Working with arrays for storing data in MySQL

TL;DR : what is the best way to store array-like information (that is, more than one information) in MySQL tables?

Newbie here.

I’m starting a site that’s pretty much a Wikipedia but, instead of articles, it has only listed informations (I don’t have the pretension of beating Wikipedia or whatever, it’s a personal (open source, ASAP) project that will help me filling information I read). Example:

Instead of “Socrates” page:

"Socrates.

Socrates was a philosopher from Ancient Greece that lived in Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua…"

One gets:

"Socrates.

  1. Lived in Athens 400BC (source)
  2. Philosopher from Ancient Greece (source)
  3. Teacher of Plato (source)
  4. etc.

An important difference to Wikipedia (amongst other differences) is that all listed infos must have a source and may be voted by other users.

To implement this idea, I wanted to use JS for user interaction (e.g., filtering or ordering information), PHP for server scripts and MySQL to store all data (infos, comments, users, etc.). Thinking about the voting aspect, I thought about creating a table in the database for each information (in the former Socrates example, “Lived in Athens 400BC”) that is voted up or down. The “votes” table would have these columns:

CREATE TABLE IF NOT EXISTS votes (
    info_id INT UNSIGNED NOT NULL, -- the id of the information (e.g. "Lived in Athens 400BC", id number: 42) that is stored in another table called 'informations'
    upvoted_users MEDIUMTEXT NOT NULL, -- a string with the ids (stored along with 'username' etc in another table called 'users') of all the users that upvoted the information 
    downvoted_users MEDIUMTEXT NOT NULL
)  DEFAULT CHARSET UTF8;

The string for upvoted_users ids would be something like: 'userid1, userid2, user white space id3, ...'. This string would be parsed by PHP and, after calculating count(array(upvoted_users)) - count(array(downvoted_users)), the number would be sent to display on the page the voting number of the information.

My question: Am I doing this the right way? Suppose that instead of up/down voting I wanted to store a 0-10 [int] rating that the users could give, would storing rating_users as a string (userId:rate): 'userid1:4, userid2:9, user white space id3:0, ...' be good or is there a better way for doing it?

As an aside, where do I go to ask people to join my project? I thought about the section “Project Feedback” in this forum, but I don’t have the project ready yet.

Thanks a lot, and happy 2017!

You have the basic idea for the votes table correct. Instead of storing up votes and down votes as one line comma separated strings you really just want to have a row stored for each up/down vote based on info_id.

You will get your total by counting the number of rows returned for any given info_id.

As far as the data types go. You don’t need MediumText for the up/down votes. I imagine your user IDs are integers and not strings and these should match types between the tables. One because it makes sense that they should match and two for performance reasons.

1 Like

Thank you for the input! But I’m not sure I understood it correctly, you mean that I should create a table with more or less these columns? :

vote_id  |  info_id  |   user_id  |  up_or_down

Row example:

23            34           42         1 (up, from -1 or +1)

I think this is much saner, and has the benefit of tracking users votes etc. But do you think this could impact performance since every time a user accesses a webpage MySQL will have to create a temporary table for a specific info_id and sum() the up_or_down column? On the other hand, as a string, PHP would have to parse it anyway… Or maybe I could store the total votes value in the ìnfo table along with the info_id and just update it… I’m asking this because I just learned MySQL and I never used it, so I don’t know if in real life it can slow down things.

And I agree with what you said about ints instead of mediumtext, but did you have a specific case in mind for the performance reasons you mentioned? For example, because text is not stored in the row?

Thanks a lot for the explanations!

1 Like

Yes that is what I was talking about. You can store many (thousands upon thousands) rows, and you wont see any slowdown because you would call the count function built into MySQL to return the total # of records where info_id=info_id_of_choosen_topic so that is only giving you one value…MySQL is doing the count, not PHP (I mean you could also pull all records in PHP and get the count that way)…but doing that is a lot of overhead…you will learn that isn’t the best way to do it for performance.

The reason I mentioned having the data types the same is because you will probably be doing joins on your tables. That is where the performance comes into play. Also if they aren’t the same type there will be some conversions happening to make them the same for comparison (again for joins) that might produce unexpected results.

1 Like

Excellent, thank you very much for the replies!