I have a spreadsheet which is currently in OpenOffice ods format but obviously can easily be saved in another format should I need to.
The spreadsheet holds roughly 6200 rows with 8 columns. I'd like to import 6 of those columns into an SQLite database and more to the point, do it automatically/programmatically rather than by hand.
There is another difficulty (it's difficult for me, it might not be for you!) is that I want the tables to be created from the names in column 3. Some of those columns are numerical and SQLite doesn't appear to like that.
The columns look like this :- Name1 Number Name2 Y/N Where Notes Unwanted1 & Unwanted2
For Name2, the rows repeat so there might be half a dozen of the same names, like 1648, 8Sail, Abbeydale against the rows in Name1.
Is there somehow I can run a program, Perl, Python or whatever to extract the data and automatically squirt it into a database?
I've searched online for something to help and did grab a Python script but it doesn't work for me and as my Python skills, like all my other programming skills amount to zero, I'm stuck, hence the question here.
I'd appreciate any help or pointers you could give.
On 12-Jun-11 11:14:10, Chris Walker wrote:
I have a spreadsheet which is currently in OpenOffice ods format but obviously can easily be saved in another format should I need to.
The spreadsheet holds roughly 6200 rows with 8 columns. I'd like to import 6 of those columns into an SQLite database and more to the point, do it automatically/programmatically rather than by hand.
There is another difficulty (it's difficult for me, it might not be for you!) is that I want the tables to be created from the names in column 3. Some of those columns are numerical and SQLite doesn't appear to like that.
The columns look like this :- Name1 Number Name2 Y/N Where Notes Unwanted1 & Unwanted2
For Name2, the rows repeat so there might be half a dozen of the same names, like 1648, 8Sail, Abbeydale against the rows in Name1.
Is there somehow I can run a program, Perl, Python or whatever to extract the data and automatically squirt it into a database?
I've searched online for something to help and did grab a Python script but it doesn't work for me and as my Python skills, like all my other programming skills amount to zero, I'm stuck, hence the question here.
I'd appreciate any help or pointers you could give.
Can you elaborate on "I want the tables to be created from the names in column 3"?
Basically, this looks as though it would be a smooth little job using 'awk', with some backup from a simple shell wrapper, but the details would depend on how you want the extraction to come out at the end of the day. An example, maybe?
Ted.
-------------------------------------------------------------------- E-Mail: (Ted Harding) ted.harding@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 12-Jun-11 Time: 12:52:03 ------------------------------ XFMail ------------------------------
On 12/06/11 12:52, (Ted Harding) wrote:
On 12-Jun-11 11:14:10, Chris Walker wrote:
I have a spreadsheet which is currently in OpenOffice ods format but obviously can easily be saved in another format should I need to.
The spreadsheet holds roughly 6200 rows with 8 columns. I'd like to import 6 of those columns into an SQLite database and more to the point, do it automatically/programmatically rather than by hand.
There is another difficulty (it's difficult for me, it might not be for you!) is that I want the tables to be created from the names in column 3. Some of those columns are numerical and SQLite doesn't appear to like that.
The columns look like this :- Name1 Number Name2 Y/N Where Notes Unwanted1& Unwanted2
For Name2, the rows repeat so there might be half a dozen of the same names, like 1648, 8Sail, Abbeydale against the rows in Name1.
[snip]
Can you elaborate on "I want the tables to be created from the names in column 3"?
This isn't my data but imagine a list of pub names in Name1, a score in Number, Brewery Name in Name2, Yes/No in Y/N, a simple address in Where and random text in Notes.
I want the table to be called the Brewery Name e.g. Woodfordes.
Basically, this looks as though it would be a smooth little job using 'awk', with some backup from a simple shell wrapper, but the details would depend on how you want the extraction to come out at the end of the day. An example, maybe?
I hope my example above illustrates what I want to do.
Chris Walker cdw_alug@the-walker-household.co.uk wrote:
Can you elaborate on "I want the tables to be created from the names in column 3"?
This isn't my data but imagine a list of pub names in Name1, a score in Number, Brewery Name in Name2, Yes/No in Y/N, a simple address in Where and random text in Notes.
So you want multiple tables, with each tables name being the Brewery Name?
Why not have one big table, with the Brewery Name as a column?
Richard
On 13/06/11 10:20, Richard Parsons wrote:
Chris Walkercdw_alug@the-walker-household.co.uk wrote:
Can you elaborate on "I want the tables to be created from the names in column 3"?
This isn't my data but imagine a list of pub names in Name1, a score in Number, Brewery Name in Name2, Yes/No in Y/N, a simple address in Where and random text in Notes.
So you want multiple tables, with each tables name being the Brewery Name?
Why not have one big table, with the Brewery Name as a column?
I already have a big table of Brewery Names. I then want to be able to click on the Brewery Name and have SQL display a list of their beers.
Here I want to extract a list of the beers (plus other data) from the OO spreadsheet.
Chris Walker cdw_alug@the-walker-household.co.uk wrote:
On 13/06/11 10:20, Richard Parsons wrote:
Chris Walkercdw_alug@the-walker-household.co.uk wrote:
Can you elaborate on "I want the tables to be created from the names in column 3"?
This isn't my data but imagine a list of pub names in Name1, a score in Number, Brewery Name in Name2, Yes/No in Y/N, a simple address in Where and random text in Notes.
So you want multiple tables, with each tables name being the Brewery Name?
Why not have one big table, with the Brewery Name as a column?
I already have a big table of Brewery Names. I then want to be able to click on the Brewery Name and have SQL display a list of their beers.
Here I want to extract a list of the beers (plus other data) from the OO spreadsheet.
Maybe, you need to re-think your proposed database structure. I'm no expert, but when you add data (like another Brewery Name) to a database you shouldn't need to add a new table. Have a look at this article which I think may help: http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html
Thanks Richard
On 13/06/11 10:39, Richard Parsons wrote:
Maybe, you need to re-think your proposed database structure. I'm no expert, but when you add data (like another Brewery Name) to a database you shouldn't need to add a new table. Have a look at this article which I think may help: http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html
You probably hit the nail on the head with your first comment :-)
I'll take a look at the site. Thanks for that.
On 13 Jun 10:39, Richard Parsons wrote:
Chris Walker cdw_alug@the-walker-household.co.uk wrote:
On 13/06/11 10:20, Richard Parsons wrote:
Chris Walkercdw_alug@the-walker-household.co.uk wrote:
Can you elaborate on "I want the tables to be created from the names in column 3"?
This isn't my data but imagine a list of pub names in Name1, a score in Number, Brewery Name in Name2, Yes/No in Y/N, a simple address in Where and random text in Notes.
So you want multiple tables, with each tables name being the Brewery Name?
Why not have one big table, with the Brewery Name as a column?
I already have a big table of Brewery Names. I then want to be able to click on the Brewery Name and have SQL display a list of their beers.
Here I want to extract a list of the beers (plus other data) from the OO spreadsheet.
Maybe, you need to re-think your proposed database structure. I'm no expert, but when you add data (like another Brewery Name) to a database you shouldn't need to add a new table. Have a look at this article which I think may help: http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html
So, what I'd be looking at is:
Table : Brewery Columns: id name description
Table : Beer Columns: id brewery_id name description
Thus having a one to many relationship between brewery and beer, storing the id of the brewery in the beer means that you can change the fields of the brewery (in the event of it being renamed, or the description changing, etc), and that applies immediately to all beers of the brewery.
This is normal database normalisation.
Cheers,
Brett Parker iDunno@sommitrealweird.co.uk wrote:
So, what I'd be looking at is:
Table : Brewery Columns: id name description Table : Beer Columns: id brewery_id name description
That looks much better to me. Now you need to write some sql that creates that database. Have a look at the sqlite documentation for that.
Thus having a one to many relationship between brewery and beer, storing the id of the brewery in the beer means that you can change the fields of the brewery (in the event of it being renamed, or the description changing, etc), and that applies immediately to all beers of the brewery.
This is normal database normalisation.
That's a mouthful. :)
Now back to your original question. Can you convert the spreadsheet to csv? That will make it easier to parse. Perhaps then give the first few lines of that csv, so we can help with parsing it.
Also, which scripting language do you want to do this in? Bash is probably easiest for me. Python will do it easily enough. Perl can do it, but I don't know perl (others on the list probably do). Someone suggested awk: that would probably be neater that bash/python, but I'm sure not as experienced in it. I'm sure there's lots more options (ruby etc).
Richard
On 13 Jun 11:37, Richard Parsons wrote:
Brett Parker iDunno@sommitrealweird.co.uk wrote:
So, what I'd be looking at is:
Table : Brewery Columns: id name description Table : Beer Columns: id brewery_id name description
That looks much better to me. Now you need to write some sql that creates that database. Have a look at the sqlite documentation for that.
Thus having a one to many relationship between brewery and beer, storing the id of the brewery in the beer means that you can change the fields of the brewery (in the event of it being renamed, or the description changing, etc), and that applies immediately to all beers of the brewery.
This is normal database normalisation.
That's a mouthful. :)
Now back to your original question. Can you convert the spreadsheet to csv? That will make it easier to parse. Perhaps then give the first few lines of that csv, so we can help with parsing it.
Also, which scripting language do you want to do this in? Bash is probably easiest for me. Python will do it easily enough. Perl can do it, but I don't know perl (others on the list probably do). Someone suggested awk: that would probably be neater that bash/python, but I'm sure not as experienced in it. I'm sure there's lots more options (ruby etc).
I would *never* suggest parsing csv in bash. Ever. Python's csv module is lovely, and perl's DBD::CSV is fine. Awk hasn't got a csv parser, and reinvention of wheels is bad :)
It appears that there are python modules for interactive with openoffice, but I've not played with them yet, so I can't recommend them. Certainly previously when faced with a similar situation I've gone the csv and python route because it was the path of least resistance (and I was in a hurry...)
On 13-Jun-11 11:02:16, Brett Parker wrote:
On 13 Jun 11:37, Richard Parsons wrote:
Brett Parker iDunno@sommitrealweird.co.uk wrote:
So, what I'd be looking at is:
Table : Brewery Columns: id name description Table : Beer Columns: id brewery_id name description
That looks much better to me. Now you need to write some sql that creates that database. Have a look at the sqlite documentation for that.
Thus having a one to many relationship between brewery and beer, storing the id of the brewery in the beer means that you can change the fields of the brewery (in the event of it being renamed, or the description changing, etc), and that applies immediately to all beers of the brewery.
This is normal database normalisation.
That's a mouthful. :)
Now back to your original question. Can you convert the spreadsheet to csv? That will make it easier to parse. Perhaps then give the first few lines of that csv, so we can help with parsing it.
Also, which scripting language do you want to do this in? Bash is probably easiest for me. Python will do it easily enough. Perl can do it, but I don't know perl (others on the list probably do). Someone suggested awk: that would probably be neater that bash/python, but I'm sure not as experienced in it. I'm sure there's lots more options (ruby etc).
I would *never* suggest parsing csv in bash. Ever. Python's csv module is lovely, and perl's DBD::CSV is fine. Awk hasn't got a csv parser,
Err, I respectfully disagree ... (in other words, b*****ks)!
Copy the following into a console:
awk 'BEGIN{FS=","; i=0}; {i=i+1; print $i}' << EOT Row 1,Field 1.1,Field 1.2,Field 1.3 Row 2,Field 2.1,Field 2.2,Field 2.3 Row 3,Field 3.1,Field 3.2,Field 3.3 Row 4,Field 4.1,Field 4.2,Field 4.3 EOT
and you will get:
Row 1 Field 2.1 Field 3.2 Field 4.3
(printing field i=1 from line 1, field i=2 from line 2, ... ).
So awk has parsed the CSV! The trick is to set the Field Separator (FS=",") to comma. After that you can do what you like.
Ted.
and reinvention of wheels is bad :)
It appears that there are python modules for interactive with openoffice, but I've not played with them yet, so I can't recommend them. Certainly previously when faced with a similar situation I've gone the csv and python route because it was the path of least resistance (and I was in a hurry...) -- Brett Parker
-------------------------------------------------------------------- E-Mail: (Ted Harding) ted.harding@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 13-Jun-11 Time: 12:35:30 ------------------------------ XFMail ------------------------------
On 13 Jun 12:35, Ted Harding wrote:
On 13-Jun-11 11:02:16, Brett Parker wrote:
I would *never* suggest parsing csv in bash. Ever. Python's csv module is lovely, and perl's DBD::CSV is fine. Awk hasn't got a csv parser,
Err, I respectfully disagree ... (in other words, b*****ks)!
Copy the following into a console:
awk 'BEGIN{FS=","; i=0}; {i=i+1; print $i}' << EOT Row 1,Field 1.1,Field 1.2,Field 1.3 Row 2,Field 2.1,Field 2.2,Field 2.3 Row 3,Field 3.1,Field 3.2,Field 3.3 Row 4,Field 4.1,Field 4.2,Field 4.3 EOT
and you will get:
Row 1 Field 2.1 Field 3.2 Field 4.3
(printing field i=1 from line 1, field i=2 from line 2, ... ).
So awk has parsed the CSV! The trick is to set the Field Separator (FS=",") to comma. After that you can do what you like.
I respectfully call you a fucking moron and ask you what happens if your csv does something like:
"Row 1","This is a description, it's quite long and contains commas",97 "Row 2","This is a different description without commas",12
Now, please, ffs, learn that just because it's comma seperated doesn't mean that the field doesn't contain commas. The number of times I've sat and had to debug someone elses shitty code because rather than using a csv library that's been tested they decided that they could just split on commas is no longer funny. And when people turn up on a list with a lot of technical people on it suggesting doing just this it makes me Very Angry. And you don't want to see me angry.
On 13/06/11 12:54, Brett Parker wrote:
On 13 Jun 12:35, Ted Harding wrote:
On 13-Jun-11 11:02:16, Brett Parker wrote:
I would *never* suggest parsing csv in bash. Ever. Python's csv module is lovely, and perl's DBD::CSV is fine. Awk hasn't got a csv parser,
Err, I respectfully disagree ... (in other words, b*****ks)!
[snip]
I respectfully call you a fucking moron and ask you what happens if your csv does something like:
People, please. I'm glad I started this now!
on commas is no longer funny. And when people turn up on a list with a lot of technical people on it suggesting doing just this it makes me Very Angry. And you don't want to see me angry.
Do you turn green? ;-)
I always preferred TSV or fixed width data rather than CSV because of the possibility of the data containing commas. Mine does here.
I'm away from the keyboard this afternoon and when I return I'll be looking at the website that Richard mentioned.
While I'm away, can you play nicely please?
On 13 Jun 13:57, Chris Walker wrote:
On 13/06/11 12:54, Brett Parker wrote:
On 13 Jun 12:35, Ted Harding wrote:
On 13-Jun-11 11:02:16, Brett Parker wrote:
I would *never* suggest parsing csv in bash. Ever. Python's csv module is lovely, and perl's DBD::CSV is fine. Awk hasn't got a csv parser,
Err, I respectfully disagree ... (in other words, b*****ks)!
[snip]
I respectfully call you a fucking moron and ask you what happens if your csv does something like:
People, please. I'm glad I started this now!
Well, you know, I take exception to people telling me "b*****ks" and give an example that fails on a trivial dataset...
on commas is no longer funny. And when people turn up on a list with a lot of technical people on it suggesting doing just this it makes me Very Angry. And you don't want to see me angry.
Do you turn green? ;-)
I always preferred TSV or fixed width data rather than CSV because of the possibility of the data containing commas. Mine does here.
Same problem, content can contain tabs... this is why the CSV spec ain't small, and actually covers many different delimeters, and what to do with them...
And as you've just proven my point (the content does contain commas...) I think it was a justified response :)
I'm away from the keyboard this afternoon and when I return I'll be looking at the website that Richard mentioned.
While I'm away, can you play nicely please?
I always play nice. It just might not seem that way occasionally... I'm saving hours of debugging and trying to work out "why did that break like that?" when given a non-trivial dataset.
On 13/06/11 11:37, Richard Parsons wrote:
Brett ParkeriDunno@sommitrealweird.co.uk wrote:
So, what I'd be looking at is:
Table : Brewery Columns: id name description Table : Beer Columns: id brewery_id name description
That looks much better to me. Now you need to write some sql that creates that database. Have a look at the sqlite documentation for that.
Writing the code would be a problem for me, so, I cheat! I use this - http://sourceforge.net/projects/sqlitebrowser/ under WindowsXP running in VirtualBox.
Thus having a one to many relationship between brewery and beer, storing the id of the brewery in the beer means that you can change the fields of the brewery (in the event of it being renamed, or the description changing, etc), and that applies immediately to all beers of the brewery.
This is normal database normalisation.
That's a mouthful. :)
Now back to your original question. Can you convert the spreadsheet to csv? That will make it easier to parse. Perhaps then give the first few lines of that csv, so we can help with parsing it.
Yes I can, but having seen the 'discussion' that erupted afterwards, I'd say TSV is preferred. Like others, I spent ages trying to fix data import for a certain large insurance company for mailshots.
Also, which scripting language do you want to do this in? Bash is probably easiest for me. Python will do it easily enough. Perl can do it, but I don't know perl (others on the list probably do). Someone suggested awk: that would probably be neater that bash/python, but I'm sure not as experienced in it. I'm sure there's lots more options (ruby etc).
I *might* be able to cope with Python. I have no experience of Perl but Python having similar structures to BBC Basic - my only programming experience - I'd be prepared to give that a go.
On 13/06/11 11:06, Brett Parker wrote:
On 13 Jun 10:39, Richard Parsons wrote:
Maybe, you need to re-think your proposed database structure. I'm no expert, but when you add data (like another Brewery Name) to a database you shouldn't need to add a new table. Have a look at this article which I think may help: http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html
So, what I'd be looking at is:
Table : Brewery Columns: id name description Table : Beer Columns: id brewery_id name description
Thus having a one to many relationship between brewery and beer, storing the id of the brewery in the beer means that you can change the fields of the brewery (in the event of it being renamed, or the description changing, etc), and that applies immediately to all beers of the brewery.
This is normal database normalisation.
I've created a couple of tables. This what they look like (csv output from the creator program) Brewery :- "ID","Name","Web_URL","Address"
ID is the 'TEXT_PRIMARY_KEY' and the rest of the fields are set as text.
Beers :- "ID","Beer_Name","Brewery_ID","ABV","Consumed_Yes_No","Where_Consumed","Tasting_Notes"
Again, ID is the 'TEXT_PRIMARY_KEY' and the rest of the fields are set as text. I wondered if I should have set Consumed_Yes_No as a BOOL element but I can't see how to do that in the program I'm using. Is SQL itself better at that or is it unnecessary?
So the first real question is, have I done that correctly? I wondered if I need an ID in the Beers table or could the Beer_Name be used as that? I realise though that lots of beers have the same name and the web site cautions against that.
Second question. Do I link the relationship between the tables in SQL code or in the program code that will make use of this data? I can see how to do the latter from this section of code :-
model->setTable("employee");
model->setRelation(2, QSqlRelation("city", "id", "name")); model->setRelation(3, QSqlRelation("country", "id", "name"));
The code is taken from one of the examples used here - http://doc.qt.nokia.com/4.7-snapshot/qsqlrelationaltablemodel.html
On 13/06/11 10:20, Richard Parsons wrote:
Chris Walkercdw_alug@the-walker-household.co.uk wrote:
Can you elaborate on "I want the tables to be created from the names in column 3"?
This isn't my data but imagine a list of pub names in Name1, a score in Number, Brewery Name in Name2, Yes/No in Y/N, a simple address in Where and random text in Notes.
So you want multiple tables, with each tables name being the Brewery Name?
That was my initial thought but I'm wading through the advice on the site you gave and now I'm far from sure that I'm doing things the right way at all.
Why not have one big table, with the Brewery Name as a column?
Give me a little while to digest the info on the web site, and I'll come back with some ideas, based on what I *think* I've learnt.