{"id":2654,"date":"2010-05-21T14:09:25","date_gmt":"2010-05-21T13:09:25","guid":{"rendered":"http:\/\/www.walkingrandomly.com\/?p=2654"},"modified":"2010-05-21T14:09:25","modified_gmt":"2010-05-21T13:09:25","slug":"matlab-tutorial-reading-csv-files","status":"publish","type":"post","link":"https:\/\/walkingrandomly.com\/?p=2654","title":{"rendered":"MATLAB Tutorial &#8211; Reading csv files"},"content":{"rendered":"<p>Reading <a href=\"http:\/\/en.wikipedia.org\/wiki\/Comma-separated_values\">comma separated value<\/a> (csv) files into <a href=\"http:\/\/www.mathworks.com\/\">MATLAB<\/a> is trivial as long as the csv file you are trying to import is trivial. For example, say you wanted to import the file <a href=\"https:\/\/www.walkingrandomly.com\/images\/matlab\/csv\/very_clean.txt\">very_clean.txt <\/a>which contains the following data<\/p>\n<pre>1031,-948,-76\r\n507,635,-1148\r\n-1031,948,750\r\n-507,-635,114<\/pre>\n<p>The following, very simple command, is all that you need<\/p>\n<pre>&gt;&gt; veryclean = csvread('very_clean.txt')\r\n\r\nveryclean =\r\n\r\n        1031        -948         -76\r\n         507         635       -1148\r\n       -1031         948         750\r\n        -507        -635         114<\/pre>\n<p>In the real world, however, your data is rarely this nice and clean. One of the most common problems faced by MATLABing data importers is that of header lines. Take the file <a href=\"https:\/\/www.walkingrandomly.com\/images\/matlab\/csv\/quite_clean.txt\">quite_clean.txt <\/a>for instance. This is identical to the previous example apart from the fact that it contains two header lines<\/p>\n<pre>These are some data that I made using my hand-crafted code\r\nDate:12th July 1996\r\n1031,-948,-76\r\n507,635,-1148\r\n-1031,948,750\r\n-507,-635,114<\/pre>\n<p>This is all too much for the <strong>csvread<\/strong> command<\/p>\n<pre>&gt;&gt; data=csvread('quite_clean.txt')\r\n??? Error using ==&gt; dlmread at 145\r\nMismatch between file and format string.\r\nTrouble reading number from file (row 1, field 1) ==&gt; This\r\n\r\nError in ==&gt; csvread at 52\r\n    m=dlmread(filename, ',', r, c);<\/pre>\n<p>Not to worry, we can just use the more capable <a href=\"http:\/\/www.mathworks.com\/access\/helpdesk\/help\/techdoc\/ref\/importdata.html\">importdata<\/a> command instead<\/p>\n<pre>&gt;&gt; quiteclean = importdata('quite_clean.txt')\r\n\r\nquiteclean = \r\n\r\n        data: [4x3 double]\r\n    textdata: {2x1 cell}<\/pre>\n<p>The result above is a two element <a href=\"http:\/\/www.mathworks.com\/access\/helpdesk\/help\/techdoc\/ref\/struct.html\">structure array<\/a> and our numerical values are contained in a field called data. Here&#8217;s how you get at it.<\/p>\n<pre>&gt;&gt; quiteclean.data\r\n\r\nans =\r\n\r\n        1031        -948         -76\r\n         507         635       -1148\r\n       -1031         948         750\r\n        -507        -635         114<\/pre>\n<p>So far so good. How do we handle a file like <a href=\"https:\/\/www.walkingrandomly.com\/images\/matlab\/csv\/messy_data.txt\">messy_data.txt <\/a>though?<\/p>\n<pre>header 1;\r\nheader 2;\r\n1031,-948,-76, ,\"12\"\r\n507,635,-1148, ,\"34\"\r\n-1031,948,750, ,\"45\"\r\n-507,-635,114, ,\"67\"<\/pre>\n<p>This is the kind of file encountered by Walking Randomly reader<a href=\"https:\/\/www.walkingrandomly.com\/?p=1502#comment-24690\"> &#8216;reen&#8217;<\/a> and it contains exactly the same numerical values as the previous two examples. Unfortunately, it also contains some cruft that makes life more difficult for us. Let&#8217;s bring out the big-guns!<\/p>\n<h3>Using textscan to import csv files in MATLAB<\/h3>\n<p>When the going gets tough, the tough use <a href=\"http:\/\/www.mathworks.co.uk\/access\/helpdesk\/help\/techdoc\/ref\/textscan.html\">textscan<\/a>.\u00a0 Here&#8217;s the incantation for importing messy_data.txt<\/p>\n<pre>fid=fopen('messy_data.txt');\r\ndata = textscan(fid,'%f %f %f %*s %*s','HeaderLines',2,'Delimiter',',','CollectOutput',1);\r\nfclose(fid)<\/pre>\n<p>The result is a one-element cell array that contains an array of doubles.\u00a0 Let&#8217;s get the array of doubles out of the cell<\/p>\n<pre>&gt;&gt; data=data{1}\r\ndata =\r\n        1031        -948         -76\r\n         507         635       -1148\r\n       -1031         948         750\r\n        -507        -635         114<\/pre>\n<p>If the<strong> importdata<\/strong> command is a chauffeur then <strong>textscan<\/strong> is a Ferrari and I don&#8217;t know about you but I&#8217;d much rather be driving my own Ferrari than being chauffeured around (John Cook over at The Endeavour has <a href=\"http:\/\/www.johndcook.com\/blog\/2010\/04\/27\/chauffers-and-ferraris-revisited\/\">more to say on Ferraris and Chauffeurs<\/a>).<\/p>\n<p>Let&#8217;s de-construct the above set of commands.\u00a0 The first thing to notice is that, unlike <strong>csvread<\/strong> and <strong>importdata<\/strong>, you have to explicitly open and close your file when using the<strong> textscan<\/strong> command.\u00a0 So, you open your file using<strong> fopen<\/strong> and give it a file ID (which in this example is <strong>fid<\/strong>).<\/p>\n<pre>fid=fopen('messy_data.txt');<\/pre>\n<p>The first argument to textscan is just this file ID, fid. Next you need to supply a conversion specifier which in this case is<\/p>\n<pre>'%f %f %f %*s %*s'<\/pre>\n<p>The conversion specifier tells <strong>textscan<\/strong> what you want each row in your csv file to be converted to. <strong>%f<\/strong> means <em>&#8220;64 bit double&#8221; <\/em>and<strong> %s<\/strong> means <strong>&#8220;string&#8221;<\/strong> so <strong>&#8216;%f %f %f %s %s&#8217;<\/strong> means &#8220;3 doubles followed by 2 strings&#8221; (we&#8217;ll get onto the asterisks in the original specifier later). You can use all sorts of data types in a conversion specifier such as integers, quoted strings and pattern matched strings among others. Check out the MATLAB <a href=\"http:\/\/www.mathworks.co.uk\/access\/helpdesk\/help\/techdoc\/ref\/textscan.html\">documentation for textscan<\/a> for the full list but an abbreviated list is shown below:<\/p>\n<pre>%d signed 32bit integer\r\n%u unsigned 32bit integer\r\n%f 64bit double (you'll want this most of the time when using MATLAB)\r\n%s string<\/pre>\n<p>Now, in the command I used to import messy_data.txt the conversion specifier contained some asterisks such as <strong>%*s<\/strong> so what do these mean?\u00a0 Quite simply, the asterisk just means <em>&#8216;ignore&#8217; <\/em>so <strong>%*s<\/strong> means <em>&#8216;ignore the string in this field&#8217;<\/em>.\u00a0 So, the full meaning of my conversion specifier <strong>&#8216;%f %f %f %*s %*s&#8217;<\/strong> is <em>&#8220;read 3 doubles and ignore 2 strings&#8221; <\/em>and textscan will do this for every row.<\/p>\n<p>The rest of the command is pretty self explanatory but I&#8217;ll explain it anyway for the sake of completeness<\/p>\n<pre>'HeaderLines',2<\/pre>\n<p>The file has 2 headerlines which should be ignored<\/p>\n<pre>'Delimiter',','<\/pre>\n<p>The fields are delimited (a posh word for separated) by a comma<\/p>\n<pre>'CollectOutput',1<\/pre>\n<p>If you supply a 1 (which stands for True) to the CollectOutput option then textscan will join consecutive output cells with the same data type into a single array. Since I want all of my doubles to be in a single array then this is the behaviour I went for.<\/p>\n<p>Finally, once you have finished textscanning, don&#8217;t forget to close your file<\/p>\n<pre>fclose(fid)<\/pre>\n<p>That&#8217;s pretty much it for this mini-tutorial &#8211; I hope you find it useful.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Reading comma separated value (csv) files into MATLAB is trivial as long as the csv file you are trying to import is trivial. For example, say you wanted to import the file very_clean.txt which contains the following data 1031,-948,-76 507,635,-1148 -1031,948,750 -507,-635,114 The following, very simple command, is all that you need &gt;&gt; veryclean = [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,7,42],"tags":[],"class_list":["post-2654","post","type-post","status-publish","format-standard","hentry","category-matlab","category-programming","category-tutorials"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p3swhs-GO","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts\/2654","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2654"}],"version-history":[{"count":19,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts\/2654\/revisions"}],"predecessor-version":[{"id":2673,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts\/2654\/revisions\/2673"}],"wp:attachment":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2654"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2654"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2654"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}