{"id":6453,"date":"2018-04-12T03:34:45","date_gmt":"2018-04-12T02:34:45","guid":{"rendered":"http:\/\/www.walkingrandomly.com\/?p=6453"},"modified":"2018-04-12T06:09:00","modified_gmt":"2018-04-12T05:09:00","slug":"string-sorting-in-r-appears-to-use-different-ordering-from-everyone-else","status":"publish","type":"post","link":"https:\/\/walkingrandomly.com\/?p=6453","title":{"rendered":"String sorting in R appears to use different ordering from everyone else"},"content":{"rendered":"<p><strong>Update<\/strong><br \/>\nA <a href=\"https:\/\/twitter.com\/walkingrandomly\/status\/984336870485176321\">discussion on twitter<\/a> determined that this was an issue with <a href=\"https:\/\/en.wikipedia.org\/wiki\/Locale_(computer_software)\">Locales<\/a>. The practical upshot is that we can make R act the same way as the others by doing <\/p>\n<pre><code>Sys.setlocale(\"LC_COLLATE\", \"C\")<\/pre>\n<p><\/code> which may or may not be what you should do! <\/p>\n<p><strong>Original post<\/strong><\/p>\n<p>While working on a project that involves using multiple languages, I noticed some tests failing in one language and not the other. Further investigation revealed that this was essentially because R's default sort order for strings is different from everyone else's.<\/p>\n<p>I have no idea how to say to R 'Use the sort order that everyone else is using'. Suggestions welcomed.<\/p>\n<p><strong>R 3.3.2<\/strong><\/p>\n<pre><code>sort(c(\"#b\",\"-b\",\"-a\",\"#a\",\"a\",\"b\"))\r\n\r\n[1] \"-a\" \"-b\" \"#a\" \"#b\" \"a\" \"b\"\r\n<\/code><\/pre>\n<p><strong>Python 3.6<br \/>\n<\/strong><\/p>\n<pre><code>sorted({\"#b\",\"-b\",\"-a\",\"#a\",\"a\",\"b\"})\r\n\r\n['#a', '#b', '-a', '-b', 'a', 'b']\r\n<\/pre>\n<p><\/code><br \/>\n<strong>MATLAB 2018a<\/strong><\/p>\n<pre><code>sort([{'#b'},{'-b'},{'-a'},{'#a'},{'a'},{'b'}])\r\n\r\nans =\r\n1\u00d76 cell array\r\n{'#a'} {'#b'} {'-a'} {'-b'} {'a'} {'b'}\r\n<\/pre>\n<p><\/code><\/p>\n<p><strong>C++<\/strong><\/p>\n<pre><code>int main(){ \r\n\r\nstd::string mystrs[] = {\"#b\",\"-b\",\"-a\",\"#a\",\"a\",\"b\"}; \r\nstd::vector&lt;std::string&gt; stringarray(mystrs,mystrs+6);\r\nstd::vector&lt;std::string&gt;::iterator it; \r\n\r\nstd::sort(stringarray.begin(),stringarray.end());\r\n\r\nfor(it=stringarray.begin(); it!=stringarray.end();++it) {\r\n   std::cout &lt;&lt; *it &lt;&lt; \" \"; \r\n} \r\n\r\nreturn 0;\r\n} \r\n<\/code><\/pre>\n<p>Result:<\/p>\n<pre><code>#a #b -a -b a b\r\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Update A discussion on twitter determined that this was an issue with Locales. The practical upshot is that we can make R act the same way as the others by doing Sys.setlocale(&#8220;LC_COLLATE&#8221;, &#8220;C&#8221;) which may or may not be what you should do! Original post While working on a project that involves using multiple languages, [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[65,11,7,31,36],"tags":[],"class_list":["post-6453","post","type-post","status-publish","format-standard","hentry","category-cc","category-matlab","category-programming","category-python","category-r"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p3swhs-1G5","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts\/6453","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6453"}],"version-history":[{"count":11,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts\/6453\/revisions"}],"predecessor-version":[{"id":6464,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts\/6453\/revisions\/6464"}],"wp:attachment":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6453"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6453"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6453"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}