Chugalug Linux Users Group- Big Memory
CHUGALUG
Chattanooga
Unix Gnu
and Linux
User Group
Recent Keywords:
From: Eric Wolf ------------------------------------------------------ I'm currently trying to work with a really big data file (473GB) with some Python code. I'm building an index in RAM in Python with a set. Currently, I am running out RAM (and VM) on my system with 8GB of RAM and 12GB of VM. I have two options: rewrite the code so it's slower but fits in my available memory or push it out somewhere where I can have the RAM to do the job. The "slower" bit may end up being a deal breaker because I anticipate the jobs to take a couple days even working straight from RAM. "Slower" might mean weeks or months. So I have time to explore finding someplace else to run this. So what I need is a platform that provides a reasonably current Python installation, 512GB of RAM and 2-3TBs of disk. Looking on NewEgg, the biggest system I can build is a 256GB RAM box starting around $6K. I could build a system with 128GB of RAM and use a 512GB SSD for VM for starting around $5K. The money isn't a deal breaker but it still doesn't guarantee I can achieve what I need - hours or days instead of weeks or months. The largest EC2 instance Amazon has only has 68GB of RAM. I'll probably try that next just because it's a cheaper way to get out of my 8GB physical limitation. Cloud is more appealing because I really don't want to have to waste a day or two building a box (in addition to the purchasing headaches). And I may not need the system after this project. Are there any other options out there for large memory cloud systems? -Eric -=--=---=----=----=---=--=-=--=---=----=---=--=-=- Eric B. Wolf 720-334-7734

=============================================================== From: wes ------------------------------------------------------ RAM drives striped into a RAID array across multiple EC2 instances connected by their internal network, if you launch all the instances in the same AZ. have fun with that! -wes

=============================================================== From: Eric Wolf ------------------------------------------------------ Wes, That's a pretty good option. The advantage is that I can get everything running under one instance with just the 68GB of RAM and play around with your idea in another instance. Damn I love the cloud! I wonder how hard it is to setup an EC2 instance as a RAM drive... -Eric -=--=---=----=----=---=--=-=--=---=----=---=--=-=- Eric B. Wolf 720-334-7734

=============================================================== From: Aaron welch ------------------------------------------------------ Can you capture your I/O information while you are building the index? I have built some large 500+GB shared RAM CCNuma installations and might be able to find a way to use fast disk to resolve some of your issues. -AW

=============================================================== From: wes ------------------------------------------------------ you won't be able to get it to boot from RAM, but you can create a RAM drive any time you like. HOWTOs abound on the interweb. the backplane within an AZ is gigabit, so you should be able to transfer data across the network much faster than you would to the local HD (forget the hell out of EBS), assuming not every other customer is doing so at the same time ;) the only real challenge I can foresee is how to effectively connect a RAM drive on one instance to the filesystem on another. SSHFS perhaps? encryption overhead could be a killer. -wes

=============================================================== From: Ryan Bales ------------------------------------------------------ You don't need big memory if you're able to distribute the load with something like MapReduce. I know GAE supports MapReduce, and I'm sure there are others. GAE also supports WSGI, so you're good to go with python. ~Ryan Bales

=============================================================== From: Eric Wolf ------------------------------------------------------ Like I said, I'm being lazy with the code. Map-Reducing the problem is not lazy. -Eric -=--=---=----=----=---=--=-=--=---=----=---=--=-=- Eric B. Wolf 720-334-7734

=============================================================== From: Ryan Bales ------------------------------------------------------ Ah, I must've missed that :P ~Ryan Bales

=============================================================== From: Aaron welch ------------------------------------------------------ Hive running on a Cassandra ring would be easier. That gives you an SQL interface over a distributed node cluster with linear performance gains from adding new hosts. http://www.datastax.com/products/brisk -AW

=============================================================== From: Chad Smith ------------------------------------------------------ The more I read the more amazed I get... HALF A TERABYTE OF RAM!!!! it's like "1.21 JiggaWatts!!!" (I know it's Gigawatts, but that's not what the man said.) - Chad W Smith "I like a man who's middle name is W." - President George W. Bush - February 10, 2003 bit.ly/gwb-dubya

=============================================================== From: Eric Wolf ------------------------------------------------------ I think I'm going to take the next logical/lazy step and write the index to SQLite and let the library do the dirty work for me. I'm spending too much time thinking about this. And yeah, a half TB of RAM seems ridiculous but it's surprisingly doable. You can build a 1/4 TB RAM machine with parts from NewEgg for under $7K. Figure you guys have been talking about building systems with 1000s of processors for Bitcoin mining. Makes sense that RAM would work proportionally as well. We need a "NewEgg Index": What is the phattest machine that can be built from parts in stock at NewEgg? CPU: How many cores? What speed? RAM: TBs? Disk: PBs? GPU: 10K? The motherboard I was looking at could support 48 CPU cores, 256GB RAM but the rest gets harder because you wouldn't put too many drives in a single cabinet (just use NAS) and to get the GPU count up, you are using bus extenders... Thanks for the input... -Eric -=--=---=----=----=---=--=-=--=---=----=---=--=-=- Eric B. Wolf 720-334-7734